mirror of https://github.com/schoebel/mars
2069 lines
88 KiB
Plaintext
2069 lines
88 KiB
Plaintext
IMPORTANT: the historic distinction between MARS Light and the future
|
|
MARS Full has been dropped. Now all versions are simply called "mars".
|
|
|
|
Old tagnames light* will remain valid, but newer names will follow the
|
|
convention s/light/mars/g (this means that the old version number counting
|
|
will be continued, only the "light" is substituted).
|
|
|
|
|
|
Meaning of stable tagnames
|
|
--------------------------
|
|
|
|
Example: mars0.1stable01:
|
|
0 = version of on-disk data structures
|
|
(only incremented when downgrades are impossible)
|
|
(not incremented on backwards-compatible upgrades)
|
|
1 = version of feature set
|
|
stable = feature set is frozen during this series
|
|
01 = bugfix revision
|
|
|
|
Example: mars0.2beta2.3:
|
|
The general idea is as before.
|
|
"beta" means that new features are roughly tested
|
|
in the lab, but not in production, so there may be
|
|
some bugs. New features may be added during
|
|
the beta phase.
|
|
|
|
Example: mars0.3alpha*:
|
|
Never use this for production. Only for historic
|
|
code inspection.
|
|
|
|
Release Conventions / Branches / Tagnames
|
|
-----------------------------------------
|
|
|
|
mars0.1 series (now EOL):
|
|
- Unstable tagnames: light0.1beta%d.%d (obsolete)
|
|
- Stable branch: mars0.1.y (obsolete)
|
|
- Stable tagnames: mars0.1stable%02d (obsolete)
|
|
|
|
mars0.1a series (stable):
|
|
New master branch. Now stable.
|
|
This branch is operational for several years on
|
|
several thousands of servers, and several petabytes
|
|
of data.
|
|
- Unstable tagnames: light0.1abeta%d (obsolete)
|
|
- Stable branch: mars0.1a.y
|
|
- Stable tagnames: mars0.1astable%02d
|
|
|
|
mars1.0 series (planned):
|
|
- Replace symlink tree by transactional status files
|
|
(future-proof)
|
|
This is required for upstream merging to the kernel.
|
|
It has further advantages, such as better scalability.
|
|
- Trying to additionally address public needs.
|
|
- Potentially for Linux kernel upstream,
|
|
- Unstable tagnames: mars1.0beta%d.%d (planned)
|
|
- Stable branch: mars1.0.y (planned)
|
|
- Stable tagnames: mars1.0stable%02d (planned)
|
|
|
|
WIP-* branches are for development and may be rebased onto anything
|
|
at any time without notice. They will disappear eventually.
|
|
Never use them for production!
|
|
|
|
*stable* branches mean the following:
|
|
|
|
- Heavily tested. Has to obey an HA SLA of 99.98% end-to-end,
|
|
including network outages and HumanError(tm) at 1&1 Ionos
|
|
ShaHoLin. Thus the _component_ SLA of MARS must be much better.
|
|
- There is always an upgrade path. Simply install the new
|
|
version, obeying the below compatibility rules.
|
|
- Rolling upgrades (temporarily different MARS kernel module
|
|
versions at primary vs secondary side) are supported.
|
|
Typically, do "rmmod mars; modprobe mars" at the secondary
|
|
side first, then handover, then do the same at the former
|
|
primary side.
|
|
Or, of course, you may combine it with (typically security-
|
|
triggered) rolling kernel reboots.
|
|
I am putting high effort into maintaining rolling upgrades
|
|
of kernel modules. The network protocols are designed to
|
|
support this.
|
|
- COMPATIBILITY RULES:
|
|
Ensure that $marsadm_version >= $module_version.
|
|
This is the safe side of your update strategy.
|
|
Update marsadm first, before updating the kernel module.
|
|
This way, the controls for newer features are already in
|
|
place when the new kernel module is activated (no blind
|
|
flight).
|
|
Since marsadm is a plain Perl script with _no_ dependencies
|
|
from anything else, this is something I can reasonably expect
|
|
from users.
|
|
REASON: ensuring forever backwards compatibility to stone-aged
|
|
marsadm versions would make me ill. I cannot change old versions
|
|
anymore, but just provide new versions. I cannot ensure and
|
|
test all possible O(n^2) combinations of marsadm versions with
|
|
kernel module versions to work eternally for all times when
|
|
marsadm would be frozen, or even all O(n^3) combinations of
|
|
frozen marsadm with mixed-operations kernel modules.
|
|
The development of MARS would be hindered by too old marsadm
|
|
versions, since my effort would grow quadratically or
|
|
even worse.
|
|
Hint: nevertheless, many combinations of old marsadm with newer
|
|
kernel module version are working anyway, in particular when
|
|
the gap is a small $epsilon. But I cannot guarantee in general.
|
|
If you want to violate the above rule, you must test the
|
|
combination yourself.
|
|
- Best practice in bigger installations: first test your upgrade
|
|
or downgrade at some test clusters first.
|
|
If you have a separate pre-live stage, it definitely is
|
|
your friend.
|
|
- As long as $marsadm_version + $epsilon >= $module_version
|
|
remains true (at least "approximately") and has been tested
|
|
in pre-live, marsadm may be upgraded and downgraded
|
|
independently from kernel, and during operations
|
|
(best via your favorite package manager).
|
|
Of course, no magic will happen: newer features are only
|
|
available when newer versions of both the userspace tool and
|
|
the kernel modules are installed.
|
|
- Please check this ChangeLog for any upgrade / downgrade
|
|
incompatibility bugs. In case they are detected, they will
|
|
be fixed. But I cannot retrospectivly change already released
|
|
versions and their bugs. Fixes are only possible in newer
|
|
versions.
|
|
- Downgrade is possible *inside* of the same stable branch
|
|
series.
|
|
- Downgrade to _prior_ *stable* branches may be restricted,
|
|
or may require some extaordinary actions.
|
|
Please read this ChangeLog for details.
|
|
|
|
Example: a new future-proof internal deletion format has been
|
|
introduced in mars0.1astable88. It is off by default.
|
|
If you never activate it, you can downgrade inside of mars0.1astable*
|
|
as you like.
|
|
Only if you actually activate it, and if you really need to
|
|
downgrade beyond that old version, you have to obey the
|
|
downgrade instructions documented below.
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.1a:
|
|
|
|
This is the new master branch, starting January 2019.
|
|
The old stable branch mars 0.1.y is EOL,
|
|
now fully superseeded by this branch.
|
|
|
|
mars0.1astable125
|
|
* Critical fixes: over a very long time, internal int counters
|
|
could wrap around into negative numbers and cause kernel Oops.
|
|
* Critical safeguard: about once per 1 million of operation hours,
|
|
a stacktrace was observed in copy_endio().
|
|
At the moment, I have no reproducer for the very spurious
|
|
bug. Hopefully it is fixed now.
|
|
* Minor improvement: alternation between sync and replay
|
|
now avoids unnecessary waiting.
|
|
|
|
mars0.1astable124
|
|
* Major improvement: support for LTS kernels 4.19 and 5.4.
|
|
A new pre-patch generation obeying the new ksys_* conventions
|
|
has been added. This will help further porting in the future.
|
|
IMPORTANT: do NOT OMIT the fix for upstream bug
|
|
0001-sched-wait-fix-endless-kthread-loop-at-timeout.patch
|
|
from directoy pre-patches/vanilla-*/ .
|
|
Leaving out this fix may SERIOUSLY HARM your experience,
|
|
due to kernel soft lockups happening when the network
|
|
is interrupted, i.e. exactly during certain types of incidents.
|
|
Anyway, please also use 0001-mars-v2-minimum-pre-patch-for-mars.patch
|
|
because IO performance is MUCH WORSE without this pre-patch.
|
|
|
|
mars0.1astable123
|
|
* Major fix: under very unlikely conditions, deadlock
|
|
of the logger IO scheduling was possible.
|
|
Never observed during millions of total operation hours.
|
|
Please update for maximum HA safety.
|
|
* Minor fix: scarce divide by zero in IO or network throughput
|
|
limiters (normally not used) was possible under very
|
|
special cirmumstances.
|
|
|
|
mars0.1astable122
|
|
* Minor improvement (possibly a regression from 0.1astable116),
|
|
observed in a very tricky situation where _both_ the primary
|
|
and secondary RAIDs were heavily degraded at the same time:
|
|
The replay could take too much preference over sync, leading
|
|
to quasi-starvation of sync.
|
|
Workaround was possible by temporarily setting
|
|
/proc/sys/mars/sync_flip_interval_sec to 0, and manually
|
|
switching between replay and sync.
|
|
Now awful RAID degradation should be handled more gracefully.
|
|
* Minor improvement, stimulated by Gabriel Franciso:
|
|
Before marsadm asks DNS, first /etc/hosts is consulted
|
|
via /usr/bin/getent.
|
|
|
|
mars0.1astable121
|
|
* Fix scarce use-after-free, only observed at rmmod operations
|
|
under KASAN test kernel 4.14.
|
|
For maximum safety, please update to this version.
|
|
|
|
mars0.1astable120
|
|
* Fix build with LTS kernel 4.14.
|
|
IMPORTANT: you need to patch your 4.14 kernel sources with
|
|
pre-patches/vanilla-4.14/0001-sched-wait-fix-endless-kthread-loop-at-timeout.patch
|
|
Otherwise you will encounter _massive_ problems!
|
|
|
|
mars0.1astable119
|
|
* Major systemd update, only relevant when you are using the
|
|
systemd template generator:
|
|
- Now works completely lockless. This should improve
|
|
the parallelism degree and reduce the risk of deadlocks.
|
|
- Now uses per-resource triggers for parallel incremental updates
|
|
after {create,join,leave}-resource etc.
|
|
Attention: checkout the new templates in systemd-testing/
|
|
(the old ones will no longer work).
|
|
- New pseudo unit type .script : in place of being interpreted
|
|
by systemd, you may now write some (wrapper) scripts for
|
|
more complex operations, and/or for achieving idempotence.
|
|
Although I would like to prefer native systemd units, I
|
|
added this feature after I became desperate when trying to
|
|
achieve true idempotence via native systemd units. After several
|
|
months of fruitless attempts, I gave up and added .script .
|
|
For details, please read the new docs.
|
|
An example of .script can be found in the new systemd-icpu/ .
|
|
Notice that "nodeagent" is a third-party tool which in turn
|
|
may call systemctl for startup of LXC containers (after
|
|
contacting a database, and a plethora of other things).
|
|
Any attempts to call nodeagent via ExecStart= and co did
|
|
not really work as it should.
|
|
- Some new template engine features, like markers DEFAULT_START
|
|
and much more.
|
|
* Updated docs on systemd.
|
|
* No other changes outside of the systemd area.
|
|
|
|
mars0.1astable118
|
|
* Critical fix, only relevant for trial builds without
|
|
pre-patch: kernel NULL deref, fixed by Gabriel Francisco.
|
|
I am releasing this alone because the next release will
|
|
take some more time.
|
|
|
|
mars0.1astable117
|
|
* Minor fix, only relevant for k > 2 replicas:
|
|
invalidate and log-purge-all could abort unnecessarily due to races.
|
|
Workaround was possible by retrying the command.
|
|
* Minor usability: new commands {de,}activate-guest for
|
|
enabling or getting rid of temporary guests. Only relevant for
|
|
clusters with > 2 members.
|
|
* Some minor fixes and improvements.
|
|
* Minor doc update (new commands).
|
|
* Further dkms improvements / tuning from Gabriel Francisco.
|
|
|
|
mars0.1astable116
|
|
* Critical fix, only relevant for cluster naming schemes where hostname A
|
|
may be a _prefix_ of hostname B: such naming schemes could have led to
|
|
a multitude of bizarre and unexplainable confusions and problems.
|
|
Example: hostnames icpu-bs6 and icpu-bs60 .
|
|
Please UPDATE when such hostnames may occur in the _same_ cluster.
|
|
I was unable to find the bug via the test suite because such hostnames
|
|
were not deployed at the test machines. Thanks to Stephan Christiany
|
|
who pointed me at the problem.
|
|
* Fix annoying bug: during long-lasting sync (several TB), the automatic
|
|
flipping between replay and sync could sometimes get stuck in sync mode,
|
|
and then /mars could fill up because replay was starving unnecessarily
|
|
(waiting that sync would finish, which could take a long time).
|
|
Workaround was possible by "pause-sync"; wait until replay has
|
|
caught up; "resume-sync".
|
|
|
|
mars0.1astable115
|
|
* Critical regression from mars0.1astable113 / 114, only relevant
|
|
when the new ssh-less peer operations are actually used:
|
|
Race on peer thread creation could lead to kernel memory corruption.
|
|
For maximum safety, please avoid the affected kernel module versions.
|
|
* dkms improvements from Gabriel Francisco.
|
|
|
|
mars0.1astable114
|
|
* Major usability: ssh-less {merge,split}-cluster.
|
|
Now all cluster operations should work without ssh and
|
|
its agent forwarding.
|
|
Of course, you will need to update mars.ko and marsadm
|
|
on all of your machines first.
|
|
* Doc update: describe new options and behaviour.
|
|
* Some smaller fixes / safeguards / improvements.
|
|
|
|
mars0.1astable113
|
|
* Critical fix: deadlock was possible after receiving _corrupted_
|
|
data over the network. Very unlikely to trigger, since there
|
|
is a lot of other magic checking, but anyway.
|
|
Now treated like any other communication error.
|
|
* Major improvement: "marsadm primary" (also with --force) now does
|
|
the equivalent of "up", after the operation has succeeded.
|
|
This should be useful for people who forget to do the "up" manually
|
|
after a manual unplanned failover.
|
|
* Minor fix: race at {wait,update}-cluster leading to unnecessary
|
|
abort.
|
|
* Minor fix: update-cluster did not always transfer directories.
|
|
* Minor fix: new join-cluster method could sometimes fail
|
|
at the first try. Workaround by repetition.
|
|
* Minor fix: primitive macros wait-todo-primary-{on,off} were
|
|
documented, but not implemented.
|
|
* Minor improvement: by the way, all missing combinations from
|
|
{is,nr,todo}-secondary and
|
|
wait-{is,todo}-{primary,secondary}-{on,off} are also implemented.
|
|
* Minor improvement: try to automatically fetch any unknown
|
|
peer info. May help after failed join-cluster & co.
|
|
* Minor improvement: speedup new join-cluster method.
|
|
* Minor doc update: describe new primitives.
|
|
* Some smaller fixes and improvements.
|
|
|
|
mars0.1astable112
|
|
* Critical fix: generic mars_readlink() did not work with an
|
|
extremely low probability, so it slipped through years
|
|
of testing. My reproducer indicates that it "fixed" itself
|
|
after a while, just leading to some unnecessary delays.
|
|
Nevertheless, I mark it "critical" under a HA viewpoint,
|
|
although most people likely might have never noticed it.
|
|
Recommendation: please update.
|
|
* Major fix, only relevant for k > 2 replica:
|
|
fetch could get stuck in cyclic dependencies for some
|
|
time, making only slow progress.
|
|
* Major fix: join-resource could loop when
|
|
old method is selected and ssh was not working.
|
|
* Minor fix: do not produce alive-timestamp & co on a fresh
|
|
/mars, before {create,join}-cluster has been executed.
|
|
* Several smaller fixes and improvements.
|
|
|
|
mars0.1astable111
|
|
* Minor fix, only relevant for new deletions:
|
|
Split-brain cleanup was sometimes stumbling over
|
|
deleted logfiles. Workaround by cron.
|
|
|
|
mars0.1astable110
|
|
* Minor improvement: new disk-error for better diagnosing
|
|
any problems with disk setup / LVM etc.
|
|
* Doc update (new macros etc).
|
|
|
|
mars0.1astable109
|
|
* Regression from mars0.1astable106: when the old
|
|
deletions were active, logfiles could be unlinked
|
|
unnecessarily (displayed as Orphan).
|
|
It did not really harm due to automatic re-fetching, but
|
|
caused unnecessary network traffic.
|
|
|
|
mars0.1astable108
|
|
* Improved metadata scalability.
|
|
* Some smaller fixes and improvements.
|
|
|
|
mars0.1astable107
|
|
* Critical regression from mars0.1astable106: use-after-free.
|
|
* Fix use-after-free at rmmmod.
|
|
|
|
mars0.1astable106
|
|
* Major regression from mars0.1astable97: marsadm primitive
|
|
disk-present erronously reported the disk name in place of
|
|
boolean value 0 or 1.
|
|
* Minor fix for new deletions (beta):
|
|
invalidate / re- join-resource were sometimes hanging
|
|
in Orphan due to a conflict with the new deletions.
|
|
* Minor improvements: somewhat more improved scalability both
|
|
in #resources and in #hosts.
|
|
|
|
mars0.1astable105
|
|
* Minor marsadm regression from mars0.1astable104: race on
|
|
_old_ deletions could lead to lost deletions. Workaround
|
|
by repeating any affected commands, e.g. leave-resource.
|
|
|
|
mars0.1astable104
|
|
* Major fix: marsadm did not obey an abort of certain phased
|
|
commands when a single resource argument was given. As a result,
|
|
a wrong exit code could be returned in such a case.
|
|
* Minor fix: when beta feature logfile digests were disabled
|
|
_during_ operations, already existing old logfiles were
|
|
not always checked correctly at the secondary,
|
|
reporting DefectiveLog (although they were healthy).
|
|
Workaround by just enabling again and invalidate.
|
|
With the fix, you may now replay the old logfiles :)
|
|
* Minor fix: inherent race between join-resource and log-rotate
|
|
(unavoidable in the Distributed System) could lead to split brain,
|
|
or to hanging replay. Now compensated.
|
|
* Minor fix: join-cluster without ssh was sometimes not
|
|
updating the local link tree immediately.
|
|
* Usability (BETA feature): improved scalability in #hosts.
|
|
The below BETA feature warnings apply.
|
|
Do not exceed the "officially documented" limits too much.
|
|
* Usability: join-resource avoids unnecessary fallback
|
|
to ssh / rsync.
|
|
IMPORTANT: please update marsadm first, before updating the
|
|
kernel module. See the above compatibility rules.
|
|
This time the compatibility rules are important. I know that
|
|
marsadm < 0.1astable85 does no reliable join-resource anymore,
|
|
while combinations with old 0.1astable95 appear to work. There is
|
|
no merit in bisecting old marsadm releases, instead of just
|
|
fucking update the old userspace script in a controlled manner.
|
|
* Usability: more accurate IOPS and friends.
|
|
* Several smaller fixes and improvements.
|
|
|
|
mars0.1astable103
|
|
* Major regression from mars0.1astable99: secondary replay could
|
|
hang unnecessarily due to a cascade of race conditions.
|
|
AFAICS consistency was not affected (thanks to md5 checksumming).
|
|
Observed with a specific load pattern at less than 1% of resources,
|
|
or in average after ~ 120 operation hours when logrotate
|
|
was 12 times per hour. Unfortunately, it slipped through all my
|
|
release tests due to relatively low trigger probability.
|
|
Workaround by "invalidate". Which is however no good solution.
|
|
Please avoid kernel module versions between *99 and *102
|
|
for production.
|
|
|
|
mars0.1astable102
|
|
* Major usability (BETA): scalability in number of hosts.
|
|
It should have no visible side effect in functionality,
|
|
but better non-functional properties.
|
|
Tested in the _lab_ with 1000 additional dummy hosts
|
|
and additionally 8000 dummy resources in total.
|
|
BETA WARNING: at the moment, there are no practical experiences.
|
|
There might be problems which might not show up during lab tests.
|
|
Do not blindly rollout or merge-cluster big masses in production.
|
|
I will tell you when practical experiences allow for rising
|
|
the "official" limits as documented in the user manual.
|
|
|
|
mars0.1astable101
|
|
* Major usability: join-cluster now works without ssh.
|
|
Of course, you need to rollout the new marsadm and
|
|
the new mars.ko first, and to modprobe it at any
|
|
pre-existing cluster.
|
|
The new feature is automatically activated when you
|
|
modprobe _before_ doing join-cluster. By running
|
|
join-cluster first (without modprobe), you can fallback
|
|
to the old ssh + rsync based method.
|
|
Important: now you can modprobe before /mars/uuid is
|
|
created or retrieved. Previously, you could accidentally
|
|
try the wrong sequence "modprobe mars; mount /mars"
|
|
without harm because it was denied by missing uuid, but now
|
|
such illegal attempts would result in a big fuckup.
|
|
Suchalike fuckup is now prevented by always insisting on
|
|
/mars being a mountpoint.
|
|
This might break old ill-behaved scripts or buggy /etc/fstab
|
|
or racy systemd dependencies, which need to be fixed.
|
|
Always ensure that no modprobe is attempted before /mars
|
|
has been mounted in a race-free and reboot-safe manner.
|
|
Notice: merge-cluster and split-cluster are not yet
|
|
ssh-free zones. This will be addressed in a later release.
|
|
* Minor usability: show age of any hanging /dev/mars/
|
|
IO requests. This is useful for diagnosing faulty RAID
|
|
controllers etc.
|
|
* Lots of further minor fixes and improvements.
|
|
|
|
mars0.1astable100
|
|
* Minor fix: UpToDate was not reported in a very weird
|
|
corner case.
|
|
* Minor fix, only relevant when the new deletion method
|
|
is enabled: leave-resource did sometimes not delete
|
|
all superfluous logfiles at the other peers, sometimes
|
|
not clearing a split brain situation immediately.
|
|
Workaround by cron which did the cleanup later.
|
|
* Minor usability: reduced speakiness of "marsadm view all"
|
|
with respect to the new compression / digest features.
|
|
Full info can be obtained with --verbose.
|
|
* Minor fix, only observed at join-cluster without ssh:
|
|
Not all symlink infos were transferred in a corner case.
|
|
* Further minor fixes and improvements.
|
|
|
|
mars0.1astable99
|
|
* Minor fixes: some more corner cases of unnecessary
|
|
split brain rarely occuring after fatal primary
|
|
crashes.
|
|
|
|
mars0.1astable98
|
|
* Minor regression from mars0.1astable97: when old
|
|
kernel modules < mars0.1astable97 were combined with
|
|
exactly that marsadm version, the presence of
|
|
/dev/mars/$resource was detected incorrectly.
|
|
Do not use exactly that combination. Simply skip
|
|
the marsadm version mars0.1astable97.
|
|
Other version combinations are still possible for independent
|
|
and rolling updates of kernel and marsadm.
|
|
Best practice: first update marsadm to mars0.1astable98
|
|
or newer, so this bug is fixed, and then your rolling
|
|
kernel updates will work again for updating or even
|
|
downgrading old kernels.
|
|
* Minor fix: in a hardly reachable corner case, detach
|
|
was hanging. Workaround by rmmod was possible.
|
|
* Minor fix: spurious races at join-resource without ssh could
|
|
occur, so it sometimes did not notice that a new resource
|
|
was added in the meantime. Usage of ssh, or just retrying
|
|
was helpful. Thus hardly relevant in practice.
|
|
* Various minor fixes and improvements. Some masked bugs,
|
|
not visible, only triggerable by a future version of MARS.
|
|
|
|
mars0.1astable97
|
|
* Critical fix: when logfile is damaged (e.g. after a
|
|
primary crash), some corner cases of primary recovery
|
|
could hang. Workaround by "detach ; attach" seemed
|
|
possible (as far as observed during testing).
|
|
* Critical fix for BETA feature network compression only:
|
|
Memory deallocation could fail under certain circumstances,
|
|
resulting in a memory leak, or potentially memory corruption.
|
|
Only relevant when network transport compression is enabled.
|
|
* Major fix: when a primary crash was occuring exactly during
|
|
a very short log-rotate time window, a race condition could
|
|
sometimes lead to unnecessary split brain (secondaries could
|
|
bypass the primary).
|
|
* Several minor fixes and improvements.
|
|
|
|
mars0.1astable96
|
|
* Minor improvement: auto-correct defective symlink
|
|
timestamps which are too far in the future.
|
|
This can happen when running with a defective CMOS
|
|
hardware clock, e.g. after a fatal hardware failure, and
|
|
before ntpd has corrected the local clock.
|
|
* Minor usability: more pretty formatting of compression
|
|
and digest flags in "marsadm view".
|
|
|
|
mars0.1astable95
|
|
* Minor fix: sometimes, in a hardly relevant corner case,
|
|
join-resource could abort unnecessarily.
|
|
* Minor improvement: marsadm view now distinguishes role ForcedPrimary
|
|
from plain Primary. This could help a larger team of sysadmins
|
|
earlier noticing potentially upcoming SplitBrain even while the
|
|
network is interrupted, so any actual SplitBrain cannot be
|
|
detected, although it is suspectible.
|
|
* Reduce footprint of some deprecated marsadm functions
|
|
and macros.
|
|
|
|
mars0.1astable94
|
|
* Major regression from mars0.1astable86:
|
|
Memory leak in remote communication.
|
|
This could accumulate over a longer time. Please update when
|
|
affected.
|
|
|
|
mars0.1astable93
|
|
* Minor improvement: in some special cases, secondaries
|
|
may now follow primaries having a damaged logfile.
|
|
|
|
mars0.1astable92
|
|
* Major improvement from an operational perspective:
|
|
"marsadm view all" now reports the current status of
|
|
/dev/mars/mydata in human-readable form, including
|
|
the Open status, the current IOPS, the number of currently
|
|
flying IO requests = IO queue length = indicator for IO problems
|
|
or overload, and any error information.
|
|
|
|
mars0.1astable91
|
|
* Major features, disabled by default:
|
|
- Network transport compression.
|
|
May improve network bottlenecks.
|
|
- Transaction logfile payload compression.
|
|
May improve the filling speed of /mars.
|
|
* Major feature, enabled by default:
|
|
- More logfile checksumming digests, some
|
|
consuming less CPU.
|
|
* Rough benchmarks, supporting you activation decisions.
|
|
Please read mars-user-manual.pdf for instructions.
|
|
Rolling updates with mixed versions are supported.
|
|
|
|
mars0.1astable90
|
|
* Minor improvement: more reactiveness. This release
|
|
is meant as an anchor point in case you would need
|
|
a downgrade.
|
|
|
|
mars0.1astable89
|
|
* Minor improvement: better kernel module reactiveness.
|
|
More on scalability is in the dev pipeline.
|
|
For now, use marsadm --timeout=300 or similar when
|
|
stretching the official limits (but don't stretch too
|
|
much until I have improved all relevant parts).
|
|
|
|
mars0.1astable88
|
|
* New experimental scalability feature, deactivated
|
|
by default:
|
|
New deletion method, uses the special symlink value
|
|
".deleted" as a marker for logically deleted symlinks.
|
|
This leads to a _massive_ simplification of code,
|
|
and improves scalability for future masses of
|
|
resources and/or cluster hosts.
|
|
After updating both mars.ko and marsadm, you may
|
|
activate it via marsadm option --delete-method=0
|
|
but ONLY FOR TESTING.
|
|
I will tell you when it will be stable enough for
|
|
production. Somewhen in future, it will hopefully
|
|
become the default, and eventually the old complex code
|
|
can be hopefully purged after the whole world
|
|
uses the new method.
|
|
Note: when never activated, it should not have any
|
|
influence on old-style production. Both methods
|
|
can be used in parallel on different clusters.
|
|
So you can activate it on some test clusters first.
|
|
Do not _directly_ rollback to old mars.ko and/or marsadm versions
|
|
after activation. First deactivate the feature via
|
|
--delete-method=1, then wait for a few hours until marsadm cron
|
|
has done purging. "find /mars -type l -ls" must no longer report
|
|
any "-> .deleted" values anywhere in the entire cluster.
|
|
Then you can roll back to old releases.
|
|
* Doc: small update on new marsadm command link-purge-all.
|
|
|
|
mars0.1astable87
|
|
* Minor fix: unnecessary split brain could result from a race
|
|
between handover and log-rotate / cron.
|
|
|
|
mars0.1astable86
|
|
* Minor improvement: speedup metadata traffic avoiding
|
|
some O(n^2) internal algorithms.
|
|
|
|
mars0.1astable85
|
|
* Minor improvement: avoid ssh / rsync at join-resource.
|
|
Only when ordinary communication over over port 7777 (default)
|
|
fails, fallback to ssh connections.
|
|
* Minor marsadm speedup by avoidance of unnecessary
|
|
sleep times.
|
|
* Minor fix: ensure that primary --force works even when a
|
|
logfile was truncated forcefully.
|
|
* Minor fix: use-after-free reported by KASAN, only
|
|
triggerable with a future development version, not
|
|
observed with the current stable version.
|
|
I include it here for safeguarding.
|
|
* Minor doc updates. Explain fundamental requirements for
|
|
geo-redundancy, and some background on cost comparisons.
|
|
|
|
mars0.1astable84
|
|
* Major improvement: try to automatically self-repair
|
|
any defective logfile at secondaries, by fetching again
|
|
from primary.
|
|
This can only work when the version at the primary is
|
|
healthy.
|
|
When successful, "invalidate" is no longer necessary.
|
|
|
|
mars0.1astable83
|
|
* Major improvement: new marsadm option --parallel can drastically
|
|
speed up handover, provided that the rest of your infrastructure
|
|
can deal with parallelism. Several cluster managers are
|
|
known to have problems with that. So be careful, do not
|
|
blindly use this feature!
|
|
Future releases will try to improve the systemd interface
|
|
such that parallelism is possible without problems.
|
|
* Doc updates: describe dimensioning of storage networks
|
|
and its realtime behaviour, at the background of Kirchhoff's
|
|
law. Neglecting this may lead to much higher cost than
|
|
necessary, and may lead to a variety of operational problems,
|
|
up to failures of projects.
|
|
Also, working with wrong definitions of Cloud Storage can lead
|
|
to a similar effect.
|
|
Recommended reading!
|
|
|
|
mars0.1astable82
|
|
* Major improvement: the mars_main kernel thread is now working
|
|
non-blocking in practically all relevant cases. Some more cases
|
|
will be addressed in future.
|
|
Testing with 32 resources in parallel is now working, and even
|
|
64 resources appear to work in the lab, although somewhat slower
|
|
(on typical server iron).
|
|
"marsadm primary all" is now much faster.
|
|
More future improvements to come. Currently, "marsadm primary all"
|
|
uses an internal barrier synchronisation model, which may lead
|
|
to unnecessary waiting time for faster resources. There are
|
|
plans to address this in future releases.
|
|
ATTENTION! You will need NEW VERSIONS of your pre-patch.
|
|
This will automatically adjust /proc/sys/fs/aio-max-nr to higher
|
|
values when needed. If you don't use the new pre-patch, you will
|
|
need to tune /proc/sys/fs/aio-max-nr yourself. Otherwise
|
|
you will get serious operational deadlocks due to virtual
|
|
resource limitations, even with only 32 resources, but a
|
|
higher number of replicas.
|
|
Since there is no practical experience yet (the biggest known
|
|
productive installation uses only 24 resources), I do not yet
|
|
increase the official limits as documented in the appendix of
|
|
mars-user-manual.pdf.
|
|
Although very slow due to some O(n^2) algorithms, 128 resources
|
|
are just surviving now, without bombing or deadlocking, but are
|
|
not yet really usable.
|
|
Therefore, do not try to stretch the official limits too much.
|
|
Please report any success stories (or problems) in case you
|
|
are using some more resources _productively_.
|
|
* Minor doc improvements. New slides from LCA2020 added.
|
|
|
|
mars0.1astable81
|
|
* Minor doc improvement: explain why running MARS inside of VMs
|
|
is a bad idea. Explain fully managed geo-location transparency
|
|
of VMs.
|
|
|
|
mars0.1astable80
|
|
* Compatibility up to kernels <= 4.14.
|
|
Attention! There is a bug in upstream kernels >= 4.11, leading
|
|
to an endless loop in kernel mode under certain preconditions.
|
|
The fix is in pre-patches/vanilla-4.14/0001-sched-wait-fix-*
|
|
If you _forget_ to apply this fix for _affected_ kernels, you may
|
|
get "operational fun" at the wrong moment: ordinary operations
|
|
will likely be unaffected, but a _silent_ network outage at the
|
|
wrong moment (race condition) may hang up your kernel at the
|
|
secondary site, just in the moment when you probably want to do
|
|
a failover.
|
|
LTS kernels 4.9 and earlier are not affected by the bug, although
|
|
potentially present also there, but it is a _masked_ (sleeping)
|
|
bug there.
|
|
I already submitted the fix to LKML, but unfortunately has been
|
|
ignored up to now.
|
|
|
|
mars0.1astable79
|
|
* Critical fix: in a multiple-failure scenario which is hard
|
|
to reach, and then acting badly by disregarding
|
|
heavy warnings from marsadm and from mars-user-manual.pdf,
|
|
data consistency could be violated. Detected by testing
|
|
(the situation has not been observed in practice up to now).
|
|
When unsure, better update to this fixed version.
|
|
* Minor fix: in a scarce corner case plus an additional
|
|
scarce race, primary handover could hang.
|
|
* Major systemd interface fixes and improvements:
|
|
- When handover fails due to failed systemd stopping at
|
|
the old primary (e.g. hanging umount etc), the application
|
|
stack will be automatically restarted before the handover
|
|
operation reports timeout. The idea is to keep your
|
|
applications running whenever possible.
|
|
- New commands marsadm set-systemd-want and get-systemd-want
|
|
for a temporary shutdown of the systemd unit stack.
|
|
This is useful e.g. for performing an fsck.
|
|
- Implemented transitive closure of indirectly referenced
|
|
further systemd units.
|
|
- Attach / detach now automatically starts / stops the
|
|
systemd unit stack.
|
|
- Improved reliability of systemd handover.
|
|
- Fixed many bugs in the systemd template macro processor.
|
|
- Updated doc accordingly.
|
|
|
|
mars0.1astable78
|
|
* Major or minor fix: memory leak, triggered under scarce conditions.
|
|
Observed cases were a few kilobytes. However, it could accumulate
|
|
over a very long time. When unsure, better update to this version.
|
|
* Minor usability: report each resource size.
|
|
|
|
mars0.1astable77
|
|
* Major doc update: the old mars-manual.pdf has been split into
|
|
- mars-user-manual.pdf (for sysadmins)
|
|
- mars-architecture-guide.pdf (for managers and architects)
|
|
- mars-for-kernel-developers.lyx (unfinished)
|
|
- football-user-manual.lyx
|
|
The first two manuals have been heavily rewritten and
|
|
extended!
|
|
* Minor fix: after primary crash without failover, the secondaries
|
|
could get stuck because a version symlink was forgotten to
|
|
update under scarce preconditions.
|
|
* Minor improvement: emergency space calculation is now more
|
|
accurate.
|
|
* Minor usability: hint when marsadm resize would be possible.
|
|
* Several minor cosmetic improvements.
|
|
|
|
mars0.1astable76
|
|
* Major fix: when the primary was dead and the
|
|
secondary had an incomplete logfile which was
|
|
not recognized as being damaged, "primary --force"
|
|
did not always work under all circumstances.
|
|
* Minor fix: some config information was not
|
|
replicated throughout the cluster.
|
|
Ordinary users were typically not affected.
|
|
* Minor improvement: marsadm view now shows
|
|
the replication degree [$x/$y] at each individual
|
|
resource.
|
|
* Added slides from FrOSCon2019.
|
|
|
|
mars0.1astable75
|
|
* Major fix, only relevant for a scarce corner case:
|
|
When overflowing the kernel fscache with gigabytes of
|
|
data, and when a few more weird preconditions were met,
|
|
it was possible to potentially eat up the whole kernel
|
|
memory and to trigger OOM.
|
|
Notice: depending on kernel version, and depending on various
|
|
overload scenarios, you may trigger OOM anyway, independently
|
|
from MARS.
|
|
* Minor fix: marsadm now is reporting the amount of
|
|
Writeback data (as necessary for the Recovery phase after
|
|
a crash) more precisely.
|
|
* Minor improvement: speedup IOPS by better internal
|
|
hash dimensioning.
|
|
|
|
mars0.1astable74
|
|
* Full merge of EOL branch mars0.1.stable74,
|
|
which was the last stable release in EOL branch
|
|
mars0.1.y.
|
|
* Major fix, only relevant for a corner case:
|
|
Writeback made no human-visible progress under
|
|
multiple weird preconditions.
|
|
* Minor fix: ssh connections should be more robust
|
|
when clumsy firewalls are leading to ssh hangs.
|
|
* Minor usability improvement: marsadm view shows
|
|
more fancy details on logfile numbers.
|
|
* Minor speedups in internal infrastructure.
|
|
* Football subproject: update to Football-2.0
|
|
|
|
mars0.1astable73 (merged from mars0.1stable73)
|
|
* Critical fix, only relevant for kernels >= 4.2.x:
|
|
NULL deref occurs systematically when more than 64
|
|
file handles are being allocated.
|
|
There is already an upstream bugfix in linux-next
|
|
(missing initializer for resize_wait in fs/file.c).
|
|
Since this fix is missing in many LTS and distro kernels
|
|
(at the moment), I added a workaround in MARS.
|
|
Recommendation: anyone operating MARS on newer kernels
|
|
should update to mars0.1astable73 for safe operations.
|
|
Don't leave this unfixed. It can explode at the worst
|
|
moment, and restoring operations may only be possible
|
|
by completely giving up a secondary host, or with a fix.
|
|
|
|
mars0.1astable72 (merged from mars0.1stable72)
|
|
* Minor fix: writeback improved in a corner case.
|
|
* Minor improvement: display WriteBack data amount in
|
|
marsadm view.
|
|
* Major doc improvement: describe IO performance tuning.
|
|
|
|
mars0.1astable71 (merged from mars0.1stable71)
|
|
* Major fix: writeback at the primary was unnecessarily
|
|
slow at certain situations.
|
|
|
|
mars0.1astable70 (merged from mars0.1stable70)
|
|
* Critical fix: a few upper-layer kernel components are
|
|
allocating struct bio on the stack. This led to stack memory
|
|
corruption. If you ever had this problem, you certainly have
|
|
noticed it ;) Thus it should not have affected your data.
|
|
Unfortunately, I got no bug reports about this for several years.
|
|
Discovered when testing compatibility to very new kernels,
|
|
and now hopefully fixed.
|
|
* Major fixes: the systemd interface was not in a mature state.
|
|
Now improved a lot. More improvements are likely to follow
|
|
in the next months.
|
|
* Minor clarification: build for ancient kernel 2.6.32 was broken.
|
|
Fixing the build was no problem, but then the resulting kernel
|
|
deadlocked in certain situations (sb_mount mutex and sisters).
|
|
The reason is that stacking of filesystem instances (like
|
|
/vol/mydata relying on IO to /mars) is a pain in the very old
|
|
kernel architecture.
|
|
Any upstream kernel before 3.16 is EOL right now. Nevertheless,
|
|
I am officially supporting 3.2 at the moment, and have tested it.
|
|
Anyway, productive use of ancient kernels is not
|
|
recommended, for various reasons.
|
|
Notice that you also need old gcc versions for building such
|
|
EOL kernels.
|
|
Thus I decided to remove support for 2.6.32 officially.
|
|
If somebody needs it _really_, please contact me.
|
|
|
|
mars0.1astable69 (merged from mars0.1stable69)
|
|
* Major improvement: compatibility to upstream kernel 4.9.x.
|
|
|
|
mars0.1astable68 (merged from mars0.1stable68)
|
|
* Minor fix: sometimes sync was advancing only slowly.
|
|
* Minor fix: in extremly rare cases and under further conditions,
|
|
detach could hang due to a race.
|
|
Workaround was possible by re-attaching.
|
|
* Minor improvement: /dev/mars/mydata now disappears only after
|
|
writeback has finished. Although the old behaviour was correct,
|
|
certain userspace tool could have erronously concluded that
|
|
the primary has finished working. The new bevaiour is
|
|
hopefully more like to user expectance.
|
|
* Minor improvement: propagate physical and logical sector
|
|
sizes from the underlying disk to /dev/mars/mydata.
|
|
This can affects mkfs and other tools for making better
|
|
decisions about their internal parameters.
|
|
* Minor safeguard: disallow manual --ignore-sync override
|
|
when the target primary is inconsistent, only relevant
|
|
for (non-existent) sysadmins who absolutely don't know what
|
|
they are doing when they are combining this with --force.
|
|
Systemadmins who really know what they are doing can use
|
|
fake-sync in front of it, and then they are explicitly stating
|
|
once again that they really want to force a defective system,
|
|
and that they really know the fact that it is defective.
|
|
* Minor improvement: additional warning when network connections
|
|
are interrupted (asymmetrically), such as by mis-configuration
|
|
of network interfaces / routing / firewall rules / etc.
|
|
|
|
mars0.1astable67 (merged from mars0.1stable67)
|
|
* Minor fix: don't unnecessarily alert sysadmins when no systemd
|
|
unit files are installed.
|
|
* Minor doc update: new slides from LCA2019, updated old
|
|
slides from FrOSCon2018.
|
|
* Minor doc update: describe some more use cases, add some
|
|
advice for managers.
|
|
|
|
mars0.1astable66.
|
|
* Merge mars0.1stable66. In detail:
|
|
* Critical fix, only relevant for kernels 4.3 to 4.4:
|
|
Due to a forgotten adaptation to newer kernels,
|
|
some userspace tools like xfs_repair could read/write
|
|
wrong data upon _large_ IO requests, and/or kernel memory
|
|
corruption could occur. Kernel-level filesystems
|
|
are typically _not_ affected because they typically use 4k
|
|
pages at maximum.
|
|
If you are operating such a kernel, please upgrade to
|
|
minimize any risks. You probably want userspace tools like
|
|
xfs_repair to not crash your kernel ;)
|
|
The problem was reproducibly detected at lab regression testing,
|
|
_before_ updating a big installation from kernel 3.16 to 4.4.
|
|
It did not show up with the old kernel.
|
|
Notice: kernels >4.6 are not yet supported at the moment,
|
|
but work on them is likely being continued during the next
|
|
months. Stay tuned.
|
|
* Minor doc updates.
|
|
|
|
mars0.1abeta18
|
|
* Merge mars0.1stable65.
|
|
|
|
mars0.1abeta17
|
|
* Merge mars0.1stable64.
|
|
* Fix compiler warning at certain kernel versions.
|
|
|
|
mars0.1abeta16
|
|
* Merge mars0.1stable63.
|
|
|
|
mars0.1abeta15
|
|
* Merge mars0.1stable62.
|
|
|
|
mars0.1abeta14
|
|
* Merge mars0.1stable61.
|
|
|
|
mars0.1abeta13
|
|
* Minor feature: marsadm takes comma-separated list of
|
|
resource names in place of "all".
|
|
* Merge mars0.1stable60.
|
|
|
|
mars0.1abeta12
|
|
* Merge mars0.1stable59.
|
|
|
|
mars0.1abeta11
|
|
* Merge mars0.1stable58.
|
|
|
|
mars0.1abeta10
|
|
* Make IP_TOS compile-time configurable.
|
|
* Update doc on IP_TOS.
|
|
|
|
mars0.1abeta9
|
|
* Major feature: lowlevel TCP tuning, separately for traffic
|
|
types MARS_TRAFFIC_META (default port 7777),
|
|
and MARS_TRAFFIC_REPLICATION (default port 7778),
|
|
and MARS_TRAFFIC_SYNC (default port 7779).
|
|
* Merge mars0.1stable57.
|
|
|
|
mars0.1abeta8
|
|
* Merge mars0.1stable56.
|
|
|
|
mars0.1abeta7
|
|
* Merge mars0.1stable55.
|
|
|
|
mars0.1abeta6
|
|
* Merge mars0.1stable54.
|
|
|
|
mars0.1abeta5
|
|
* Merge mars0.1stable53.
|
|
|
|
mars0.1abeta4
|
|
* Merge mars0.1stable52.
|
|
|
|
mars0.1abeta3
|
|
* Merge mars0.1stable51.
|
|
|
|
mars0.1abeta2
|
|
* Merge mars0.1stable50.
|
|
* Silence annoying false-positive network interruption messages.
|
|
|
|
mars0.1abeta1
|
|
* Merge mars0.1stable49.
|
|
* Several smaller fixes.
|
|
|
|
mars0.1abeta0
|
|
Forked off from 0.1balpha4.
|
|
Merge 0.1stable48 (in several intermediate steps).
|
|
Some infrastructure for version detection.
|
|
Backport of selected fixes from branch 0.1b.y.
|
|
Add marsadm split-cluster.
|
|
|
|
-----------------------------------
|
|
Changelog for the deprecated series 0.1b:
|
|
(only the part which has been merged with branch mars0.1a)
|
|
(notice that there were a few more historic branches which
|
|
were not really usable, and never went into production)
|
|
|
|
mars0.1balpha4
|
|
--------
|
|
* First improvements for scalability to thousands of nodes.
|
|
Not yet tested with really huge masses of nodes, only
|
|
with relatively small clusters.
|
|
* Merge fixes from mars0.1stable41 (see there)
|
|
* Doc update on socket bundling.
|
|
|
|
mars0.1balpha3.4
|
|
--------
|
|
* Merge fix from mars0.1stable40 (see there)
|
|
|
|
mars0.1balpha3.3
|
|
--------
|
|
* Merge fixes from mars0.1stable39
|
|
* Major fix: copy was sometimes hanging.
|
|
* Minor fix: unnecessary delay of metadata propagation.
|
|
* Performance improvements / bottleneck enhancenemts:
|
|
- Lamport clock
|
|
- Network
|
|
- md5 checksumming
|
|
* Userspace: faster logfile deletion via cron job.
|
|
|
|
mars0.1balpha3.2
|
|
--------
|
|
* Merge mars0.1stable38: now compiles without pre-patch
|
|
on certain kernel versions. Please read ChangeLog there.
|
|
|
|
mars0.1balpha3.1
|
|
--------
|
|
* Minor fix: deadlock on termination of copy thread.
|
|
|
|
mars0.1balpha3
|
|
--------
|
|
* Some tuning (more to come later):
|
|
* Speedup network by better corking.
|
|
* New scalable Lamport clock implementation.
|
|
|
|
mars0.1balpha2
|
|
--------
|
|
* Socket bundling (cherry-picked from mars0.2.y).
|
|
* Speedup copy processes (sync, logfile transfer).
|
|
* Speedup bio and md5 checksumming.
|
|
|
|
mars0.1balpha1
|
|
--------
|
|
* First improvements for scalability to more than 10 resources
|
|
per node. Already tested with 128 resources on a pair of nodes.
|
|
More improvements to come later.
|
|
No functional changes otherwise (from a sysadmin perspective).
|
|
Rollback to stable series 0.1 should be possible at
|
|
any time.
|
|
* Include fix from 0.1stable37.
|
|
|
|
mars0.1balpha0
|
|
--------
|
|
* Minor fix: the 1&1 specific feature set-sync-pref-list was
|
|
not used at all. Without it, the limitation feature for the sync
|
|
parallelism degree did not work correctly (without leading to harm,
|
|
other than optimum sync throughput / performance).
|
|
Removed the old _obsolete_ feature (for formal reasons,
|
|
this cannot be done in the 0.1stable branch).
|
|
Re-implemnented the feature in a very simple form,
|
|
which is hopefully "obviously correct" now.
|
|
* Minor feature: please use "marsadm cron" as a fool-proof short form,
|
|
in particular at cron jobs.
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.1:
|
|
|
|
Attention! This branch is now EOL.
|
|
Everything has been merged into branch mars0.1a.y which
|
|
is also the master branch.
|
|
PLEASE UPGRADE to the new branch.
|
|
Upgrade is easy: just rollout the new marsadm version,
|
|
install the new kernel modules, and load them where possible.
|
|
Mixed operation of different versions is no problem,
|
|
but is of course not the desired state, so keep this period
|
|
as short as possible.
|
|
Rollback is also easy.
|
|
|
|
Motivation: branch 0.1a is productive for several years at 1&1.
|
|
Experiences: now runs provably better than 0.1.y with
|
|
better performance, smoother, etc.
|
|
|
|
mars0.1stable74 (last stable release in branch mars0.1.y)
|
|
* Major fix, only relevant for a corner case:
|
|
Writeback made no human-visible progress under
|
|
multiple weird preconditions.
|
|
* Minor usability improvement: marsadm view shows
|
|
more fancy details on logfile numbers.
|
|
|
|
mars0.1stable73
|
|
* Critical fix, only relevant for kernels >= 4.2.x:
|
|
NULL deref occurs systematically when more than 64
|
|
file handles are being allocated.
|
|
There is already an upstream bugfix in linux-next
|
|
(missing initializer for resize_wait in fs/file.c).
|
|
Since this fix is missing in many LTS and distro kernels
|
|
(at the moment), I added a workaround in MARS.
|
|
Recommendation: anyone operating MARS on newer kernels
|
|
should update to mars0.1astable73 for safe operations.
|
|
Don't leave this unfixed. It can explode at the worst
|
|
moment, and restoring operations may only be possible
|
|
by completely giving up a secondary host, or with a fix.
|
|
|
|
mars0.1stable72
|
|
* Minor fix: writeback improved in a corner case.
|
|
* Minor improvement: display WriteBack data amount in
|
|
marsadm view.
|
|
* Major doc improvement: describe IO performance tuning.
|
|
|
|
mars0.1stable71
|
|
* Major fix: writeback at the primary was unnecessarily
|
|
slow at certain situations.
|
|
|
|
mars0.1stable70
|
|
* Critical fix: a few upper-layer kernel components are
|
|
allocating struct bio on the stack. This led to stack memory
|
|
corruption. If you ever had this problem, you certainly have
|
|
noticed it ;) Thus it should not have affected your data.
|
|
Unfortunately, I got no bug reports about this for several years.
|
|
Discovered when testing compatibility to very new kernels,
|
|
and now hopefully fixed.
|
|
* Major fixes: the systemd interface was not in a mature state.
|
|
Now improved a lot. More improvements are likely to follow
|
|
in the next months.
|
|
* Minor clarification: build for ancient kernel 2.6.32 was broken.
|
|
Fixing the build was no problem, but then the resulting kernel
|
|
deadlocked in certain situations (sb_mount mutex and sisters).
|
|
The reason is that stacking of filesystem instances (like
|
|
/vol/mydata relying on IO to /mars) is a pain in the very old
|
|
kernel architecture.
|
|
Any upstream kernel before 3.16 is EOL right now. Nevertheless,
|
|
I am officially supporting 3.2 at the moment, and have tested it.
|
|
Anyway, productive use of ancient kernels is not
|
|
recommended, for various reasons.
|
|
Notice that you also need old gcc versions for building such
|
|
EOL kernels.
|
|
Thus I decided to remove support for 2.6.32 officially.
|
|
If somebody needs it _really_, please contact me.
|
|
|
|
mars0.1stable69
|
|
* Major improvement: compatibility to upstream kernel 4.9.x.
|
|
|
|
mars0.1stable68
|
|
* Minor fix: in extremly rare cases and under further conditions,
|
|
detach could hang due to a race.
|
|
Workaround was possible by re-attaching.
|
|
* Minor improvement: /dev/mars/mydata now disappears only after
|
|
writeback has finished. Although the old behaviour was correct,
|
|
certain userspace tool could have erronously concluded that
|
|
the primary has finished working. The new bevaiour is
|
|
hopefully more like to user expectance.
|
|
* Minor improvement: propagate physical and logical sector
|
|
sizes from the underlying disk to /dev/mars/mydata.
|
|
This can affects mkfs and other tools for making better
|
|
decisions about their internal parameters.
|
|
* Minor safeguard: disallow manual --ignore-sync override
|
|
when the target primary is inconsistent, only relevant
|
|
for (non-existent) sysadmins who absolutely don't know what
|
|
they are doing when they are combining this with --force.
|
|
Systemadmins who really know what they are doing can use
|
|
fake-sync in front of it, and then they are explicitly stating
|
|
once again that they really want to force a defective system,
|
|
and that they really know the fact that it is defective.
|
|
* Minor improvement: additional warning when network connections
|
|
are interrupted (asymmetrically), such as by mis-configuration
|
|
of network interfaces / routing / firewall rules / etc.
|
|
|
|
mars0.1stable67
|
|
* Minor fix: don't unnecessarily alert sysadmins when no systemd
|
|
unit files are installed.
|
|
* Minor doc update: new slides from LCA2019, updated old
|
|
slides from FrOSCon2018.
|
|
* Minor doc update: describe some more use cases, add some
|
|
advice for managers.
|
|
|
|
mars0.1stable66
|
|
* Critical fix, only relevant for kernels 4.3 to 4.4:
|
|
Due to a forgotten adaptation to newer kernels,
|
|
some userspace tools like xfs_repair could read/write
|
|
wrong data upon _large_ IO requests, and/or kernel memory
|
|
corruption could occur. Kernel-level filesystems
|
|
are typically _not_ affected because they typically use 4k
|
|
pages at maximum.
|
|
If you are operating such a kernel, please upgrade to
|
|
minimize any risks. You probably want userspace tools like
|
|
xfs_repair to not crash your kernel ;)
|
|
The problem was reproducibly detected at lab regression testing,
|
|
_before_ updating a big installation from kernel 3.16 to 4.4.
|
|
It did not show up with the old kernel.
|
|
Notice: kernels >4.6 are not yet supported at the moment,
|
|
but work on them is likely being continued during the next
|
|
months. Stay tuned.
|
|
* Minor doc updates.
|
|
|
|
mars0.1stable65
|
|
* Major fix, only observed during KASAN debugging:
|
|
Use-after-free which appears to splat only at Football
|
|
during final deletion of resources. Never observed at production.
|
|
Update if you are very cautious.
|
|
* A few minor fixes, not relevant for production.
|
|
* Minor doc improvements.
|
|
|
|
mars0.1stable64
|
|
* Major regression: split-brain detection did not display
|
|
correctly.
|
|
* Minor fix: rare race conditon on O_NONBLOCK networking.
|
|
Only observed during testing with kernel 4.9 (sorry, _all_ the
|
|
adaptations are not yet ready for release, but it is making
|
|
progress now).
|
|
I am not sure whether this bug could also trigger with kernel
|
|
4.4 or earlier, therefore I am releasing the fix beforehand.
|
|
* Minor doc architectural explanations.
|
|
|
|
mars0.1stable63
|
|
* Minor fix: when compiling for some newer kernels (only there),
|
|
schedule() could be called during wait for some condition,
|
|
worsening performance unnecessarily.
|
|
* Minor improvement: starting join-resource in batches
|
|
was slow because each was waiting for cluster communication.
|
|
Use a manual "marsadm wait-cluster" before starting batches
|
|
of join-resource operations.
|
|
* Doc: some clarifications on BigCluster scalability behaviour.
|
|
|
|
mars0.1stable62
|
|
* Minor fix: race between join-resource and log-rotate.
|
|
* Minor fix: report split brain logfile amount only when
|
|
actually detectable.
|
|
* Minor improvement: shift annoying error message over
|
|
to Orphan state detection.
|
|
* Football: update to Football-2.0-RC12
|
|
* doc: some updates.
|
|
|
|
mars0.1stable61
|
|
* Minor fix: in very rare cases where some symlinks are missing,
|
|
don't abort in try_to_avoid_splitbrain().
|
|
* Minor improvement: better human-readable numbers.
|
|
* Minor doc: more on asynchronous background operations.
|
|
|
|
mars0.1stable60
|
|
* Major improvement: new option --ignore-sync allows primary
|
|
Handover without --force even when some sync is running
|
|
somewhere. Any running syncs will restart from scratch
|
|
(which might take some time, depending on LV size and
|
|
many more factors like the network).
|
|
* Minor fix: split-cluster did not work correctly when no
|
|
resources were existing anymore, at all.
|
|
* Doc: major update. More explanation on CAP theorem, and
|
|
on differences / commonalities with DRBD.
|
|
|
|
mars0.1stable59
|
|
* Major fix: "marsadm up" did not work when sync could not
|
|
be started. Now does "best effort".
|
|
* Minor fix: marsadm system interface was active when
|
|
not activated.
|
|
* Minor usability improvement: new repliaction state "Orphaned"
|
|
indicates that logfiles are missing, and thus replication
|
|
is stuck.
|
|
|
|
mars0.1stable58
|
|
* Major fix for Football / split-cluster: for safety,
|
|
cron deletes some blocking left-overs.
|
|
* Major fix at _asymmetric_ split-cluster: ignore hindering
|
|
abort condition.
|
|
* Minor fix: not all internal systemd links were removed upon
|
|
marsadm set-systemd-unit mydata "".
|
|
* Doc: Football.
|
|
* Doc: architectural treatment of centralized storage.
|
|
|
|
mars0.1stable57
|
|
* Minor fix: silly deadlock upon scarce race at logging.
|
|
Without debug logging, probability should be extremely low
|
|
(only observed at rmmod).
|
|
* Added initial version of systemd templates (for future backward
|
|
compatibility with branch 0.1a).
|
|
* Doc: systemd templates.
|
|
|
|
mars0.1stable56
|
|
* Minor fix: split-cluster could unnecessarily abort
|
|
in some cases.
|
|
* Added initial version of submodule "football".
|
|
More updates will follow.
|
|
|
|
mars0.1stable55
|
|
* Major fix: unnecessary / false positive split brain could
|
|
occur after the primary logfile was truncated, e.g. at crashes
|
|
or disk damages. Systematic triggering in masses was possible
|
|
by keeping /dev/mars/mydata mounted while _forcing_
|
|
a reboot _during_ (!) its umount (e.g. by patching the
|
|
"reboot" command and/or patching systemd dependencies
|
|
or similar to provoke this regularly).
|
|
|
|
mars0.1stable54
|
|
* Major fix, only relevant for massive execution of
|
|
leave-resource, e.g. when playing Football (Tetris)
|
|
games:
|
|
When non-versioned symlinks were eventually deleted,
|
|
later re-creation did not always succeed.
|
|
Fixed by an new generic timestamp ordering approach.
|
|
* Stability client-side fixes (could lead to stacktraces),
|
|
backported from branch 0.1a (were forgotten long ago).
|
|
* Major doc update: new section on reliability of
|
|
storage architectures.
|
|
This explains why many BigCluster systems don't work as
|
|
expected.
|
|
Backed up by graphs and by mathematical formulas.
|
|
A must-read for anyone working in the storage area!
|
|
|
|
mars0.1stable53
|
|
* Major fix: rare corner case of split brain was not displayed
|
|
correctly.
|
|
* Major usablilty: show amount of data during split brain.
|
|
This hints the sysadmins about the size of future data loss
|
|
at later split brain resolution.
|
|
* Minor workaround: crashed /mars filesystems may contain
|
|
completely damaged symlinks with timestamps in the far
|
|
distant future, e.g. year >3000 etc. Safeguard unusual
|
|
Lamport time slips by ignoring implausible values.
|
|
* Major improvement: internal locking overhead reduced.
|
|
* Minor improvment: reduce message trigger overhead.
|
|
* Several minor improvements.
|
|
* Doc updates.
|
|
|
|
mars0.1stable52
|
|
* Major contrib: new example scripts for MARS background data
|
|
migration during production. 1&1-specific code in a separate
|
|
plugin. You can write your own plugins for adaptation to
|
|
your needs.
|
|
* Minor fix: limit the size of the writeback buffer by the
|
|
rest space in /mars. This is only relevant when
|
|
/mars is dimensioned smaller than RAM (which should
|
|
never be the case in production systems, but might happen
|
|
accidentally or for testing).
|
|
Analogously, limit the maximum logfile size.
|
|
* Minor fix: prevent creation of many tiny logfiles over time
|
|
when secondaries are not catching up.
|
|
The default threshold is a minimum of 5 GB size when more
|
|
than 10 logfiles are already present.
|
|
* Minor fix: cleanup old internal .tmp-* symlinks which might
|
|
remain as leftovers when marsadm is dying at the wrong
|
|
moment.
|
|
* Minor improvement: don't run O(n) mapfree under spinlock.
|
|
More speed improvements under preparation; will result in O(k).
|
|
* Some more minor improvements.
|
|
|
|
mars0.1stable51
|
|
* Minor fix: don't abort log-delete-all too early when there
|
|
are holes in the deletion sequence numbers.
|
|
* Backport of marsadm cron from branch 0.1a, in order to systematically
|
|
support mixed operation of different MARS versions in bigger installations
|
|
(avoid confusion at junior sysadmins and at monitoring staff).
|
|
* Rectified the semantics of log-delete, which now does the same as
|
|
log-delete-all. Single deletion is only needed for testing, and
|
|
has been renamed to log-delete-one.
|
|
Leaving the old semantics would have been an operational risk
|
|
when junior sysadmins or 24/7 surveillance people are not carefully
|
|
looking at the details of semantics. Now everything is hopefully
|
|
as everybody not familiar with MARS would naively assume.
|
|
* Doc update.
|
|
|
|
mars0.1stable50
|
|
* Major usability improvement (backport from 0.1a):
|
|
marsadm shows number of replicas of each resource, out of total number
|
|
of cluster members. Example: [2/4]
|
|
* Minor fix: automatically cleanup internal backups produced by the new
|
|
merge-cluster / split-cluster after 1 week.
|
|
* Minor fix: also cleanup some new symlink types replicated through
|
|
the network when running asymmetric clusters with mixed branches
|
|
0.1 and 0.1a.
|
|
* Minor annoyance: silence split-cluster error message when no
|
|
resources are present.
|
|
|
|
mars0.1stable49
|
|
* Backports of new marsadm commands merge-cluster and split-cluster.
|
|
The new functionality is needed for background migration of resources.
|
|
Please be aware that this branch has not been constructed for
|
|
scalability in the dimension of #nodes, so don't merge too many
|
|
nodes and use split-cluster after each background migration.
|
|
Better scalability is / will be addressed at the 0.1a and 0.1b
|
|
branches. However, currently they are not yet stable.
|
|
No changes at the kernel module (besides some bug fixes);
|
|
this is solely done at userspace level.
|
|
The new userspace-level commands should have almost no intersection
|
|
with (and therefore no impact onto) other parts of this well-proven
|
|
stable branch.
|
|
* Backports of new wait-cluster implementation.
|
|
This avoids irritating messages after split-cluster.
|
|
|
|
mars0.1stable48
|
|
* Critical fix: DDOS-like attacks at the MARS ports (or similar caused
|
|
by bugs / misbehaviour) are prevented by configurable limits
|
|
/proc/sys/mars/handler_dent_limit and
|
|
/proc/sys/mars/handler_limit .
|
|
* Critical safeguard: when the network is interruted for a long time
|
|
while the log-rotate frequency is very high and a lot of resources
|
|
(exceeding the official limits as documented) had been used, masses of
|
|
deletion links may accumulate in /mars/todo/. First, already
|
|
existing deletions to the same targets are reused now.
|
|
Second, a maximum limit (of currently 512 entries)
|
|
is enforced, and a warning is spit when too many deletions
|
|
are accumulated over time.
|
|
* Minor fix: earlier detection of socket hangups.
|
|
|
|
mars0.1stable47
|
|
* Critical fix: leave-cluster could lead to deadlocks, also
|
|
on remote nodes.
|
|
* Contrib: mass automation script (unmaintained).
|
|
|
|
mars0.1stable46
|
|
* Major fix: bugfix from 0.1stable44 (state "Detached" was
|
|
reported too early) was incorrect, now fixed.
|
|
* Minor fix: display of host lists in special case of
|
|
create-resource was misleading.
|
|
|
|
mars0.1stable45
|
|
* Major fix: on secondaries, orphane files and symlinks were
|
|
sometimes created in /mars and could accumulate over a long time.
|
|
After several months or years of operation, the /mars directory
|
|
could appear being full via "df /mars", but "du -s /mars" was
|
|
not reporting the hidden space allocation.
|
|
Also, upon remount or reboot the cleanup of orphane files
|
|
could take a rather long time. Workaround was possible by
|
|
"rmmod mars; umount /mars; mount /mars; modprobe mars".
|
|
Fixed by regularly pruning the dentry cache of the /mars
|
|
filesystem.
|
|
|
|
mars0.1stable44
|
|
--------
|
|
* Major fix: state "Detached" was reported too early,
|
|
before the underlying disk was really closed.
|
|
* Doc: new updated slides from FrOSCon 2017.
|
|
New architectural comparison with Big Storage Clusters
|
|
in terms of scalability, reliability and costs.
|
|
|
|
mars0.1stable43
|
|
--------
|
|
* Major fix, only relevant for k >= 3 replicas:
|
|
Logfile fetch did not switch over to another alive peer
|
|
upon _speicfic_ network problems with the _current_
|
|
peer. As a consequence, an unaffected replica could
|
|
hang. Workarould was possible by pause-fetch /
|
|
resume-fetch or by fixing the network :)
|
|
|
|
mars0.1stable42
|
|
--------
|
|
* Minor fix: ssh IPs and port numbers are automatically probed
|
|
on join-cluster.
|
|
* Minor compatibility to branch mars.1b.y: join-resource
|
|
does additional rsync for safety.
|
|
* Minor fix: rate display was not going down to 0
|
|
on switchoff or long pauses.
|
|
* Minor improvement: show peers in internal debugging info.
|
|
|
|
mars0.1stable41
|
|
--------
|
|
* Minor fix: a scarce race could lead to an unnecessary split brain
|
|
when umounting _after_ role transition from primary to secondary.
|
|
|
|
mars0.1stable40
|
|
--------
|
|
* Potentially critical fix: on very fast machines, and with
|
|
extremely low probability, a race in AIO could lead to a kernel
|
|
page fault.
|
|
For maximum safety, update to this version is recommended.
|
|
|
|
mars0.1stable39
|
|
--------
|
|
* Minor fix: hangs of logfile updates. Found by stress-testing
|
|
on fast hardware over 10GBit network links. Might explain
|
|
some extremely rare (1 per several millions of operations hours)
|
|
production hangs on secondaries. Workaround possible by
|
|
"pause-fetch; resume-fetch".
|
|
* Minor fixes of rare kthread retarding under very high load.
|
|
* Minor improvement: add version number to "marsadm version" which
|
|
can be used for future compatibilty checking with respect to
|
|
new features.
|
|
|
|
mars0.1stable38
|
|
--------
|
|
* Compile without pre-patch on some kernel versions!
|
|
Whether the pre-patch is applied will be detected automatically.
|
|
However, there is some (hopefully minor) performance penalty when
|
|
the pre-patch is missing.
|
|
This will be addressed in a future release (but might go
|
|
to branch 0.1b instead, not yet decided).
|
|
Tested with vanilla kernels 3.10.105, 3.14.79, 3.16.43,
|
|
4.1.39, 4.4.67.
|
|
Vanilla kernels 4.8.x and later are _not_ yet working
|
|
(independently from pre-patches). This will be addressed
|
|
in a future release.
|
|
* No functional changes otherwise. Rollback to prior versions
|
|
should be easy. Please report any issues.
|
|
* Updated docs describing build methods.
|
|
|
|
mars0.1stable37
|
|
--------
|
|
* Minor fix: secondary logfile replication could hang in the
|
|
extremely unusual case that the expected primary logfile size
|
|
gets shortened after a crash followed by reboot.
|
|
Workaround was possible via "pause-fetch; resume-fetch".
|
|
|
|
mars0.1stable36
|
|
--------
|
|
* Doc: new slides from GUUG2017, both in English and in German.
|
|
Some very important hints for cost savings. May easily save
|
|
you a few millions when operating some petabytes of data.
|
|
* Doc: new chapter on cost savings in mars-manual.pdf.
|
|
Some parts of German oral explanations from the GUUG conference
|
|
translated to English for my English-speaking audience.
|
|
More to come later (hopefully; I need to get the time).
|
|
|
|
mars0.1stable35
|
|
--------
|
|
* Minor fix: when syncing a big resource (e.g. 40TiB) over an 1GBit
|
|
uplink, the sync may take longer than 1 day. This increases the
|
|
probability for triggering an unintended restart of that sync
|
|
from scratch.
|
|
Among further obscure preconditions, more than 5 logfiles must
|
|
exist such that the wrong assumption of an emergency mode can
|
|
happen at the secondary. In order to trigger the bug more likely,
|
|
it is therefore helpful to misconfigure /etc/cron.d/mars by
|
|
log-rotate'ing every 10 minutes, but doing log-delete-all only
|
|
once an hour (which contradicts my upstream documentation and
|
|
unnecessarily wastes valuable storage space in /mars).
|
|
Fixed by correction of a typo-like error.
|
|
|
|
mars0.1stable34
|
|
--------
|
|
* Minor fix: in some rare cases, when lots of gigabytes had to be
|
|
replayed in one big slurp, the replay position wasn't updated
|
|
during a longer time. Some admins were complaining that it
|
|
appeared "stuck" although it worked in reality.
|
|
Improved by increasing the update frequency of the replay link.
|
|
* Minor fix: after network errors, sometimes the sync restarted
|
|
from scratch, unnecessarily.
|
|
* Minor fix: under rare conditions, rmmod could hang forever.
|
|
A known reason has been fixed. Other theoretical reasons
|
|
hopefully improved by some further safeguards.
|
|
|
|
mars0.1stable33
|
|
--------
|
|
* Minor regression from stable29:
|
|
After a primary crash, without switchover, and when the primary
|
|
recovery phase involves a logrotate to an empty new logfile
|
|
which had been in the meantime shortly before the crash but
|
|
has not yet been used before the crash (race condition),
|
|
a kernel NULL pointer deref may stop the main thread.
|
|
Workaround: either remove the empty logfile by hand,
|
|
or just do a failover to the other side.
|
|
|
|
mars0.1stable32
|
|
--------
|
|
* Critical regression between stable30 and stable31 (can be avoided
|
|
by simply using stable30 for affected kernels): on _old_ kernels
|
|
(before 4.3.x) the removal of merge_bvec_fn() (see upstream commit
|
|
8ae126660fddbeebb9251a174e6fa45b6ad8f932) can lead to fatal
|
|
crashes at the primary side.
|
|
Fixed by using (hopefully) proper #ifdef's according to the
|
|
kernel version.
|
|
Notice: between stable30 and stable31 no true MARS fixes were
|
|
made (since no bugs were found). This strategy is likely to
|
|
continue for a while, for newer adaptations to even newer kernels.
|
|
In case of problems, go back. And, please, report it to me :)
|
|
|
|
mars0.1stable31
|
|
--------
|
|
* New _minimum_ pre-patches for vanilla LTS kernels 3.2.x to 4.7.x.
|
|
For security reasons, please prefer them over the old _generic_
|
|
pre-patch versions which expose many unnecessary EXPORT_SYMBOL
|
|
to potential attackers.
|
|
* Adaptions to vanilla kernels up to 4.7.x.
|
|
Note: 4.8rc-* does not yet work.
|
|
* Regression testing with many kernel versions: looks fine.
|
|
|
|
mars0.1stable30
|
|
--------
|
|
* Minor fix: in very rare cases of a primary crash, a missing
|
|
versionlink could lead to a hang.
|
|
* Minor fix: improved error reporting of replay code.
|
|
* Minor fix: improved switchback to former primary side.
|
|
* Minor fix: systematically add some missing macros.
|
|
* Minor improvements: add some example systemd unit and other
|
|
contrib stuff like a cronjob example.
|
|
* Doc: minor additions and improvements.
|
|
|
|
mars0.1stable29
|
|
--------
|
|
* Minor fix: on very fast hardware and networks, sync could take
|
|
a while for terminating.
|
|
* Minor fix: external module build.
|
|
* Major usability improvement: new expert commands marsadm
|
|
lowlevel-ls-host-ips, lowlevel-set-host-ip, lowlevel-delete-host.
|
|
Necessary for moves between networks, dedicated replication IPs,
|
|
etc.
|
|
* Minor doc update.
|
|
|
|
mars0.1stable28
|
|
--------
|
|
* Doc: describe new naming conventions.
|
|
MARS Light is now simply called MARS.
|
|
No distinction between "Light" and the future "Full" anymore.
|
|
Please note that the git branches light0.1.y and light0.2.y have
|
|
been renamed to mars0.1.y and mars0.2.y respectively.
|
|
* Minor sourcecode cleanup: s/light//g or s/light/main/g
|
|
where appropriate.
|
|
No other changes in the sourcecode, deliberately.
|
|
In case anyone encounters any build problems compiling MARS,
|
|
this release is separated just for the sake of build testing,
|
|
or Debian packaging testing, etc.
|
|
* Doc: minor clarifications.
|
|
|
|
mars0.1stable27
|
|
light0.1stable27
|
|
--------
|
|
* Critical fix: typo in sync progress comparison code could lead
|
|
to data version mismatches during sync when alternating with
|
|
replay. Only observed at a certain new hardware class, and only
|
|
while testing with an extremely high load (9 loaded resources
|
|
in parallel to 9 concurrent syncs). As a workaround,
|
|
echo 0 > /proc/sys/mars/sync_flip_interval_sec can be used.
|
|
Nevertheless, update is highly recommended!
|
|
* Major fix: slow memory leak (regression from light0.1stable26).
|
|
Only when starting the transaction logger (i.e. primary is typically
|
|
not affected). But don't let run it for a longer time.
|
|
Monitoring is possible via /proc/slabinfo (size-64 or siblings).
|
|
* Minor fix: join-cluster did not check for duplicate IP addresses.
|
|
* Minor fixes: some unnecessary annoying error messages.
|
|
* Docu: new slides from GUUG 2016 in Köln.
|
|
|
|
light0.1stable26
|
|
--------
|
|
* Minor fixes: some primitive macros were reporting misleading or
|
|
even wrong values at split brain, or during/after emergency mode.
|
|
Some high-level macros as well as try_to_avoid_split_brain
|
|
should work better / more reliable now.
|
|
* Minor fix: potential deadlock after crash reboot, or after
|
|
defective /mars filesystem. Never observed in practice.
|
|
* Minor safeguard: unnecessary split brain could emerge at
|
|
secondaries under extremely rare and strange conditions.
|
|
Unsure whether it ever occurred in practice.
|
|
* Minor usability improvement: show incorrect permissions on /mars.
|
|
Some other sysadmin tools like Puppet seem to have their own
|
|
default notion of "secure permissions" ;)
|
|
* Minor doc reorg, better chapter structure.
|
|
|
|
light0.1stable25
|
|
--------
|
|
* Major fix: in rare cases "marsadm primary" (without --force)
|
|
could go into an endless loop, even if --timeout= was specified.
|
|
* Minor fix: in rare cases of hanging or defective IO, crashes
|
|
of the primary could replicate versionlinks to the secondary,
|
|
but after reboot they were missing at the primary because of
|
|
of hanging IO or other IO / RAID controller problems.
|
|
Now using sync_filesystem() for either ensuring actuality,
|
|
or for letting the mars_light main control thread hang
|
|
(which will hopefully be noticed soon by monitoring).
|
|
* Minor fix: join-cluster uses rsync, which could abort due to
|
|
vanished filesystem objects while the primary is actively running.
|
|
Now it should tolerate such "errors".
|
|
* Minor fixes / additions at primitive macros.
|
|
* Tiny doc update.
|
|
|
|
light0.1stable24
|
|
--------
|
|
* Skip this release due to a regression.
|
|
|
|
light0.1stable23
|
|
--------
|
|
* Minor fix: the new replay-code error message was forgotten
|
|
to reset at secondaries. Now the annoying old error message
|
|
disappears after the next successful logrotate.
|
|
* Minor fixes of internal marsadm code (not in use until now).
|
|
* Minor doc update.
|
|
|
|
light0.1stable22
|
|
--------
|
|
* Critical fix for non-storage servers: the /mars directory
|
|
was readable by ordinary non-root users, opening a potential
|
|
security hole. Originally MARS was designed for standalone
|
|
storage servers solely, but now it is increasingly deployed to
|
|
machines where ordinary users can log in.
|
|
Update recommended, but only urgent for potentially affected
|
|
installations.
|
|
* Minor fix: when a logfile was damaged (observed at defective
|
|
hardware), this was often (but not always) detected by the
|
|
md5 data checksums in the transaction logfiles. So far so good.
|
|
The replay / recovery process stopped for a very good reason.
|
|
But it was not easily possible to _force_ any of the resource
|
|
members into primary role when the defect was already present at
|
|
the _primary_ (which happend once during 7 millions of operating
|
|
hours, and at a primary site which proved defective afterwards),
|
|
and the defect had been replicated to all secondaries.
|
|
As a workaround, the resource could be destroyed via leave-resource
|
|
everywhere, and re-surrected from scratch. Clumsy.
|
|
Now an md5 checksum error in the middle of a logfile is
|
|
treated similarly to an EOF. "primary --force" will succeed now,
|
|
without applying the defective data (as before).
|
|
Split brain will result for sure in such a case.
|
|
* Minor improvement: md5 logfile checksum errors are now displayed
|
|
directly in the diskstate macro (and therefore also at plain
|
|
"view").
|
|
* Minor improvement: when "marsadm view all" told you "InConsistent"
|
|
as the disk state, this was _formally correct_ because it related
|
|
to the state of the _disk_, not to the state of the replication.
|
|
The former message could appear regularly during ordinary
|
|
out-of-order writeback at the primary side, without violating
|
|
the consistency of /dev/mars/mydata.
|
|
However, many people were confused and alarmed by the irritating
|
|
message.
|
|
Now a better wording is used: "WriteBack" and "Recovery" describes
|
|
more intuitively what is really happening :)
|
|
* Minor doc improvements.
|
|
|
|
light0.1stable21
|
|
--------
|
|
* Hint: now MARS has been rolled out to more than 1600 servers,
|
|
including some MySQL database servers, and has collected more
|
|
than 6 millions of operation hours.
|
|
* Minor fixes, none of them observed in practice, only found
|
|
by testing while working on new features:
|
|
- potential read page fault
|
|
- potential deadlock
|
|
- incorrect remote symlink update under untypical circumstances
|
|
|
|
light0.1stable20
|
|
--------
|
|
* Hint: MARS is now running on more than 850 storage servers,
|
|
and has collected more than 4.5 millions of operation hours.
|
|
There were no new incidents with customer impact since the last
|
|
major bugfix (more than 3 millions of operation hours since then).
|
|
It is difficult to deduce a reliability from that, but it appears
|
|
that at least 99.999%, if not 99.9999% are now real for the
|
|
MARS component as a standalone component (not to be confused with
|
|
overall system reliability). Our storage hardware is clearly much
|
|
less reliable. MARS does compensate these defects all the time.
|
|
|
|
* Minor fix: memory leak in networking code, does not occur
|
|
at light0.1 operations (but maybe future versions of MARS).
|
|
* Doc: add presentation slides from Froscon2015.
|
|
|
|
light0.1stable19
|
|
--------
|
|
* Minor safeguard: warn when somebody tries leave-resource --host=
|
|
for a damaged host, and later the dead host resurrects in an
|
|
unreasonable way.
|
|
* Doc update: describe use cases for DRBD vs MARS more clearly.
|
|
* Minor spelling fixes.
|
|
|
|
light0.1stable18
|
|
--------
|
|
* Minor safeguard: prevent join-resource when previous log-purge-all
|
|
has been forgotten. Prevent create-resource also when previous
|
|
delete-resource has been forgotten. Anyway, this happens only in
|
|
very exotic repair scenarios after very heavy failures.
|
|
* Doc updates: simplify descriptions of split-brain resolution and
|
|
emergency mode resolution. Nowadays 'invalidate' will do everything
|
|
in all tested cases; the more complex alternative methods have
|
|
been moved to the appendix.
|
|
|
|
light0.1stable17
|
|
--------
|
|
* Minor fix: stacktrace / oops in aio callback path due to a
|
|
subtle race, observed once during 2.5 millions of operation hours.
|
|
In the observed case, the secondary was hanging, without
|
|
customer impact. However, the error class could potentially
|
|
occur also at the primary side. Probably the bug was triggered
|
|
by a hardware problem from the RAID controller.
|
|
|
|
light0.1stable16
|
|
--------
|
|
* Minor fix: sync could take a long time to complete under high
|
|
application load, similarly to a live-lock.
|
|
* Some smaller minor fixes for annoying messages.
|
|
* Contrib: added configurable Nagios check.
|
|
* Contrib: added some example scripts which could be used by
|
|
clustermanagers etc.
|
|
* Doc: important new section on pitfalls when using existing
|
|
clustermanagers UNMODIFIED for long distance replication.
|
|
PLEASE READ!
|
|
|
|
light0.1stable15
|
|
--------
|
|
* NOTICE: MARS succeeded baptism on fire at 04/22/2015 when a whole
|
|
co-location had a partial power blackout, followed by breakdown
|
|
of air conditioning, followed by mass hardware defects due to
|
|
overheating. MARS showed exactly 0 errors when (emergency)
|
|
switching to another datacenter was started in masses.
|
|
* Major fix of race in transaction logger: the primary could hang
|
|
when using very fast hardware, typically after ~24000 operation
|
|
hours. The problem was noticed 6 times during a grand total of
|
|
more than 1,000,000 operation hours on a mixed hardware park,
|
|
showing up only on specific hardware classes. Together with 3
|
|
other incidents during early beta phase which also had customer
|
|
impact, this means that we have reached a reliability of about
|
|
===> 99.999%
|
|
After this fix, the reliability should grow even higher.
|
|
A workaround for this bug exists:
|
|
# echo 2 > /proc/sys/mars/logger_completion_semantics
|
|
Update is only mandatory when you cannot use the workaround.
|
|
* Minor improvement in marsadm: re-allow --force combined with "all".
|
|
This is highly appreciated for speeding up operations / handling
|
|
during emergency datacenter switchover.
|
|
* Various smaller improvements.
|
|
* Contrib (unsupported): example rollout script for mass rollout.
|
|
|
|
light0.1stable14
|
|
--------
|
|
* Minor safeguard: modprobe mars will refuse to start when the
|
|
cluster UUID is missing.
|
|
* Minor fix: external race in marsadm resize, only relevant
|
|
for scripting.
|
|
* Minor fix: potential race on plugged IO requests.
|
|
* Clarify output of marsadm view. Many systematical improvements
|
|
and hints.
|
|
* Add some unevitable macros for scripting / automation.
|
|
* Various tiny improvements.
|
|
|
|
light0.1stable13
|
|
--------
|
|
* Critical safeguard for accidental join-cluster with wrong argument:
|
|
make UUID mandatory, disallow completely unrelated hosts to
|
|
communicate symlink tree updates when their UUIDs mismatch.
|
|
* Minor fix: leave-resource --host=other did not work when disks
|
|
were named differently throughout the cluster.
|
|
* Minor fix: detach --host=other --force (which is needed as a
|
|
precondition) did not work.
|
|
* Various minor fixes and clarifications. "marsadm view all"
|
|
now reports the communication status in the cluster.
|
|
|
|
light0.1stable12
|
|
--------
|
|
* Critical (but usually not extremely relevant) fix:
|
|
When emergency mode occurs just during a sync, the target could
|
|
remain inconsistent without notice. Now noticed.
|
|
You always could/should manually invalidate whenever an
|
|
emergency mode appeared.
|
|
Now this is automatically fixed by restarting any sync from
|
|
scratch (if one was actually running before; otherwise consistency
|
|
was never violated).
|
|
* Major documentation update / corrections.
|
|
* Major (but less relevant) fix: leave-cluster did not really work.
|
|
* Minor fix (regression): rmmod could hang when sync was running.
|
|
* Various minor fixes and clarifications.
|
|
|
|
light0.1stable11
|
|
--------
|
|
* Major documentation update. mars-manual.pdf increased from
|
|
66 to 80 pages. Please read! You probably should know this.
|
|
* Minor fixes: better cleanup on invalidate / leave-resource.
|
|
* Minor clarifications: more precise EIO error codes, more verbose
|
|
error reporting via "marsadm cat".
|
|
|
|
light0.1stable10
|
|
--------
|
|
* Major fixes of internal network protocol errors, leading to
|
|
internal shutdown of sockets, which were transparently re-opened.
|
|
It could affect network performance. Not sure whether
|
|
stability was also affected (probably under extremely high load);
|
|
for better safety you should upgrade.
|
|
* Major fix from Manuel Lausch: regex parsing sometimes went
|
|
completely wrong when hostnames followed a similar name scheme
|
|
than internal symlinks.
|
|
* Major, only relevant for k>2 replicas: fix wrong internal sharing
|
|
of data structures resulting from parallel data connections.
|
|
* Minor fix: race in fake-sync.
|
|
* Minor fix: race in invalidate.
|
|
* Minor, only for k>2 replicas: fix direct primary handover when
|
|
some non-involved hosts are currently unreachable.
|
|
* Minor: improve becoming primary during split brain.
|
|
* Minor: improve becoming primary when emergency mode starts.
|
|
* Minor: silence some annoying stderr messages.
|
|
* Several internal minor fixes and clarifications.
|
|
|
|
light0.1stable09
|
|
--------
|
|
* Major fix of scarce race (potentially critical): the bio response
|
|
thread could terminate too early, leading to a premature dealloc
|
|
of kernel memory. This has only been observed on slow virtual
|
|
machines with slow virtual devices, and very high load on k=4
|
|
replicas. This could potentially affect the stability of the system.
|
|
Although not observed at production machines at 1&1, I recommend
|
|
updating production machines to this release ASAP.
|
|
* Major usability fix: incorrect commandline options of marsadm
|
|
were just ignored if they appeared after the resource argument.
|
|
Misspellings could cause undesired effects. For instance,
|
|
"marsadm delete-resource vital --force --MISSPELLhost=banana"
|
|
was accidentally destroying the primary during operation (which
|
|
is _possible_ when using --force, and this was even a _required_
|
|
sort of "STONITH"-like feature -- however from a human point
|
|
of view it was intended to destroy _another_ host, so this was
|
|
an unexpected behaviour from a sysadmin point of view).
|
|
* Major workaround: the concept "actual primary" is wrong, because
|
|
during split brain there may exist several primaries. Do not
|
|
use the macro view-actual-primary any longer. It is deprecated now.
|
|
Use view-is-primary instead, on each host you are interested in.
|
|
* Minor fix: "marsadm invalidate" did not work in some weired
|
|
split brain situations / was not equivalent to
|
|
"marsadm leave-resource $res; marsadm join-resource $res".
|
|
The latter was the old workaround to fix the situation.
|
|
Now it shouldn't be necessary anymore.
|
|
* Minor fix: pause-fetch could take very long to terminate.
|
|
* Minor fix: marsadm wait-cluster did not wait for all hosts
|
|
particiapting in the resource, but only for one of them.
|
|
This is only relevant for k>2 replicas.
|
|
* Minor fix: the rates displayed by "marsadm view" did not drop down
|
|
to 0 when no progress was made.
|
|
* Minor fix: logging to syslog was incomplete.
|
|
* Minor usability fix: decrease boring speakyness of "log-rotate"
|
|
and "log-delete" for cron jobs.
|
|
* Minor fixes: several internal awkwardnesses, potentially affecting
|
|
performance and/or stability in weired situations.
|
|
|
|
light0.1stable08
|
|
--------
|
|
* Minor fix: after emergency mode, a versionlink was forgotten
|
|
to create. This could lead to unnecessary reports of split
|
|
brain and/or need for additional re-invalidate.
|
|
* Minor fix: the predicate 'view-is-consistent' reported 'false'
|
|
in some situations on secondaries when all was ok.
|
|
* Minor fix: it was impossible to determine the 'is-consistent'
|
|
from 'marsadm view' (without -1and1 suffix). Added a new [Cc-]
|
|
flag. This is absolutely needed to determine whether the
|
|
underlying disks must have the same checksum (provided that
|
|
both disks are detached and the network works and fetch+replay
|
|
had completed before the detach).
|
|
* Updated docs to reflect this.
|
|
* Minor fix: 'invalidate' did not work when the resource was not
|
|
completely detached. Now it implicitly does a detach before
|
|
starting invalidation.
|
|
* Minor fix: wait-umount was waiting for umount of _all_ primaries
|
|
during split brain. Now it waits only for umount of the local node.
|
|
Notice that having multiple primaries in parallel is an
|
|
erroneous state anyway.
|
|
* Minor fix: leave-cluster did not work without --force.
|
|
|
|
light0.1stable07
|
|
--------
|
|
* Minor fix: re-creation of a completely destroyed resource
|
|
did not always work correctly
|
|
|
|
light0.1stable06
|
|
--------
|
|
* Major fix: becoming primary was hanging in scarce situations.
|
|
* Minor fix: some split brains were not always detected correctly.
|
|
* Minor fix for Redhat openvz kernel builds.
|
|
* Several fixes for 1&1 internal Debian builds.
|
|
|
|
light0.1stable05
|
|
--------
|
|
* Major fix: incomplete calls to vfs_readdir()
|
|
which could lead to incomplete symlink updates /
|
|
replication hangs.
|
|
* Minor fix: scarce race on replay EOF.
|
|
* Separated kernel from userspace build environment.
|
|
* Removed some potentially dangerous Kconfig options
|
|
if they would be set to wrong values (robustness against
|
|
accidentally producing bad kernel modules).
|
|
* Dito: some additional checks against bad main Kconfig options
|
|
(mainly for out-of-tree builds).
|
|
* Separated contrib code from maintained code.
|
|
* Added some pre-patches for newer kernels
|
|
(WIP - not yet fully tested at all combinations)
|
|
* Minor doc addition: LinuxTag 2014 presentation.
|
|
|
|
light0.1stable04
|
|
--------
|
|
* Quiet annoying error message.
|
|
* Minor readability improvements.
|
|
* Minor doc updates.
|
|
|
|
light0.1stable03
|
|
--------
|
|
* Major: fix internal aio race (could lead to memory corruption).
|
|
* Fix refcounting in trans_logger.
|
|
* Some minor fixes in module code.
|
|
* Fix 1&1-internal out-of-tree builds.
|
|
* Various minor fixes.
|
|
* Update monitoring tools / docs (German, contributed by Jörg Mann).
|
|
|
|
light0.1stable02
|
|
--------
|
|
* Fix sorting of internal data structure.
|
|
* Fix IO error propagation at replay.
|
|
|
|
light0.1stable01
|
|
--------
|
|
* Fix parallelism of logfile propagation: sometimes a secondary
|
|
could get a more recent version than the primary had on stable
|
|
storage after its crash, eventually leading to an (annoying)
|
|
split brain. Some people might take this as a feature instead
|
|
of a bug, but now the logfile transfer starts only after the
|
|
primary _knows_ that the data is successfully committed to
|
|
stable storage.
|
|
* Fix memory leaks in error path.
|
|
* Fix error propagation between client and server.
|
|
* Make string allocation fully dynamic (remove limitation).
|
|
* Fix some annoying messages.
|
|
* Fix usage output of marsadm.
|
|
* Userspace: contributed bugfix for Debian udev rules by Jörg Mann.
|
|
* Improved debugging (only for testing).
|
|
|
|
light0.1beta0.18 (feature release)
|
|
--------
|
|
* New commands marsadm view-$macroname
|
|
* New customizable macro processor
|
|
* New err/warn/inf reporting via symlinks
|
|
* Per-resource emergency mode
|
|
* Allow limiting the sync parallelism
|
|
* New flood-protected syslogging
|
|
* Some smaller improvements
|
|
* Update docs
|
|
* Update test suite
|
|
|
|
light0.1beta0.17
|
|
--------
|
|
* Major bugfix: race in logfile switchover could sometimes
|
|
lead to the wrong logfile (extremely rare to hit, but
|
|
potentially harmful).
|
|
* Disallow primary switching when some secondaries are
|
|
syncing.
|
|
* Fix logfile fetch from multiple peers.
|
|
* Fix computation of transitive closure (affected
|
|
log-purge-all, split brain detection, and many others).
|
|
* Fix incorrect emergency mode detection.
|
|
* Primaries no longer fetch logfiles (unnecessarily, only
|
|
makes a difference at concurrent split brain operations).
|
|
* Detached resources no longer fetch logfiles (unexpectedly).
|
|
* Myriads of smaller fixes.
|
|
|
|
light0.1beta0.16
|
|
--------
|
|
|
|
* Critical bugfix: "marsadm primary --force" was assumed to be given
|
|
by sysadmins only in case of emergency, when the network is down.
|
|
When given in non-emergency cases where the old primary continues
|
|
to run (/dev/mars/* being actively used and written), the
|
|
old primary could suddendly do a "logrotate" to the
|
|
new split-brain logfile produced by the new (second) primary.
|
|
Now two primaries should be able to run concurrently in split-brain
|
|
mode without mutually trashing their logfiles.
|
|
* primary --force now only works in disconnected mode, in order
|
|
to hinder unintended forceful creation of split brain during
|
|
normal operation.
|
|
* Stop fetching of logfiles behind split brain points (save space
|
|
at the target hosts - usually the data will be discarded later).
|
|
* Fixed split brain detection in userspace.
|
|
* leave-resource now waits for local actions to take place
|
|
(remote actions stay asynchronously).
|
|
* invalidate / join-resource now work only if a designated primary
|
|
exists (otherwise they would not know uniquely from whom
|
|
to start initial sync).
|
|
* Update docs, clarify scenarios intended <-> emergengy switching.
|
|
* Fixed mutual overwrite of deletion symlinks in case of racing
|
|
log-deletes spawned in parallel by cron jobs (resilience).
|
|
* Fixed races between deletion and re-erection (e.g. fresh
|
|
join-resource after leave-resource during network partitions).
|
|
* Fixed duration of network timeouts in case the network is down
|
|
(replaced non-working TCP_KEEPALIVE by explicit timeouts).
|
|
* New option --dry-run which does not really create symlinks.
|
|
* New command "delete-resource" (VERY DANGEROUS) for
|
|
forcefully destroying a resource, even when it is in use.
|
|
Intended only for _emergency_ cases when sysadmins are
|
|
desperate. Use only by hand, first run with --dry-run in order
|
|
to check what will happen!
|
|
* New command "log-purge-all" (potentially DANGEROUS) for
|
|
resolving split brain in desperate situations (cleanup of
|
|
leftovers). Only use by hand, first run with --dry-run!
|
|
* Lots of smaller imprevements / usability / readability etc.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.15
|
|
--------
|
|
|
|
* Introduce write throttling of bulk writers.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.14
|
|
--------
|
|
|
|
* Fix logfile transfer in case of "holes" created by
|
|
emergency mode.
|
|
* Fix "marsadm invalidate" after emergency mode had been entered.
|
|
* Fix "marsadm resize" capacity propagation from underlying LVM.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.13
|
|
--------
|
|
|
|
* Fix shutdown during operation (flying requests).
|
|
* Fix unnecessary Lamport clock propagation storms.
|
|
* Improve unnecessary page cache utilisation (mapfree).
|
|
* Update test suite.
|
|
|
|
|
|
light0.1beta0.12 and earlier
|
|
--------
|
|
|
|
There was no dedicated ChangeLog. For details, look at the
|
|
commit history.
|
|
|
|
Release Policy / Software Lifecycle
|
|
-----------------------------------
|
|
|
|
New source releases are simply announced by appearance of git tags.
|