mirror of
https://github.com/schoebel/mars
synced 2024-12-11 09:15:48 +00:00
1009 lines
43 KiB
Plaintext
1009 lines
43 KiB
Plaintext
IMPORTANT: the historic distinction between MARS Light and the future
|
|
MARS Full has been dropped. Now all versions are simply called "mars".
|
|
|
|
Old tagnames light* will remain valid, but newer names will follow the
|
|
convention s/light/mars/g (this means that the old version number counting
|
|
will be continued, only the "light" is substituted).
|
|
|
|
|
|
Meaning of stable tagnames
|
|
--------------------------
|
|
|
|
Example: mars0.1stable01:
|
|
0 = version of on-disk data structures
|
|
(only incremented when downgrades are impossible)
|
|
(not incremented on backwards-compatible upgrades)
|
|
1 = version of feature set
|
|
stable = feature set is frozen during this series
|
|
01 = bugfix revision
|
|
|
|
Example: mars0.2beta2.3:
|
|
The general idea is as before.
|
|
"beta" means that new features are roughly tested
|
|
in the lab, but not in production, so there may be
|
|
some bugs. New features may be added during
|
|
the beta phase.
|
|
|
|
Example: mars0.3alpha*:
|
|
Never use this for production. Only for historic
|
|
code inspection.
|
|
|
|
Release Conventions / Branches / Tagnames
|
|
-----------------------------------------
|
|
FLOW OF BUGFIXES: 0.1 -> 0.1a -> 0.1b -> 0.2 -> ...
|
|
|
|
mars0.1 series (stable):
|
|
- Asynchronous replication.
|
|
Currently operational at more than 3000 servers at
|
|
1&1, more than 25,000,000 operating hours (Feb 2017)
|
|
- Unstable tagnames: light0.1beta%d.%d (obsolete)
|
|
- Stable branch: mars0.1.y
|
|
- Stable tagnames: mars0.1stable%02d
|
|
|
|
mars0.1a series:
|
|
New designated master branch. Will become stable ASAP.
|
|
|
|
mars0.1b series (currently alpha):
|
|
This is an _imtermediate_ series between 0.1 and 0.2.
|
|
The goal is to improve _scalability_ to thousands of
|
|
hosts in one cluster, as well as thousands of resources.
|
|
Likely, this intermdiate branch will be merged into 0.2
|
|
and then continue development there. When this point
|
|
will arrive is uncertain at the moment.
|
|
Likely, the stabilization of the new scalability features
|
|
will occur together with the 0.2 series.
|
|
Reason for this: the rollout strategy at 1&1 to
|
|
thousands of machines wants to do small incremental
|
|
steps. The risk of directly going to 0.2 in _masses_
|
|
is minimized by first rolling out the really necessary
|
|
changes, and to postpone those developments which are
|
|
currently not yet really needed in mass deployment.
|
|
|
|
mars0.2 series (currently in beta stage):
|
|
Mostly for internal needs of 1&1 (but not limited to that).
|
|
- Getting rid of the kernel prepatch! MARS may be built
|
|
as an external kernel module for any supported
|
|
kernel version. First prototype is only tested for
|
|
unaltered 3.2.x vanilla kernel, but compatibility to
|
|
further vanilla kernel versions (maybe even
|
|
Redhat-specific ones) will follow during the course of
|
|
the MARS mars0.2 stable series. The problem is not
|
|
compatibility as such, but _testing_ that it really
|
|
works. These tests need a lot of time.
|
|
=> further arguments for getting to kernel upstream ASAP.
|
|
- Improved network throughput by parallel TCP connections
|
|
(in particular under packet loss).
|
|
Also called "socket bundling".
|
|
First benchmarks show an impressive speedup over
|
|
highly congested long-distance lines.
|
|
- Future-proof updates in the network protocol:
|
|
Mixed operation of 32/64bit and/or {big,low}endian
|
|
- Support for multi-homed network interfaces.
|
|
- Transparent data compression over low bandwidth lines.
|
|
Consumes a lot of CPU, therefore only recommended for
|
|
low write loads or for desperate network situations.
|
|
- Remote device: bypassing iSCSI. In essence,
|
|
/dev/mars/mydata can appear at any other cluster member
|
|
which doesn't necessarily need any local disks.
|
|
- Various smaller features and improvements.
|
|
- Unstable tagnames: mars0.2beta%d.%d (current)
|
|
- Stable branch: mars0.2.y (already in use for beta)
|
|
- Stable tagnames: mars0.2stable%02d (planned)
|
|
|
|
mars0.3 series (planned):
|
|
(some might possibly go to 1.0 series instead)
|
|
- Improve replication latency.
|
|
- New pseudo-synchronous replication modes.
|
|
For the internal needs of database folks at 1&1.
|
|
- (Maybe) old test suite could be retired, a new
|
|
one is at github.com/schoebel/test-suite
|
|
- Unstable tagnames: mars0.3beta%d.%d (planned)
|
|
- Stable branch: mars0.3.y (planned)
|
|
- Stable tagnames: mars0.3stable%02d (planned)
|
|
|
|
mars1.0 series (planned):
|
|
- Replace symlink tree by transactional status files
|
|
(future-proof)
|
|
This is required for upstream merging to the kernel.
|
|
It has further advantages, such as better scalability.
|
|
- Trying to additionally address public needs.
|
|
- Potentially for Linux kernel upstream,
|
|
- Unstable tagnames: mars1.0beta%d.%d (planned)
|
|
- Stable branch: mars1.0.y (planned)
|
|
- Stable tagnames: mars1.0stable%02d (planned)
|
|
|
|
WIP-* branches are for development and may be rebased onto anything
|
|
at any time without notice. They will disappear eventually.
|
|
|
|
*stable* branches mean that only bug fixes and documentation
|
|
updates / clarifications will be applied. Updates to the test suite /
|
|
new test cases potentially disguising bugs, and other minor additions
|
|
of debugging code / paranoia code which may lead to discovery
|
|
of bugs are also possible. Error messages / warnings and their
|
|
error class may also be changed.
|
|
NO NEW FEATURES, not even minor ones, except when absolutely
|
|
necessary for a bugfix, or for an important usability improvement
|
|
(such as clearer display of errors, hints for resolving them, etc).
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.2:
|
|
|
|
(you need to checkout branch mars0.2.y to see any details)
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.1b:
|
|
|
|
(you need to checkout branch mars0.1b.y to see any details)
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.1a:
|
|
(you need to checkout branch mars0.1a.y to see any details)
|
|
|
|
Designated master branch in future.
|
|
Recommended for Football. Will get updates on
|
|
the Football sub-project.
|
|
Based on 0.1balpha4. Merge 0.1stable.
|
|
Receive mainly fixes.
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.1:
|
|
|
|
Attention! This branch will go EOL in the next few months.
|
|
Reason: branch 0.1a is productive for several months at 1&1.
|
|
Experiences: seems to run better than 0.1.y with
|
|
better performance, smoother, etc.
|
|
Upgrade is easy: just rollout the new marsadm version,
|
|
install the new kernel modules, and load them where possible.
|
|
Mixed operation of different versions is no problem,
|
|
but is of course not the desired state, so keep it short.
|
|
Rollback is also easy.
|
|
If nobody complains, I'd like to get rid of this ancient
|
|
branch ASAP.
|
|
Before being closed, I am adding the new systemd templates
|
|
for better upgrade / downgrade in case you need it. If you don't
|
|
activate it, it should have zero impact.
|
|
|
|
Hint: branch 0.1a will get a merge from here, and then get the
|
|
Remote Device (which was beta until now).
|
|
The new code is not used when not activated, thus inclusion
|
|
of new features into 0.1a should be low risk.
|
|
Afterwards, 0.1a will be marked "stable", and newer features
|
|
(except Football related ones) will then go to 0.1b.
|
|
Finally, when 0.1a is stable, I will close this branch.
|
|
|
|
mars0.1stable57
|
|
* Minor fix: silly deadlock upon scarce race at logging.
|
|
Without debug logging, probability should be extremely low
|
|
(only observed at rmmod).
|
|
* Added initial version of systemd templates (for future backward
|
|
compatibility with branch 0.1a).
|
|
* Doc: systemd templates.
|
|
|
|
mars0.1stable56
|
|
* Minor fix: split-cluster could unnecessarily abort
|
|
in some cases.
|
|
* Added initial version of submodule "football".
|
|
More updates will follow.
|
|
|
|
mars0.1stable55
|
|
* Major fix: unnecessary / false positive split brain could
|
|
occur after the primary logfile was truncated, e.g. at crashes
|
|
or disk damages. Systematic triggering in masses was possible
|
|
by keeping /dev/mars/mydata mounted while _forcing_
|
|
a reboot _during_ (!) its umount (e.g. by patching the
|
|
"reboot" command and/or patching systemd dependencies
|
|
or similar to provoke this regularly).
|
|
|
|
mars0.1stable54
|
|
* Major fix, only relevant for massive execution of
|
|
leave-resource, e.g. when playing Football (Tetris)
|
|
games:
|
|
When non-versioned symlinks were eventually deleted,
|
|
later re-creation did not always succeed.
|
|
Fixed by an new generic timestamp ordering approach.
|
|
* Stability client-side fixes (could lead to stacktraces),
|
|
backported from branch 0.1a (were forgotten long ago).
|
|
* Major doc update: new section on reliability of
|
|
storage architectures.
|
|
This explains why many BigCluster systems don't work as
|
|
expected.
|
|
Backed up by graphs and by mathematical formulas.
|
|
A must-read for anyone working in the storage area!
|
|
|
|
mars0.1stable53
|
|
* Major fix: rare corner case of split brain was not displayed
|
|
correctly.
|
|
* Major usablilty: show amount of data during split brain.
|
|
This hints the sysadmins about the size of future data loss
|
|
at later split brain resolution.
|
|
* Minor workaround: crashed /mars filesystems may contain
|
|
completely damaged symlinks with timestamps in the far
|
|
distant future, e.g. year >3000 etc. Safeguard unusual
|
|
Lamport time slips by ignoring implausible values.
|
|
* Major improvement: internal locking overhead reduced.
|
|
* Minor improvment: reduce message trigger overhead.
|
|
* Several minor improvements.
|
|
* Doc updates.
|
|
|
|
mars0.1stable52
|
|
* Major contrib: new example scripts for MARS background data
|
|
migration during production. 1&1-specific code in a separate
|
|
plugin. You can write your own plugins for adaptation to
|
|
your needs.
|
|
* Minor fix: limit the size of the writeback buffer by the
|
|
rest space in /mars. This is only relevant when
|
|
/mars is dimensioned smaller than RAM (which should
|
|
never be the case in production systems, but might happen
|
|
accidentally or for testing).
|
|
Analogously, limit the maximum logfile size.
|
|
* Minor fix: prevent creation of many tiny logfiles over time
|
|
when secondaries are not catching up.
|
|
The default threshold is a minimum of 5 GB size when more
|
|
than 10 logfiles are already present.
|
|
* Minor fix: cleanup old internal .tmp-* symlinks which might
|
|
remain as leftovers when marsadm is dying at the wrong
|
|
moment.
|
|
* Minor improvement: don't run O(n) mapfree under spinlock.
|
|
More speed improvements under preparation; will result in O(k).
|
|
* Some more minor improvements.
|
|
|
|
mars0.1stable51
|
|
* Minor fix: don't abort log-delete-all too early when there
|
|
are holes in the deletion sequence numbers.
|
|
* Backport of marsadm cron from branch 0.1a, in order to systematically
|
|
support mixed operation of different MARS versions in bigger installations
|
|
(avoid confusion at junior sysadmins and at monitoring staff).
|
|
* Rectified the semantics of log-delete, which now does the same as
|
|
log-delete-all. Single deletion is only needed for testing, and
|
|
has been renamed to log-delete-one.
|
|
Leaving the old semantics would have been an operational risk
|
|
when junior sysadmins or 24/7 surveillance people are not carefully
|
|
looking at the details of semantics. Now everything is hopefully
|
|
as everybody not familiar with MARS would naively assume.
|
|
* Doc update.
|
|
|
|
mars0.1stable50
|
|
* Major usability improvement (backport from 0.1a):
|
|
marsadm shows number of replicas of each resource, out of total number
|
|
of cluster members. Example: [2/4]
|
|
* Minor fix: automatically cleanup internal backups produced by the new
|
|
merge-cluster / split-cluster after 1 week.
|
|
* Minor fix: also cleanup some new symlink types replicated through
|
|
the network when running asymmetric clusters with mixed branches
|
|
0.1 and 0.1a.
|
|
* Minor annoyance: silence split-cluster error message when no
|
|
resources are present.
|
|
|
|
mars0.1stable49
|
|
* Backports of new marsadm commands merge-cluster and split-cluster.
|
|
The new functionality is needed for background migration of resources.
|
|
Please be aware that this branch has not been constructed for
|
|
scalability in the dimension of #nodes, so don't merge too many
|
|
nodes and use split-cluster after each background migration.
|
|
Better scalability is / will be addressed at the 0.1a and 0.1b
|
|
branches. However, currently they are not yet stable.
|
|
No changes at the kernel module (besides some bug fixes);
|
|
this is solely done at userspace level.
|
|
The new userspace-level commands should have almost no intersection
|
|
with (and therefore no impact onto) other parts of this well-proven
|
|
stable branch.
|
|
* Backports of new wait-cluster implementation.
|
|
This avoids irritating messages after split-cluster.
|
|
|
|
mars0.1stable48
|
|
* Critical fix: DDOS-like attacks at the MARS ports (or similar caused
|
|
by bugs / misbehaviour) are prevented by configurable limits
|
|
/proc/sys/mars/handler_dent_limit and
|
|
/proc/sys/mars/handler_limit .
|
|
* Critical safeguard: when the network is interruted for a long time
|
|
while the log-rotate frequency is very high and a lot of resources
|
|
(exceeding the official limits as documented) had been used, masses of
|
|
deletion links may accumulate in /mars/todo/. First, already
|
|
existing deletions to the same targets are reused now.
|
|
Second, a maximum limit (of currently 512 entries)
|
|
is enforced, and a warning is spit when too many deletions
|
|
are accumulated over time.
|
|
* Minor fix: earlier detection of socket hangups.
|
|
|
|
mars0.1stable47
|
|
* Critical fix: leave-cluster could lead to deadlocks, also
|
|
on remote nodes.
|
|
* Contrib: mass automation script (unmaintained).
|
|
|
|
mars0.1stable46
|
|
* Major fix: bugfix from 0.1stable44 (state "Detached" was
|
|
reported too early) was incorrect, now fixed.
|
|
* Minor fix: display of host lists in special case of
|
|
create-resource was misleading.
|
|
|
|
mars0.1stable45
|
|
* Major fix: on secondaries, orphane files and symlinks were
|
|
sometimes created in /mars and could accumulate over a long time.
|
|
After several months or years of operation, the /mars directory
|
|
could appear being full via "df /mars", but "du -s /mars" was
|
|
not reporting the hidden space allocation.
|
|
Also, upon remount or reboot the cleanup of orphane files
|
|
could take a rather long time. Workaround was possible by
|
|
"rmmod mars; umount /mars; mount /mars; modprobe mars".
|
|
Fixed by regularly pruning the dentry cache of the /mars
|
|
filesystem.
|
|
|
|
mars0.1stable44
|
|
--------
|
|
* Major fix: state "Detached" was reported too early,
|
|
before the underlying disk was really closed.
|
|
* Doc: new updated slides from FrOSCon 2017.
|
|
New architectural comparison with Big Storage Clusters
|
|
in terms of scalability, reliability and costs.
|
|
|
|
mars0.1stable43
|
|
--------
|
|
* Major fix, only relevant for k >= 3 replicas:
|
|
Logfile fetch did not switch over to another alive peer
|
|
upon _speicfic_ network problems with the _current_
|
|
peer. As a consequence, an unaffected replica could
|
|
hang. Workarould was possible by pause-fetch /
|
|
resume-fetch or by fixing the network :)
|
|
|
|
mars0.1stable42
|
|
--------
|
|
* Minor fix: ssh IPs and port numbers are automatically probed
|
|
on join-cluster.
|
|
* Minor compatibility to branch mars.1b.y: join-resource
|
|
does additional rsync for safety.
|
|
* Minor fix: rate display was not going down to 0
|
|
on switchoff or long pauses.
|
|
* Minor improvement: show peers in internal debugging info.
|
|
|
|
mars0.1stable41
|
|
--------
|
|
* Minor fix: a scarce race could lead to an unnecessary split brain
|
|
when umounting _after_ role transition from primary to secondary.
|
|
|
|
mars0.1stable40
|
|
--------
|
|
* Potentially critical fix: on very fast machines, and with
|
|
extremely low probability, a race in AIO could lead to a kernel
|
|
page fault.
|
|
For maximum safety, update to this version is recommended.
|
|
|
|
mars0.1stable39
|
|
--------
|
|
* Minor fix: hangs of logfile updates. Found by stress-testing
|
|
on fast hardware over 10GBit network links. Might explain
|
|
some extremely rare (1 per several millions of operations hours)
|
|
production hangs on secondaries. Workaround possible by
|
|
"pause-fetch; resume-fetch".
|
|
* Minor fixes of rare kthread retarding under very high load.
|
|
* Minor improvement: add version number to "marsadm version" which
|
|
can be used for future compatibilty checking with respect to
|
|
new features.
|
|
|
|
mars0.1stable38
|
|
--------
|
|
* Compile without pre-patch on some kernel versions!
|
|
Whether the pre-patch is applied will be detected automatically.
|
|
However, there is some (hopefully minor) performance penalty when
|
|
the pre-patch is missing.
|
|
This will be addressed in a future release (but might go
|
|
to branch 0.1b instead, not yet decided).
|
|
Tested with vanilla kernels 3.10.105, 3.14.79, 3.16.43,
|
|
4.1.39, 4.4.67.
|
|
Vanilla kernels 4.8.x and later are _not_ yet working
|
|
(independently from pre-patches). This will be addressed
|
|
in a future release.
|
|
* No functional changes otherwise. Rollback to prior versions
|
|
should be easy. Please report any issues.
|
|
* Updated docs describing build methods.
|
|
|
|
mars0.1stable37
|
|
--------
|
|
* Minor fix: secondary logfile replication could hang in the
|
|
extremely unusual case that the expected primary logfile size
|
|
gets shortened after a crash followed by reboot.
|
|
Workaround was possible via "pause-fetch; resume-fetch".
|
|
|
|
mars0.1stable36
|
|
--------
|
|
* Doc: new slides from GUUG2017, both in English and in German.
|
|
Some very important hints for cost savings. May easily save
|
|
you a few millions when operating some petabytes of data.
|
|
* Doc: new chapter on cost savings in mars-manual.pdf.
|
|
Some parts of German oral explanations from the GUUG conference
|
|
translated to English for my English-speaking audience.
|
|
More to come later (hopefully; I need to get the time).
|
|
|
|
mars0.1stable35
|
|
--------
|
|
* Minor fix: when syncing a big resource (e.g. 40TiB) over an 1GBit
|
|
uplink, the sync may take longer than 1 day. This increases the
|
|
probability for triggering an unintended restart of that sync
|
|
from scratch.
|
|
Among further obscure preconditions, more than 5 logfiles must
|
|
exist such that the wrong assumption of an emergency mode can
|
|
happen at the secondary. In order to trigger the bug more likely,
|
|
it is therefore helpful to misconfigure /etc/cron.d/mars by
|
|
log-rotate'ing every 10 minutes, but doing log-delete-all only
|
|
once an hour (which contradicts my upstream documentation and
|
|
unnecessarily wastes valuable storage space in /mars).
|
|
Fixed by correction of a typo-like error.
|
|
|
|
mars0.1stable34
|
|
--------
|
|
* Minor fix: in some rare cases, when lots of gigabytes had to be
|
|
replayed in one big slurp, the replay position wasn't updated
|
|
during a longer time. Some admins were complaining that it
|
|
appeared "stuck" although it worked in reality.
|
|
Improved by increasing the update frequency of the replay link.
|
|
* Minor fix: after network errors, sometimes the sync restarted
|
|
from scratch, unnecessarily.
|
|
* Minor fix: under rare conditions, rmmod could hang forever.
|
|
A known reason has been fixed. Other theoretical reasons
|
|
hopefully improved by some further safeguards.
|
|
|
|
mars0.1stable33
|
|
--------
|
|
* Minor regression from stable29:
|
|
After a primary crash, without switchover, and when the primary
|
|
recovery phase involves a logrotate to an empty new logfile
|
|
which had been in the meantime shortly before the crash but
|
|
has not yet been used before the crash (race condition),
|
|
a kernel NULL pointer deref may stop the main thread.
|
|
Workaround: either remove the empty logfile by hand,
|
|
or just do a failover to the other side.
|
|
|
|
mars0.1stable32
|
|
--------
|
|
* Critical regression between stable30 and stable31 (can be avoided
|
|
by simply using stable30 for affected kernels): on _old_ kernels
|
|
(before 4.3.x) the removal of merge_bvec_fn() (see upstream commit
|
|
8ae126660fddbeebb9251a174e6fa45b6ad8f932) can lead to fatal
|
|
crashes at the primary side.
|
|
Fixed by using (hopefully) proper #ifdef's according to the
|
|
kernel version.
|
|
Notice: between stable30 and stable31 no true MARS fixes were
|
|
made (since no bugs were found). This strategy is likely to
|
|
continue for a while, for newer adaptations to even newer kernels.
|
|
In case of problems, go back. And, please, report it to me :)
|
|
|
|
mars0.1stable31
|
|
--------
|
|
* New _minimum_ pre-patches for vanilla LTS kernels 3.2.x to 4.7.x.
|
|
For security reasons, please prefer them over the old _generic_
|
|
pre-patch versions which expose many unnecessary EXPORT_SYMBOL
|
|
to potential attackers.
|
|
* Adaptions to vanilla kernels up to 4.7.x.
|
|
Note: 4.8rc-* does not yet work.
|
|
* Regression testing with many kernel versions: looks fine.
|
|
|
|
mars0.1stable30
|
|
--------
|
|
* Minor fix: in very rare cases of a primary crash, a missing
|
|
versionlink could lead to a hang.
|
|
* Minor fix: improved error reporting of replay code.
|
|
* Minor fix: improved switchback to former primary side.
|
|
* Minor fix: systematically add some missing macros.
|
|
* Minor improvements: add some example systemd unit and other
|
|
contrib stuff like a cronjob example.
|
|
* Doc: minor additions and improvements.
|
|
|
|
mars0.1stable29
|
|
--------
|
|
* Minor fix: on very fast hardware and networks, sync could take
|
|
a while for terminating.
|
|
* Minor fix: external module build.
|
|
* Major usability improvement: new expert commands marsadm
|
|
lowlevel-ls-host-ips, lowlevel-set-host-ip, lowlevel-delete-host.
|
|
Necessary for moves between networks, dedicated replication IPs,
|
|
etc.
|
|
* Minor doc update.
|
|
|
|
mars0.1stable28
|
|
--------
|
|
* Doc: describe new naming conventions.
|
|
MARS Light is now simply called MARS.
|
|
No distinction between "Light" and the future "Full" anymore.
|
|
Please note that the git branches light0.1.y and light0.2.y have
|
|
been renamed to mars0.1.y and mars0.2.y respectively.
|
|
* Minor sourcecode cleanup: s/light//g or s/light/main/g
|
|
where appropriate.
|
|
No other changes in the sourcecode, deliberately.
|
|
In case anyone encounters any build problems compiling MARS,
|
|
this release is separated just for the sake of build testing,
|
|
or Debian packaging testing, etc.
|
|
* Doc: minor clarifications.
|
|
|
|
mars0.1stable27
|
|
light0.1stable27
|
|
--------
|
|
* Critical fix: typo in sync progress comparison code could lead
|
|
to data version mismatches during sync when alternating with
|
|
replay. Only observed at a certain new hardware class, and only
|
|
while testing with an extremely high load (9 loaded resources
|
|
in parallel to 9 concurrent syncs). As a workaround,
|
|
echo 0 > /proc/sys/mars/sync_flip_interval_sec can be used.
|
|
Nevertheless, update is highly recommended!
|
|
* Major fix: slow memory leak (regression from light0.1stable26).
|
|
Only when starting the transaction logger (i.e. primary is typically
|
|
not affected). But don't let run it for a longer time.
|
|
Monitoring is possible via /proc/slabinfo (size-64 or siblings).
|
|
* Minor fix: join-cluster did not check for duplicate IP addresses.
|
|
* Minor fixes: some unnecessary annoying error messages.
|
|
* Docu: new slides from GUUG 2016 in Köln.
|
|
|
|
light0.1stable26
|
|
--------
|
|
* Minor fixes: some primitive macros were reporting misleading or
|
|
even wrong values at split brain, or during/after emergency mode.
|
|
Some high-level macros as well as try_to_avoid_split_brain
|
|
should work better / more reliable now.
|
|
* Minor fix: potential deadlock after crash reboot, or after
|
|
defective /mars filesystem. Never observed in practice.
|
|
* Minor safeguard: unnecessary split brain could emerge at
|
|
secondaries under extremely rare and strange conditions.
|
|
Unsure whether it ever occurred in practice.
|
|
* Minor usability improvement: show incorrect permissions on /mars.
|
|
Some other sysadmin tools like Puppet seem to have their own
|
|
default notion of "secure permissions" ;)
|
|
* Minor doc reorg, better chapter structure.
|
|
|
|
light0.1stable25
|
|
--------
|
|
* Major fix: in rare cases "marsadm primary" (without --force)
|
|
could go into an endless loop, even if --timeout= was specified.
|
|
* Minor fix: in rare cases of hanging or defective IO, crashes
|
|
of the primary could replicate versionlinks to the secondary,
|
|
but after reboot they were missing at the primary because of
|
|
of hanging IO or other IO / RAID controller problems.
|
|
Now using sync_filesystem() for either ensuring actuality,
|
|
or for letting the mars_light main control thread hang
|
|
(which will hopefully be noticed soon by monitoring).
|
|
* Minor fix: join-cluster uses rsync, which could abort due to
|
|
vanished filesystem objects while the primary is actively running.
|
|
Now it should tolerate such "errors".
|
|
* Minor fixes / additions at primitive macros.
|
|
* Tiny doc update.
|
|
|
|
light0.1stable24
|
|
--------
|
|
* Skip this release due to a regression.
|
|
|
|
light0.1stable23
|
|
--------
|
|
* Minor fix: the new replay-code error message was forgotten
|
|
to reset at secondaries. Now the annoying old error message
|
|
disappears after the next successful logrotate.
|
|
* Minor fixes of internal marsadm code (not in use until now).
|
|
* Minor doc update.
|
|
|
|
light0.1stable22
|
|
--------
|
|
* Critical fix for non-storage servers: the /mars directory
|
|
was readable by ordinary non-root users, opening a potential
|
|
security hole. Originally MARS was designed for standalone
|
|
storage servers solely, but now it is increasingly deployed to
|
|
machines where ordinary users can log in.
|
|
Update recommended, but only urgent for potentially affected
|
|
installations.
|
|
* Minor fix: when a logfile was damaged (observed at defective
|
|
hardware), this was often (but not always) detected by the
|
|
md5 data checksums in the transaction logfiles. So far so good.
|
|
The replay / recovery process stopped for a very good reason.
|
|
But it was not easily possible to _force_ any of the resource
|
|
members into primary role when the defect was already present at
|
|
the _primary_ (which happend once during 7 millions of operating
|
|
hours, and at a primary site which proved defective afterwards),
|
|
and the defect had been replicated to all secondaries.
|
|
As a workaround, the resource could be destroyed via leave-resource
|
|
everywhere, and re-surrected from scratch. Clumsy.
|
|
Now an md5 checksum error in the middle of a logfile is
|
|
treated similarly to an EOF. "primary --force" will succeed now,
|
|
without applying the defective data (as before).
|
|
Split brain will result for sure in such a case.
|
|
* Minor improvement: md5 logfile checksum errors are now displayed
|
|
directly in the diskstate macro (and therefore also at plain
|
|
"view").
|
|
* Minor improvement: when "marsadm view all" told you "InConsistent"
|
|
as the disk state, this was _formally correct_ because it related
|
|
to the state of the _disk_, not to the state of the replication.
|
|
The former message could appear regularly during ordinary
|
|
out-of-order writeback at the primary side, without violating
|
|
the consistency of /dev/mars/mydata.
|
|
However, many people were confused and alarmed by the irritating
|
|
message.
|
|
Now a better wording is used: "WriteBack" and "Recovery" describes
|
|
more intuitively what is really happening :)
|
|
* Minor doc improvements.
|
|
|
|
light0.1stable21
|
|
--------
|
|
* Hint: now MARS has been rolled out to more than 1600 servers,
|
|
including some MySQL database servers, and has collected more
|
|
than 6 millions of operation hours.
|
|
* Minor fixes, none of them observed in practice, only found
|
|
by testing while working on new features:
|
|
- potential read page fault
|
|
- potential deadlock
|
|
- incorrect remote symlink update under untypical circumstances
|
|
|
|
light0.1stable20
|
|
--------
|
|
* Hint: MARS is now running on more than 850 storage servers,
|
|
and has collected more than 4.5 millions of operation hours.
|
|
There were no new incidents with customer impact since the last
|
|
major bugfix (more than 3 millions of operation hours since then).
|
|
It is difficult to deduce a reliability from that, but it appears
|
|
that at least 99.999%, if not 99.9999% are now real for the
|
|
MARS component as a standalone component (not to be confused with
|
|
overall system reliability). Our storage hardware is clearly much
|
|
less reliable. MARS does compensate these defects all the time.
|
|
|
|
* Minor fix: memory leak in networking code, does not occur
|
|
at light0.1 operations (but maybe future versions of MARS).
|
|
* Doc: add presentation slides from Froscon2015.
|
|
|
|
light0.1stable19
|
|
--------
|
|
* Minor safeguard: warn when somebody tries leave-resource --host=
|
|
for a damaged host, and later the dead host resurrects in an
|
|
unreasonable way.
|
|
* Doc update: describe use cases for DRBD vs MARS more clearly.
|
|
* Minor spelling fixes.
|
|
|
|
light0.1stable18
|
|
--------
|
|
* Minor safeguard: prevent join-resource when previous log-purge-all
|
|
has been forgotten. Prevent create-resource also when previous
|
|
delete-resource has been forgotten. Anyway, this happens only in
|
|
very exotic repair scenarios after very heavy failures.
|
|
* Doc updates: simplify descriptions of split-brain resolution and
|
|
emergency mode resolution. Nowadays 'invalidate' will do everything
|
|
in all tested cases; the more complex alternative methods have
|
|
been moved to the appendix.
|
|
|
|
light0.1stable17
|
|
--------
|
|
* Minor fix: stacktrace / oops in aio callback path due to a
|
|
subtle race, observed once during 2.5 millions of operation hours.
|
|
In the observed case, the secondary was hanging, without
|
|
customer impact. However, the error class could potentially
|
|
occur also at the primary side. Probably the bug was triggered
|
|
by a hardware problem from the RAID controller.
|
|
|
|
light0.1stable16
|
|
--------
|
|
* Minor fix: sync could take a long time to complete under high
|
|
application load, similarly to a live-lock.
|
|
* Some smaller minor fixes for annoying messages.
|
|
* Contrib: added configurable Nagios check.
|
|
* Contrib: added some example scripts which could be used by
|
|
clustermanagers etc.
|
|
* Doc: important new section on pitfalls when using existing
|
|
clustermanagers UNMODIFIED for long distance replication.
|
|
PLEASE READ!
|
|
|
|
light0.1stable15
|
|
--------
|
|
* NOTICE: MARS succeeded baptism on fire at 04/22/2015 when a whole
|
|
co-location had a partial power blackout, followed by breakdown
|
|
of air conditioning, followed by mass hardware defects due to
|
|
overheating. MARS showed exactly 0 errors when (emergency)
|
|
switching to another datacenter was started in masses.
|
|
* Major fix of race in transaction logger: the primary could hang
|
|
when using very fast hardware, typically after ~24000 operation
|
|
hours. The problem was noticed 6 times during a grand total of
|
|
more than 1,000,000 operation hours on a mixed hardware park,
|
|
showing up only on specific hardware classes. Together with 3
|
|
other incidents during early beta phase which also had customer
|
|
impact, this means that we have reached a reliability of about
|
|
===> 99.999%
|
|
After this fix, the reliability should grow even higher.
|
|
A workaround for this bug exists:
|
|
# echo 2 > /proc/sys/mars/logger_completion_semantics
|
|
Update is only mandatory when you cannot use the workaround.
|
|
* Minor improvement in marsadm: re-allow --force combined with "all".
|
|
This is highly appreciated for speeding up operations / handling
|
|
during emergency datacenter switchover.
|
|
* Various smaller improvements.
|
|
* Contrib (unsupported): example rollout script for mass rollout.
|
|
|
|
light0.1stable14
|
|
--------
|
|
* Minor safeguard: modprobe mars will refuse to start when the
|
|
cluster UUID is missing.
|
|
* Minor fix: external race in marsadm resize, only relevant
|
|
for scripting.
|
|
* Minor fix: potential race on plugged IO requests.
|
|
* Clarify output of marsadm view. Many systematical improvements
|
|
and hints.
|
|
* Add some unevitable macros for scripting / automation.
|
|
* Various tiny improvements.
|
|
|
|
light0.1stable13
|
|
--------
|
|
* Critical safeguard for accidental join-cluster with wrong argument:
|
|
make UUID mandatory, disallow completely unrelated hosts to
|
|
communicate symlink tree updates when their UUIDs mismatch.
|
|
* Minor fix: leave-resource --host=other did not work when disks
|
|
were named differently throughout the cluster.
|
|
* Minor fix: detach --host=other --force (which is needed as a
|
|
precondition) did not work.
|
|
* Various minor fixes and clarifications. "marsadm view all"
|
|
now reports the communication status in the cluster.
|
|
|
|
light0.1stable12
|
|
--------
|
|
* Critical (but usually not extremely relevant) fix:
|
|
When emergency mode occurs just during a sync, the target could
|
|
remain inconsistent without notice. Now noticed.
|
|
You always could/should manually invalidate whenever an
|
|
emergency mode appeared.
|
|
Now this is automatically fixed by restarting any sync from
|
|
scratch (if one was actually running before; otherwise consistency
|
|
was never violated).
|
|
* Major documentation update / corrections.
|
|
* Major (but less relevant) fix: leave-cluster did not really work.
|
|
* Minor fix (regression): rmmod could hang when sync was running.
|
|
* Various minor fixes and clarifications.
|
|
|
|
light0.1stable11
|
|
--------
|
|
* Major documentation update. mars-manual.pdf increased from
|
|
66 to 80 pages. Please read! You probably should know this.
|
|
* Minor fixes: better cleanup on invalidate / leave-resource.
|
|
* Minor clarifications: more precise EIO error codes, more verbose
|
|
error reporting via "marsadm cat".
|
|
|
|
light0.1stable10
|
|
--------
|
|
* Major fixes of internal network protocol errors, leading to
|
|
internal shutdown of sockets, which were transparently re-opened.
|
|
It could affect network performance. Not sure whether
|
|
stability was also affected (probably under extremely high load);
|
|
for better safety you should upgrade.
|
|
* Major fix from Manuel Lausch: regex parsing sometimes went
|
|
completely wrong when hostnames followed a similar name scheme
|
|
than internal symlinks.
|
|
* Major, only relevant for k>2 replicas: fix wrong internal sharing
|
|
of data structures resulting from parallel data connections.
|
|
* Minor fix: race in fake-sync.
|
|
* Minor fix: race in invalidate.
|
|
* Minor, only for k>2 replicas: fix direct primary handover when
|
|
some non-involved hosts are currently unreachable.
|
|
* Minor: improve becoming primary during split brain.
|
|
* Minor: improve becoming primary when emergency mode starts.
|
|
* Minor: silence some annoying stderr messages.
|
|
* Several internal minor fixes and clarifications.
|
|
|
|
light0.1stable09
|
|
--------
|
|
* Major fix of scarce race (potentially critical): the bio response
|
|
thread could terminate too early, leading to a premature dealloc
|
|
of kernel memory. This has only been observed on slow virtual
|
|
machines with slow virtual devices, and very high load on k=4
|
|
replicas. This could potentially affect the stability of the system.
|
|
Although not observed at production machines at 1&1, I recommend
|
|
updating production machines to this release ASAP.
|
|
* Major usability fix: incorrect commandline options of marsadm
|
|
were just ignored if they appeared after the resource argument.
|
|
Misspellings could cause undesired effects. For instance,
|
|
"marsadm delete-resource vital --force --MISSPELLhost=banana"
|
|
was accidentally destroying the primary during operation (which
|
|
is _possible_ when using --force, and this was even a _required_
|
|
sort of "STONITH"-like feature -- however from a human point
|
|
of view it was intended to destroy _another_ host, so this was
|
|
an unexpected behaviour from a sysadmin point of view).
|
|
* Major workaround: the concept "actual primary" is wrong, because
|
|
during split brain there may exist several primaries. Do not
|
|
use the macro view-actual-primary any longer. It is deprecated now.
|
|
Use view-is-primary instead, on each host you are interested in.
|
|
* Minor fix: "marsadm invalidate" did not work in some weired
|
|
split brain situations / was not equivalent to
|
|
"marsadm leave-resource $res; marsadm join-resource $res".
|
|
The latter was the old workaround to fix the situation.
|
|
Now it shouldn't be necessary anymore.
|
|
* Minor fix: pause-fetch could take very long to terminate.
|
|
* Minor fix: marsadm wait-cluster did not wait for all hosts
|
|
particiapting in the resource, but only for one of them.
|
|
This is only relevant for k>2 replicas.
|
|
* Minor fix: the rates displayed by "marsadm view" did not drop down
|
|
to 0 when no progress was made.
|
|
* Minor fix: logging to syslog was incomplete.
|
|
* Minor usability fix: decrease boring speakyness of "log-rotate"
|
|
and "log-delete" for cron jobs.
|
|
* Minor fixes: several internal awkwardnesses, potentially affecting
|
|
performance and/or stability in weired situations.
|
|
|
|
light0.1stable08
|
|
--------
|
|
* Minor fix: after emergency mode, a versionlink was forgotten
|
|
to create. This could lead to unnecessary reports of split
|
|
brain and/or need for additional re-invalidate.
|
|
* Minor fix: the predicate 'view-is-consistent' reported 'false'
|
|
in some situations on secondaries when all was ok.
|
|
* Minor fix: it was impossible to determine the 'is-consistent'
|
|
from 'marsadm view' (without -1and1 suffix). Added a new [Cc-]
|
|
flag. This is absolutely needed to determine whether the
|
|
underlying disks must have the same checksum (provided that
|
|
both disks are detached and the network works and fetch+replay
|
|
had completed before the detach).
|
|
* Updated docs to reflect this.
|
|
* Minor fix: 'invalidate' did not work when the resource was not
|
|
completely detached. Now it implicitly does a detach before
|
|
starting invalidation.
|
|
* Minor fix: wait-umount was waiting for umount of _all_ primaries
|
|
during split brain. Now it waits only for umount of the local node.
|
|
Notice that having multiple primaries in parallel is an
|
|
erroneous state anyway.
|
|
* Minor fix: leave-cluster did not work without --force.
|
|
|
|
light0.1stable07
|
|
--------
|
|
* Minor fix: re-creation of a completely destroyed resource
|
|
did not always work correctly
|
|
|
|
light0.1stable06
|
|
--------
|
|
* Major fix: becoming primary was hanging in scarce situations.
|
|
* Minor fix: some split brains were not always detected correctly.
|
|
* Minor fix for Redhat openvz kernel builds.
|
|
* Several fixes for 1&1 internal Debian builds.
|
|
|
|
light0.1stable05
|
|
--------
|
|
* Major fix: incomplete calls to vfs_readdir()
|
|
which could lead to incomplete symlink updates /
|
|
replication hangs.
|
|
* Minor fix: scarce race on replay EOF.
|
|
* Separated kernel from userspace build environment.
|
|
* Removed some potentially dangerous Kconfig options
|
|
if they would be set to wrong values (robustness against
|
|
accidentally producing bad kernel modules).
|
|
* Dito: some additional checks against bad main Kconfig options
|
|
(mainly for out-of-tree builds).
|
|
* Separated contrib code from maintained code.
|
|
* Added some pre-patches for newer kernels
|
|
(WIP - not yet fully tested at all combinations)
|
|
* Minor doc addition: LinuxTag 2014 presentation.
|
|
|
|
light0.1stable04
|
|
--------
|
|
* Quiet annoying error message.
|
|
* Minor readability improvements.
|
|
* Minor doc updates.
|
|
|
|
light0.1stable03
|
|
--------
|
|
* Major: fix internal aio race (could lead to memory corruption).
|
|
* Fix refcounting in trans_logger.
|
|
* Some minor fixes in module code.
|
|
* Fix 1&1-internal out-of-tree builds.
|
|
* Various minor fixes.
|
|
* Update monitoring tools / docs (German, contributed by Jörg Mann).
|
|
|
|
light0.1stable02
|
|
--------
|
|
* Fix sorting of internal data structure.
|
|
* Fix IO error propagation at replay.
|
|
|
|
light0.1stable01
|
|
--------
|
|
* Fix parallelism of logfile propagation: sometimes a secondary
|
|
could get a more recent version than the primary had on stable
|
|
storage after its crash, eventually leading to an (annoying)
|
|
split brain. Some people might take this as a feature instead
|
|
of a bug, but now the logfile transfer starts only after the
|
|
primary _knows_ that the data is successfully committed to
|
|
stable storage.
|
|
* Fix memory leaks in error path.
|
|
* Fix error propagation between client and server.
|
|
* Make string allocation fully dynamic (remove limitation).
|
|
* Fix some annoying messages.
|
|
* Fix usage output of marsadm.
|
|
* Userspace: contributed bugfix for Debian udev rules by Jörg Mann.
|
|
* Improved debugging (only for testing).
|
|
|
|
light0.1beta0.18 (feature release)
|
|
--------
|
|
* New commands marsadm view-$macroname
|
|
* New customizable macro processor
|
|
* New err/warn/inf reporting via symlinks
|
|
* Per-resource emergency mode
|
|
* Allow limiting the sync parallelism
|
|
* New flood-protected syslogging
|
|
* Some smaller improvements
|
|
* Update docs
|
|
* Update test suite
|
|
|
|
light0.1beta0.17
|
|
--------
|
|
* Major bugfix: race in logfile switchover could sometimes
|
|
lead to the wrong logfile (extremely rare to hit, but
|
|
potentially harmful).
|
|
* Disallow primary switching when some secondaries are
|
|
syncing.
|
|
* Fix logfile fetch from multiple peers.
|
|
* Fix computation of transitive closure (affected
|
|
log-purge-all, split brain detection, and many others).
|
|
* Fix incorrect emergency mode detection.
|
|
* Primaries no longer fetch logfiles (unnecessarily, only
|
|
makes a difference at concurrent split brain operations).
|
|
* Detached resources no longer fetch logfiles (unexpectedly).
|
|
* Myriads of smaller fixes.
|
|
|
|
light0.1beta0.16
|
|
--------
|
|
|
|
* Critical bugfix: "marsadm primary --force" was assumed to be given
|
|
by sysadmins only in case of emergency, when the network is down.
|
|
When given in non-emergency cases where the old primary continues
|
|
to run (/dev/mars/* being actively used and written), the
|
|
old primary could suddendly do a "logrotate" to the
|
|
new split-brain logfile produced by the new (second) primary.
|
|
Now two primaries should be able to run concurrently in split-brain
|
|
mode without mutually trashing their logfiles.
|
|
* primary --force now only works in disconnected mode, in order
|
|
to hinder unintended forceful creation of split brain during
|
|
normal operation.
|
|
* Stop fetching of logfiles behind split brain points (save space
|
|
at the target hosts - usually the data will be discarded later).
|
|
* Fixed split brain detection in userspace.
|
|
* leave-resource now waits for local actions to take place
|
|
(remote actions stay asynchronously).
|
|
* invalidate / join-resource now work only if a designated primary
|
|
exists (otherwise they would not know uniquely from whom
|
|
to start initial sync).
|
|
* Update docs, clarify scenarios intended <-> emergengy switching.
|
|
* Fixed mutual overwrite of deletion symlinks in case of racing
|
|
log-deletes spawned in parallel by cron jobs (resilience).
|
|
* Fixed races between deletion and re-erection (e.g. fresh
|
|
join-resource after leave-resource during network partitions).
|
|
* Fixed duration of network timeouts in case the network is down
|
|
(replaced non-working TCP_KEEPALIVE by explicit timeouts).
|
|
* New option --dry-run which does not really create symlinks.
|
|
* New command "delete-resource" (VERY DANGEROUS) for
|
|
forcefully destroying a resource, even when it is in use.
|
|
Intended only for _emergency_ cases when sysadmins are
|
|
desperate. Use only by hand, first run with --dry-run in order
|
|
to check what will happen!
|
|
* New command "log-purge-all" (potentially DANGEROUS) for
|
|
resolving split brain in desperate situations (cleanup of
|
|
leftovers). Only use by hand, first run with --dry-run!
|
|
* Lots of smaller imprevements / usability / readability etc.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.15
|
|
--------
|
|
|
|
* Introduce write throttling of bulk writers.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.14
|
|
--------
|
|
|
|
* Fix logfile transfer in case of "holes" created by
|
|
emergency mode.
|
|
* Fix "marsadm invalidate" after emergency mode had been entered.
|
|
* Fix "marsadm resize" capacity propagation from underlying LVM.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.13
|
|
--------
|
|
|
|
* Fix shutdown during operation (flying requests).
|
|
* Fix unnecessary Lamport clock propagation storms.
|
|
* Improve unnecessary page cache utilisation (mapfree).
|
|
* Update test suite.
|
|
|
|
|
|
light0.1beta0.12 and earlier
|
|
--------
|
|
|
|
There was no dedicated ChangeLog. For details, look at the
|
|
commit history.
|
|
|
|
Release Policy / Software Lifecycle
|
|
-----------------------------------
|
|
|
|
New source releases are simply announced by appearance of git tags.
|