mirror of https://github.com/schoebel/mars
505 lines
22 KiB
Plaintext
505 lines
22 KiB
Plaintext
Meaning of stable tagnames
|
|
--------------------------
|
|
|
|
Example: light0.1stable01:
|
|
0 = version of on-disk data structures
|
|
(only incremented when downgrades are impossible)
|
|
(not incremented on backwards-compatible upgrades)
|
|
1 = version of feature set
|
|
stable = feature set is frozen during this series
|
|
01 = bugfix revision
|
|
|
|
Release Conventions / Branches / Tagnames
|
|
-----------------------------------------
|
|
|
|
light0.1 series (stable):
|
|
- Asynchronous replication.
|
|
Currently operational at more than 1600 servers at
|
|
1&1, more than 6,000,000 operating hours (end of 2015)
|
|
- Unstable tagnames: light0.1beta%d.%d (obsolete)
|
|
- Stable branch: light0.1.y
|
|
- Stable tagnames: light0.1stable%02d
|
|
|
|
light0.2 series (currently in alpha stage):
|
|
Mostly for internal needs of 1&1 (but not limited to that).
|
|
- Getting rid of the kernel prepatch! MARS may be built
|
|
as an external kernel module for any supported
|
|
kernel version. First prototype is only tested for
|
|
unaltered 3.2.x vanilla kernel, but compatibility to
|
|
further vanilla kernel versions (maybe even
|
|
Redhat-specific ones) will follow during the course of
|
|
the MARS light0.2 stable series. The problem is not
|
|
compatibility as such, but _testing_ that it really
|
|
works. These tests need a lot of time.
|
|
=> further arguments for getting to kernel upstream ASAP.
|
|
- Improved network throughput by parallel TCP connections
|
|
(in particular under packet loss).
|
|
Also called "socket bundling".
|
|
First benchmarks show an impressive speedup over
|
|
highly congested long-distance lines.
|
|
- Future-proof updates in the network protocol:
|
|
Mixed operation of 32/64bit and/or {big,low}endian
|
|
- Support for multi-homed network interfaces.
|
|
- Transparent data compression over low bandwidth lines.
|
|
Consumes a lot of CPU, therefore only recommended for
|
|
low write loads or for desperate network situations.
|
|
- Remote device: bypassing iSCSI. In essence,
|
|
/dev/mars/mydata can appear at any other cluster member
|
|
which doesn't necessarily need any local disks.
|
|
- Various smaller features and improvements.
|
|
- Unstable tagnames: light0.2beta%d.%d (current)
|
|
- Stable branch: light0.2.y (planned)
|
|
- Stable tagnames: light0.2stable%02d (planned)
|
|
|
|
light0.3 series (planned):
|
|
(some might possibly go to 1.0 series instead)
|
|
- Improve replication latency.
|
|
- New pseudo-synchronous replication modes.
|
|
For the internal needs of database folks at 1&1.
|
|
- (Maybe) old test suite could be retired, a new
|
|
one is at github.com/schoebel/test-suite
|
|
- Unstable tagnames: light0.3beta%d.%d (planned)
|
|
- Stable branch: light0.3.y (planned)
|
|
- Stable tagnames: light0.3stable%02d (planned)
|
|
|
|
light1.0 series (planned):
|
|
- Replace symlink tree by transactional status files
|
|
(future-proof)
|
|
This is required for upstream merging to the kernel.
|
|
It has further advantages, such as better scalability.
|
|
- Trying to additionally address public needs.
|
|
- Potentially for Linux kernel upstream,
|
|
- Unstable tagnames: light1.0beta%d.%d (planned)
|
|
- Stable branch: light1.0.y (planned)
|
|
- Stable tagnames: light1.0stable%02d (planned)
|
|
|
|
full* (somewhen in future)
|
|
|
|
WIP-* branches are for development and may be rebased onto anything
|
|
at any time without notice. They will disappear eventually.
|
|
|
|
*stable* branches mean that only bug fixes and documentation
|
|
updates / clarifications will be applied. Updates to the test suite /
|
|
new test cases potentially disguising bugs, and other minor additions
|
|
of debugging code / paranoia code which may lead to discovery
|
|
of bugs are also possible. Error messages / warnings and their
|
|
error class may also be changed.
|
|
NO NEW FEATURES, not even minor ones, except when absolutely
|
|
necessary for a bugfix.
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.2:
|
|
|
|
(you need to checkout branch light0.2.y to see any details)
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.1:
|
|
|
|
light0.1stable21
|
|
--------
|
|
* Hint: now MARS has been rolled out to more than 1600 servers,
|
|
including some MySQL database servers, and has collected more
|
|
than 6 millions of operation hours.
|
|
* Minor fixes, none of them observed in practice, only found
|
|
by testing while working on new features:
|
|
- potential read page fault
|
|
- potential deadlock
|
|
- incorrect remote symlink update under untypical circumstances
|
|
|
|
light0.1stable20
|
|
--------
|
|
* Hint: MARS is now running on more than 850 storage servers,
|
|
and has collected more than 4.5 millions of operation hours.
|
|
There were no new incidents with customer impact since the last
|
|
major bugfix (more than 3 millions of operation hours since then).
|
|
It is difficult to deduce a reliability from that, but it appears
|
|
that at least 99.999%, if not 99.9999% are now real for the
|
|
MARS component as a standalone component (not to be confused with
|
|
overall system reliability). Our storage hardware is clearly much
|
|
less reliable. MARS does compensate these defects all the time.
|
|
|
|
* Minor fix: memory leak in networking code, does not occur
|
|
at light0.1 operations (but maybe future versions of MARS).
|
|
* Doc: add presentation slides from Froscon2015.
|
|
|
|
light0.1stable19
|
|
--------
|
|
* Minor safeguard: warn when somebody tries leave-resource --host=
|
|
for a damaged host, and later the dead host resurrects in an
|
|
unreasonable way.
|
|
* Doc update: describe use cases for DRBD vs MARS more clearly.
|
|
* Minor spelling fixes.
|
|
|
|
light0.1stable18
|
|
--------
|
|
* Minor safeguard: prevent join-resource when previous log-purge-all
|
|
has been forgotten. Prevent create-resource also when previous
|
|
delete-resource has been forgotten. Anyway, this happens only in
|
|
very exotic repair scenarios after very heavy failures.
|
|
* Doc updates: simplify descriptions of split-brain resolution and
|
|
emergency mode resolution. Nowadays 'invalidate' will do everything
|
|
in all tested cases; the more complex alternative methods have
|
|
been moved to the appendix.
|
|
|
|
light0.1stable17
|
|
--------
|
|
* Minor fix: stacktrace / oops in aio callback path due to a
|
|
subtle race, observed once during 2.5 millions of operation hours.
|
|
In the observed case, the secondary was hanging, without
|
|
customer impact. However, the error class could potentially
|
|
occur also at the primary side. Probably the bug was triggered
|
|
by a hardware problem from the RAID controller.
|
|
|
|
light0.1stable16
|
|
--------
|
|
* Minor fix: sync could take a long time to complete under high
|
|
application load, similarly to a live-lock.
|
|
* Some smaller minor fixes for annoying messages.
|
|
* Contrib: added configurable Nagios check.
|
|
* Contrib: added some example scripts which could be used by
|
|
clustermanagers etc.
|
|
* Doc: important new section on pitfalls when using existing
|
|
clustermanagers UNMODIFIED for long distance replication.
|
|
PLEASE READ!
|
|
|
|
light0.1stable15
|
|
--------
|
|
* NOTICE: MARS succeeded baptism on fire at 04/22/2015 when a whole
|
|
co-location had a partial power blackout, followed by breakdown
|
|
of air conditioning, followed by mass hardware defects due to
|
|
overheating. MARS showed exactly 0 errors when (emergency)
|
|
switching to another datacenter was started in masses.
|
|
* Major fix of race in transaction logger: the primary could hang
|
|
when using very fast hardware, typically after ~24000 operation
|
|
hours. The problem was noticed 6 times during a grand total of
|
|
more than 1,000,000 operation hours on a mixed hardware park,
|
|
showing up only on specific hardware classes. Together with 3
|
|
other incidents during early beta phase which also had customer
|
|
impact, this means that we have reached a reliability of about
|
|
===> 99.999%
|
|
After this fix, the reliability should grow even higher.
|
|
A workaround for this bug exists:
|
|
# echo 2 > /proc/sys/mars/logger_completion_semantics
|
|
Update is only mandatory when you cannot use the workaround.
|
|
* Minor improvement in marsadm: re-allow --force combined with "all".
|
|
This is highly appreciated for speeding up operations / handling
|
|
during emergency datacenter switchover.
|
|
* Various smaller improvements.
|
|
* Contrib (unsupported): example rollout script for mass rollout.
|
|
|
|
light0.1stable14
|
|
--------
|
|
* Minor safeguard: modprobe mars will refuse to start when the
|
|
cluster UUID is missing.
|
|
* Minor fix: external race in marsadm resize, only relevant
|
|
for scripting.
|
|
* Minor fix: potential race on plugged IO requests.
|
|
* Clarify output of marsadm view. Many systematical improvements
|
|
and hints.
|
|
* Add some unevitable macros for scripting / automation.
|
|
* Various tiny improvements.
|
|
|
|
light0.1stable13
|
|
--------
|
|
* Critical safeguard for accidental join-cluster with wrong argument:
|
|
make UUID mandatory, disallow completely unrelated hosts to
|
|
communicate symlink tree updates when their UUIDs mismatch.
|
|
* Minor fix: leave-resource --host=other did not work when disks
|
|
were named differently throughout the cluster.
|
|
* Minor fix: detach --host=other --force (which is needed as a
|
|
precondition) did not work.
|
|
* Various minor fixes and clarifications. "marsadm view all"
|
|
now reports the communication status in the cluster.
|
|
|
|
light0.1stable12
|
|
--------
|
|
* Critical (but usually not extremely relevant) fix:
|
|
When emergency mode occurs just during a sync, the target could
|
|
remain inconsistent without notice. Now noticed.
|
|
You always could/should manually invalidate whenever an
|
|
emergency mode appeared.
|
|
Now this is automatically fixed by restarting any sync from
|
|
scratch (if one was actually running before; otherwise consistency
|
|
was never violated).
|
|
* Major documentation update / corrections.
|
|
* Major (but less relevant) fix: leave-cluster did not really work.
|
|
* Minor fix (regression): rmmod could hang when sync was running.
|
|
* Various minor fixes and clarifications.
|
|
|
|
light0.1stable11
|
|
--------
|
|
* Major documentation update. mars-manual.pdf increased from
|
|
66 to 80 pages. Please read! You probably should know this.
|
|
* Minor fixes: better cleanup on invalidate / leave-resource.
|
|
* Minor clarifications: more precise EIO error codes, more verbose
|
|
error reporting via "marsadm cat".
|
|
|
|
light0.1stable10
|
|
--------
|
|
* Major fixes of internal network protocol errors, leading to
|
|
internal shutdown of sockets, which were transparently re-opened.
|
|
It could affect network performance. Not sure whether
|
|
stability was also affected (probably under extremely high load);
|
|
for better safety you should upgrade.
|
|
* Major fix from Manuel Lausch: regex parsing sometimes went
|
|
completely wrong when hostnames followed a similar name scheme
|
|
than internal symlinks.
|
|
* Major, only relevant for k>2 replicas: fix wrong internal sharing
|
|
of data structures resulting from parallel data connections.
|
|
* Minor fix: race in fake-sync.
|
|
* Minor fix: race in invalidate.
|
|
* Minor, only for k>2 replicas: fix direct primary handover when
|
|
some non-involved hosts are currently unreachable.
|
|
* Minor: improve becoming primary during split brain.
|
|
* Minor: improve becoming primary when emergency mode starts.
|
|
* Minor: silence some annoying stderr messages.
|
|
* Several internal minor fixes and clarifications.
|
|
|
|
light0.1stable09
|
|
--------
|
|
* Major fix of scarce race (potentially critical): the bio response
|
|
thread could terminate too early, leading to a premature dealloc
|
|
of kernel memory. This has only been observed on slow virtual
|
|
machines with slow virtual devices, and very high load on k=4
|
|
replicas. This could potentially affect the stability of the system.
|
|
Although not observed at production machines at 1&1, I recommend
|
|
updating production machines to this release ASAP.
|
|
* Major usability fix: incorrect commandline options of marsadm
|
|
were just ignored if they appeared after the resource argument.
|
|
Misspellings could cause undesired effects. For instance,
|
|
"marsadm delete-resource vital --force --MISSPELLhost=banana"
|
|
was accidentally destroying the primary during operation (which
|
|
is _possible_ when using --force, and this was even a _required_
|
|
sort of "STONITH"-like feature -- however from a human point
|
|
of view it was intended to destroy _another_ host, so this was
|
|
an unexpected behaviour from a sysadmin point of view).
|
|
* Major workaround: the concept "actual primary" is wrong, because
|
|
during split brain there may exist several primaries. Do not
|
|
use the macro view-actual-primary any longer. It is deprecated now.
|
|
Use view-is-primary instead, on each host you are interested in.
|
|
* Minor fix: "marsadm invalidate" did not work in some weired
|
|
split brain situations / was not equivalent to
|
|
"marsadm leave-resource $res; marsadm join-resource $res".
|
|
The latter was the old workaround to fix the situation.
|
|
Now it shouldn't be necessary anymore.
|
|
* Minor fix: pause-fetch could take very long to terminate.
|
|
* Minor fix: marsadm wait-cluster did not wait for all hosts
|
|
particiapting in the resource, but only for one of them.
|
|
This is only relevant for k>2 replicas.
|
|
* Minor fix: the rates displayed by "marsadm view" did not drop down
|
|
to 0 when no progress was made.
|
|
* Minor fix: logging to syslog was incomplete.
|
|
* Minor usability fix: decrease boring speakyness of "log-rotate"
|
|
and "log-delete" for cron jobs.
|
|
* Minor fixes: several internal awkwardnesses, potentially affecting
|
|
performance and/or stability in weired situations.
|
|
|
|
light0.1stable08
|
|
--------
|
|
* Minor fix: after emergency mode, a versionlink was forgotten
|
|
to create. This could lead to unnecessary reports of split
|
|
brain and/or need for additional re-invalidate.
|
|
* Minor fix: the predicate 'view-is-consistent' reported 'false'
|
|
in some situations on secondaries when all was ok.
|
|
* Minor fix: it was impossible to determine the 'is-consistent'
|
|
from 'marsadm view' (without -1and1 suffix). Added a new [Cc-]
|
|
flag. This is absolutely needed to determine whether the
|
|
underlying disks must have the same checksum (provided that
|
|
both disks are detached and the network works and fetch+replay
|
|
had completed before the detach).
|
|
* Updated docs to reflect this.
|
|
* Minor fix: 'invalidate' did not work when the resource was not
|
|
completely detached. Now it implicitly does a detach before
|
|
starting invalidation.
|
|
* Minor fix: wait-umount was waiting for umount of _all_ primaries
|
|
during split brain. Now it waits only for umount of the local node.
|
|
Notice that having multiple primaries in parallel is an
|
|
erroneous state anyway.
|
|
* Minor fix: leave-cluster did not work without --force.
|
|
|
|
light0.1stable07
|
|
--------
|
|
* Minor fix: re-creation of a completely destroyed resource
|
|
did not always work correctly
|
|
|
|
light0.1stable06
|
|
--------
|
|
* Major fix: becoming primary was hanging in scarce situations.
|
|
* Minor fix: some split brains were not always detected correctly.
|
|
* Minor fix for Redhat openvz kernel builds.
|
|
* Several fixes for 1&1 internal Debian builds.
|
|
|
|
light0.1stable05
|
|
--------
|
|
* Major fix: incomplete calls to vfs_readdir()
|
|
which could lead to incomplete symlink updates /
|
|
replication hangs.
|
|
* Minor fix: scarce race on replay EOF.
|
|
* Separated kernel from userspace build environment.
|
|
* Removed some potentially dangerous Kconfig options
|
|
if they would be set to wrong values (robustness against
|
|
accidentally producing bad kernel modules).
|
|
* Dito: some additional checks against bad main Kconfig options
|
|
(mainly for out-of-tree builds).
|
|
* Separated contrib code from maintained code.
|
|
* Added some pre-patches for newer kernels
|
|
(WIP - not yet fully tested at all combinations)
|
|
* Minor doc addition: LinuxTag 2014 presentation.
|
|
|
|
light0.1stable04
|
|
--------
|
|
* Quiet annoying error message.
|
|
* Minor readability improvements.
|
|
* Minor doc updates.
|
|
|
|
light0.1stable03
|
|
--------
|
|
* Major: fix internal aio race (could lead to memory corruption).
|
|
* Fix refcounting in trans_logger.
|
|
* Some minor fixes in module code.
|
|
* Fix 1&1-internal out-of-tree builds.
|
|
* Various minor fixes.
|
|
* Update monitoring tools / docs (German, contributed by Jörg Mann).
|
|
|
|
light0.1stable02
|
|
--------
|
|
* Fix sorting of internal data structure.
|
|
* Fix IO error propagation at replay.
|
|
|
|
light0.1stable01
|
|
--------
|
|
* Fix parallelism of logfile propagation: sometimes a secondary
|
|
could get a more recent version than the primary had on stable
|
|
storage after its crash, eventually leading to an (annoying)
|
|
split brain. Some people might take this as a feature instead
|
|
of a bug, but now the logfile transfer starts only after the
|
|
primary _knows_ that the data is successfully committed to
|
|
stable storage.
|
|
* Fix memory leaks in error path.
|
|
* Fix error propagation between client and server.
|
|
* Make string allocation fully dynamic (remove limitation).
|
|
* Fix some annoying messages.
|
|
* Fix usage output of marsadm.
|
|
* Userspace: contributed bugfix for Debian udev rules by Jörg Mann.
|
|
* Improved debugging (only for testing).
|
|
|
|
light0.1beta0.18 (feature release)
|
|
--------
|
|
* New commands marsadm view-$macroname
|
|
* New customizable macro processor
|
|
* New err/warn/inf reporting via symlinks
|
|
* Per-resource emergency mode
|
|
* Allow limiting the sync parallelism
|
|
* New flood-protected syslogging
|
|
* Some smaller improvements
|
|
* Update docs
|
|
* Update test suite
|
|
|
|
light0.1beta0.17
|
|
--------
|
|
* Major bugfix: race in logfile switchover could sometimes
|
|
lead to the wrong logfile (extremely rare to hit, but
|
|
potentially harmful).
|
|
* Disallow primary switching when some secondaries are
|
|
syncing.
|
|
* Fix logfile fetch from multiple peers.
|
|
* Fix computation of transitive closure (affected
|
|
log-purge-all, split brain detection, and many others).
|
|
* Fix incorrect emergency mode detection.
|
|
* Primaries no longer fetch logfiles (unnecessarily, only
|
|
makes a difference at concurrent split brain operations).
|
|
* Detached resources no longer fetch logfiles (unexpectedly).
|
|
* Myriads of smaller fixes.
|
|
|
|
light0.1beta0.16
|
|
--------
|
|
|
|
* Critical bugfix: "marsadm primary --force" was assumed to be given
|
|
by sysadmins only in case of emergency, when the network is down.
|
|
When given in non-emergency cases where the old primary continues
|
|
to run (/dev/mars/* being actively used and written), the
|
|
old primary could suddendly do a "logrotate" to the
|
|
new split-brain logfile produced by the new (second) primary.
|
|
Now two primaries should be able to run concurrently in split-brain
|
|
mode without mutually trashing their logfiles.
|
|
* primary --force now only works in disconnected mode, in order
|
|
to hinder unintended forceful creation of split brain during
|
|
normal operation.
|
|
* Stop fetching of logfiles behind split brain points (save space
|
|
at the target hosts - usually the data will be discarded later).
|
|
* Fixed split brain detection in userspace.
|
|
* leave-resource now waits for local actions to take place
|
|
(remote actions stay asynchronously).
|
|
* invalidate / join-resource now work only if a designated primary
|
|
exists (otherwise they would not know uniquely from whom
|
|
to start initial sync).
|
|
* Update docs, clarify scenarios intended <-> emergengy switching.
|
|
* Fixed mutual overwrite of deletion symlinks in case of racing
|
|
log-deletes spawned in parallel by cron jobs (resilience).
|
|
* Fixed races between deletion and re-erection (e.g. fresh
|
|
join-resource after leave-resource during network partitions).
|
|
* Fixed duration of network timeouts in case the network is down
|
|
(replaced non-working TCP_KEEPALIVE by explicit timeouts).
|
|
* New option --dry-run which does not really create symlinks.
|
|
* New command "delete-resource" (VERY DANGEROUS) for
|
|
forcefully destroying a resource, even when it is in use.
|
|
Intended only for _emergency_ cases when sysadmins are
|
|
desperate. Use only by hand, first run with --dry-run in order
|
|
to check what will happen!
|
|
* New command "log-purge-all" (potentially DANGEROUS) for
|
|
resolving split brain in desperate situations (cleanup of
|
|
leftovers). Only use by hand, first run with --dry-run!
|
|
* Lots of smaller imprevements / usability / readability etc.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.15
|
|
--------
|
|
|
|
* Introduce write throttling of bulk writers.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.14
|
|
--------
|
|
|
|
* Fix logfile transfer in case of "holes" created by
|
|
emergency mode.
|
|
* Fix "marsadm invalidate" after emergency mode had been entered.
|
|
* Fix "marsadm resize" capacity propagation from underlying LVM.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.13
|
|
--------
|
|
|
|
* Fix shutdown during operation (flying requests).
|
|
* Fix unnecessary Lamport clock propagation storms.
|
|
* Improve unnecessary page cache utilisation (mapfree).
|
|
* Update test suite.
|
|
|
|
|
|
light0.1beta0.12 and earlier
|
|
--------
|
|
|
|
There was no dedicated ChangeLog. For details, look at the
|
|
commit history.
|
|
|
|
Release Policy / Software Lifecycle
|
|
-----------------------------------
|
|
|
|
New source releases are simply announced by appearance of git tags.
|
|
|
|
General Conventions
|
|
-------------------
|
|
|
|
The git tags have the following meaning:
|
|
|
|
full* for future use.
|
|
|
|
light1.0 The first number indicates the main symlink tree revision, the second number indicates the sub revision. The main symlink tree revision is only updated upon (potentially) incompatible changes. Upgrades of main revisions will always be possible, but downgrades are not automatically supported. The sub revision will indicate new releases, and they may also indicate symlink tree extersions which are both forwards and backwards compatible. It may just happen that new features are not available with elder releases :)
|
|
Example: 1.0 ff will indicate the future main production revision.
|
|
Extensions: suffixes like pre1 indicate pre-releases. Other suffixes like testing2 are reserved for future use.
|
|
Hint: you may automatically convert the MARS git tags into Debian release tags by a regex inserting a ~ after any transition from a digit to an alpha character. We just omitted the ~ because git treats it as an invalid character. The corresponding Debian tags _should_ result in the correct ordering according to the Debian guidelines. Please report a bug if not :)
|
|
|
|
light0.1beta* Internal 1&1 releases during the pilot phase. May be used by the public, but you should know that the 1.0 symlink tree revision will appear soon.
|
|
|
|
light0.0alpha* Very old prototypes; never use them. Vital feature were missing. Only for historic inspection.
|