mars/ChangeLog

IMPORTANT: the historic distinction between MARS Light and the future
MARS Full has been dropped. Now all versions are simply called "mars".

Old tagnames light* will remain valid, but newer names will follow the
convention s/light/mars/g (this means that the old version number counting
will be continued, only the "light" is substituted).


Meaning of stable tagnames
--------------------------

Example: mars0.1stable01:
             0             = version of on-disk data structures
                            (only incremented when downgrades are impossible)
                            (not incremented on backwards-compatible upgrades)
               1           = version of feature set
                stable     = feature set is frozen during this series
                      01   = bugfix revision

Example: mars0.2beta2.3:
			   The general idea is as before.
			   "beta" means that new features are roughly tested
			   in the lab, but not in production, so there may be
	                   some bugs. New features may be added during
			   the beta phase.

Example: mars0.3alpha*:
			  Never use this for production. Only for historic
			  code inspection.

Release Conventions / Branches / Tagnames
-----------------------------------------
	FLOW OF BUGFIXES: 0.1 -> 0.1a -> 0.1b -> 0.2 -> ...

	mars0.1 series (stable, will go EOL soon):
		 - Will run in parallel to branch 0.1a for a few
		   months, and then go EOL.
		 - Unstable tagnames: light0.1beta%d.%d (obsolete)
		 - Stable branch: mars0.1.y
		 - Stable tagnames: mars0.1stable%02d

	mars0.1a series (stable):
		 New master branch. Now stable.
		 This branch is operational for several years on
		 several thousands of servers, and several petabytes
		 of data.
		 - Unstable tagnames: light0.1abeta%d (obsolete)
		 - Stable branch: mars0.1a.y
		 - Stable tagnames: mars0.1astable%02d

	mars0.1b series (currently alpha):
		 This is an _imtermediate_ series between 0.1 and 0.2.
		 The goal is to improve _scalability_ to thousands of
		 hosts in one cluster, as well as thousands of resources.
		 Likely, this intermdiate branch will be merged into 0.2
		 and then continue development there. When this point
		 will arrive is uncertain at the moment.
		 Likely, the stabilization of the new scalability features
		 will occur together with the 0.2 series.
		 Reason for this: the rollout strategy at 1&1 to
		 thousands of machines wants to do small incremental
		 steps. The risk of directly going to 0.2 in _masses_
		 is minimized by first rolling out the really necessary
		 changes, and to postpone those developments which are
		 currently not yet really needed in mass deployment.

	mars0.2 series (currently in beta stage):
		   Mostly for internal needs of 1&1 (but not limited to that).
		 - Getting rid of the kernel prepatch! MARS may be built
		   as an external kernel module for any supported
		   kernel version. First prototype is only tested for
		   unaltered 3.2.x vanilla kernel, but compatibility to
		   further vanilla kernel versions (maybe even
		   Redhat-specific ones) will follow during the course of
		   the MARS mars0.2 stable series. The problem is not
		   compatibility as such, but _testing_ that it really
		   works. These tests need a lot of time.
		   => further arguments for getting to kernel upstream ASAP.
		 - Improved network throughput by parallel TCP connections
		   (in particular under packet loss).
		   Also called "socket bundling".
		   First benchmarks show an impressive speedup over
		   highly congested long-distance lines.
		 - Future-proof updates in the network protocol:
		   Mixed operation of 32/64bit and/or {big,low}endian
		 - Support for multi-homed network interfaces.
		 - Transparent data compression over low bandwidth lines.
		   Consumes a lot of CPU, therefore only recommended for
		   low write loads or for desperate network situations.
		 - Remote device: bypassing iSCSI. In essence,
		   /dev/mars/mydata can appear at any other cluster member
		   which doesn't necessarily need any local disks.
		 - Various smaller features and improvements.
		 - Unstable tagnames: mars0.2beta%d.%d (current)
		 - Stable branch: mars0.2.y (already in use for beta)
		 - Stable tagnames: mars0.2stable%02d (planned)

	mars0.3 series (planned):
		   (some might possibly go to 1.0 series instead)
		 - Improve replication latency.
		 - New pseudo-synchronous replication modes.
		   For the internal needs of database folks at 1&1.
		 - (Maybe) old test suite could be retired, a new
		   one is at github.com/schoebel/test-suite
		 - Unstable tagnames: mars0.3beta%d.%d (planned)
		 - Stable branch: mars0.3.y (planned)
		 - Stable tagnames: mars0.3stable%02d (planned)

	mars1.0 series (planned):
		 - Replace symlink tree by transactional status files
		   (future-proof)
		   This is required for upstream merging to the kernel.
		   It has further advantages, such as better scalability.
		 - Trying to additionally address public needs.
		 - Potentially for Linux kernel upstream,
		 - Unstable tagnames: mars1.0beta%d.%d (planned)
		 - Stable branch: mars1.0.y (planned)
		 - Stable tagnames: mars1.0stable%02d (planned)

	WIP-* branches are for development and may be rebased onto anything
	at any time without notice. They will disappear eventually.

	*stable* branches mean that only bug fixes and documentation
	updates / clarifications will be applied. Updates to the test suite /
	new test cases potentially disguising bugs, and other minor additions
	of debugging code / paranoia code which may lead to discovery
	of bugs are also possible. Error messages / warnings and their
	error class may	also be changed.
	NO NEW FEATURES, not even minor ones, except when absolutely
	necessary for a bugfix, or for an important usability improvement
	(such as clearer display of errors, hints for resolving them, etc).

-----------------------------------
Changelog for series 0.2:

(you need to checkout branch mars0.2.y to see any details)

-----------------------------------
Changelog for series 0.1b:

(you need to checkout branch mars0.1b.y to see any details)

-----------------------------------
Changelog for series 0.1a:
(you need to checkout branch mars0.1a.y to see any details)

	Designated master branch in future.
	Recommended for Football. Will get updates on
	the Football sub-project.
	Based on 0.1balpha4. Merge 0.1stable.
	Receive mainly fixes.

-----------------------------------
Changelog for series 0.1:

Attention! This branch will go EOL around February 2019.
	Branch mars0.1a.y is now the new master branch, and all
	commits from the 0.1 series will be fully merged.
	Upgrade is easy: just rollout the new marsadm version,
	install the new kernel modules, and load them where possible.
	Mixed operation of different versions is no problem,
	but is of course not the desired state, so keep this period
	as short as possible.
	Rollback is also easy.

	Motivation: branch 0.1a is productive for several years at 1&1.
	Experiences: now runs provably better than 0.1.y with
	better performance, smoother, etc.
	And even more stable, although the 0.1a releases were
	called "beta" up to now.

mars0.1stable69
	* Major improvement: compatibility to upstream kernel 4.9.x.

mars0.1stable68
	* Minor fix: in extremly rare cases and under further conditions,
	  detach could hang due to a race.
	  Workaround was possible by re-attaching.
	* Minor improvement: /dev/mars/mydata now disappears only after
	  writeback has finished. Although the old behaviour was correct,
	  certain userspace tool could have erronously concluded that
	  the primary has finished working. The new bevaiour is
	  hopefully more like to user expectance.
	* Minor improvement: propagate physical and logical sector
	  sizes from the underlying disk to /dev/mars/mydata.
	  This can affects mkfs and other tools for making better
	  decisions about their internal parameters.
	* Minor safeguard: disallow manual --ignore-sync override
	  when the target primary  is inconsistent, only relevant
	  for (non-existent) sysadmins who absolutely don't know what
	  they are doing when they are combining this with --force.
	  Systemadmins who really know what they are doing can use
	  fake-sync in front of it, and then they are explicitly stating
	  once again that they really want to force a defective system,
	  and that they really know the fact that it is defective.
	* Minor improvement: additional warning when network connections
	  are interrupted (asymmetrically), such as by mis-configuration
	  of network interfaces / routing / firewall rules / etc.

mars0.1stable67
	* Minor fix: don't unnecessarily alert sysadmins when no systemd
	  unit files are installed.
	* Minor doc update: new slides from LCA2019, updated old
	  slides from FrOSCon2018.
	* Minor doc update: describe some more use cases, add some
	  advice for managers.

mars0.1stable66
	* Critical fix, only relevant for kernels 4.3 to 4.4:
	  Due to a forgotten adaptation to newer kernels,
	  some userspace tools like xfs_repair could read/write
	  wrong data upon _large_ IO requests, and/or kernel memory
	  corruption could occur. Kernel-level filesystems
	  are typically _not_ affected because they typically use 4k
	  pages at maximum.
	  If you are operating such a kernel, please upgrade to
	  minimize any risks. You probably want userspace tools like
	  xfs_repair to not crash your kernel ;)
	  The problem was reproducibly detected at lab regression testing,
	  _before_ updating a big installation from kernel 3.16 to 4.4.
	  It did not show up with the old kernel.
	  Notice: kernels >4.6 are not yet supported at the moment,
	  but work on them is likely being continued during the next
	  months. Stay tuned.
	* Minor doc updates.

mars0.1stable65
	* Major fix, only observed during KASAN debugging:
	  Use-after-free which appears to splat only at Football
	  during final deletion of resources. Never observed at production.
	  Update if you are very cautious.
	* A few minor fixes, not relevant for production.
	* Minor doc improvements.

mars0.1stable64
	* Major regression: split-brain detection did not display
	  correctly.
	* Minor fix: rare race conditon on O_NONBLOCK networking.
	  Only observed during testing with kernel 4.9 (sorry, _all_ the
	  adaptations are not yet ready for release, but it is making
	  progress now).
	  I am not sure whether this bug could also trigger with kernel
	  4.4 or earlier, therefore I am releasing the fix beforehand.
	* Minor doc architectural explanations.

mars0.1stable63
	* Minor fix: when compiling for some newer kernels (only there),
	  schedule() could be called during wait for some condition,
	  worsening performance unnecessarily.
	* Minor improvement: starting join-resource in batches
	  was slow because each was waiting for cluster communication.
	  Use a manual "marsadm wait-cluster" before starting batches
	  of join-resource operations.
	* Doc: some clarifications on BigCluster scalability behaviour.

mars0.1stable62
	* Minor fix: race between join-resource and log-rotate.
	* Minor fix: report split brain logfile amount only when
	  actually detectable.
	* Minor improvement: shift annoying error message over
	  to Orphan state detection.
	* Football: update to Football-2.0-RC12
	* doc: some updates.

mars0.1stable61
	* Minor fix: in very rare cases where some symlinks are missing,
	  don't abort in try_to_avoid_splitbrain().
	* Minor improvement: better human-readable numbers.
	* Minor doc: more on asynchronous background operations.

mars0.1stable60
	* Major improvement: new option --ignore-sync allows primary
	  Handover without --force even when some sync is running
	  somewhere. Any running syncs will restart from scratch
	  (which might take some time, depending on LV size and
	  many more factors like the network).
	* Minor fix: split-cluster did not work correctly when no
	  resources were existing anymore, at all.
	* Doc: major update. More explanation on CAP theorem, and
	  on differences / commonalities with DRBD.

mars0.1stable59
	* Major fix: "marsadm up" did not work when sync could not
	  be started. Now does "best effort".
	* Minor fix: marsadm system interface was active when
	  not activated.
	* Minor usability improvement: new repliaction state "Orphaned"
	  indicates that logfiles are missing, and thus replication
	  is stuck.

mars0.1stable58
	* Major fix for Football / split-cluster: for safety,
	  cron deletes some blocking left-overs.
	* Major fix at _asymmetric_ split-cluster: ignore hindering
	  abort condition.
	* Minor fix: not all internal systemd links were removed upon
	  marsadm set-systemd-unit mydata "".
	* Doc: Football.
	* Doc: architectural treatment of centralized storage.

mars0.1stable57
	* Minor fix: silly deadlock upon scarce race at logging.
	  Without debug logging, probability should be extremely low
	  (only observed at rmmod).
	* Added initial version of systemd templates (for future backward
	  compatibility with branch 0.1a).
	* Doc: systemd templates.

mars0.1stable56
	* Minor fix: split-cluster could unnecessarily abort
	  in some cases.
	* Added initial version of submodule "football".
	  More updates will follow.

mars0.1stable55
	* Major fix: unnecessary / false positive split brain could
	  occur after the primary logfile was truncated, e.g. at crashes
	  or disk damages. Systematic triggering in masses was possible
	  by keeping /dev/mars/mydata mounted while _forcing_
	  a reboot _during_ (!) its umount (e.g. by patching the
	  "reboot" command and/or patching systemd dependencies
	  or similar to provoke this regularly).

mars0.1stable54
	* Major fix, only relevant for massive execution of
	  leave-resource, e.g. when playing Football (Tetris)
	  games:
	  When non-versioned symlinks were eventually deleted,
	  later re-creation did not always succeed.
	  Fixed by an new generic timestamp ordering approach.
	* Stability client-side fixes (could lead to stacktraces),
	  backported from branch 0.1a (were forgotten long ago).
	* Major doc update: new section on reliability of
	  storage architectures.
	  This explains why many BigCluster systems don't work as
	  expected.
	  Backed up by graphs and by mathematical formulas.
	  A must-read for anyone working in the storage area!

mars0.1stable53
	* Major fix: rare corner case of split brain was not displayed
	  correctly.
	* Major usablilty: show amount of data during split brain.
	  This hints the sysadmins about the size of future data loss
	  at later split brain resolution.
	* Minor workaround: crashed /mars filesystems may contain
	  completely damaged symlinks with timestamps in the far
	  distant future, e.g. year >3000 etc. Safeguard unusual
	  Lamport time slips by ignoring implausible values.
	* Major improvement: internal locking overhead reduced.
	* Minor improvment: reduce message trigger overhead.
	* Several minor improvements.
	* Doc updates.

mars0.1stable52
	* Major contrib: new example scripts for MARS background data
	  migration during production. 1&1-specific code in a separate
	  plugin. You can write your own plugins for adaptation to
	  your needs.
	* Minor fix: limit the size of the writeback buffer by the
	  rest space in /mars. This is only relevant when
	  /mars is dimensioned smaller than RAM (which should
	  never be the case in production systems, but might happen
	  accidentally or for testing).
	  Analogously, limit the maximum logfile size.
	* Minor fix: prevent creation of many tiny logfiles over time
	  when secondaries are not catching up.
	  The default threshold is a minimum of 5 GB size when more
	  than 10 logfiles are already present.
	* Minor fix: cleanup old internal .tmp-* symlinks which might
	  remain as leftovers when marsadm is dying at the wrong
	  moment.
	* Minor improvement: don't run O(n) mapfree under spinlock.
	  More speed improvements under preparation; will result in O(k).
	* Some more minor improvements.

mars0.1stable51
	* Minor fix: don't abort log-delete-all too early when there
	  are holes in the deletion sequence numbers.
	* Backport of marsadm cron from branch 0.1a, in order to systematically
	  support mixed operation of different MARS versions in bigger installations
	  (avoid confusion at junior sysadmins and at monitoring staff).
	* Rectified the semantics of log-delete, which now does the same as
	  log-delete-all. Single deletion is only needed for testing, and
	  has been renamed to log-delete-one.
	  Leaving the old semantics would have been an operational risk
	  when junior sysadmins or 24/7 surveillance people are not carefully
	  looking at the details of semantics. Now everything is hopefully
	  as everybody not familiar with MARS would naively assume.
	* Doc update.

mars0.1stable50
	* Major usability improvement (backport from 0.1a):
	  marsadm shows number of replicas of each resource, out of total number
	  of cluster members. Example: [2/4]
	* Minor fix: automatically cleanup internal backups produced by the new
	  merge-cluster / split-cluster after 1 week.
	* Minor fix: also cleanup some new symlink types replicated through
	  the network when running asymmetric clusters with mixed branches
	  0.1 and 0.1a.
	* Minor annoyance: silence split-cluster error message when no
	  resources are present.

mars0.1stable49
	* Backports of new marsadm commands merge-cluster and split-cluster.
	  The new functionality is needed for background migration of resources.
	  Please be aware that this branch has not been constructed for
	  scalability in the dimension of #nodes, so don't merge too many
	  nodes and use split-cluster after each background migration.
	  Better scalability is / will be addressed at the 0.1a and 0.1b
	  branches. However, currently they are not yet stable.
	  No changes at the kernel module (besides some bug fixes);
	  this is solely done at userspace level.
	  The new userspace-level commands should have almost no intersection
	  with (and therefore no impact onto) other parts of this well-proven
	  stable branch.
	* Backports of new wait-cluster implementation.
	  This avoids irritating messages after split-cluster.

mars0.1stable48
	* Critical fix: DDOS-like attacks at the MARS ports (or similar caused
	  by bugs / misbehaviour) are prevented by configurable limits
	  /proc/sys/mars/handler_dent_limit and
	  /proc/sys/mars/handler_limit .
	* Critical safeguard: when the network is interruted for a long time
	  while the log-rotate frequency is very high and a lot of resources
	  (exceeding the official limits as documented) had been used, masses of
	  deletion links may accumulate in /mars/todo/. First, already
	  existing deletions to the same targets are reused now.
	  Second, a maximum limit (of currently 512 entries)
	  is enforced, and a warning is spit when too many deletions
	  are accumulated over time.
	* Minor fix: earlier detection of socket hangups.

mars0.1stable47
	* Critical fix: leave-cluster could lead to deadlocks, also
	  on remote nodes.
	* Contrib: mass automation script (unmaintained).

mars0.1stable46
	* Major fix: bugfix from 0.1stable44 (state "Detached" was
	  reported too early) was incorrect, now fixed.
	* Minor fix: display of host lists in special case of
	  create-resource was misleading.

mars0.1stable45
	* Major fix: on	secondaries, orphane files and symlinks were
	  sometimes created in /mars and could accumulate over a long time.
	  After several months or years of operation, the /mars directory
	  could appear being full via "df /mars", but "du -s /mars" was
	  not reporting the hidden space allocation.
	  Also, upon remount or reboot the cleanup of orphane files
	  could take a rather long time. Workaround was possible by
	  "rmmod mars; umount /mars; mount /mars; modprobe mars".
	  Fixed by regularly pruning the dentry cache of the /mars
	  filesystem.

mars0.1stable44
--------
	* Major fix: state "Detached" was reported too early,
	  before the underlying disk was really closed.
	* Doc: new updated slides from FrOSCon 2017.
	  New architectural comparison with Big Storage Clusters
	  in terms of scalability, reliability and costs.

mars0.1stable43
--------
	* Major fix, only relevant for k >= 3 replicas:
	  Logfile fetch did not switch over to another alive peer
	  upon _speicfic_ network problems with the _current_
	  peer. As a consequence, an unaffected replica could
	  hang. Workarould was possible by pause-fetch /
	  resume-fetch or by fixing the network :)

mars0.1stable42
--------
	* Minor fix: ssh IPs and port numbers are automatically probed
	  on join-cluster.
	* Minor compatibility to branch mars.1b.y: join-resource
	  does additional rsync for safety.
	* Minor fix: rate display was not going down to 0
	  on switchoff or long pauses.
	* Minor improvement: show peers in internal debugging info.

mars0.1stable41
--------
	* Minor fix: a scarce race could lead to an unnecessary split brain
	  when umounting _after_ role transition from primary to secondary.

mars0.1stable40
--------
	* Potentially critical fix: on very fast machines, and with
	  extremely low probability, a race in AIO could lead to a kernel
	  page fault.
	  For maximum safety, update to this version is recommended.

mars0.1stable39
--------
	* Minor fix: hangs of logfile updates. Found by stress-testing
	  on fast hardware over 10GBit network links. Might explain
	  some extremely rare (1 per several millions of operations hours)
	  production hangs on secondaries. Workaround possible by
	  "pause-fetch; resume-fetch".
	* Minor fixes of rare kthread retarding under very high load.
	* Minor improvement: add version number to "marsadm version" which
	  can be used for future compatibilty checking with respect to
	  new features.

mars0.1stable38
--------
	* Compile without pre-patch on some kernel versions!
	  Whether the pre-patch is applied will be detected automatically.
	  However, there is some (hopefully minor) performance penalty when
	  the pre-patch is missing.
	  This will be addressed in a future release (but might go
	  to branch 0.1b instead, not yet decided).
	  Tested with vanilla kernels 3.10.105, 3.14.79, 3.16.43,
	  4.1.39, 4.4.67.
	  Vanilla kernels 4.8.x and later are _not_ yet working
	  (independently from pre-patches). This will be addressed
	  in a future release.
	* No functional changes otherwise. Rollback to prior versions
	  should be easy. Please report any issues.
	* Updated docs describing build methods.

mars0.1stable37
--------
	* Minor fix: secondary logfile replication could hang in the
	  extremely unusual case that the expected primary logfile size
	  gets shortened after a crash followed by reboot.
	  Workaround was possible via "pause-fetch; resume-fetch".

mars0.1stable36
--------
	* Doc: new slides from GUUG2017, both in English and in German.
	  Some very important hints for cost savings. May easily save
	  you a few millions when operating some petabytes of data.
	* Doc: new chapter on cost savings in mars-manual.pdf.
	  Some parts of German oral explanations from the GUUG conference
	  translated to English for my English-speaking audience.
	  More to come later (hopefully; I need to get the time).

mars0.1stable35
--------
	* Minor fix: when syncing a big resource (e.g. 40TiB) over an 1GBit
	  uplink, the sync may take longer than 1 day. This increases the
	  probability for triggering an unintended restart of that sync
	  from scratch.
	  Among further obscure preconditions, more than 5 logfiles must
	  exist such that the wrong assumption of an emergency mode can
	  happen at the secondary. In order to trigger the bug more likely,
	  it is therefore helpful to misconfigure /etc/cron.d/mars by
	  log-rotate'ing every 10 minutes, but doing log-delete-all only
	  once an hour (which contradicts my upstream documentation and
	  unnecessarily wastes valuable storage space in /mars).
	  Fixed by correction of a typo-like error.

mars0.1stable34
--------
	* Minor fix: in some rare cases, when lots of gigabytes had to be
	  replayed in one big slurp, the replay position wasn't updated
	  during a longer time. Some admins were complaining that it
	  appeared "stuck" although it worked in reality.
	  Improved by increasing the update frequency of the replay link.
	* Minor fix: after network errors, sometimes the sync restarted
	  from scratch, unnecessarily.
	* Minor fix: under rare conditions, rmmod could hang forever.
	  A known reason has been fixed. Other theoretical reasons
	  hopefully improved by some further safeguards.

mars0.1stable33
--------
	* Minor regression from stable29:
	  After a primary crash, without switchover, and when the primary
	  recovery phase involves a logrotate to an empty new logfile
	  which had been in the meantime shortly before the crash but
	  has not yet been used before the crash (race condition),
	  a kernel NULL pointer deref may stop the main thread.
	  Workaround: either remove the empty logfile by hand,
	  or just do a failover to the other side.

mars0.1stable32
--------
	* Critical regression between stable30 and stable31 (can be avoided
	  by simply using stable30 for affected kernels): on _old_ kernels
	  (before 4.3.x) the removal of merge_bvec_fn() (see upstream commit
	  8ae126660fddbeebb9251a174e6fa45b6ad8f932) can lead to fatal
	  crashes at the primary side.
	  Fixed by using (hopefully) proper #ifdef's according to the
	  kernel version.
	  Notice: between stable30 and stable31 no true MARS fixes were
	  made (since no bugs were found). This strategy is likely to
	  continue for a while, for newer adaptations to even newer kernels.
	  In case of problems, go back. And, please, report it to me :)

mars0.1stable31
--------
	* New _minimum_ pre-patches for vanilla LTS kernels 3.2.x to 4.7.x.
	  For security reasons, please prefer them over the old _generic_
	  pre-patch versions which expose many unnecessary EXPORT_SYMBOL
	  to potential attackers.
	* Adaptions to vanilla kernels up to 4.7.x.
	  Note: 4.8rc-* does not yet work.
	* Regression testing with many kernel versions: looks fine.

mars0.1stable30
--------
	* Minor fix: in very rare cases of a primary crash, a missing
	  versionlink could lead to a hang.
	* Minor fix: improved error reporting of replay code.
	* Minor fix: improved switchback to former primary side.
	* Minor fix: systematically add some missing macros.
	* Minor improvements: add some example systemd unit and other
	  contrib stuff like a cronjob example.
	* Doc: minor additions and improvements.

mars0.1stable29
--------
	* Minor fix: on very fast hardware and networks, sync could take
	  a while for terminating.
	* Minor fix: external module build.
	* Major usability improvement: new expert commands marsadm
	  lowlevel-ls-host-ips, lowlevel-set-host-ip, lowlevel-delete-host.
	  Necessary for moves between networks, dedicated replication IPs,
	  etc.
	* Minor doc update.

mars0.1stable28
--------
	* Doc: describe new naming conventions.
	  MARS Light is now simply called MARS.
	  No distinction between "Light" and the future "Full" anymore.
	  Please note that the git branches light0.1.y and light0.2.y have
	  been renamed to mars0.1.y and mars0.2.y respectively.
	* Minor sourcecode cleanup: s/light//g or s/light/main/g
	  where appropriate.
	  No other changes in the sourcecode, deliberately.
	  In case anyone encounters any build problems compiling MARS,
	  this release is separated just for the sake of build testing,
	  or Debian packaging testing, etc.
	* Doc: minor clarifications.

mars0.1stable27
light0.1stable27
--------
	* Critical fix: typo in sync progress comparison code could lead
	  to data version mismatches during sync when alternating with
	  replay. Only observed at a certain new hardware class, and only
	  while testing with an extremely high load (9 loaded resources
	  in parallel to 9 concurrent syncs). As a workaround,
	  echo 0 > /proc/sys/mars/sync_flip_interval_sec can be used.
	  Nevertheless, update is highly recommended!
	* Major fix: slow memory leak (regression from light0.1stable26).
	  Only when starting the transaction logger (i.e. primary is typically
	  not affected). But don't let run it for a longer time.
	  Monitoring is possible via /proc/slabinfo (size-64 or siblings).
	* Minor fix: join-cluster did not check for duplicate IP addresses.
	* Minor fixes: some unnecessary annoying error messages.
	* Docu: new slides from GUUG 2016 in Köln.

light0.1stable26
--------
	* Minor fixes: some primitive macros were reporting misleading or
	  even wrong values at split brain, or during/after emergency mode.
	  Some high-level macros as well as try_to_avoid_split_brain
	  should work better / more reliable now.
	* Minor fix: potential deadlock after crash reboot, or after
	  defective /mars filesystem. Never observed in practice.
	* Minor safeguard: unnecessary split brain could emerge at
	  secondaries under extremely rare and strange conditions.
	  Unsure whether it ever occurred in practice.
	* Minor usability improvement: show incorrect permissions on /mars.
	  Some other sysadmin tools like Puppet seem to have their own
	  default notion of "secure permissions" ;)
	* Minor doc reorg, better chapter structure.

light0.1stable25
--------
	* Major fix: in rare cases "marsadm primary" (without --force)
	  could go into an endless loop, even if --timeout= was specified.
	* Minor fix: in rare cases of hanging or defective IO, crashes
	  of the primary could replicate versionlinks to the secondary,
	  but after reboot they were missing at the primary because of
	  of hanging IO or other IO / RAID controller problems.
	  Now using sync_filesystem() for either ensuring actuality,
	  or for letting the mars_light main control thread hang
	  (which will hopefully be noticed soon by monitoring).
	* Minor fix: join-cluster uses rsync, which could abort due to
	  vanished filesystem objects while the primary is actively running.
	  Now it should tolerate such "errors".
	* Minor fixes / additions at primitive macros.
	* Tiny doc update.

light0.1stable24
--------
	* Skip this release due to a regression.

light0.1stable23
--------
	* Minor fix: the new replay-code error message was forgotten
	  to reset at secondaries. Now the annoying old error message
	  disappears after the next successful logrotate.
	* Minor fixes of internal marsadm code (not in use until now).
	* Minor doc update.

light0.1stable22
--------
	* Critical fix for non-storage servers: the /mars directory
	  was readable by ordinary non-root users, opening a potential
	  security hole. Originally MARS was designed for standalone
	  storage servers solely, but now it is increasingly deployed to
	  machines where ordinary users can log in.
	  Update recommended, but only urgent for potentially affected
	  installations.
	* Minor fix: when a logfile was damaged (observed at defective
	  hardware), this was often (but not always) detected by the
	  md5 data checksums in the transaction logfiles. So far so good.
	  The replay / recovery process stopped for a very good reason.
	  But it was not easily possible to _force_ any of the resource
	  members into primary role when the defect was already present at
	  the _primary_ (which happend once during 7 millions of operating
	  hours, and at a primary site which proved defective afterwards),
	  and the defect had been replicated to all secondaries.
	  As a workaround, the resource could be destroyed via leave-resource
	  everywhere, and re-surrected from scratch. Clumsy.
	  Now an md5 checksum error in the middle of a logfile is
	  treated similarly to an EOF. "primary --force" will succeed now,
	  without applying the defective data (as before).
	  Split brain will result for sure in such a case.
	* Minor improvement: md5 logfile checksum errors are now displayed
	  directly in the diskstate macro (and therefore also at plain
	  "view").
	* Minor improvement: when "marsadm view all" told you "InConsistent"
	  as the disk state, this was _formally correct_ because it related
	  to the state of the _disk_, not to the state of the replication.
	  The former message could appear regularly during ordinary
	  out-of-order writeback at the primary side, without violating
	  the consistency of /dev/mars/mydata.
	  However, many people were confused and alarmed by the irritating
	  message.
	  Now a better wording is used: "WriteBack" and "Recovery" describes
	  more intuitively what is really happening :)
	* Minor doc improvements.

light0.1stable21
--------
	* Hint: now MARS has been rolled out to more than 1600 servers,
	  including some MySQL database servers, and has collected more
	  than 6 millions of operation hours.
	* Minor fixes, none of them observed in practice, only found
	  by testing while working on new features:
	  - potential read page fault
	  - potential deadlock
	  - incorrect remote symlink update under untypical circumstances

light0.1stable20
--------
	* Hint: MARS is now running on more than 850 storage servers,
	  and has collected more than 4.5 millions of operation hours.
	  There were no new incidents with customer impact since the last
	  major bugfix (more than 3 millions of operation hours since then).
	  It is difficult to deduce a reliability from that, but it appears
	  that at least 99.999%, if not 99.9999% are now real for the
	  MARS component as a standalone component (not to be confused with
	  overall system reliability). Our storage hardware is clearly much
	  less reliable. MARS does compensate these defects all the time.

	* Minor fix: memory leak in networking code, does not occur
	  at light0.1 operations (but maybe future versions of MARS).
	* Doc: add presentation slides from Froscon2015.

light0.1stable19
--------
	* Minor safeguard: warn when somebody tries leave-resource --host=
	  for a damaged host, and later the dead host resurrects in an
	  unreasonable way.
	* Doc update: describe use cases for DRBD vs MARS more clearly.
	* Minor spelling fixes.

light0.1stable18
--------
	* Minor safeguard: prevent join-resource when previous log-purge-all
	  has been forgotten. Prevent create-resource also when previous
	  delete-resource has been forgotten. Anyway, this happens only in
	  very exotic repair scenarios after very heavy failures.
	* Doc updates: simplify descriptions of split-brain resolution and
	  emergency mode resolution. Nowadays 'invalidate' will do everything
	  in all tested cases; the more complex alternative methods have
	  been moved to the appendix.

light0.1stable17
--------
	* Minor fix: stacktrace / oops in aio callback path due to a
	  subtle race, observed once during 2.5 millions of operation hours.
	  In the observed case, the secondary was hanging, without
	  customer impact. However, the error class could potentially
	  occur also at the primary side. Probably the bug was triggered
	  by a hardware problem from the RAID controller.

light0.1stable16
--------
	* Minor fix: sync could take a long time to complete under high
	  application load, similarly to a live-lock.
	* Some smaller minor fixes for annoying messages.
	* Contrib: added configurable Nagios check.
	* Contrib: added some example scripts which could be used by
	  clustermanagers etc.
	* Doc: important new section on pitfalls when using existing
	  clustermanagers UNMODIFIED for long distance replication.
	  PLEASE READ!

light0.1stable15
--------
	* NOTICE: MARS succeeded baptism on fire at 04/22/2015 when a whole
	  co-location had a partial power blackout, followed by breakdown
	  of air conditioning, followed by mass hardware defects due to
	  overheating. MARS showed exactly 0 errors when (emergency)
	  switching to another datacenter was started in masses.
	* Major fix of race in transaction logger: the primary could hang
	  when using very fast hardware, typically after ~24000 operation
	  hours. The problem was noticed 6 times during a grand total of
	  more than 1,000,000 operation hours on a mixed hardware park,
	  showing up only on specific hardware classes. Together with 3
	  other incidents during early beta phase which also had customer
	  impact, this means that we have reached a reliability of about
				  ===> 99.999%
	  After this fix, the reliability should grow even higher.
	  A workaround for this bug exists:
	  # echo 2 > /proc/sys/mars/logger_completion_semantics
	  Update is only mandatory when you cannot use the workaround.
	* Minor improvement in marsadm: re-allow --force combined with "all".
	  This is highly appreciated for speeding up operations / handling
	  during emergency datacenter switchover.
	* Various smaller improvements.
	* Contrib (unsupported): example rollout script for mass rollout.

light0.1stable14
--------
	* Minor safeguard: modprobe mars will refuse to start when the
	  cluster UUID is missing.
	* Minor fix: external race in marsadm resize, only relevant
	  for scripting.
	* Minor fix: potential race on plugged IO requests.
	* Clarify output of marsadm view. Many systematical improvements
	  and hints.
	* Add some unevitable macros for scripting / automation.
	* Various tiny improvements.

light0.1stable13
--------
	* Critical safeguard for accidental join-cluster with wrong argument:
	  make UUID mandatory, disallow completely unrelated hosts to
	  communicate symlink tree updates when their UUIDs mismatch.
	* Minor fix: leave-resource --host=other did not work when disks
	  were named differently throughout the cluster.
	* Minor fix: detach --host=other --force (which is needed as a
	  precondition) did not work.
	* Various minor fixes and clarifications. "marsadm view all"
	  now reports the communication status in the cluster.

light0.1stable12
--------
	* Critical (but usually not extremely relevant) fix:
	  When emergency mode occurs just during a sync, the target could
	  remain inconsistent without notice. Now noticed.
	  You always could/should manually invalidate whenever an
	  emergency mode appeared.
	  Now this is automatically fixed by restarting any sync from
	  scratch (if one was actually running before; otherwise consistency
	  was never violated).
	* Major documentation update / corrections.
	* Major (but less relevant) fix: leave-cluster did not really work.
	* Minor fix (regression): rmmod could hang when sync was running.
	* Various minor fixes and clarifications.

light0.1stable11
--------
	* Major documentation update. mars-manual.pdf increased from
	  66 to 80 pages. Please read! You probably should know this.
	* Minor fixes: better cleanup on invalidate / leave-resource.
	* Minor clarifications: more precise EIO error codes, more verbose
	  error reporting via "marsadm cat".

light0.1stable10
--------
	* Major fixes of internal network protocol errors, leading to
	  internal shutdown of sockets, which were transparently re-opened.
	  It could affect network performance. Not sure whether
	  stability was also affected (probably under extremely high load);
	  for better safety you should upgrade.
	* Major fix from Manuel Lausch: regex parsing sometimes went
	  completely wrong when hostnames followed a similar name scheme
	  than internal symlinks.
	* Major, only relevant for k>2 replicas: fix wrong internal sharing
	  of data structures resulting from parallel data connections.
	* Minor fix: race in fake-sync.
	* Minor fix: race in invalidate.
	* Minor, only for k>2 replicas: fix direct primary handover when
	  some non-involved hosts are currently unreachable.
	* Minor: improve becoming primary during split brain.
	* Minor: improve becoming primary when emergency mode starts.
	* Minor: silence some annoying stderr messages.
	* Several internal minor fixes and clarifications.

light0.1stable09
--------
	* Major fix of scarce race (potentially critical): the bio response
	  thread could terminate too early, leading to a premature dealloc
	  of kernel memory. This has only been observed on slow virtual
	  machines with slow virtual devices, and very high load on k=4
	  replicas. This could potentially affect the stability of the system.
	  Although not observed at production machines at 1&1, I recommend
	  updating production machines to this release ASAP.
	* Major usability fix: incorrect commandline options of marsadm
	  were just ignored if they appeared after the resource argument.
	  Misspellings could cause undesired effects. For instance,
	  "marsadm delete-resource vital --force --MISSPELLhost=banana"
	  was accidentally destroying the primary during operation (which
	  is _possible_ when using --force, and this was even a _required_
	  sort of "STONITH"-like feature -- however from a human point
	  of view it was intended to destroy _another_ host, so this was
	  an unexpected behaviour from a sysadmin point of view).
	* Major workaround: the concept "actual primary" is wrong, because
	  during split brain there may exist several primaries. Do not
	  use the macro view-actual-primary any longer. It is deprecated now.
	  Use view-is-primary instead, on each host you are interested in.
	* Minor fix: "marsadm invalidate" did not work in some weired
	  split brain situations / was not equivalent to
	  "marsadm leave-resource $res; marsadm join-resource $res".
	  The latter was the old workaround to fix the situation.
	  Now it shouldn't be necessary anymore.
	* Minor fix: pause-fetch could take very long to terminate.
	* Minor fix: marsadm wait-cluster did not wait for all hosts
	  particiapting in the resource, but only for one of them.
	  This is only relevant for k>2 replicas.
	* Minor fix: the rates displayed by "marsadm view" did not drop down
	  to 0 when no progress was made.
	* Minor fix: logging to syslog was incomplete.
	* Minor usability fix: decrease boring speakyness of "log-rotate"
	  and "log-delete" for cron jobs.
	* Minor fixes: several internal awkwardnesses, potentially affecting
	  performance and/or stability in weired situations.

light0.1stable08
--------
	 * Minor fix: after emergency mode, a versionlink was forgotten
	   to create. This could lead to unnecessary reports of split
	   brain and/or need for additional re-invalidate.
	 * Minor fix: the predicate 'view-is-consistent' reported 'false'
	   in some situations on secondaries when all was ok.
	 * Minor fix: it was impossible to determine the 'is-consistent'
	   from 'marsadm view' (without -1and1 suffix). Added a new [Cc-]
	   flag. This is absolutely needed to determine whether the
	   underlying disks must have the same checksum (provided that
	   both disks are detached and the network works and fetch+replay
	   had completed before the detach).
	 * Updated docs to reflect this.
	 * Minor fix: 'invalidate' did not work when the resource was not
	   completely detached. Now it implicitly does a detach before
	   starting invalidation.
	 * Minor fix: wait-umount was waiting for umount of _all_ primaries
	   during split brain. Now it waits only for umount of the local node.
	   Notice that having multiple primaries in parallel is an
	   erroneous state anyway.
	 * Minor fix: leave-cluster did not work without --force.

light0.1stable07
--------
	 * Minor fix: re-creation of a completely destroyed resource
	   did not always work correctly

light0.1stable06
--------
	 * Major fix: becoming primary was hanging in scarce situations.
	 * Minor fix: some split brains were not always detected correctly.
	 * Minor fix for Redhat openvz kernel builds.
	 * Several fixes for 1&1 internal Debian builds.

light0.1stable05
--------
	 * Major fix: incomplete calls to vfs_readdir()
	   which could lead to incomplete symlink updates /
	   replication hangs.
	 * Minor fix: scarce race on replay EOF.
	 * Separated kernel from userspace build environment.
	 * Removed some potentially dangerous Kconfig options
	   if they would be set to wrong values (robustness against
	   accidentally producing bad kernel modules).
	 * Dito: some additional checks against bad main Kconfig options
	   (mainly for out-of-tree builds).
	 * Separated contrib code from maintained code.
	 * Added some pre-patches for newer kernels
	   (WIP - not yet fully tested at all combinations)
	 * Minor doc addition: LinuxTag 2014 presentation.

light0.1stable04
--------
	 * Quiet annoying error message.
	 * Minor readability improvements.
	 * Minor doc updates.

light0.1stable03
--------
	 * Major: fix internal aio race (could lead to memory corruption).
	 * Fix refcounting in trans_logger.
	 * Some minor fixes in module code.
	 * Fix 1&1-internal out-of-tree builds.
	 * Various minor fixes.
	 * Update monitoring tools / docs (German, contributed by Jörg Mann).

light0.1stable02
--------
	 * Fix sorting of internal data structure.
	 * Fix IO error propagation at replay.

light0.1stable01
--------
	 * Fix parallelism of logfile propagation: sometimes a secondary
	   could get a more recent version than the primary had on stable
	   storage after its crash, eventually leading to an (annoying)
	   split brain. Some people might take this as a feature instead
	   of a bug, but now the logfile transfer starts only after the
	   primary _knows_ that the data is successfully committed to
	   stable storage.
	 * Fix memory leaks in error path.
	 * Fix error propagation between client and server.
	 * Make string allocation fully dynamic (remove limitation).
	 * Fix some annoying messages.
	 * Fix usage output of marsadm.
	 * Userspace: contributed bugfix for Debian udev rules by Jörg Mann.
	 * Improved debugging (only for testing).

light0.1beta0.18 (feature release)
--------
	 * New commands marsadm view-$macroname
	 * New customizable macro processor
	 * New err/warn/inf reporting via symlinks
	 * Per-resource emergency mode
	 * Allow limiting the sync parallelism
	 * New flood-protected syslogging
	 * Some smaller improvements
	 * Update docs
	 * Update test suite

light0.1beta0.17
--------
	 * Major bugfix: race in logfile switchover could sometimes
	   lead to the wrong logfile (extremely rare to hit, but
	   potentially harmful).
	 * Disallow primary switching when some secondaries are
	   syncing.
	 * Fix logfile fetch from multiple peers.
	 * Fix computation of transitive closure (affected
	   log-purge-all, split brain detection, and many others).
	 * Fix incorrect emergency mode detection.
	 * Primaries no longer fetch logfiles (unnecessarily, only
	   makes a difference at concurrent split brain operations).
	 * Detached resources no longer fetch logfiles (unexpectedly).
	 * Myriads of smaller fixes.

light0.1beta0.16
--------

	 * Critical bugfix: "marsadm primary --force" was assumed to be given
	   by sysadmins only in case of emergency, when the network is down.
	   When given in non-emergency cases where the old primary continues
	   to run (/dev/mars/* being actively used and written), the
	   old primary could suddendly do a "logrotate" to the
	   new split-brain logfile produced by the new (second) primary.
	   Now two primaries should be able to run concurrently in split-brain
	   mode without mutually trashing their logfiles.
	 * primary --force now only works in disconnected mode, in order
	   to hinder unintended forceful creation of split brain during
	   normal operation.
	 * Stop fetching of logfiles behind split brain points (save space
	   at the target hosts - usually the data will be discarded later).
	 * Fixed split brain detection in userspace.
	 * leave-resource now waits for local actions to take place
	   (remote actions stay asynchronously).
	 * invalidate / join-resource now work only if a designated primary
	   exists (otherwise they would not know uniquely from whom
	   to start initial sync).
	 * Update docs, clarify scenarios intended <-> emergengy switching.
	 * Fixed mutual overwrite of deletion symlinks in case of racing
	   log-deletes spawned in parallel by cron jobs (resilience).
	 * Fixed races between deletion and re-erection (e.g. fresh
	   join-resource after leave-resource during network partitions).
	 * Fixed duration of network timeouts in case the network is down
	   (replaced non-working TCP_KEEPALIVE by explicit timeouts).
	 * New option --dry-run which does not really create symlinks.
	 * New command "delete-resource" (VERY DANGEROUS) for
	   forcefully destroying a resource, even when it is in use.
	   Intended only for _emergency_ cases when sysadmins are
	   desperate. Use only by hand, first run with --dry-run in order
	   to check what will happen!
	 * New command "log-purge-all" (potentially DANGEROUS) for
	   resolving split brain in desperate situations (cleanup of
	   leftovers). Only use by hand, first run with --dry-run!
	 * Lots of smaller imprevements / usability / readability etc.
	 * Update test suite.

light0.1beta0.15
--------

	 * Introduce write throttling of bulk writers.
	 * Update test suite.

light0.1beta0.14
--------

	 * Fix logfile transfer in case of "holes" created by
	   emergency mode.
	 * Fix "marsadm invalidate" after emergency mode had been entered.
	 * Fix "marsadm resize" capacity propagation from underlying LVM.
	 * Update test suite.

light0.1beta0.13
--------

	 * Fix shutdown during operation (flying requests).
	 * Fix unnecessary Lamport clock propagation storms.
	 * Improve unnecessary page cache utilisation (mapfree).
	 * Update test suite.


light0.1beta0.12 and earlier
--------

	There was no dedicated ChangeLog. For details, look at the
	commit history.

Release Policy / Software Lifecycle
-----------------------------------

	New source releases are simply announced by appearance of git tags.