Release Conventions / Branches / Tagnames light0.1 series (now stable): - Asynchronous replication for the internal needs of shared hosting and other ITOPS departments at 1&1. - Unstable tagnames: light0.1beta%d.%d - Stable branch: light0.1.y - Stable tagnames: light0.1stable%02d light0.2 series (planned): - Improve network throughput by parallel TCP connections (in particular under packet loss). Also for the internal needs of shared hosting at 1&1. - Unstable tagnames: light0.2beta%d.%d (planned) - Stable branch: light0.2.y (planned) - Stable tagnames: light0.2stable%02d (planned) light0.3 series (planned): - Improve replication latency. - New pseudo-synchronous replication modes. For the internal needs of database folks at 1&1. - Unstable tagnames: light0.3beta%d.%d (planned) - Stable branch: light0.3.y (planned) - Stable tagnames: light0.3stable%02d (planned) light1.0 series (planned): - New symlink tree structure (future-proof) - Trying to additionally address public needs. - Potentially for Linux kernel upstream, - Unstable tagnames: light1.0beta%d.%d (planned) - Stable branch: light1.0.y (planned) - Stable tagnames: light1.0stable%02d (planned) full* (somewhen in future) WIP-* branches are for development and may be rebased onto anything at any time without notice. They will disappear eventually. *stable* branches mean that only bug fixes and documentation updates / clarifications will be applied. Updates to the test suite / new test cases potentially disguising bugs, and other minor additions of debugging code / paranoia code which may lead to discovery of bugs are also possible. Error messages / warnings and their error class may also be changed. NO NEW FEATURES, not even minor ones, except when absolutely necessary for a bugfix. light0.1stable08 -------- * Minor fix: after emergency mode, a versionlink was forgotten to create. This could lead to unnecessary reports of split brain and/or need for additional re-invalidate. * Minor fix: the predicate 'view-is-consistent' reported 'false' in some situations on secondaries when all was ok. * Minor fix: it was impossible to determine the 'is-consistent' from 'marsadm view' (without -1and1 suffix). Added a new [Cc-] flag. This is absolutely needed to determine whether the underlying disks must have the same checksum (provided that both disks are detached and the network works and fetch+replay had completed before the detach). * Updated docs to reflect this. * Minor fix: 'invalidate' did not work when the resource was not completely detached. Now it implicitly does a detach before starting invalidation. * Minor fix: wait-umount was waiting for umount of _all_ primaries during split brain. Now it waits only for umount of the local node. Notice that having multiple primaries in parallel is an erroneous state anyway. * Minor fix: leave-cluster did not work without --force. light0.1stable07 -------- * Minor fix: re-creation of a completely destroyed resource did not always work correctly light0.1stable06 -------- * Major fix: becoming primary was hanging in scarce situations. * Minor fix: some split brains were not always detected correctly. * Minor fix for Redhat openvz kernel builds. * Several fixes for 1&1 internal Debian builds. light0.1stable05 -------- * Major fix: incomplete calls to vfs_readdir() which could lead to incomplete symlink updates / replication hangs. * Minor fix: scarce race on replay EOF. * Separated kernel from userspace build environment. * Removed some potentially dangerous Kconfig options if they would be set to wrong values (robustness against accidentally producing bad kernel modules). * Dito: some additional checks against bad main Kconfig options (mainly for out-of-tree builds). * Separated contrib code from maintained code. * Added some pre-patches for newer kernels (WIP - not yet fully tested at all combinations) * Minor doc addition: LinuxTag 2014 presentation. light0.1stable04 -------- * Quiet annoying error message. * Minor readability improvements. * Minor doc updates. light0.1stable03 -------- * Major: fix internal aio race (could lead to memory corruption). * Fix refcounting in trans_logger. * Some minor fixes in module code. * Fix 1&1-internal out-of-tree builds. * Various minor fixes. * Update monitoring tools / docs (German, contributed by Jörg Mann). light0.1stable02 -------- * Fix sorting of internal data structure. * Fix IO error propagation at replay. light0.1stable01 -------- * Fix parallelism of logfile propagation: sometimes a secondary could get a more recent version than the primary had on stable storage after its crash, eventually leading to an (annoying) split brain. Some people might take this as a feature instead of a bug, but now the logfile transfer starts only after the primary _knows_ that the data is successfully committed to stable storage. * Fix memory leaks in error path. * Fix error propagation between client and server. * Make string allocation fully dynamic (remove limitation). * Fix some annoying messages. * Fix usage output of marsadm. * Userspace: contributed bugfix for Debian udev rules by Jörg Mann. * Improved debugging (only for testing). light0.1beta0.18 (feature release) -------- * New commands marsadm view-$macroname * New customizable macro processor * New err/warn/inf reporting via symlinks * Per-resource emergency mode * Allow limiting the sync parallelism * New flood-protected syslogging * Some smaller improvements * Update docs * Update test suite light0.1beta0.17 -------- * Major bugfix: race in logfile switchover could sometimes lead to the wrong logfile (extremely rare to hit, but potentially harmful). * Disallow primary switching when some secondaries are syncing. * Fix logfile fetch from multiple peers. * Fix computation of transitive closure (affected log-purge-all, split brain detection, and many others). * Fix incorrect emergency mode detection. * Primaries no longer fetch logfiles (unnecessarily, only makes a difference at concurrent split brain operations). * Detached resources no longer fetch logfiles (unexpectedly). * Myriads of smaller fixes. light0.1beta0.16 -------- * Critical bugfix: "marsadm primary --force" was assumed to be given by sysadmins only in case of emergency, when the network is down. When given in non-emergency cases where the old primary continues to run (/dev/mars/* being actively used and written), the old primary could suddendly do a "logrotate" to the new split-brain logfile produced by the new (second) primary. Now two primaries should be able to run concurrently in split-brain mode without mutually trashing their logfiles. * primary --force now only works in disconnected mode, in order to hinder unintended forceful creation of split brain during normal operation. * Stop fetching of logfiles behind split brain points (save space at the target hosts - usually the data will be discarded later). * Fixed split brain detection in userspace. * leave-resource now waits for local actions to take place (remote actions stay asynchronously). * invalidate / join-resource now work only if a designated primary exists (otherwise they would not know uniquely from whom to start initial sync). * Update docs, clarify scenarios intended <-> emergengy switching. * Fixed mutual overwrite of deletion symlinks in case of racing log-deletes spawned in parallel by cron jobs (resilience). * Fixed races between deletion and re-erection (e.g. fresh join-resource after leave-resource during network partitions). * Fixed duration of network timeouts in case the network is down (replaced non-working TCP_KEEPALIVE by explicit timeouts). * New option --dry-run which does not really create symlinks. * New command "delete-resource" (VERY DANGEROUS) for forcefully destroying a resource, even when it is in use. Intended only for _emergency_ cases when sysadmins are desperate. Use only by hand, first run with --dry-run in order to check what will happen! * New command "log-purge-all" (potentially DANGEROUS) for resolving split brain in desperate situations (cleanup of leftovers). Only use by hand, first run with --dry-run! * Lots of smaller imprevements / usability / readability etc. * Update test suite. light0.1beta0.15 -------- * Introduce write throttling of bulk writers. * Update test suite. light0.1beta0.14 -------- * Fix logfile transfer in case of "holes" created by emergency mode. * Fix "marsadm invalidate" after emergency mode had been entered. * Fix "marsadm resize" capacity propagation from underlying LVM. * Update test suite. light0.1beta0.13 -------- * Fix shutdown during operation (flying requests). * Fix unnecessary Lamport clock propagation storms. * Improve unnecessary page cache utilisation (mapfree). * Update test suite. light0.1beta0.12 and earlier -------- There was no dedicated ChangeLog. For details, look at the commit history. Release Policy / Software Lifecycle ----------------------------------- New source releases are simply announced by appearance of git tags. General Conventions ------------------- The git tags have the following meaning: full* for future use. light1.0 The first number indicates the main symlink tree revision, the second number indicates the sub revision. The main symlink tree revision is only updated upon (potentially) incompatible changes. Upgrades of main revisions will always be possible, but downgrades are not automatically supported. The sub revision will indicate new releases, and they may also indicate symlink tree extersions which are both forwards and backwards compatible. It may just happen that new features are not available with elder releases :) Example: 1.0 ff will indicate the future main production revision. Extensions: suffixes like pre1 indicate pre-releases. Other suffixes like testing2 are reserved for future use. Hint: you may automatically convert the MARS git tags into Debian release tags by a regex inserting a ~ after any transition from a digit to an alpha character. We just omitted the ~ because git treats it as an invalid character. The corresponding Debian tags _should_ result in the correct ordering according to the Debian guidelines. Please report a bug if not :) light0.1beta* Internal 1&1 releases during the pilot phase. May be used by the public, but you should know that the 1.0 symlink tree revision will appear soon. light0.0alpha* Very old prototypes; never use them. Vital feature were missing. Only for historic inspection.