mars/ChangeLog

221 lines
9.2 KiB
Plaintext

Release Conventions / Branches / Tagnames
light0.1 series (now stable):
- Asynchronous replication for the internal needs of
shared hosting and other ITOPS departments at 1&1.
- Unstable tagnames: light0.1beta%d.%d
- Stable branch: light0.1.y
- Stable tagnames: light0.1stable%02d
light0.2 series (planned):
- Improve network throughput by parallel TCP connections
(in particular under packet loss).
Also for the internal needs of shared hosting at 1&1.
- Unstable tagnames: light0.2beta%d.%d (planned)
- Stable branch: light0.2.y (planned)
- Stable tagnames: light0.2stable%02d (planned)
light0.3 series (planned):
- Improve replication latency.
- New pseudo-synchronous replication modes.
For the internal needs of database folks at 1&1.
- Unstable tagnames: light0.3beta%d.%d (planned)
- Stable branch: light0.3.y (planned)
- Stable tagnames: light0.3stable%02d (planned)
light1.0 series (planned):
- New symlink tree structure (future-proof)
- Trying to additionally address public needs.
- Potentially for Linux kernel upstream,
- Unstable tagnames: light1.0beta%d.%d (planned)
- Stable branch: light1.0.y (planned)
- Stable tagnames: light1.0stable%02d (planned)
full* (somewhen in future)
WIP-* branches are for development and may be rebased onto anything
at any time without notice. They will disappear eventually.
*stable* branches mean that only bug fixes and documentation
updates / clarifications will be applied. Updates to the test suite /
new test cases potentially disguising bugs, and other minor additions
of debugging code / paranoia code which may lead to discovery
of bugs are also possible. Error messages / warnings and their
error class may also be changed.
NO NEW FEATURES, not even minor ones, except when absolutely
necessary for a bugfix.
light0.1stable05
--------
* Major fix: incomplete calls to vfs_readdir()
which could lead to incomplete symlink updates /
replication hangs.
* Minor fix: scarce race on replay EOF.
* Separated kernel from userspace build environment.
* Removed some potentially dangerous Kconfig options
if they would be set to wrong values (robustness against
accidentally producing bad kernel modules).
* Dito: some additional checks against bad main Kconfig options
(mainly for out-of-tree builds).
* Separated contrib code from maintained code.
* Added some pre-patches for newer kernels
(WIP - not yet fully tested at all combinations)
* Minor doc addition: LinuxTag 2014 presentation.
light0.1stable04
--------
* Quiet annoying error message.
* Minor readability improvements.
* Minor doc updates.
light0.1stable03
--------
* Major: fix internal aio race (could lead to memory corruption).
* Fix refcounting in trans_logger.
* Some minor fixes in module code.
* Fix 1&1-internal out-of-tree builds.
* Various minor fixes.
* Update monitoring tools / docs (German, contributed by Jörg Mann).
light0.1stable02
--------
* Fix sorting of internal data structure.
* Fix IO error propagation at replay.
light0.1stable01
--------
* Fix parallelism of logfile propagation: sometimes a secondary
could get a more recent version than the primary had on stable
storage after its crash, eventually leading to an (annoying)
split brain. Some people might take this as a feature instead
of a bug, but now the logfile transfer starts only after the
primary _knows_ that the data is successfully committed to
stable storage.
* Fix memory leaks in error path.
* Fix error propagation between client and server.
* Make string allocation fully dynamic (remove limitation).
* Fix some annoying messages.
* Fix usage output of marsadm.
* Userspace: contributed bugfix for Debian udev rules by Jörg Mann.
* Improved debugging (only for testing).
light0.1beta0.18 (feature release)
--------
* New commands marsadm view-$macroname
* New customizable macro processor
* New err/warn/inf reporting via symlinks
* Per-resource emergency mode
* Allow limiting the sync parallelism
* New flood-protected syslogging
* Some smaller improvements
* Update docs
* Update test suite
light0.1beta0.17
--------
* Major bugfix: race in logfile switchover could sometimes
lead to the wrong logfile (extremely rare to hit, but
potentially harmful).
* Disallow primary switching when some secondaries are
syncing.
* Fix logfile fetch from multiple peers.
* Fix computation of transitive closure (affected
log-purge-all, split brain detection, and many others).
* Fix incorrect emergency mode detection.
* Primaries no longer fetch logfiles (unnecessarily, only
makes a difference at concurrent split brain operations).
* Detached resources no longer fetch logfiles (unexpectedly).
* Myriads of smaller fixes.
light0.1beta0.16
--------
* Critical bugfix: "marsadm primary --force" was assumed to be given
by sysadmins only in case of emergency, when the network is down.
When given in non-emergency cases where the old primary continues
to run (/dev/mars/* being actively used and written), the
old primary could suddendly do a "logrotate" to the
new split-brain logfile produced by the new (second) primary.
Now two primaries should be able to run concurrently in split-brain
mode without mutually trashing their logfiles.
* primary --force now only works in disconnected mode, in order
to hinder unintended forceful creation of split brain during
normal operation.
* Stop fetching of logfiles behind split brain points (save space
at the target hosts - usually the data will be discarded later).
* Fixed split brain detection in userspace.
* leave-resource now waits for local actions to take place
(remote actions stay asynchronously).
* invalidate / join-resource now work only if a designated primary
exists (otherwise they would not know uniquely from whom
to start initial sync).
* Update docs, clarify scenarios intended <-> emergengy switching.
* Fixed mutual overwrite of deletion symlinks in case of racing
log-deletes spawned in parallel by cron jobs (resilience).
* Fixed races between deletion and re-erection (e.g. fresh
join-resource after leave-resource during network partitions).
* Fixed duration of network timeouts in case the network is down
(replaced non-working TCP_KEEPALIVE by explicit timeouts).
* New option --dry-run which does not really create symlinks.
* New command "delete-resource" (VERY DANGEROUS) for
forcefully destroying a resource, even when it is in use.
Intended only for _emergency_ cases when sysadmins are
desperate. Use only by hand, first run with --dry-run in order
to check what will happen!
* New command "log-purge-all" (potentially DANGEROUS) for
resolving split brain in desperate situations (cleanup of
leftovers). Only use by hand, first run with --dry-run!
* Lots of smaller imprevements / usability / readability etc.
* Update test suite.
light0.1beta0.15
--------
* Introduce write throttling of bulk writers.
* Update test suite.
light0.1beta0.14
--------
* Fix logfile transfer in case of "holes" created by
emergency mode.
* Fix "marsadm invalidate" after emergency mode had been entered.
* Fix "marsadm resize" capacity propagation from underlying LVM.
* Update test suite.
light0.1beta0.13
--------
* Fix shutdown during operation (flying requests).
* Fix unnecessary Lamport clock propagation storms.
* Improve unnecessary page cache utilisation (mapfree).
* Update test suite.
light0.1beta0.12 and earlier
--------
There was no dedicated ChangeLog. For details, look at the
commit history.
Release Policy / Software Lifecycle
-----------------------------------
New source releases are simply announced by appearance of git tags.
General Conventions
-------------------
The git tags have the following meaning:
full* for future use.
light1.0 The first number indicates the main symlink tree revision, the second number indicates the sub revision. The main symlink tree revision is only updated upon (potentially) incompatible changes. Upgrades of main revisions will always be possible, but downgrades are not automatically supported. The sub revision will indicate new releases, and they may also indicate symlink tree extersions which are both forwards and backwards compatible. It may just happen that new features are not available with elder releases :)
Example: 1.0 ff will indicate the future main production revision.
Extensions: suffixes like pre1 indicate pre-releases. Other suffixes like testing2 are reserved for future use.
Hint: you may automatically convert the MARS git tags into Debian release tags by a regex inserting a ~ after any transition from a digit to an alpha character. We just omitted the ~ because git treats it as an invalid character. The corresponding Debian tags _should_ result in the correct ordering according to the Debian guidelines. Please report a bug if not :)
light0.1beta* Internal 1&1 releases during the pilot phase. May be used by the public, but you should know that the 1.0 symlink tree revision will appear soon.
light0.0alpha* Very old prototypes; never use them. Vital feature were missing. Only for historic inspection.