mirror of https://github.com/schoebel/mars
1624 lines
67 KiB
Plaintext
1624 lines
67 KiB
Plaintext
IMPORTANT: the historic distinction between MARS Light and the future
|
|
MARS Full has been dropped. Now all versions are simply called "mars".
|
|
|
|
Old tagnames light* will remain valid, but newer names will follow the
|
|
convention s/light/mars/g (this means that the old version number counting
|
|
will be continued, only the "light" is substituted).
|
|
|
|
|
|
Meaning of stable tagnames
|
|
--------------------------
|
|
|
|
Example: mars0.1stable01:
|
|
0 = version of on-disk data structures
|
|
(only incremented when downgrades are impossible)
|
|
(not incremented on backwards-compatible upgrades)
|
|
1 = version of feature set
|
|
stable = feature set is frozen during this series
|
|
01 = bugfix revision
|
|
|
|
Example: mars0.2beta2.3:
|
|
The general idea is as before.
|
|
"beta" means that new features are roughly tested
|
|
in the lab, but not in production, so there may be
|
|
some bugs. New features may be added during
|
|
the beta phase.
|
|
|
|
Example: mars0.3alpha*:
|
|
Never use this for production. Only for historic
|
|
code inspection.
|
|
|
|
Release Conventions / Branches / Tagnames
|
|
-----------------------------------------
|
|
|
|
mars0.1 series (now EOL):
|
|
- Unstable tagnames: light0.1beta%d.%d (obsolete)
|
|
- Stable branch: mars0.1.y (obsolete)
|
|
- Stable tagnames: mars0.1stable%02d (obsolete)
|
|
|
|
mars0.1a series (stable):
|
|
New master branch. Now stable.
|
|
This branch is operational for several years on
|
|
several thousands of servers, and several petabytes
|
|
of data.
|
|
- Unstable tagnames: light0.1abeta%d (obsolete)
|
|
- Stable branch: mars0.1a.y
|
|
- Stable tagnames: mars0.1astable%02d
|
|
|
|
mars1.0 series (planned):
|
|
- Replace symlink tree by transactional status files
|
|
(future-proof)
|
|
This is required for upstream merging to the kernel.
|
|
It has further advantages, such as better scalability.
|
|
- Trying to additionally address public needs.
|
|
- Potentially for Linux kernel upstream,
|
|
- Unstable tagnames: mars1.0beta%d.%d (planned)
|
|
- Stable branch: mars1.0.y (planned)
|
|
- Stable tagnames: mars1.0stable%02d (planned)
|
|
|
|
WIP-* branches are for development and may be rebased onto anything
|
|
at any time without notice. They will disappear eventually.
|
|
Never use them for production!
|
|
|
|
*stable* branches mean the following:
|
|
|
|
- Heavily tested. Has to obey an HA SLA of 99.98% end-to-end,
|
|
including network outages and HumanError(tm) at 1&1 Ionos
|
|
ShaHoLin. Thus the _component_ SLA of MARS must be much better.
|
|
- There is always an upgrade path. Simply install the new
|
|
version.
|
|
- Rolling upgrades (temporarily different MARS kernel module
|
|
versions at primary vs secondary side) are supported.
|
|
Typically, do "rmmod mars; modprobe mars" at the secondary
|
|
side first, then handover, then do the same at the former
|
|
primary side.
|
|
Or, of course, you may combine it with (typically security-
|
|
triggered) rolling kernel reboots.
|
|
- marsadm may be upgraded independently from kernel
|
|
(during operations, best via your favorite package manager).
|
|
- Downgrade is possible *inside* of the same stable branch
|
|
series.
|
|
- Downgrade to _prior_ *stable* branches may be restricted,
|
|
or may require some extaordinary actions.
|
|
Please read this ChangeLog for details.
|
|
|
|
Example: a new future-proof internal deletion format has been
|
|
introduced in mars0.1astable88. It is off by default.
|
|
If you never activate it, you can downgrade inside of mars0.1astable*
|
|
as you like.
|
|
Only if you actually activate it, you have to obey the
|
|
downgrade instructions documented below.
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.1a:
|
|
|
|
This is the new master branch, starting January 2019.
|
|
The old stable branch mars 0.1.y is EOL,
|
|
now fully superseeded by this branch.
|
|
|
|
Based on 0.1balpha4. Merged 0.1stable.
|
|
Now stable.
|
|
Receive mainly fixes.
|
|
|
|
mars0.1astable90
|
|
* Minor improvement: more reactiveness. This release
|
|
is meant as an anchor point in case you would need
|
|
a downgrade.
|
|
|
|
mars0.1astable89
|
|
* Minor improvement: better kernel module reactiveness.
|
|
More on scalability is in the dev pipeline.
|
|
For now, use marsadm --timeout=300 or similar when
|
|
stretching the official limits (but don't stretch too
|
|
much until I have improved all relevant parts).
|
|
|
|
mars0.1astable88
|
|
* New experimental scalability feature, deactivated
|
|
by default:
|
|
New deletion method, uses the special symlink value
|
|
".deleted" as a marker for logically deleted symlinks.
|
|
This leads to a _massive_ simplification of code,
|
|
and improves scalability for future masses of
|
|
resources and/or cluster hosts.
|
|
After updating both mars.ko and marsadm, you may
|
|
activate it via marsadm option --delete-method=0
|
|
but ONLY FOR TESTING.
|
|
I will tell you when it will be stable enough for
|
|
production. Somewhen in future, it will hopefully
|
|
become the default, and eventually the old complex code
|
|
can be hopefully purged after the whole world
|
|
uses the new method.
|
|
Note: when never activated, it should not have any
|
|
influence on old-style production. Both methods
|
|
can be used in parallel on different clusters.
|
|
So you can activate it on some test clusters first.
|
|
Do not _directly_ rollback to old mars.ko and/or marsadm versions
|
|
after activation. First deactivate the feature via
|
|
--delete-method=1, then wait for a few hours until marsadm cron
|
|
has done purging. "find /mars -type l -ls" must no longer report
|
|
any "-> .deleted" values anywhere in the entire cluster.
|
|
Then you can roll back to old releases.
|
|
* Doc: small update on new marsadm command link-purge-all.
|
|
|
|
mars0.1astable87
|
|
* Minor fix: unnecessary split brain could result from a race
|
|
between handover and log-rotate / cron.
|
|
|
|
mars0.1astable86
|
|
* Minor improvement: speedup metadata traffic avoiding
|
|
some O(n^2) internal algorithms.
|
|
|
|
mars0.1astable85
|
|
* Minor improvement: avoid ssh / rsync at join-resource.
|
|
Only when ordinary communication over over port 7777 (default)
|
|
fails, fallback to ssh connections.
|
|
* Minor marsadm speedup by avoidance of unnecessary
|
|
sleep times.
|
|
* Minor fix: ensure that primary --force works even when a
|
|
logfile was truncated forcefully.
|
|
* Minor fix: use-after-free reported by KASAN, only
|
|
triggerable with a future development version, not
|
|
observed with the current stable version.
|
|
I include it here for safeguarding.
|
|
* Minor doc updates. Explain fundamental requirements for
|
|
geo-redundancy, and some background on cost comparisons.
|
|
|
|
mars0.1astable84
|
|
* Major improvement: try to automatically self-repair
|
|
any defective logfile at secondaries, by fetching again
|
|
from primary.
|
|
This can only work when the version at the primary is
|
|
healthy.
|
|
When successful, "invalidate" is no longer necessary.
|
|
|
|
mars0.1astable83
|
|
* Major improvement: new marsadm option --parallel can drastically
|
|
speed up handover, provided that the rest of your infrastructure
|
|
can deal with parallelism. Several cluster managers are
|
|
known to have problems with that. So be careful, do not
|
|
blindly use this feature!
|
|
Future releases will try to improve the systemd interface
|
|
such that parallelism is possible without problems.
|
|
* Doc updates: describe dimensioning of storage networks
|
|
and its realtime behaviour, at the background of Kirchhoff's
|
|
law. Neglecting this may lead to much higher cost than
|
|
necessary, and may lead to a variety of operational problems,
|
|
up to failures of projects.
|
|
Also, working with wrong definitions of Cloud Storage can lead
|
|
to a similar effect.
|
|
Recommended reading!
|
|
|
|
mars0.1astable82
|
|
* Major improvement: the mars_main kernel thread is now working
|
|
non-blocking in practically all relevant cases. Some more cases
|
|
will be addressed in future.
|
|
Testing with 32 resources in parallel is now working, and even
|
|
64 resources appear to work in the lab, although somewhat slower
|
|
(on typical server iron).
|
|
"marsadm primary all" is now much faster.
|
|
More future improvements to come. Currently, "marsadm primary all"
|
|
uses an internal barrier synchronisation model, which may lead
|
|
to unnecessary waiting time for faster resources. There are
|
|
plans to address this in future releases.
|
|
ATTENTION! You will need NEW VERSIONS of your pre-patch.
|
|
This will automatically adjust /proc/sys/fs/aio-max-nr to higher
|
|
values when needed. If you don't use the new pre-patch, you will
|
|
need to tune /proc/sys/fs/aio-max-nr yourself. Otherwise
|
|
you will get serious operational deadlocks due to virtual
|
|
resource limitations, even with only 32 resources, but a
|
|
higher number of replicas.
|
|
Since there is no practical experience yet (the biggest known
|
|
productive installation uses only 24 resources), I do not yet
|
|
increase the official limits as documented in the appendix of
|
|
mars-user-manual.pdf.
|
|
Although very slow due to some O(n^2) algorithms, 128 resources
|
|
are just surviving now, without bombing or deadlocking, but are
|
|
not yet really usable.
|
|
Therefore, do not try to stretch the official limits too much.
|
|
Please report any success stories (or problems) in case you
|
|
are using some more resources _productively_.
|
|
* Minor doc improvements. New slides from LCA2020 added.
|
|
|
|
mars0.1astable81
|
|
* Minor doc improvement: explain why running MARS inside of VMs
|
|
is a bad idea. Explain fully managed geo-location transparency
|
|
of VMs.
|
|
|
|
mars0.1astable80
|
|
* Compatibility up to kernels <= 4.14.
|
|
Attention! There is a bug in upstream kernels >= 4.11, leading
|
|
to an endless loop in kernel mode under certain preconditions.
|
|
The fix is in pre-patches/vanilla-4.14/0001-sched-wait-fix-*
|
|
If you _forget_ to apply this fix for _affected_ kernels, you may
|
|
get "operational fun" at the wrong moment: ordinary operations
|
|
will likely be unaffected, but a _silent_ network outage at the
|
|
wrong moment (race condition) may hang up your kernel at the
|
|
secondary site, just in the moment when you probably want to do
|
|
a failover.
|
|
LTS kernels 4.9 and earlier are not affected by the bug, although
|
|
potentially present also there, but it is a _masked_ (sleeping)
|
|
bug there.
|
|
I already submitted the fix to LKML, but unfortunately has been
|
|
ignored up to now.
|
|
|
|
mars0.1astable79
|
|
* Critical fix: in a multiple-failure scenario which is hard
|
|
to reach, and then acting badly by disregarding
|
|
heavy warnings from marsadm and from mars-user-manual.pdf,
|
|
data consistency could be violated. Detected by testing
|
|
(the situation has not been observed in practice up to now).
|
|
When unsure, better update to this fixed version.
|
|
* Minor fix: in a scarce corner case plus an additional
|
|
scarce race, primary handover could hang.
|
|
* Major systemd interface fixes and improvements:
|
|
- When handover fails due to failed systemd stopping at
|
|
the old primary (e.g. hanging umount etc), the application
|
|
stack will be automatically restarted before the handover
|
|
operation reports timeout. The idea is to keep your
|
|
applications running whenever possible.
|
|
- New commands marsadm set-systemd-want and get-systemd-want
|
|
for a temporary shutdown of the systemd unit stack.
|
|
This is useful e.g. for performing an fsck.
|
|
- Implemented transitive closure of indirectly referenced
|
|
further systemd units.
|
|
- Attach / detach now automatically starts / stops the
|
|
systemd unit stack.
|
|
- Improved reliability of systemd handover.
|
|
- Fixed many bugs in the systemd template macro processor.
|
|
- Updated doc accordingly.
|
|
|
|
mars0.1astable78
|
|
* Major or minor fix: memory leak, triggered under scarce conditions.
|
|
Observed cases were a few kilobytes. However, it could accumulate
|
|
over a very long time. When unsure, better update to this version.
|
|
* Minor usability: report each resource size.
|
|
|
|
mars0.1astable77
|
|
* Major doc update: the old mars-manual.pdf has been split into
|
|
- mars-user-manual.pdf (for sysadmins)
|
|
- mars-architecture-guide.pdf (for managers and architects)
|
|
- mars-for-kernel-developers.lyx (unfinished)
|
|
- football-user-manual.lyx
|
|
The first two manuals have been heavily rewritten and
|
|
extended!
|
|
* Minor fix: after primary crash without failover, the secondaries
|
|
could get stuck because a version symlink was forgotten to
|
|
update under scarce preconditions.
|
|
* Minor improvement: emergency space calculation is now more
|
|
accurate.
|
|
* Minor usability: hint when marsadm resize would be possible.
|
|
* Several minor cosmetic improvements.
|
|
|
|
mars0.1astable76
|
|
* Major fix: when the primary was dead and the
|
|
secondary had an incomplete logfile which was
|
|
not recognized as being damaged, "primary --force"
|
|
did not always work under all circumstances.
|
|
* Minor fix: some config information was not
|
|
replicated throughout the cluster.
|
|
Ordinary users were typically not affected.
|
|
* Minor improvement: marsadm view now shows
|
|
the replication degree [$x/$y] at each individual
|
|
resource.
|
|
* Added slides from FrOSCon2019.
|
|
|
|
mars0.1astable75
|
|
* Major fix, only relevant for a scarce corner case:
|
|
When overflowing the kernel fscache with gigabytes of
|
|
data, and when a few more weird preconditions were met,
|
|
it was possible to potentially eat up the whole kernel
|
|
memory and to trigger OOM.
|
|
Notice: depending on kernel version, and depending on various
|
|
overload scenarios, you may trigger OOM anyway, independently
|
|
from MARS.
|
|
* Minor fix: marsadm now is reporting the amount of
|
|
Writeback data (as necessary for the Recovery phase after
|
|
a crash) more precisely.
|
|
* Minor improvement: speedup IOPS by better internal
|
|
hash dimensioning.
|
|
|
|
mars0.1astable74
|
|
* Full merge of EOL branch mars0.1.stable74,
|
|
which was the last stable release in EOL branch
|
|
mars0.1.y.
|
|
* Major fix, only relevant for a corner case:
|
|
Writeback made no human-visible progress under
|
|
multiple weird preconditions.
|
|
* Minor fix: ssh connections should be more robust
|
|
when clumsy firewalls are leading to ssh hangs.
|
|
* Minor usability improvement: marsadm view shows
|
|
more fancy details on logfile numbers.
|
|
* Minor speedups in internal infrastructure.
|
|
* Football subproject: update to Football-2.0
|
|
|
|
mars0.1astable73 (merged from mars0.1stable73)
|
|
* Critical fix, only relevant for kernels >= 4.2.x:
|
|
NULL deref occurs systematically when more than 64
|
|
file handles are being allocated.
|
|
There is already an upstream bugfix in linux-next
|
|
(missing initializer for resize_wait in fs/file.c).
|
|
Since this fix is missing in many LTS and distro kernels
|
|
(at the moment), I added a workaround in MARS.
|
|
Recommendation: anyone operating MARS on newer kernels
|
|
should update to mars0.1astable73 for safe operations.
|
|
Don't leave this unfixed. It can explode at the worst
|
|
moment, and restoring operations may only be possible
|
|
by completely giving up a secondary host, or with a fix.
|
|
|
|
mars0.1astable72 (merged from mars0.1stable72)
|
|
* Minor fix: writeback improved in a corner case.
|
|
* Minor improvement: display WriteBack data amount in
|
|
marsadm view.
|
|
* Major doc improvement: describe IO performance tuning.
|
|
|
|
mars0.1astable71 (merged from mars0.1stable71)
|
|
* Major fix: writeback at the primary was unnecessarily
|
|
slow at certain situations.
|
|
|
|
mars0.1astable70 (merged from mars0.1stable70)
|
|
* Critical fix: a few upper-layer kernel components are
|
|
allocating struct bio on the stack. This led to stack memory
|
|
corruption. If you ever had this problem, you certainly have
|
|
noticed it ;) Thus it should not have affected your data.
|
|
Unfortunately, I got no bug reports about this for several years.
|
|
Discovered when testing compatibility to very new kernels,
|
|
and now hopefully fixed.
|
|
* Major fixes: the systemd interface was not in a mature state.
|
|
Now improved a lot. More improvements are likely to follow
|
|
in the next months.
|
|
* Minor clarification: build for ancient kernel 2.6.32 was broken.
|
|
Fixing the build was no problem, but then the resulting kernel
|
|
deadlocked in certain situations (sb_mount mutex and sisters).
|
|
The reason is that stacking of filesystem instances (like
|
|
/vol/mydata relying on IO to /mars) is a pain in the very old
|
|
kernel architecture.
|
|
Any upstream kernel before 3.16 is EOL right now. Nevertheless,
|
|
I am officially supporting 3.2 at the moment, and have tested it.
|
|
Anyway, productive use of ancient kernels is not
|
|
recommended, for various reasons.
|
|
Notice that you also need old gcc versions for building such
|
|
EOL kernels.
|
|
Thus I decided to remove support for 2.6.32 officially.
|
|
If somebody needs it _really_, please contact me.
|
|
|
|
mars0.1astable69 (merged from mars0.1stable69)
|
|
* Major improvement: compatibility to upstream kernel 4.9.x.
|
|
|
|
mars0.1astable68 (merged from mars0.1stable68)
|
|
* Minor fix: sometimes sync was advancing only slowly.
|
|
* Minor fix: in extremly rare cases and under further conditions,
|
|
detach could hang due to a race.
|
|
Workaround was possible by re-attaching.
|
|
* Minor improvement: /dev/mars/mydata now disappears only after
|
|
writeback has finished. Although the old behaviour was correct,
|
|
certain userspace tool could have erronously concluded that
|
|
the primary has finished working. The new bevaiour is
|
|
hopefully more like to user expectance.
|
|
* Minor improvement: propagate physical and logical sector
|
|
sizes from the underlying disk to /dev/mars/mydata.
|
|
This can affects mkfs and other tools for making better
|
|
decisions about their internal parameters.
|
|
* Minor safeguard: disallow manual --ignore-sync override
|
|
when the target primary is inconsistent, only relevant
|
|
for (non-existent) sysadmins who absolutely don't know what
|
|
they are doing when they are combining this with --force.
|
|
Systemadmins who really know what they are doing can use
|
|
fake-sync in front of it, and then they are explicitly stating
|
|
once again that they really want to force a defective system,
|
|
and that they really know the fact that it is defective.
|
|
* Minor improvement: additional warning when network connections
|
|
are interrupted (asymmetrically), such as by mis-configuration
|
|
of network interfaces / routing / firewall rules / etc.
|
|
|
|
mars0.1astable67 (merged from mars0.1stable67)
|
|
* Minor fix: don't unnecessarily alert sysadmins when no systemd
|
|
unit files are installed.
|
|
* Minor doc update: new slides from LCA2019, updated old
|
|
slides from FrOSCon2018.
|
|
* Minor doc update: describe some more use cases, add some
|
|
advice for managers.
|
|
|
|
mars0.1astable66.
|
|
* Merge mars0.1stable66. In detail:
|
|
* Critical fix, only relevant for kernels 4.3 to 4.4:
|
|
Due to a forgotten adaptation to newer kernels,
|
|
some userspace tools like xfs_repair could read/write
|
|
wrong data upon _large_ IO requests, and/or kernel memory
|
|
corruption could occur. Kernel-level filesystems
|
|
are typically _not_ affected because they typically use 4k
|
|
pages at maximum.
|
|
If you are operating such a kernel, please upgrade to
|
|
minimize any risks. You probably want userspace tools like
|
|
xfs_repair to not crash your kernel ;)
|
|
The problem was reproducibly detected at lab regression testing,
|
|
_before_ updating a big installation from kernel 3.16 to 4.4.
|
|
It did not show up with the old kernel.
|
|
Notice: kernels >4.6 are not yet supported at the moment,
|
|
but work on them is likely being continued during the next
|
|
months. Stay tuned.
|
|
* Minor doc updates.
|
|
|
|
mars0.1abeta18
|
|
* Merge mars0.1stable65.
|
|
|
|
mars0.1abeta17
|
|
* Merge mars0.1stable64.
|
|
* Fix compiler warning at certain kernel versions.
|
|
|
|
mars0.1abeta16
|
|
* Merge mars0.1stable63.
|
|
|
|
mars0.1abeta15
|
|
* Merge mars0.1stable62.
|
|
|
|
mars0.1abeta14
|
|
* Merge mars0.1stable61.
|
|
|
|
mars0.1abeta13
|
|
* Minor feature: marsadm takes comma-separated list of
|
|
resource names in place of "all".
|
|
* Merge mars0.1stable60.
|
|
|
|
mars0.1abeta12
|
|
* Merge mars0.1stable59.
|
|
|
|
mars0.1abeta11
|
|
* Merge mars0.1stable58.
|
|
|
|
mars0.1abeta10
|
|
* Make IP_TOS compile-time configurable.
|
|
* Update doc on IP_TOS.
|
|
|
|
mars0.1abeta9
|
|
* Major feature: lowlevel TCP tuning, separately for traffic
|
|
types MARS_TRAFFIC_META (default port 7777),
|
|
and MARS_TRAFFIC_REPLICATION (default port 7778),
|
|
and MARS_TRAFFIC_SYNC (default port 7779).
|
|
* Merge mars0.1stable57.
|
|
|
|
mars0.1abeta8
|
|
* Merge mars0.1stable56.
|
|
|
|
mars0.1abeta7
|
|
* Merge mars0.1stable55.
|
|
|
|
mars0.1abeta6
|
|
* Merge mars0.1stable54.
|
|
|
|
mars0.1abeta5
|
|
* Merge mars0.1stable53.
|
|
|
|
mars0.1abeta4
|
|
* Merge mars0.1stable52.
|
|
|
|
mars0.1abeta3
|
|
* Merge mars0.1stable51.
|
|
|
|
mars0.1abeta2
|
|
* Merge mars0.1stable50.
|
|
* Silence annoying false-positive network interruption messages.
|
|
|
|
mars0.1abeta1
|
|
* Merge mars0.1stable49.
|
|
* Several smaller fixes.
|
|
|
|
mars0.1abeta0
|
|
Forked off from 0.1balpha4.
|
|
Merge 0.1stable48 (in several intermediate steps).
|
|
Some infrastructure for version detection.
|
|
Backport of selected fixes from branch 0.1b.y.
|
|
Add marsadm split-cluster.
|
|
|
|
-----------------------------------
|
|
Changelog for the deprecated series 0.1b:
|
|
(only the part which has been merged with branch mars0.1a)
|
|
(notice that there were a few more historic branches which
|
|
were not really usable, and never went into production)
|
|
|
|
mars0.1balpha4
|
|
--------
|
|
* First improvements for scalability to thousands of nodes.
|
|
Not yet tested with really huge masses of nodes, only
|
|
with relatively small clusters.
|
|
* Merge fixes from mars0.1stable41 (see there)
|
|
* Doc update on socket bundling.
|
|
|
|
mars0.1balpha3.4
|
|
--------
|
|
* Merge fix from mars0.1stable40 (see there)
|
|
|
|
mars0.1balpha3.3
|
|
--------
|
|
* Merge fixes from mars0.1stable39
|
|
* Major fix: copy was sometimes hanging.
|
|
* Minor fix: unnecessary delay of metadata propagation.
|
|
* Performance improvements / bottleneck enhancenemts:
|
|
- Lamport clock
|
|
- Network
|
|
- md5 checksumming
|
|
* Userspace: faster logfile deletion via cron job.
|
|
|
|
mars0.1balpha3.2
|
|
--------
|
|
* Merge mars0.1stable38: now compiles without pre-patch
|
|
on certain kernel versions. Please read ChangeLog there.
|
|
|
|
mars0.1balpha3.1
|
|
--------
|
|
* Minor fix: deadlock on termination of copy thread.
|
|
|
|
mars0.1balpha3
|
|
--------
|
|
* Some tuning (more to come later):
|
|
* Speedup network by better corking.
|
|
* New scalable Lamport clock implementation.
|
|
|
|
mars0.1balpha2
|
|
--------
|
|
* Socket bundling (cherry-picked from mars0.2.y).
|
|
* Speedup copy processes (sync, logfile transfer).
|
|
* Speedup bio and md5 checksumming.
|
|
|
|
mars0.1balpha1
|
|
--------
|
|
* First improvements for scalability to more than 10 resources
|
|
per node. Already tested with 128 resources on a pair of nodes.
|
|
More improvements to come later.
|
|
No functional changes otherwise (from a sysadmin perspective).
|
|
Rollback to stable series 0.1 should be possible at
|
|
any time.
|
|
* Include fix from 0.1stable37.
|
|
|
|
mars0.1balpha0
|
|
--------
|
|
* Minor fix: the 1&1 specific feature set-sync-pref-list was
|
|
not used at all. Without it, the limitation feature for the sync
|
|
parallelism degree did not work correctly (without leading to harm,
|
|
other than optimum sync throughput / performance).
|
|
Removed the old _obsolete_ feature (for formal reasons,
|
|
this cannot be done in the 0.1stable branch).
|
|
Re-implemnented the feature in a very simple form,
|
|
which is hopefully "obviously correct" now.
|
|
* Minor feature: please use "marsadm cron" as a fool-proof short form,
|
|
in particular at cron jobs.
|
|
|
|
-----------------------------------
|
|
Changelog for series 0.1:
|
|
|
|
Attention! This branch is now EOL.
|
|
Everything has been merged into branch mars0.1a.y which
|
|
is also the master branch.
|
|
PLEASE UPGRADE to the new branch.
|
|
Upgrade is easy: just rollout the new marsadm version,
|
|
install the new kernel modules, and load them where possible.
|
|
Mixed operation of different versions is no problem,
|
|
but is of course not the desired state, so keep this period
|
|
as short as possible.
|
|
Rollback is also easy.
|
|
|
|
Motivation: branch 0.1a is productive for several years at 1&1.
|
|
Experiences: now runs provably better than 0.1.y with
|
|
better performance, smoother, etc.
|
|
|
|
mars0.1stable74 (last stable release in branch mars0.1.y)
|
|
* Major fix, only relevant for a corner case:
|
|
Writeback made no human-visible progress under
|
|
multiple weird preconditions.
|
|
* Minor usability improvement: marsadm view shows
|
|
more fancy details on logfile numbers.
|
|
|
|
mars0.1stable73
|
|
* Critical fix, only relevant for kernels >= 4.2.x:
|
|
NULL deref occurs systematically when more than 64
|
|
file handles are being allocated.
|
|
There is already an upstream bugfix in linux-next
|
|
(missing initializer for resize_wait in fs/file.c).
|
|
Since this fix is missing in many LTS and distro kernels
|
|
(at the moment), I added a workaround in MARS.
|
|
Recommendation: anyone operating MARS on newer kernels
|
|
should update to mars0.1astable73 for safe operations.
|
|
Don't leave this unfixed. It can explode at the worst
|
|
moment, and restoring operations may only be possible
|
|
by completely giving up a secondary host, or with a fix.
|
|
|
|
mars0.1stable72
|
|
* Minor fix: writeback improved in a corner case.
|
|
* Minor improvement: display WriteBack data amount in
|
|
marsadm view.
|
|
* Major doc improvement: describe IO performance tuning.
|
|
|
|
mars0.1stable71
|
|
* Major fix: writeback at the primary was unnecessarily
|
|
slow at certain situations.
|
|
|
|
mars0.1stable70
|
|
* Critical fix: a few upper-layer kernel components are
|
|
allocating struct bio on the stack. This led to stack memory
|
|
corruption. If you ever had this problem, you certainly have
|
|
noticed it ;) Thus it should not have affected your data.
|
|
Unfortunately, I got no bug reports about this for several years.
|
|
Discovered when testing compatibility to very new kernels,
|
|
and now hopefully fixed.
|
|
* Major fixes: the systemd interface was not in a mature state.
|
|
Now improved a lot. More improvements are likely to follow
|
|
in the next months.
|
|
* Minor clarification: build for ancient kernel 2.6.32 was broken.
|
|
Fixing the build was no problem, but then the resulting kernel
|
|
deadlocked in certain situations (sb_mount mutex and sisters).
|
|
The reason is that stacking of filesystem instances (like
|
|
/vol/mydata relying on IO to /mars) is a pain in the very old
|
|
kernel architecture.
|
|
Any upstream kernel before 3.16 is EOL right now. Nevertheless,
|
|
I am officially supporting 3.2 at the moment, and have tested it.
|
|
Anyway, productive use of ancient kernels is not
|
|
recommended, for various reasons.
|
|
Notice that you also need old gcc versions for building such
|
|
EOL kernels.
|
|
Thus I decided to remove support for 2.6.32 officially.
|
|
If somebody needs it _really_, please contact me.
|
|
|
|
mars0.1stable69
|
|
* Major improvement: compatibility to upstream kernel 4.9.x.
|
|
|
|
mars0.1stable68
|
|
* Minor fix: in extremly rare cases and under further conditions,
|
|
detach could hang due to a race.
|
|
Workaround was possible by re-attaching.
|
|
* Minor improvement: /dev/mars/mydata now disappears only after
|
|
writeback has finished. Although the old behaviour was correct,
|
|
certain userspace tool could have erronously concluded that
|
|
the primary has finished working. The new bevaiour is
|
|
hopefully more like to user expectance.
|
|
* Minor improvement: propagate physical and logical sector
|
|
sizes from the underlying disk to /dev/mars/mydata.
|
|
This can affects mkfs and other tools for making better
|
|
decisions about their internal parameters.
|
|
* Minor safeguard: disallow manual --ignore-sync override
|
|
when the target primary is inconsistent, only relevant
|
|
for (non-existent) sysadmins who absolutely don't know what
|
|
they are doing when they are combining this with --force.
|
|
Systemadmins who really know what they are doing can use
|
|
fake-sync in front of it, and then they are explicitly stating
|
|
once again that they really want to force a defective system,
|
|
and that they really know the fact that it is defective.
|
|
* Minor improvement: additional warning when network connections
|
|
are interrupted (asymmetrically), such as by mis-configuration
|
|
of network interfaces / routing / firewall rules / etc.
|
|
|
|
mars0.1stable67
|
|
* Minor fix: don't unnecessarily alert sysadmins when no systemd
|
|
unit files are installed.
|
|
* Minor doc update: new slides from LCA2019, updated old
|
|
slides from FrOSCon2018.
|
|
* Minor doc update: describe some more use cases, add some
|
|
advice for managers.
|
|
|
|
mars0.1stable66
|
|
* Critical fix, only relevant for kernels 4.3 to 4.4:
|
|
Due to a forgotten adaptation to newer kernels,
|
|
some userspace tools like xfs_repair could read/write
|
|
wrong data upon _large_ IO requests, and/or kernel memory
|
|
corruption could occur. Kernel-level filesystems
|
|
are typically _not_ affected because they typically use 4k
|
|
pages at maximum.
|
|
If you are operating such a kernel, please upgrade to
|
|
minimize any risks. You probably want userspace tools like
|
|
xfs_repair to not crash your kernel ;)
|
|
The problem was reproducibly detected at lab regression testing,
|
|
_before_ updating a big installation from kernel 3.16 to 4.4.
|
|
It did not show up with the old kernel.
|
|
Notice: kernels >4.6 are not yet supported at the moment,
|
|
but work on them is likely being continued during the next
|
|
months. Stay tuned.
|
|
* Minor doc updates.
|
|
|
|
mars0.1stable65
|
|
* Major fix, only observed during KASAN debugging:
|
|
Use-after-free which appears to splat only at Football
|
|
during final deletion of resources. Never observed at production.
|
|
Update if you are very cautious.
|
|
* A few minor fixes, not relevant for production.
|
|
* Minor doc improvements.
|
|
|
|
mars0.1stable64
|
|
* Major regression: split-brain detection did not display
|
|
correctly.
|
|
* Minor fix: rare race conditon on O_NONBLOCK networking.
|
|
Only observed during testing with kernel 4.9 (sorry, _all_ the
|
|
adaptations are not yet ready for release, but it is making
|
|
progress now).
|
|
I am not sure whether this bug could also trigger with kernel
|
|
4.4 or earlier, therefore I am releasing the fix beforehand.
|
|
* Minor doc architectural explanations.
|
|
|
|
mars0.1stable63
|
|
* Minor fix: when compiling for some newer kernels (only there),
|
|
schedule() could be called during wait for some condition,
|
|
worsening performance unnecessarily.
|
|
* Minor improvement: starting join-resource in batches
|
|
was slow because each was waiting for cluster communication.
|
|
Use a manual "marsadm wait-cluster" before starting batches
|
|
of join-resource operations.
|
|
* Doc: some clarifications on BigCluster scalability behaviour.
|
|
|
|
mars0.1stable62
|
|
* Minor fix: race between join-resource and log-rotate.
|
|
* Minor fix: report split brain logfile amount only when
|
|
actually detectable.
|
|
* Minor improvement: shift annoying error message over
|
|
to Orphan state detection.
|
|
* Football: update to Football-2.0-RC12
|
|
* doc: some updates.
|
|
|
|
mars0.1stable61
|
|
* Minor fix: in very rare cases where some symlinks are missing,
|
|
don't abort in try_to_avoid_splitbrain().
|
|
* Minor improvement: better human-readable numbers.
|
|
* Minor doc: more on asynchronous background operations.
|
|
|
|
mars0.1stable60
|
|
* Major improvement: new option --ignore-sync allows primary
|
|
Handover without --force even when some sync is running
|
|
somewhere. Any running syncs will restart from scratch
|
|
(which might take some time, depending on LV size and
|
|
many more factors like the network).
|
|
* Minor fix: split-cluster did not work correctly when no
|
|
resources were existing anymore, at all.
|
|
* Doc: major update. More explanation on CAP theorem, and
|
|
on differences / commonalities with DRBD.
|
|
|
|
mars0.1stable59
|
|
* Major fix: "marsadm up" did not work when sync could not
|
|
be started. Now does "best effort".
|
|
* Minor fix: marsadm system interface was active when
|
|
not activated.
|
|
* Minor usability improvement: new repliaction state "Orphaned"
|
|
indicates that logfiles are missing, and thus replication
|
|
is stuck.
|
|
|
|
mars0.1stable58
|
|
* Major fix for Football / split-cluster: for safety,
|
|
cron deletes some blocking left-overs.
|
|
* Major fix at _asymmetric_ split-cluster: ignore hindering
|
|
abort condition.
|
|
* Minor fix: not all internal systemd links were removed upon
|
|
marsadm set-systemd-unit mydata "".
|
|
* Doc: Football.
|
|
* Doc: architectural treatment of centralized storage.
|
|
|
|
mars0.1stable57
|
|
* Minor fix: silly deadlock upon scarce race at logging.
|
|
Without debug logging, probability should be extremely low
|
|
(only observed at rmmod).
|
|
* Added initial version of systemd templates (for future backward
|
|
compatibility with branch 0.1a).
|
|
* Doc: systemd templates.
|
|
|
|
mars0.1stable56
|
|
* Minor fix: split-cluster could unnecessarily abort
|
|
in some cases.
|
|
* Added initial version of submodule "football".
|
|
More updates will follow.
|
|
|
|
mars0.1stable55
|
|
* Major fix: unnecessary / false positive split brain could
|
|
occur after the primary logfile was truncated, e.g. at crashes
|
|
or disk damages. Systematic triggering in masses was possible
|
|
by keeping /dev/mars/mydata mounted while _forcing_
|
|
a reboot _during_ (!) its umount (e.g. by patching the
|
|
"reboot" command and/or patching systemd dependencies
|
|
or similar to provoke this regularly).
|
|
|
|
mars0.1stable54
|
|
* Major fix, only relevant for massive execution of
|
|
leave-resource, e.g. when playing Football (Tetris)
|
|
games:
|
|
When non-versioned symlinks were eventually deleted,
|
|
later re-creation did not always succeed.
|
|
Fixed by an new generic timestamp ordering approach.
|
|
* Stability client-side fixes (could lead to stacktraces),
|
|
backported from branch 0.1a (were forgotten long ago).
|
|
* Major doc update: new section on reliability of
|
|
storage architectures.
|
|
This explains why many BigCluster systems don't work as
|
|
expected.
|
|
Backed up by graphs and by mathematical formulas.
|
|
A must-read for anyone working in the storage area!
|
|
|
|
mars0.1stable53
|
|
* Major fix: rare corner case of split brain was not displayed
|
|
correctly.
|
|
* Major usablilty: show amount of data during split brain.
|
|
This hints the sysadmins about the size of future data loss
|
|
at later split brain resolution.
|
|
* Minor workaround: crashed /mars filesystems may contain
|
|
completely damaged symlinks with timestamps in the far
|
|
distant future, e.g. year >3000 etc. Safeguard unusual
|
|
Lamport time slips by ignoring implausible values.
|
|
* Major improvement: internal locking overhead reduced.
|
|
* Minor improvment: reduce message trigger overhead.
|
|
* Several minor improvements.
|
|
* Doc updates.
|
|
|
|
mars0.1stable52
|
|
* Major contrib: new example scripts for MARS background data
|
|
migration during production. 1&1-specific code in a separate
|
|
plugin. You can write your own plugins for adaptation to
|
|
your needs.
|
|
* Minor fix: limit the size of the writeback buffer by the
|
|
rest space in /mars. This is only relevant when
|
|
/mars is dimensioned smaller than RAM (which should
|
|
never be the case in production systems, but might happen
|
|
accidentally or for testing).
|
|
Analogously, limit the maximum logfile size.
|
|
* Minor fix: prevent creation of many tiny logfiles over time
|
|
when secondaries are not catching up.
|
|
The default threshold is a minimum of 5 GB size when more
|
|
than 10 logfiles are already present.
|
|
* Minor fix: cleanup old internal .tmp-* symlinks which might
|
|
remain as leftovers when marsadm is dying at the wrong
|
|
moment.
|
|
* Minor improvement: don't run O(n) mapfree under spinlock.
|
|
More speed improvements under preparation; will result in O(k).
|
|
* Some more minor improvements.
|
|
|
|
mars0.1stable51
|
|
* Minor fix: don't abort log-delete-all too early when there
|
|
are holes in the deletion sequence numbers.
|
|
* Backport of marsadm cron from branch 0.1a, in order to systematically
|
|
support mixed operation of different MARS versions in bigger installations
|
|
(avoid confusion at junior sysadmins and at monitoring staff).
|
|
* Rectified the semantics of log-delete, which now does the same as
|
|
log-delete-all. Single deletion is only needed for testing, and
|
|
has been renamed to log-delete-one.
|
|
Leaving the old semantics would have been an operational risk
|
|
when junior sysadmins or 24/7 surveillance people are not carefully
|
|
looking at the details of semantics. Now everything is hopefully
|
|
as everybody not familiar with MARS would naively assume.
|
|
* Doc update.
|
|
|
|
mars0.1stable50
|
|
* Major usability improvement (backport from 0.1a):
|
|
marsadm shows number of replicas of each resource, out of total number
|
|
of cluster members. Example: [2/4]
|
|
* Minor fix: automatically cleanup internal backups produced by the new
|
|
merge-cluster / split-cluster after 1 week.
|
|
* Minor fix: also cleanup some new symlink types replicated through
|
|
the network when running asymmetric clusters with mixed branches
|
|
0.1 and 0.1a.
|
|
* Minor annoyance: silence split-cluster error message when no
|
|
resources are present.
|
|
|
|
mars0.1stable49
|
|
* Backports of new marsadm commands merge-cluster and split-cluster.
|
|
The new functionality is needed for background migration of resources.
|
|
Please be aware that this branch has not been constructed for
|
|
scalability in the dimension of #nodes, so don't merge too many
|
|
nodes and use split-cluster after each background migration.
|
|
Better scalability is / will be addressed at the 0.1a and 0.1b
|
|
branches. However, currently they are not yet stable.
|
|
No changes at the kernel module (besides some bug fixes);
|
|
this is solely done at userspace level.
|
|
The new userspace-level commands should have almost no intersection
|
|
with (and therefore no impact onto) other parts of this well-proven
|
|
stable branch.
|
|
* Backports of new wait-cluster implementation.
|
|
This avoids irritating messages after split-cluster.
|
|
|
|
mars0.1stable48
|
|
* Critical fix: DDOS-like attacks at the MARS ports (or similar caused
|
|
by bugs / misbehaviour) are prevented by configurable limits
|
|
/proc/sys/mars/handler_dent_limit and
|
|
/proc/sys/mars/handler_limit .
|
|
* Critical safeguard: when the network is interruted for a long time
|
|
while the log-rotate frequency is very high and a lot of resources
|
|
(exceeding the official limits as documented) had been used, masses of
|
|
deletion links may accumulate in /mars/todo/. First, already
|
|
existing deletions to the same targets are reused now.
|
|
Second, a maximum limit (of currently 512 entries)
|
|
is enforced, and a warning is spit when too many deletions
|
|
are accumulated over time.
|
|
* Minor fix: earlier detection of socket hangups.
|
|
|
|
mars0.1stable47
|
|
* Critical fix: leave-cluster could lead to deadlocks, also
|
|
on remote nodes.
|
|
* Contrib: mass automation script (unmaintained).
|
|
|
|
mars0.1stable46
|
|
* Major fix: bugfix from 0.1stable44 (state "Detached" was
|
|
reported too early) was incorrect, now fixed.
|
|
* Minor fix: display of host lists in special case of
|
|
create-resource was misleading.
|
|
|
|
mars0.1stable45
|
|
* Major fix: on secondaries, orphane files and symlinks were
|
|
sometimes created in /mars and could accumulate over a long time.
|
|
After several months or years of operation, the /mars directory
|
|
could appear being full via "df /mars", but "du -s /mars" was
|
|
not reporting the hidden space allocation.
|
|
Also, upon remount or reboot the cleanup of orphane files
|
|
could take a rather long time. Workaround was possible by
|
|
"rmmod mars; umount /mars; mount /mars; modprobe mars".
|
|
Fixed by regularly pruning the dentry cache of the /mars
|
|
filesystem.
|
|
|
|
mars0.1stable44
|
|
--------
|
|
* Major fix: state "Detached" was reported too early,
|
|
before the underlying disk was really closed.
|
|
* Doc: new updated slides from FrOSCon 2017.
|
|
New architectural comparison with Big Storage Clusters
|
|
in terms of scalability, reliability and costs.
|
|
|
|
mars0.1stable43
|
|
--------
|
|
* Major fix, only relevant for k >= 3 replicas:
|
|
Logfile fetch did not switch over to another alive peer
|
|
upon _speicfic_ network problems with the _current_
|
|
peer. As a consequence, an unaffected replica could
|
|
hang. Workarould was possible by pause-fetch /
|
|
resume-fetch or by fixing the network :)
|
|
|
|
mars0.1stable42
|
|
--------
|
|
* Minor fix: ssh IPs and port numbers are automatically probed
|
|
on join-cluster.
|
|
* Minor compatibility to branch mars.1b.y: join-resource
|
|
does additional rsync for safety.
|
|
* Minor fix: rate display was not going down to 0
|
|
on switchoff or long pauses.
|
|
* Minor improvement: show peers in internal debugging info.
|
|
|
|
mars0.1stable41
|
|
--------
|
|
* Minor fix: a scarce race could lead to an unnecessary split brain
|
|
when umounting _after_ role transition from primary to secondary.
|
|
|
|
mars0.1stable40
|
|
--------
|
|
* Potentially critical fix: on very fast machines, and with
|
|
extremely low probability, a race in AIO could lead to a kernel
|
|
page fault.
|
|
For maximum safety, update to this version is recommended.
|
|
|
|
mars0.1stable39
|
|
--------
|
|
* Minor fix: hangs of logfile updates. Found by stress-testing
|
|
on fast hardware over 10GBit network links. Might explain
|
|
some extremely rare (1 per several millions of operations hours)
|
|
production hangs on secondaries. Workaround possible by
|
|
"pause-fetch; resume-fetch".
|
|
* Minor fixes of rare kthread retarding under very high load.
|
|
* Minor improvement: add version number to "marsadm version" which
|
|
can be used for future compatibilty checking with respect to
|
|
new features.
|
|
|
|
mars0.1stable38
|
|
--------
|
|
* Compile without pre-patch on some kernel versions!
|
|
Whether the pre-patch is applied will be detected automatically.
|
|
However, there is some (hopefully minor) performance penalty when
|
|
the pre-patch is missing.
|
|
This will be addressed in a future release (but might go
|
|
to branch 0.1b instead, not yet decided).
|
|
Tested with vanilla kernels 3.10.105, 3.14.79, 3.16.43,
|
|
4.1.39, 4.4.67.
|
|
Vanilla kernels 4.8.x and later are _not_ yet working
|
|
(independently from pre-patches). This will be addressed
|
|
in a future release.
|
|
* No functional changes otherwise. Rollback to prior versions
|
|
should be easy. Please report any issues.
|
|
* Updated docs describing build methods.
|
|
|
|
mars0.1stable37
|
|
--------
|
|
* Minor fix: secondary logfile replication could hang in the
|
|
extremely unusual case that the expected primary logfile size
|
|
gets shortened after a crash followed by reboot.
|
|
Workaround was possible via "pause-fetch; resume-fetch".
|
|
|
|
mars0.1stable36
|
|
--------
|
|
* Doc: new slides from GUUG2017, both in English and in German.
|
|
Some very important hints for cost savings. May easily save
|
|
you a few millions when operating some petabytes of data.
|
|
* Doc: new chapter on cost savings in mars-manual.pdf.
|
|
Some parts of German oral explanations from the GUUG conference
|
|
translated to English for my English-speaking audience.
|
|
More to come later (hopefully; I need to get the time).
|
|
|
|
mars0.1stable35
|
|
--------
|
|
* Minor fix: when syncing a big resource (e.g. 40TiB) over an 1GBit
|
|
uplink, the sync may take longer than 1 day. This increases the
|
|
probability for triggering an unintended restart of that sync
|
|
from scratch.
|
|
Among further obscure preconditions, more than 5 logfiles must
|
|
exist such that the wrong assumption of an emergency mode can
|
|
happen at the secondary. In order to trigger the bug more likely,
|
|
it is therefore helpful to misconfigure /etc/cron.d/mars by
|
|
log-rotate'ing every 10 minutes, but doing log-delete-all only
|
|
once an hour (which contradicts my upstream documentation and
|
|
unnecessarily wastes valuable storage space in /mars).
|
|
Fixed by correction of a typo-like error.
|
|
|
|
mars0.1stable34
|
|
--------
|
|
* Minor fix: in some rare cases, when lots of gigabytes had to be
|
|
replayed in one big slurp, the replay position wasn't updated
|
|
during a longer time. Some admins were complaining that it
|
|
appeared "stuck" although it worked in reality.
|
|
Improved by increasing the update frequency of the replay link.
|
|
* Minor fix: after network errors, sometimes the sync restarted
|
|
from scratch, unnecessarily.
|
|
* Minor fix: under rare conditions, rmmod could hang forever.
|
|
A known reason has been fixed. Other theoretical reasons
|
|
hopefully improved by some further safeguards.
|
|
|
|
mars0.1stable33
|
|
--------
|
|
* Minor regression from stable29:
|
|
After a primary crash, without switchover, and when the primary
|
|
recovery phase involves a logrotate to an empty new logfile
|
|
which had been in the meantime shortly before the crash but
|
|
has not yet been used before the crash (race condition),
|
|
a kernel NULL pointer deref may stop the main thread.
|
|
Workaround: either remove the empty logfile by hand,
|
|
or just do a failover to the other side.
|
|
|
|
mars0.1stable32
|
|
--------
|
|
* Critical regression between stable30 and stable31 (can be avoided
|
|
by simply using stable30 for affected kernels): on _old_ kernels
|
|
(before 4.3.x) the removal of merge_bvec_fn() (see upstream commit
|
|
8ae126660fddbeebb9251a174e6fa45b6ad8f932) can lead to fatal
|
|
crashes at the primary side.
|
|
Fixed by using (hopefully) proper #ifdef's according to the
|
|
kernel version.
|
|
Notice: between stable30 and stable31 no true MARS fixes were
|
|
made (since no bugs were found). This strategy is likely to
|
|
continue for a while, for newer adaptations to even newer kernels.
|
|
In case of problems, go back. And, please, report it to me :)
|
|
|
|
mars0.1stable31
|
|
--------
|
|
* New _minimum_ pre-patches for vanilla LTS kernels 3.2.x to 4.7.x.
|
|
For security reasons, please prefer them over the old _generic_
|
|
pre-patch versions which expose many unnecessary EXPORT_SYMBOL
|
|
to potential attackers.
|
|
* Adaptions to vanilla kernels up to 4.7.x.
|
|
Note: 4.8rc-* does not yet work.
|
|
* Regression testing with many kernel versions: looks fine.
|
|
|
|
mars0.1stable30
|
|
--------
|
|
* Minor fix: in very rare cases of a primary crash, a missing
|
|
versionlink could lead to a hang.
|
|
* Minor fix: improved error reporting of replay code.
|
|
* Minor fix: improved switchback to former primary side.
|
|
* Minor fix: systematically add some missing macros.
|
|
* Minor improvements: add some example systemd unit and other
|
|
contrib stuff like a cronjob example.
|
|
* Doc: minor additions and improvements.
|
|
|
|
mars0.1stable29
|
|
--------
|
|
* Minor fix: on very fast hardware and networks, sync could take
|
|
a while for terminating.
|
|
* Minor fix: external module build.
|
|
* Major usability improvement: new expert commands marsadm
|
|
lowlevel-ls-host-ips, lowlevel-set-host-ip, lowlevel-delete-host.
|
|
Necessary for moves between networks, dedicated replication IPs,
|
|
etc.
|
|
* Minor doc update.
|
|
|
|
mars0.1stable28
|
|
--------
|
|
* Doc: describe new naming conventions.
|
|
MARS Light is now simply called MARS.
|
|
No distinction between "Light" and the future "Full" anymore.
|
|
Please note that the git branches light0.1.y and light0.2.y have
|
|
been renamed to mars0.1.y and mars0.2.y respectively.
|
|
* Minor sourcecode cleanup: s/light//g or s/light/main/g
|
|
where appropriate.
|
|
No other changes in the sourcecode, deliberately.
|
|
In case anyone encounters any build problems compiling MARS,
|
|
this release is separated just for the sake of build testing,
|
|
or Debian packaging testing, etc.
|
|
* Doc: minor clarifications.
|
|
|
|
mars0.1stable27
|
|
light0.1stable27
|
|
--------
|
|
* Critical fix: typo in sync progress comparison code could lead
|
|
to data version mismatches during sync when alternating with
|
|
replay. Only observed at a certain new hardware class, and only
|
|
while testing with an extremely high load (9 loaded resources
|
|
in parallel to 9 concurrent syncs). As a workaround,
|
|
echo 0 > /proc/sys/mars/sync_flip_interval_sec can be used.
|
|
Nevertheless, update is highly recommended!
|
|
* Major fix: slow memory leak (regression from light0.1stable26).
|
|
Only when starting the transaction logger (i.e. primary is typically
|
|
not affected). But don't let run it for a longer time.
|
|
Monitoring is possible via /proc/slabinfo (size-64 or siblings).
|
|
* Minor fix: join-cluster did not check for duplicate IP addresses.
|
|
* Minor fixes: some unnecessary annoying error messages.
|
|
* Docu: new slides from GUUG 2016 in Köln.
|
|
|
|
light0.1stable26
|
|
--------
|
|
* Minor fixes: some primitive macros were reporting misleading or
|
|
even wrong values at split brain, or during/after emergency mode.
|
|
Some high-level macros as well as try_to_avoid_split_brain
|
|
should work better / more reliable now.
|
|
* Minor fix: potential deadlock after crash reboot, or after
|
|
defective /mars filesystem. Never observed in practice.
|
|
* Minor safeguard: unnecessary split brain could emerge at
|
|
secondaries under extremely rare and strange conditions.
|
|
Unsure whether it ever occurred in practice.
|
|
* Minor usability improvement: show incorrect permissions on /mars.
|
|
Some other sysadmin tools like Puppet seem to have their own
|
|
default notion of "secure permissions" ;)
|
|
* Minor doc reorg, better chapter structure.
|
|
|
|
light0.1stable25
|
|
--------
|
|
* Major fix: in rare cases "marsadm primary" (without --force)
|
|
could go into an endless loop, even if --timeout= was specified.
|
|
* Minor fix: in rare cases of hanging or defective IO, crashes
|
|
of the primary could replicate versionlinks to the secondary,
|
|
but after reboot they were missing at the primary because of
|
|
of hanging IO or other IO / RAID controller problems.
|
|
Now using sync_filesystem() for either ensuring actuality,
|
|
or for letting the mars_light main control thread hang
|
|
(which will hopefully be noticed soon by monitoring).
|
|
* Minor fix: join-cluster uses rsync, which could abort due to
|
|
vanished filesystem objects while the primary is actively running.
|
|
Now it should tolerate such "errors".
|
|
* Minor fixes / additions at primitive macros.
|
|
* Tiny doc update.
|
|
|
|
light0.1stable24
|
|
--------
|
|
* Skip this release due to a regression.
|
|
|
|
light0.1stable23
|
|
--------
|
|
* Minor fix: the new replay-code error message was forgotten
|
|
to reset at secondaries. Now the annoying old error message
|
|
disappears after the next successful logrotate.
|
|
* Minor fixes of internal marsadm code (not in use until now).
|
|
* Minor doc update.
|
|
|
|
light0.1stable22
|
|
--------
|
|
* Critical fix for non-storage servers: the /mars directory
|
|
was readable by ordinary non-root users, opening a potential
|
|
security hole. Originally MARS was designed for standalone
|
|
storage servers solely, but now it is increasingly deployed to
|
|
machines where ordinary users can log in.
|
|
Update recommended, but only urgent for potentially affected
|
|
installations.
|
|
* Minor fix: when a logfile was damaged (observed at defective
|
|
hardware), this was often (but not always) detected by the
|
|
md5 data checksums in the transaction logfiles. So far so good.
|
|
The replay / recovery process stopped for a very good reason.
|
|
But it was not easily possible to _force_ any of the resource
|
|
members into primary role when the defect was already present at
|
|
the _primary_ (which happend once during 7 millions of operating
|
|
hours, and at a primary site which proved defective afterwards),
|
|
and the defect had been replicated to all secondaries.
|
|
As a workaround, the resource could be destroyed via leave-resource
|
|
everywhere, and re-surrected from scratch. Clumsy.
|
|
Now an md5 checksum error in the middle of a logfile is
|
|
treated similarly to an EOF. "primary --force" will succeed now,
|
|
without applying the defective data (as before).
|
|
Split brain will result for sure in such a case.
|
|
* Minor improvement: md5 logfile checksum errors are now displayed
|
|
directly in the diskstate macro (and therefore also at plain
|
|
"view").
|
|
* Minor improvement: when "marsadm view all" told you "InConsistent"
|
|
as the disk state, this was _formally correct_ because it related
|
|
to the state of the _disk_, not to the state of the replication.
|
|
The former message could appear regularly during ordinary
|
|
out-of-order writeback at the primary side, without violating
|
|
the consistency of /dev/mars/mydata.
|
|
However, many people were confused and alarmed by the irritating
|
|
message.
|
|
Now a better wording is used: "WriteBack" and "Recovery" describes
|
|
more intuitively what is really happening :)
|
|
* Minor doc improvements.
|
|
|
|
light0.1stable21
|
|
--------
|
|
* Hint: now MARS has been rolled out to more than 1600 servers,
|
|
including some MySQL database servers, and has collected more
|
|
than 6 millions of operation hours.
|
|
* Minor fixes, none of them observed in practice, only found
|
|
by testing while working on new features:
|
|
- potential read page fault
|
|
- potential deadlock
|
|
- incorrect remote symlink update under untypical circumstances
|
|
|
|
light0.1stable20
|
|
--------
|
|
* Hint: MARS is now running on more than 850 storage servers,
|
|
and has collected more than 4.5 millions of operation hours.
|
|
There were no new incidents with customer impact since the last
|
|
major bugfix (more than 3 millions of operation hours since then).
|
|
It is difficult to deduce a reliability from that, but it appears
|
|
that at least 99.999%, if not 99.9999% are now real for the
|
|
MARS component as a standalone component (not to be confused with
|
|
overall system reliability). Our storage hardware is clearly much
|
|
less reliable. MARS does compensate these defects all the time.
|
|
|
|
* Minor fix: memory leak in networking code, does not occur
|
|
at light0.1 operations (but maybe future versions of MARS).
|
|
* Doc: add presentation slides from Froscon2015.
|
|
|
|
light0.1stable19
|
|
--------
|
|
* Minor safeguard: warn when somebody tries leave-resource --host=
|
|
for a damaged host, and later the dead host resurrects in an
|
|
unreasonable way.
|
|
* Doc update: describe use cases for DRBD vs MARS more clearly.
|
|
* Minor spelling fixes.
|
|
|
|
light0.1stable18
|
|
--------
|
|
* Minor safeguard: prevent join-resource when previous log-purge-all
|
|
has been forgotten. Prevent create-resource also when previous
|
|
delete-resource has been forgotten. Anyway, this happens only in
|
|
very exotic repair scenarios after very heavy failures.
|
|
* Doc updates: simplify descriptions of split-brain resolution and
|
|
emergency mode resolution. Nowadays 'invalidate' will do everything
|
|
in all tested cases; the more complex alternative methods have
|
|
been moved to the appendix.
|
|
|
|
light0.1stable17
|
|
--------
|
|
* Minor fix: stacktrace / oops in aio callback path due to a
|
|
subtle race, observed once during 2.5 millions of operation hours.
|
|
In the observed case, the secondary was hanging, without
|
|
customer impact. However, the error class could potentially
|
|
occur also at the primary side. Probably the bug was triggered
|
|
by a hardware problem from the RAID controller.
|
|
|
|
light0.1stable16
|
|
--------
|
|
* Minor fix: sync could take a long time to complete under high
|
|
application load, similarly to a live-lock.
|
|
* Some smaller minor fixes for annoying messages.
|
|
* Contrib: added configurable Nagios check.
|
|
* Contrib: added some example scripts which could be used by
|
|
clustermanagers etc.
|
|
* Doc: important new section on pitfalls when using existing
|
|
clustermanagers UNMODIFIED for long distance replication.
|
|
PLEASE READ!
|
|
|
|
light0.1stable15
|
|
--------
|
|
* NOTICE: MARS succeeded baptism on fire at 04/22/2015 when a whole
|
|
co-location had a partial power blackout, followed by breakdown
|
|
of air conditioning, followed by mass hardware defects due to
|
|
overheating. MARS showed exactly 0 errors when (emergency)
|
|
switching to another datacenter was started in masses.
|
|
* Major fix of race in transaction logger: the primary could hang
|
|
when using very fast hardware, typically after ~24000 operation
|
|
hours. The problem was noticed 6 times during a grand total of
|
|
more than 1,000,000 operation hours on a mixed hardware park,
|
|
showing up only on specific hardware classes. Together with 3
|
|
other incidents during early beta phase which also had customer
|
|
impact, this means that we have reached a reliability of about
|
|
===> 99.999%
|
|
After this fix, the reliability should grow even higher.
|
|
A workaround for this bug exists:
|
|
# echo 2 > /proc/sys/mars/logger_completion_semantics
|
|
Update is only mandatory when you cannot use the workaround.
|
|
* Minor improvement in marsadm: re-allow --force combined with "all".
|
|
This is highly appreciated for speeding up operations / handling
|
|
during emergency datacenter switchover.
|
|
* Various smaller improvements.
|
|
* Contrib (unsupported): example rollout script for mass rollout.
|
|
|
|
light0.1stable14
|
|
--------
|
|
* Minor safeguard: modprobe mars will refuse to start when the
|
|
cluster UUID is missing.
|
|
* Minor fix: external race in marsadm resize, only relevant
|
|
for scripting.
|
|
* Minor fix: potential race on plugged IO requests.
|
|
* Clarify output of marsadm view. Many systematical improvements
|
|
and hints.
|
|
* Add some unevitable macros for scripting / automation.
|
|
* Various tiny improvements.
|
|
|
|
light0.1stable13
|
|
--------
|
|
* Critical safeguard for accidental join-cluster with wrong argument:
|
|
make UUID mandatory, disallow completely unrelated hosts to
|
|
communicate symlink tree updates when their UUIDs mismatch.
|
|
* Minor fix: leave-resource --host=other did not work when disks
|
|
were named differently throughout the cluster.
|
|
* Minor fix: detach --host=other --force (which is needed as a
|
|
precondition) did not work.
|
|
* Various minor fixes and clarifications. "marsadm view all"
|
|
now reports the communication status in the cluster.
|
|
|
|
light0.1stable12
|
|
--------
|
|
* Critical (but usually not extremely relevant) fix:
|
|
When emergency mode occurs just during a sync, the target could
|
|
remain inconsistent without notice. Now noticed.
|
|
You always could/should manually invalidate whenever an
|
|
emergency mode appeared.
|
|
Now this is automatically fixed by restarting any sync from
|
|
scratch (if one was actually running before; otherwise consistency
|
|
was never violated).
|
|
* Major documentation update / corrections.
|
|
* Major (but less relevant) fix: leave-cluster did not really work.
|
|
* Minor fix (regression): rmmod could hang when sync was running.
|
|
* Various minor fixes and clarifications.
|
|
|
|
light0.1stable11
|
|
--------
|
|
* Major documentation update. mars-manual.pdf increased from
|
|
66 to 80 pages. Please read! You probably should know this.
|
|
* Minor fixes: better cleanup on invalidate / leave-resource.
|
|
* Minor clarifications: more precise EIO error codes, more verbose
|
|
error reporting via "marsadm cat".
|
|
|
|
light0.1stable10
|
|
--------
|
|
* Major fixes of internal network protocol errors, leading to
|
|
internal shutdown of sockets, which were transparently re-opened.
|
|
It could affect network performance. Not sure whether
|
|
stability was also affected (probably under extremely high load);
|
|
for better safety you should upgrade.
|
|
* Major fix from Manuel Lausch: regex parsing sometimes went
|
|
completely wrong when hostnames followed a similar name scheme
|
|
than internal symlinks.
|
|
* Major, only relevant for k>2 replicas: fix wrong internal sharing
|
|
of data structures resulting from parallel data connections.
|
|
* Minor fix: race in fake-sync.
|
|
* Minor fix: race in invalidate.
|
|
* Minor, only for k>2 replicas: fix direct primary handover when
|
|
some non-involved hosts are currently unreachable.
|
|
* Minor: improve becoming primary during split brain.
|
|
* Minor: improve becoming primary when emergency mode starts.
|
|
* Minor: silence some annoying stderr messages.
|
|
* Several internal minor fixes and clarifications.
|
|
|
|
light0.1stable09
|
|
--------
|
|
* Major fix of scarce race (potentially critical): the bio response
|
|
thread could terminate too early, leading to a premature dealloc
|
|
of kernel memory. This has only been observed on slow virtual
|
|
machines with slow virtual devices, and very high load on k=4
|
|
replicas. This could potentially affect the stability of the system.
|
|
Although not observed at production machines at 1&1, I recommend
|
|
updating production machines to this release ASAP.
|
|
* Major usability fix: incorrect commandline options of marsadm
|
|
were just ignored if they appeared after the resource argument.
|
|
Misspellings could cause undesired effects. For instance,
|
|
"marsadm delete-resource vital --force --MISSPELLhost=banana"
|
|
was accidentally destroying the primary during operation (which
|
|
is _possible_ when using --force, and this was even a _required_
|
|
sort of "STONITH"-like feature -- however from a human point
|
|
of view it was intended to destroy _another_ host, so this was
|
|
an unexpected behaviour from a sysadmin point of view).
|
|
* Major workaround: the concept "actual primary" is wrong, because
|
|
during split brain there may exist several primaries. Do not
|
|
use the macro view-actual-primary any longer. It is deprecated now.
|
|
Use view-is-primary instead, on each host you are interested in.
|
|
* Minor fix: "marsadm invalidate" did not work in some weired
|
|
split brain situations / was not equivalent to
|
|
"marsadm leave-resource $res; marsadm join-resource $res".
|
|
The latter was the old workaround to fix the situation.
|
|
Now it shouldn't be necessary anymore.
|
|
* Minor fix: pause-fetch could take very long to terminate.
|
|
* Minor fix: marsadm wait-cluster did not wait for all hosts
|
|
particiapting in the resource, but only for one of them.
|
|
This is only relevant for k>2 replicas.
|
|
* Minor fix: the rates displayed by "marsadm view" did not drop down
|
|
to 0 when no progress was made.
|
|
* Minor fix: logging to syslog was incomplete.
|
|
* Minor usability fix: decrease boring speakyness of "log-rotate"
|
|
and "log-delete" for cron jobs.
|
|
* Minor fixes: several internal awkwardnesses, potentially affecting
|
|
performance and/or stability in weired situations.
|
|
|
|
light0.1stable08
|
|
--------
|
|
* Minor fix: after emergency mode, a versionlink was forgotten
|
|
to create. This could lead to unnecessary reports of split
|
|
brain and/or need for additional re-invalidate.
|
|
* Minor fix: the predicate 'view-is-consistent' reported 'false'
|
|
in some situations on secondaries when all was ok.
|
|
* Minor fix: it was impossible to determine the 'is-consistent'
|
|
from 'marsadm view' (without -1and1 suffix). Added a new [Cc-]
|
|
flag. This is absolutely needed to determine whether the
|
|
underlying disks must have the same checksum (provided that
|
|
both disks are detached and the network works and fetch+replay
|
|
had completed before the detach).
|
|
* Updated docs to reflect this.
|
|
* Minor fix: 'invalidate' did not work when the resource was not
|
|
completely detached. Now it implicitly does a detach before
|
|
starting invalidation.
|
|
* Minor fix: wait-umount was waiting for umount of _all_ primaries
|
|
during split brain. Now it waits only for umount of the local node.
|
|
Notice that having multiple primaries in parallel is an
|
|
erroneous state anyway.
|
|
* Minor fix: leave-cluster did not work without --force.
|
|
|
|
light0.1stable07
|
|
--------
|
|
* Minor fix: re-creation of a completely destroyed resource
|
|
did not always work correctly
|
|
|
|
light0.1stable06
|
|
--------
|
|
* Major fix: becoming primary was hanging in scarce situations.
|
|
* Minor fix: some split brains were not always detected correctly.
|
|
* Minor fix for Redhat openvz kernel builds.
|
|
* Several fixes for 1&1 internal Debian builds.
|
|
|
|
light0.1stable05
|
|
--------
|
|
* Major fix: incomplete calls to vfs_readdir()
|
|
which could lead to incomplete symlink updates /
|
|
replication hangs.
|
|
* Minor fix: scarce race on replay EOF.
|
|
* Separated kernel from userspace build environment.
|
|
* Removed some potentially dangerous Kconfig options
|
|
if they would be set to wrong values (robustness against
|
|
accidentally producing bad kernel modules).
|
|
* Dito: some additional checks against bad main Kconfig options
|
|
(mainly for out-of-tree builds).
|
|
* Separated contrib code from maintained code.
|
|
* Added some pre-patches for newer kernels
|
|
(WIP - not yet fully tested at all combinations)
|
|
* Minor doc addition: LinuxTag 2014 presentation.
|
|
|
|
light0.1stable04
|
|
--------
|
|
* Quiet annoying error message.
|
|
* Minor readability improvements.
|
|
* Minor doc updates.
|
|
|
|
light0.1stable03
|
|
--------
|
|
* Major: fix internal aio race (could lead to memory corruption).
|
|
* Fix refcounting in trans_logger.
|
|
* Some minor fixes in module code.
|
|
* Fix 1&1-internal out-of-tree builds.
|
|
* Various minor fixes.
|
|
* Update monitoring tools / docs (German, contributed by Jörg Mann).
|
|
|
|
light0.1stable02
|
|
--------
|
|
* Fix sorting of internal data structure.
|
|
* Fix IO error propagation at replay.
|
|
|
|
light0.1stable01
|
|
--------
|
|
* Fix parallelism of logfile propagation: sometimes a secondary
|
|
could get a more recent version than the primary had on stable
|
|
storage after its crash, eventually leading to an (annoying)
|
|
split brain. Some people might take this as a feature instead
|
|
of a bug, but now the logfile transfer starts only after the
|
|
primary _knows_ that the data is successfully committed to
|
|
stable storage.
|
|
* Fix memory leaks in error path.
|
|
* Fix error propagation between client and server.
|
|
* Make string allocation fully dynamic (remove limitation).
|
|
* Fix some annoying messages.
|
|
* Fix usage output of marsadm.
|
|
* Userspace: contributed bugfix for Debian udev rules by Jörg Mann.
|
|
* Improved debugging (only for testing).
|
|
|
|
light0.1beta0.18 (feature release)
|
|
--------
|
|
* New commands marsadm view-$macroname
|
|
* New customizable macro processor
|
|
* New err/warn/inf reporting via symlinks
|
|
* Per-resource emergency mode
|
|
* Allow limiting the sync parallelism
|
|
* New flood-protected syslogging
|
|
* Some smaller improvements
|
|
* Update docs
|
|
* Update test suite
|
|
|
|
light0.1beta0.17
|
|
--------
|
|
* Major bugfix: race in logfile switchover could sometimes
|
|
lead to the wrong logfile (extremely rare to hit, but
|
|
potentially harmful).
|
|
* Disallow primary switching when some secondaries are
|
|
syncing.
|
|
* Fix logfile fetch from multiple peers.
|
|
* Fix computation of transitive closure (affected
|
|
log-purge-all, split brain detection, and many others).
|
|
* Fix incorrect emergency mode detection.
|
|
* Primaries no longer fetch logfiles (unnecessarily, only
|
|
makes a difference at concurrent split brain operations).
|
|
* Detached resources no longer fetch logfiles (unexpectedly).
|
|
* Myriads of smaller fixes.
|
|
|
|
light0.1beta0.16
|
|
--------
|
|
|
|
* Critical bugfix: "marsadm primary --force" was assumed to be given
|
|
by sysadmins only in case of emergency, when the network is down.
|
|
When given in non-emergency cases where the old primary continues
|
|
to run (/dev/mars/* being actively used and written), the
|
|
old primary could suddendly do a "logrotate" to the
|
|
new split-brain logfile produced by the new (second) primary.
|
|
Now two primaries should be able to run concurrently in split-brain
|
|
mode without mutually trashing their logfiles.
|
|
* primary --force now only works in disconnected mode, in order
|
|
to hinder unintended forceful creation of split brain during
|
|
normal operation.
|
|
* Stop fetching of logfiles behind split brain points (save space
|
|
at the target hosts - usually the data will be discarded later).
|
|
* Fixed split brain detection in userspace.
|
|
* leave-resource now waits for local actions to take place
|
|
(remote actions stay asynchronously).
|
|
* invalidate / join-resource now work only if a designated primary
|
|
exists (otherwise they would not know uniquely from whom
|
|
to start initial sync).
|
|
* Update docs, clarify scenarios intended <-> emergengy switching.
|
|
* Fixed mutual overwrite of deletion symlinks in case of racing
|
|
log-deletes spawned in parallel by cron jobs (resilience).
|
|
* Fixed races between deletion and re-erection (e.g. fresh
|
|
join-resource after leave-resource during network partitions).
|
|
* Fixed duration of network timeouts in case the network is down
|
|
(replaced non-working TCP_KEEPALIVE by explicit timeouts).
|
|
* New option --dry-run which does not really create symlinks.
|
|
* New command "delete-resource" (VERY DANGEROUS) for
|
|
forcefully destroying a resource, even when it is in use.
|
|
Intended only for _emergency_ cases when sysadmins are
|
|
desperate. Use only by hand, first run with --dry-run in order
|
|
to check what will happen!
|
|
* New command "log-purge-all" (potentially DANGEROUS) for
|
|
resolving split brain in desperate situations (cleanup of
|
|
leftovers). Only use by hand, first run with --dry-run!
|
|
* Lots of smaller imprevements / usability / readability etc.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.15
|
|
--------
|
|
|
|
* Introduce write throttling of bulk writers.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.14
|
|
--------
|
|
|
|
* Fix logfile transfer in case of "holes" created by
|
|
emergency mode.
|
|
* Fix "marsadm invalidate" after emergency mode had been entered.
|
|
* Fix "marsadm resize" capacity propagation from underlying LVM.
|
|
* Update test suite.
|
|
|
|
light0.1beta0.13
|
|
--------
|
|
|
|
* Fix shutdown during operation (flying requests).
|
|
* Fix unnecessary Lamport clock propagation storms.
|
|
* Improve unnecessary page cache utilisation (mapfree).
|
|
* Update test suite.
|
|
|
|
|
|
light0.1beta0.12 and earlier
|
|
--------
|
|
|
|
There was no dedicated ChangeLog. For details, look at the
|
|
commit history.
|
|
|
|
Release Policy / Software Lifecycle
|
|
-----------------------------------
|
|
|
|
New source releases are simply announced by appearance of git tags.
|