This is needed for adaptation of the out-of-tree MARS version to multiple
kernel versions.
It will be much simplified after upstream merging, and/or
removed/replaced by something better.
After a primary --force, the error couldn't go away in case of
a defective logfile. Months later, sysadmins were needlessly alarmed
when looking at the primary.
In some rare cases (e.g. damaged /mars or crashed primaries),
the versionlink belonging to a logfile may be missing.
Don't insist on the existence of a versionlink if the logfile is
stemming from myself (automatic self-repair).
This is important for even more hardening of MARS.
Simulate crashes at the "wrong moment", typically with
IO requests flying, or just before a symlink update.
Only for debugging. Never use for production.
Found by code inspection, neither in practice nor by testing.
Should not occur in practice, because it could only occur after
marsadm pause-fetch, which is an exceptional state only to be entered
for maintenance or for emergency failover.
Skipping over an incorrect logfile at a secondary may produce an
unnecessary split brain.
Fix the potential problem by doing it only after "primary --force",
and by never creating a new logfile, always by re-using existing
logfiles.
This has been found by testing.
In extremely rare cases, such after crashes at the "wrong moment"
or after defective /mars filesystems, the replay link could show a
different length than the corresponding versionlink.
The versionlink wouldn't be updated anymore when additionally the
logfile has the same length than the replay link.
The incorrect versionlink will then lead to a lock.
Fix the problem by using the _minimum_ of all length indicators.
For safty, or when in doubt, replay more data, which will in turn
update the versionlink again to its correct value.
Some primary appeared to have died, and was rebooted.
In the meantime, the old secondary was forcefully switched
to primary.
Afterwards, the old primary = new secondary got stuck because 2
versionlinks, which had been _produced_ by _himself_, were
missing, but they were present at the new primary = old secondary!
How could this happen?
All transaction logfiles were fully present and correct everywhere.
However, the old primary kern.log showed that a problem with the
RAID system must have existed. In addition, the RAID controller
errorlog also reported some problems which appeared to have healed.
Problem analysis shows the following possibility:
The transaction logger can continue to write data, even via
fsync(), while the _writeback_ of other parts of the /mars filesystem
(e.g. symlink updates) got stuck for a long time due to an IO problem.
Usually, slow or even missing symlink updates are no problem because
upon recovery after a reboot, everything is healed by transaction
replay (possibly replaying much more data than really necessary,
but this does not affect semantics, and it is even advantageous
when RAID disks might contain defective data).
There is one exception: after a logrotate, the corresponding new
versionlink should appear after a small time. Otherwise, the
above mentioned scenario could emerge.
We use sync_filesystem() to ensure that any versionlink update
to a _new_ versionlink is either guaranteed to become persistent,
or (in case of IO problems) the mars_light thread will hang, which
will be (hopefully) noticed soon by monitoring.