Commit Graph

1435 Commits

Author SHA1 Message Date
Thomas Schoebel-Theuer
8e2de8288d light: fix missing versionlink upon slow or defective IO
Some primary appeared to have died, and was rebooted.
In the meantime, the old secondary was forcefully switched
to primary.

Afterwards, the old primary = new secondary got stuck because 2
versionlinks, which had been _produced_ by _himself_, were
missing, but they were present at the new primary = old secondary!

How could this happen?

All transaction logfiles were fully present and correct everywhere.

However, the old primary kern.log showed that a problem with the
RAID system must have existed. In addition, the RAID controller
errorlog also reported some problems which appeared to have healed.

Problem analysis shows the following possibility:

The transaction logger can continue to write data, even via
fsync(), while the _writeback_ of other parts of the /mars filesystem
(e.g. symlink updates) got stuck for a long time due to an IO problem.

Usually, slow or even missing symlink updates are no problem because
upon recovery after a reboot, everything is healed by transaction
replay (possibly replaying much more data than really necessary,
but this does not affect semantics, and it is even advantageous
when RAID disks might contain defective data).

There is one exception: after a logrotate, the corresponding new
versionlink should appear after a small time. Otherwise, the
above mentioned scenario could emerge.

We use sync_filesystem() to ensure that any versionlink update
to a _new_ versionlink is either guaranteed to become persistent,
or (in case of IO problems) the mars_light thread will hang, which
will be (hopefully) noticed soon by monitoring.
2016-02-03 22:01:48 +01:00
Thomas Schoebel-Theuer
0e6bb47cb6 marsadm: fix edge cases of try_to_avoid_splitbrain()
Originally a trivial silly bug (boolean value was wrong), leading to an
endless loop when a local versionlink was missing, which can happen
only after a primary crash at the wrong moment shortly after a logrotate
(not even during ordinary operations), followed by a hard reboot.

As documented in mars-manual.pdf, you simply need "modprobe mars"
to recover after such a crash reboot. MARS remembers the primary state
persistently for you and restores everything _automatically_.

Using "marsadm primary" in such a case to switch the current primary
to primary again (after an unnecessary "marsadm secondary" which is
strongly discouraged by mars-manual.pdf), although the host is / was
already in primary state after the reboot, is at least as silly as
the mentioned bug. Doing this in an /etc/init.d/ startup script
where it really doesn't belong into, is even more silly.

The latter is even an OPERATIONAL RISK, because "marsadm secondary"
works _globally_ in the whole cluster (as documented in mars-manual.pdf).
Such an improper startup script _can_ (potentially) disturb another
cluster member which had become primary in the _meantime_ during reboot.
Global cluster operations don't belong into startup scripts, because
reboots may happen unintentionally at any time.
2016-02-03 22:00:47 +01:00
Thomas Schoebel-Theuer
cd01d1ae02 all: release light0.1stable23 2016-01-21 08:11:24 +01:00
Thomas Schoebel-Theuer
d9fd3de2a2 doc: update version 2016-01-21 08:10:26 +01:00
Thomas Schoebel-Theuer
e207443833 marsadm: fix binary operators =~ and "match" 2016-01-21 08:09:48 +01:00
Thomas Schoebel-Theuer
ea48664a14 light: disallow primary from rotating over damaged logfiles
Only a secondary is allowed to do this, because we assume that
logfile replay has the property of "anytime consistency"
only there.

When a primary cannot recover after a crash due to a defective
logfile, this is not true. The primary is simply lost in such a
(rare) case. Observed 2 times during almost 8 millions of
operating hours.

In such a case, hardware is truly defective, and you have only
the following options:

1) switchover to a secondary via "primary --force", OR

2) deconstruct the resource everywhere, run fsck or similar on
whatever replica seems to be the best version,
and reconstruct the resource from scratch, OR

3) restore your backup.
2016-01-21 08:09:47 +01:00
Thomas Schoebel-Theuer
acdb9d7a42 light: fix reset of replay-code
Reset was forgotten in secondary role. Do it always whenever
a logfile is actually rotated.
2016-01-20 14:48:43 +01:00
Thomas Schoebel-Theuer
40e06b8577 all: release light0.1stable22 2016-01-15 18:23:46 +01:00
Thomas Schoebel-Theuer
d5bc9d592c doc: update version and PDF 2016-01-15 17:59:32 +01:00
Thomas Schoebel-Theuer
03523a61fc doc: clarify future way of symlink updates 2016-01-15 17:58:31 +01:00
Thomas Schoebel-Theuer
bda94f439f doc: remove accidental insertion 2016-01-15 17:58:30 +01:00
Thomas Schoebel-Theuer
b412ebac20 doc: explain blackbox principle of /mars 2016-01-15 17:58:30 +01:00
Thomas Schoebel-Theuer
feb0b34604 marsadm: fix irritating "Inconsistent" display at primary side
At an actual primary, "Inconsistent" would be the correct description
for the state of the _disk_.

However most sysadmins will confuse this with the state of the
_replication_ (which is of course never inconsistent during
writeback from the memory buffer).

Although documented correctly, misunderstandings continue
to survive, because humans are automatically abstracting away
from detail components such as a "disk", and are automatically
assuming that "marsadm view" would relate to the replication
as a whole.

Avoid misunderstandings by more detailed message distinctions
aiming to address all of these in parallel.
2016-01-15 17:58:30 +01:00
Thomas Schoebel-Theuer
cd122db700 marsadm: display logfile replay errors in diskstate 2016-01-15 17:58:27 +01:00
Thomas Schoebel-Theuer
cc1074fc53 marsadm: add primitive macro errno-text 2016-01-15 17:29:47 +01:00
Thomas Schoebel-Theuer
6c41326f7a marsadm: add basic macro replay-code 2016-01-15 17:23:14 +01:00
Thomas Schoebel-Theuer
496e57e1e1 logger: add new indicator for damaged logfiles 2016-01-15 17:10:58 +01:00
Thomas Schoebel-Theuer
cc1d786654 marsadm: disallow ordinary switching when logfiles are damaged
Only primary --force should be possible in such a (rare) case.
2016-01-15 17:10:48 +01:00
Thomas Schoebel-Theuer
d67336420d light: fix becoming primary when logfiles are damaged
When logfile replay aborts with an error, becoming primary would be
impossible.
Without this, repair would be only possible by complete destruction
of the resource.

A previous version of this patch introduced
/proc/sys/mars/allow_primary_when_damaged which would complicate
the sysadmin interface. People would be unsure what to do.
2016-01-13 14:12:02 +01:00
Thomas Schoebel-Theuer
69386b33d9 marsadm: fix /mars security issues
Only relevant for non-storage servers where customers have access to.

Notice that /mars is a _reserved_ filesystem for MARS-internal purposes.
It has mothing to do with an ordinary filesystem.

Users have generally to be kept out.
2016-01-13 14:12:00 +01:00
Thomas Schoebel-Theuer
5ddc0b8991 all: release light0.1stable21 2016-01-02 10:50:43 +01:00
Thomas Schoebel-Theuer
3eedff125d infra: fix comparison
Under weird circumstances, when a new symlink contents was just a
shortened version (prefix) of the old one, the symlink was not updated.
2016-01-02 10:18:33 +01:00
Thomas Schoebel-Theuer
d18c60f232 infra: fix potential fault
Very old idiotic bug.
Under some circumstances, a byte beyond the end of a non-null-terminated
string (such as produced by the VFS) might be read, potentially leading
to a page fault just one byte after a page border.
2016-01-02 10:18:33 +01:00
Thomas Schoebel-Theuer
25d954051b logger: move ranking array from stack to brick instance
Don't allocate this on the stack, it might grow too big in future.
Reduces the risk of stack overflows (not observed until now, but
suspected).
2016-01-02 10:18:22 +01:00
Thomas Schoebel-Theuer
045d0e0356 logger: fix potential deadlock caused by incorrect accounting
Never observed in practice, found by testing with kernel upstream
versions.
2016-01-02 09:43:22 +01:00
Thomas Schoebel-Theuer
2b478a13e2 all: release light0.1stable20 2015-10-20 10:21:22 +02:00
Thomas Schoebel-Theuer
6ec43f63d0 doc: add Froscon2015 slides 2015-10-20 09:41:40 +02:00
Thomas Schoebel-Theuer
c1ee80f9f4 server: fix memory leak on writes
This was unnoticed for a long time because it simply did not occur
in ordinary MARS Light workloads.
2015-10-19 07:24:20 +02:00
Thomas Schoebel-Theuer
ccb5021e0f all: release light0.1stable19 2015-10-08 07:53:23 +02:00
Thomas Schoebel-Theuer
b726dc44ac doc: update version 2015-10-08 07:53:23 +02:00
Thomas Schoebel-Theuer
8f92f50799 doc: add quick table DRBD vs MARS 2015-10-08 07:52:37 +02:00
Thomas Schoebel-Theuer
54d8433b21 light: fix spelling 2015-10-07 10:46:04 +02:00
Thomas Schoebel-Theuer
4d8dc3a619 logger: fix spelling 2015-10-07 10:45:51 +02:00
Thomas Schoebel-Theuer
af6ac736c5 if: fix wrong error code ENOSYS 2015-10-07 10:44:44 +02:00
Thomas Schoebel-Theuer
66d200dbf1 infra: fix wrong error code ENOSYS 2015-10-07 10:44:35 +02:00
Thomas Schoebel-Theuer
96bbb42771 contrib: speedup mars_check.sh 2015-10-07 10:42:18 +02:00
Thomas Schoebel-Theuer
9d8dbe9181 contrib: speedup mars_check 2015-10-07 10:42:18 +02:00
Thomas Schoebel-Theuer
3a543d5ca5 marsadm: improve weird --host=other deletion 2015-10-07 10:42:07 +02:00
Thomas Schoebel-Theuer
224ad9f95f all: release light0.1stable18 2015-08-05 10:41:16 +02:00
Thomas Schoebel-Theuer
e534d9ed7e doc: update version 2015-08-05 10:18:33 +02:00
Thomas Schoebel-Theuer
f82d19c4ae doc: describe logrotate intervals 2015-08-05 10:17:47 +02:00
Thomas Schoebel-Theuer
1c424148dc doc: simplify emergency mode 2015-08-05 07:51:14 +02:00
Thomas Schoebel-Theuer
4eb7df274c doc: simplify split-brain resolution
marsadm invalidate is long-proven and the simplest method.
Move the complicated alternative methods to the appendix.
2015-08-04 15:02:25 +02:00
Thomas Schoebel-Theuer
60b6b56604 doc: simplify primary switching
Some caveats are no longer necessary: becoming primary --force
during split brain works for a long time, and has been tested
numerous times already.
2015-08-04 14:18:45 +02:00
Thomas Schoebel-Theuer
8e786d129f marsadm: remove distracting warning
This is no longer needed.
2015-08-04 14:18:45 +02:00
Thomas Schoebel-Theuer
58294defe5 marsadm: safeguard {create,join}-resource against old remains 2015-08-04 10:21:32 +02:00
Thomas Schoebel-Theuer
3e92223e47 marsadm: fix annoying warning in corner case 2015-07-22 12:19:41 +02:00
Thomas Schoebel-Theuer
cbd7cac4ad all: release light0.1stable17 2015-07-15 11:07:12 +02:00
Thomas Schoebel-Theuer
c6235c71d5 aio: fix race on shutdown 2015-07-15 10:38:49 +02:00
Thomas Schoebel-Theuer
550d02935e sio: fix race on shutdown 2015-07-15 10:38:49 +02:00