Commit Graph

195 Commits

Author SHA1 Message Date
Thomas Schoebel-Theuer dd4748bb52 light: clarify code 2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer 8fa728a0c9 light: fix annoying unnecessary error message 2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer 8abcbf196d light: safeguard sync vs replay 2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer e70ac4df8c light: safeguard position update 2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer fafad9512a light: always update position symlinks at logger switchoff 2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer 42c2dc98da light: fix typo in replay link comparison 2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer a312e3d93b light: fix memory leak
regression from f235b76900
2016-03-01 11:58:09 +01:00
Thomas Schoebel-Theuer 8bc1e80488 light: safeguard skipping of logfiles in disconnected state.
Found by code inspection, neither in practice nor by testing.

Should not occur in practice, because it could only occur after
marsadm pause-fetch, which is an exceptional state only to be entered
for maintenance or for emergency failover.

Skipping over an incorrect logfile at a secondary may produce an
unnecessary split brain.

Fix the potential problem by doing it only after "primary --force",
and by never creating a new logfile, always by re-using existing
logfiles.
2016-02-10 06:44:00 +01:00
Thomas Schoebel-Theuer f235b76900 light: fix potential deadlock on restart after inconsistent symlinks
This has been found by testing.

In extremely rare cases, such after crashes at the "wrong moment"
or after defective /mars filesystems, the replay link could show a
different length than the corresponding versionlink.

The versionlink wouldn't be updated anymore when additionally the
logfile has the same length than the replay link.

The incorrect versionlink will then lead to a lock.

Fix the problem by using the _minimum_ of all length indicators.
For safty, or when in doubt, replay more data, which will in turn
update the versionlink again to its correct value.
2016-02-10 06:24:27 +01:00
Thomas Schoebel-Theuer 8e2de8288d light: fix missing versionlink upon slow or defective IO
Some primary appeared to have died, and was rebooted.
In the meantime, the old secondary was forcefully switched
to primary.

Afterwards, the old primary = new secondary got stuck because 2
versionlinks, which had been _produced_ by _himself_, were
missing, but they were present at the new primary = old secondary!

How could this happen?

All transaction logfiles were fully present and correct everywhere.

However, the old primary kern.log showed that a problem with the
RAID system must have existed. In addition, the RAID controller
errorlog also reported some problems which appeared to have healed.

Problem analysis shows the following possibility:

The transaction logger can continue to write data, even via
fsync(), while the _writeback_ of other parts of the /mars filesystem
(e.g. symlink updates) got stuck for a long time due to an IO problem.

Usually, slow or even missing symlink updates are no problem because
upon recovery after a reboot, everything is healed by transaction
replay (possibly replaying much more data than really necessary,
but this does not affect semantics, and it is even advantageous
when RAID disks might contain defective data).

There is one exception: after a logrotate, the corresponding new
versionlink should appear after a small time. Otherwise, the
above mentioned scenario could emerge.

We use sync_filesystem() to ensure that any versionlink update
to a _new_ versionlink is either guaranteed to become persistent,
or (in case of IO problems) the mars_light thread will hang, which
will be (hopefully) noticed soon by monitoring.
2016-02-03 22:01:48 +01:00
Thomas Schoebel-Theuer ea48664a14 light: disallow primary from rotating over damaged logfiles
Only a secondary is allowed to do this, because we assume that
logfile replay has the property of "anytime consistency"
only there.

When a primary cannot recover after a crash due to a defective
logfile, this is not true. The primary is simply lost in such a
(rare) case. Observed 2 times during almost 8 millions of
operating hours.

In such a case, hardware is truly defective, and you have only
the following options:

1) switchover to a secondary via "primary --force", OR

2) deconstruct the resource everywhere, run fsck or similar on
whatever replica seems to be the best version,
and reconstruct the resource from scratch, OR

3) restore your backup.
2016-01-21 08:09:47 +01:00
Thomas Schoebel-Theuer acdb9d7a42 light: fix reset of replay-code
Reset was forgotten in secondary role. Do it always whenever
a logfile is actually rotated.
2016-01-20 14:48:43 +01:00
Thomas Schoebel-Theuer 496e57e1e1 logger: add new indicator for damaged logfiles 2016-01-15 17:10:58 +01:00
Thomas Schoebel-Theuer d67336420d light: fix becoming primary when logfiles are damaged
When logfile replay aborts with an error, becoming primary would be
impossible.
Without this, repair would be only possible by complete destruction
of the resource.

A previous version of this patch introduced
/proc/sys/mars/allow_primary_when_damaged which would complicate
the sysadmin interface. People would be unsure what to do.
2016-01-13 14:12:02 +01:00
Thomas Schoebel-Theuer 3eedff125d infra: fix comparison
Under weird circumstances, when a new symlink contents was just a
shortened version (prefix) of the old one, the symlink was not updated.
2016-01-02 10:18:33 +01:00
Thomas Schoebel-Theuer 54d8433b21 light: fix spelling 2015-10-07 10:46:04 +02:00
Thomas Schoebel-Theuer c39a2988b7 light: fix long-lasting switchoff at end of sync 2015-06-17 11:33:27 +02:00
Thomas Schoebel-Theuer 4ecd6937c7 light: don't try fetching from (none) 2015-06-17 11:33:27 +02:00
Thomas Schoebel-Theuer 876625d66a light: disallow modprobe when UUID is missing 2015-03-23 13:48:11 +01:00
Thomas Schoebel-Theuer 7f565f77b6 light: prohibit communication with wrong UUID 2015-03-06 11:49:54 +01:00
Thomas Schoebel-Theuer 7ced30b24c infra: report peak IO latencies 2015-02-27 11:32:57 +01:00
Thomas Schoebel-Theuer c35065fe97 infra: report global IO hangs 2015-02-27 11:32:57 +01:00
Thomas Schoebel-Theuer c1823bbfab light: report actually running buildtag 2015-02-27 11:32:56 +01:00
Thomas Schoebel-Theuer 736489eccd light: suppress irrelevant warning 2015-02-24 15:51:28 +01:00
Thomas Schoebel-Theuer 036953fa54 light: provisionary allow fetch during detach 2015-02-24 15:51:28 +01:00
Thomas Schoebel-Theuer 0453fbae9b light: fix race on rmmod 2015-02-24 15:51:27 +01:00
Thomas Schoebel-Theuer f10e7358ad light: stop syncing upon logfile holes 2015-02-24 15:51:26 +01:00
Thomas Schoebel-Theuer 827b5b5192 light: fix syncpos indication of inconsistency 2015-02-24 12:08:41 +01:00
Thomas Schoebel-Theuer c03fc47539 light: fix start of sync 2015-02-24 12:08:41 +01:00
Thomas Schoebel-Theuer 0c38493e13 light: add hysteresis to emergency revovery 2015-02-24 12:08:39 +01:00
Thomas Schoebel-Theuer 092201decc light: less side effects by emergency mode 2015-02-24 11:15:29 +01:00
Thomas Schoebel-Theuer 5d81381664 all: disallow sync IO during emergency mode 2015-02-11 15:20:26 +01:00
Thomas Schoebel-Theuer e7464b3c02 all: correct error code EIO
The error code -EIO should always refer to a problem of
lower storage laysers. Thus MARS should not generate that
code itself, but other ones.
2015-01-20 15:20:10 +01:00
Thomas Schoebel-Theuer 802cc73b49 infra: additionally safeguard race on brick resource deallocation 2015-01-19 18:01:04 +01:00
Thomas Schoebel-Theuer fa49247b8e infra: fix stale dents 2015-01-19 18:01:04 +01:00
Thomas Schoebel-Theuer ce48d7031c all: fix hang of NotYetPrimary in lower emergency modes 2014-12-07 09:24:16 +01:00
Thomas Schoebel-Theuer 7366cb9dad light: fix leave-cluster communication 2014-12-07 09:24:16 +01:00
Thomas Schoebel-Theuer 28c8575cc0 light: fix becoming primary during split brain
Always prefer the own logfile if one exists.
This should improve becoming in most split brain situations.
2014-12-07 09:24:16 +01:00
Thomas Schoebel-Theuer aa09d7df30 all: clarify license GPLv2+ 2014-11-25 18:09:17 +01:00
Thomas Schoebel-Theuer 917d5ae2d2 light: fix client shutdown on slow network
On slow networks, the generic net_io_timeout is too long if you are
impatiently waiting for disconnect.

Change the io_timeout of the individual client brick to a short value.
2014-11-12 09:01:35 +01:00
Thomas Schoebel-Theuer 1295c43a7a infra: move io_timeout to generic interface
This is needed for the next commit.
2014-11-12 09:01:34 +01:00
Thomas Schoebel-Theuer 843a931cae light: fix zero progress of rate display 2014-11-12 09:01:33 +01:00
Thomas Schoebel-Theuer 547cc60a72 light: fix long-lasting pause-fetch effect 2014-11-12 09:01:33 +01:00
Thomas Schoebel-Theuer f6cca5ca72 light: fix copy switch off 2014-11-12 09:01:33 +01:00
Thomas Schoebel-Theuer ed57478ace light: fix versionlink in emergency mode 2014-08-25 09:43:06 +02:00
Thomas Schoebel-Theuer 6a176c26c7 light: fix propagation of maxnr 2014-08-14 10:01:21 +02:00
Thomas Schoebel-Theuer 3a6ff3d2c8 infra: quickfix Redhat/openvz builds 2014-07-14 17:27:11 +02:00
Thomas Schoebel-Theuer 4a2ee37b98 light: treat double logfiles directly as split brain 2014-07-11 08:19:10 +02:00
Thomas Schoebel-Theuer 16f5a5dd77 light: fix becoming primary in multiple logrotated situations 2014-07-11 07:55:33 +02:00
Thomas Schoebel-Theuer 1439d30ffb all: port to newer kernels (up to 3.15) 2014-06-18 12:10:55 +02:00