Commit Graph

24 Commits

Author SHA1 Message Date
Thomas Schoebel-Theuer
8e2de8288d light: fix missing versionlink upon slow or defective IO
Some primary appeared to have died, and was rebooted.
In the meantime, the old secondary was forcefully switched
to primary.

Afterwards, the old primary = new secondary got stuck because 2
versionlinks, which had been _produced_ by _himself_, were
missing, but they were present at the new primary = old secondary!

How could this happen?

All transaction logfiles were fully present and correct everywhere.

However, the old primary kern.log showed that a problem with the
RAID system must have existed. In addition, the RAID controller
errorlog also reported some problems which appeared to have healed.

Problem analysis shows the following possibility:

The transaction logger can continue to write data, even via
fsync(), while the _writeback_ of other parts of the /mars filesystem
(e.g. symlink updates) got stuck for a long time due to an IO problem.

Usually, slow or even missing symlink updates are no problem because
upon recovery after a reboot, everything is healed by transaction
replay (possibly replaying much more data than really necessary,
but this does not affect semantics, and it is even advantageous
when RAID disks might contain defective data).

There is one exception: after a logrotate, the corresponding new
versionlink should appear after a small time. Otherwise, the
above mentioned scenario could emerge.

We use sync_filesystem() to ensure that any versionlink update
to a _new_ versionlink is either guaranteed to become persistent,
or (in case of IO problems) the mars_light thread will hang, which
will be (hopefully) noticed soon by monitoring.
2016-02-03 22:01:48 +01:00
Thomas Schoebel-Theuer
aa09d7df30 all: clarify license GPLv2+ 2014-11-25 18:09:17 +01:00
Thomas Schoebel-Theuer
1439d30ffb all: port to newer kernels (up to 3.15) 2014-06-18 12:10:55 +02:00
Thomas Schoebel-Theuer
2f4696a9cc all: fix logfile size propagation 2014-03-31 06:59:09 +02:00
Thomas Schoebel-Theuer
6050b4157f infra: make string allocation fully dynamic 2014-03-26 11:43:05 +01:00
Thomas Schoebel-Theuer
2fc05b5373 light: allow limiting the sync parallelism 2014-03-19 17:49:40 +01:00
Thomas Schoebel-Theuer
9340f70c36 light: add info symlinks 2014-03-19 17:49:39 +01:00
Thomas Schoebel-Theuer
56f38641ff infra: fix/remove buggy d_{name,path}len
In rare cases, this could lead to buffer overflows.
Replace buggy concept from the prototype phase with more
robust (although slightly less performant) code.
2014-03-19 11:30:24 +01:00
Thomas Schoebel-Theuer
6d78a7bc8d light: do deletions only once 2014-03-19 11:30:23 +01:00
Thomas Schoebel-Theuer
3acb6a02fe infra: fix removal of stale directories 2014-03-19 11:30:23 +01:00
Thomas Schoebel-Theuer
5d2a682cfd infra: fix readlink() for very long paths 2014-03-19 11:30:23 +01:00
Thomas Schoebel-Theuer
8309fb97e6 light: add peer abort 2014-02-03 15:06:35 +01:00
Thomas Schoebel-Theuer
3346daf959 light: allow remote deletion of directories 2013-06-29 21:15:18 +02:00
Thomas Schoebel-Theuer
cdd7b85417 infra: systematics of make_brick_all() switching, remove superfluos parameter 2013-06-20 15:08:27 +02:00
Thomas Schoebel-Theuer
bdb6aaef1f light: fix detach operation 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
bf7c0c9f3b infra: remove superfluous parameter is_server 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
dfe2dc5b1c infra: remove recursive button operations
All buttons should be switched step-by-step in future.
The previous patch should ensure that no harm can occur.
2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
ce110eb52b infra: disallow forbidden brick states
Switch on only when all predecessor bricks are also on.
Failing to do so can result in fatal errors.
Similarly, switch only off if no successor exists any more.
2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
326ed48da2 light: remove superfluous timeout parameter
The concept was broken.
2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
e3c10d31a9 net: decrease trigger turnaround time 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
f5fae8e4ba light: show runtime connection status information 2013-04-12 08:26:25 +02:00
Thomas Schoebel-Theuer
c275bec28d light: new systematics for emergency modes (filesystem full) 2013-04-08 17:02:57 +02:00
Thomas Schoebel-Theuer
795e931e1f all: make CONFIG_* constants tunable in /proc/sys/mars/ 2013-04-08 17:02:57 +02:00
Thomas Schoebel-Theuer
c58417d271 all: move kernel source into separate directory 2013-04-08 17:01:37 +02:00