Commit Graph

836 Commits

Author SHA1 Message Date
Thomas Schoebel-Theuer 35b9345d94 server: fix socket shutdown in error path 2013-10-17 07:48:32 +02:00
Thomas Schoebel-Theuer 99644a943a all: make *_switch() code idempotent
New semantics: it must be possible to call the switch functions
even when nothing has changed.
2013-10-17 07:48:32 +02:00
Thomas Schoebel-Theuer 7a2755a56f light: prevent races on device size 2013-10-17 07:48:32 +02:00
Thomas Schoebel-Theuer ffc97c5c68 if: fix set_capacity() 2013-10-17 07:48:31 +02:00
Thomas Schoebel-Theuer be24c712e0 bio: fix usage of i_size_read() 2013-10-17 07:35:35 +02:00
Thomas Schoebel-Theuer 8971edad18 if: set capacity upon regular switch() maintenance 2013-10-17 07:35:34 +02:00
Thomas Schoebel-Theuer 7f8bf6c29a brick_mem: add /proc/sys/mars/mem_allow_freelist 2013-10-17 07:30:10 +02:00
Thomas Schoebel-Theuer 4abb584aad doc: move pictures to images/ 2013-10-04 10:53:42 +02:00
Thomas Schoebel-Theuer 3b1705af99 doc: new chapter about use cases MARS vs DRBD 2013-09-17 13:36:28 +02:00
Frank Liepold 6b41af4cd9 test_suite: new and updated test cases 2013-09-17 13:36:27 +02:00
Frank Liepold 08e5803cd1 light: workaround flying IO before reporting memory leaks
We report an error if there are unfreed mrefs after the device brick
has been switched to power off.

Instead of reporting an error at once, we report only warnings in the first 20
seconds. If there are still unfreed mrefs after that time an error is reported.
2013-09-17 13:36:27 +02:00
Thomas Schoebel-Theuer 74e12ad531 infra: add mapfree_grace_keep_mb 2013-09-17 13:36:27 +02:00
Thomas Schoebel-Theuer 0755380a52 light: show CONFIG_DEBUG* in modinfo 2013-09-17 12:16:36 +02:00
Thomas Schoebel-Theuer 797132cfb8 sio: adapt to newer kernels (kmap_atomic) 2013-09-17 12:16:36 +02:00
Thomas Schoebel-Theuer 453fcb59d8 if: fix early kill of if_brick 2013-09-17 12:16:36 +02:00
Thomas Schoebel-Theuer 9134c1b771 light: add transferstatus symlink 2013-09-17 12:16:36 +02:00
Frank Liepold ebe0ca6ad9 light: reduce cascades on lamport clock workaround
Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de>

Some filesystems like ext3 have only full second resolution.

Therefore, we _must_ advance the Lamport clock in whole seconds
when working on such gear, since we want to prevent lost
updates which would be caused by standstill Lamport clocks.

Sometimes, the lamport clock gets updated more frequently per second
than real time. In such cases, the Lamport clock will run much faster
than real time. After some weeks of operation, the Lamport clock
will be far in the future.

In general, we cannot do anything against that. When some fine-grained
information cannot be coded into some specific data type, it
cannot be coded.

However, when updates start to occur less frequently, we want to
_leave_ the workaround mode ASAP. The old code set tv_nsec to 0
which made it very likely that the workaround was triggered
again unnecessarily.

In order to _reduce_ that effect, we prevent unnecessary cascades
of whole-second leaps by setting the nanoseconds constantly to 1
if the full second was increased due to insufficient capabilities
of the underlying filesystem. At least in those cases where
Lamport timestamps are transferred over the network and/or we have
mixed configurations between ext3/ext4, we hope to
decrease the risk of endless cascades.

Experience shows that the new code behaves better.
2013-08-28 14:54:04 +02:00
Frank Liepold 96be062f63 tests: update 2013-08-06 14:40:16 +02:00
Thomas Schoebel-Theuer 4b59be870e copy: speedup by making overlap the default
Since commit 62e2f5944b, aio prevents races on the length
of a transaction logfile.

Thefore, we can safely enable IO parallelism at writes fired off
by copy.

The old behaviour was a serious IO bottleneck.
2013-08-06 14:30:05 +02:00
Thomas Schoebel-Theuer 3f3a4c365a copy: fix / improve IO debugging 2013-08-06 13:09:55 +02:00
Thomas Schoebel-Theuer 94e1ac2ad0 all: remove internal URLs 2013-08-02 11:45:21 +02:00
Frank Liepold 820708e712 marsadm: command down includes disconnect
Fixes the following bug:
marsadm down does not disconnect the resource.
2013-07-29 16:30:10 +02:00
Frank Liepold e3db28c4d7 marsadm: correct wrong condition for checking exclusive access
The fixed bug was introduced in commit:

marsadm: 'create-resource' --force no longer checks for exclusive access
2013-07-25 09:12:46 +02:00
Frank Liepold 2e441d0d11 marsadm: use lamport clock as mtime of symbolic links
Fixes the following bug:
Symbolic links which are created in userspace get the current time
as mtime.
2013-07-25 08:42:27 +02:00
Thomas Schoebel-Theuer c877c43eff copy: limit IO parallelism via /proc 2013-07-22 09:15:53 +02:00
Thomas Schoebel-Theuer 0d8d637dee marsadm: 'create-resource' --force no longer checks for exclusive access 2013-07-22 08:44:03 +02:00
Thomas Schoebel-Theuer 72a2537c6d copy: make io_prio configurable via /proc 2013-07-22 08:44:03 +02:00
Thomas Schoebel-Theuer 08b27c548e all: add lamport clock to all messages 2013-07-22 08:44:03 +02:00
Thomas Schoebel-Theuer 105bc07b58 infra: lamport clock can never appear as stopped
In case CONFIG_HIGH_RES_TIMERS is not set (or when it does not work
as expected), the lamport clock could "stop" in some extremely
rare cases. Theoretically, a symlink update could be missed, or
two transaction log records could accidentally get the same
timestamp. We want any timestamps to be unique (at least on
the same host).

This patch ensures that true forward stepping always takes place,
even when the system clock fails (or at other problems).

For now, the dependency from CONFIG_HIGH_RES_TIMERS is left in Kconfig
as a precondition for MARS.

After some tests and some observational time, it could probably be removed
some day.
2013-07-22 08:44:03 +02:00
Frank Liepold 08d4f863ff marsadm: command secondary must not change primary link if executed on a secondary
Fixes the bug, that marsadm secondary sets the link <resource_dir>/primary
to (none) even if executed on a secondary host.
2013-07-22 08:44:03 +02:00
Frank Liepold c474e17d88 light: improve info message 2013-07-22 08:44:03 +02:00
Thomas Schoebel-Theuer ddf28af52d marsadm: fix 'invalidate' racing against replay 2013-07-22 08:44:03 +02:00
Thomas Schoebel-Theuer 22d4516d21 trans_logger: use kb as replay limiter units 2013-07-15 12:21:16 +02:00
Thomas Schoebel-Theuer 61e5d30757 copy: use kb as limiter units 2013-07-15 12:21:16 +02:00
Thomas Schoebel-Theuer dae1218c50 infra: fix limiter underflow
Credits to Daniel Hermann for revealing this bug.
2013-07-15 12:21:16 +02:00
Daniel Hermann 5eae58a3a4 gen_config: allow options to be overridden by environment
From caa2cd968be140ed3e91eeb0d30e0c006b55e17a Mon Sep 17 00:00:00 2001
From: Daniel Hermann <daniel.hermann@1und1.de>
Date: Thu, 11 Jul 2013 11:45:16 +0200
Subject: [PATCH] gen_config: allow options to be overridden by environment
2013-07-15 09:20:53 +02:00
Thomas Schoebel-Theuer d70a415b9a aio: fix dirty_head completion 2013-07-10 09:53:59 +02:00
Thomas Schoebel-Theuer ea42d36a15 infra: check Kconfig prerequirements 2013-07-10 09:08:39 +02:00
Thomas Schoebel-Theuer 5103e7b46c aio: increase robustness of file descriptors 2013-07-10 07:16:19 +02:00
Thomas Schoebel-Theuer c9bb358239 aio: improve debugging 2013-07-10 07:16:19 +02:00
Thomas Schoebel-Theuer 764d5ed7d8 infra: fix reference counter in lib_mapfree 2013-07-09 18:50:18 +02:00
Thomas Schoebel-Theuer acd9b194aa light: add syncpos symlink
This is needed for detection of the real end of inconsistencies
after sync as finished. Consistency is only (re-)reached after
a certain amount of logfile data has been sucessfully applied.

This patch remembers the replaylink from the primary at the time
when the sync has finished.

When at least that amount of logfile data has been applied, we
are certain that now we are consistent.
2013-07-08 10:55:58 +02:00
Thomas Schoebel-Theuer ad08afe074 light: use replay_tolerance only after failed replay attempt 2013-07-08 10:47:33 +02:00
Frank Liepold f38c56d5ab light: primary is more tolerant against truncated logfile 2013-07-08 10:19:38 +02:00
Thomas Schoebel-Theuer 2351f54d6f light: fix regression caused by tolerance
On the secondaries, switchover between logfiles could hang
when _check_logging_status() used the tolerance, but
is_switchover_possible() refused to switch over.
2013-07-05 14:35:11 +02:00
Thomas Schoebel-Theuer 156d493192 light: improve tolerance flexibility 2013-07-05 14:08:12 +02:00
Thomas Schoebel-Theuer 9a160c26ab Kconfig: make MARS_SEPARATE_PORTS the default 2013-07-05 14:06:35 +02:00
Thomas Schoebel-Theuer f75d402d0b doc: add mars-manual.{lyx,pdf} (work in progress) 2013-07-04 10:23:08 +02:00
Frank Liepold d01389e171 tests: add test suite (work in progress) 2013-07-04 10:22:39 +02:00
Thomas Schoebel-Theuer 637ec3fc4e tools: new checking tool write-reboot.c
Use this for testing the power blackout safety.
2013-07-04 09:06:36 +02:00