Commit Graph

75 Commits

Author SHA1 Message Date
Thomas Schoebel-Theuer
65bdee3b08 infra: show cumulatives in all limiters 2013-11-19 12:22:45 +01:00
Frank Liepold
871e3994db light: fix throttling calculation of request sizes
Signed-off-by: Thomas Schoebel-Theuer <schoebel@bell.site>
2013-11-19 11:44:15 +01:00
Thomas Schoebel-Theuer
9134be1a3e all: allow throttling of bulk write requests 2013-10-31 08:24:56 +01:00
Frank Liepold
c832799910 light: allow logfiles not to be consecutive on secondary site
If there are holes in the logfile sequence and this holes concern only logfiles
which are already applied (i.e. logfiles lying before all replay links)
the secondary can continue working.
Warnings are written as long as the situation exists.

Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de>
2013-10-22 09:39:22 +02:00
Frank Liepold
675e46d689 light: report next logfile to be copyable in case of logfile sequence holes
Up to now holes in the logfile sequence caused the copy process to stop after
having fetched the last logfile before the hole.

E.g. in emergency mode such holes are created intentionally on the primary
side. After the situation has been cleaned up, the secondary must be able to
fetch newly created logfiles.

Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de>
2013-10-22 09:38:17 +02:00
Thomas Schoebel-Theuer
915f955333 light: fix copy_next_is_available propagation 2013-10-17 14:49:41 +02:00
Thomas Schoebel-Theuer
7a2755a56f light: prevent races on device size 2013-10-17 07:48:32 +02:00
Thomas Schoebel-Theuer
8971edad18 if: set capacity upon regular switch() maintenance 2013-10-17 07:35:34 +02:00
Thomas Schoebel-Theuer
7f8bf6c29a brick_mem: add /proc/sys/mars/mem_allow_freelist 2013-10-17 07:30:10 +02:00
Frank Liepold
08e5803cd1 light: workaround flying IO before reporting memory leaks
We report an error if there are unfreed mrefs after the device brick
has been switched to power off.

Instead of reporting an error at once, we report only warnings in the first 20
seconds. If there are still unfreed mrefs after that time an error is reported.
2013-09-17 13:36:27 +02:00
Thomas Schoebel-Theuer
74e12ad531 infra: add mapfree_grace_keep_mb 2013-09-17 13:36:27 +02:00
Thomas Schoebel-Theuer
0755380a52 light: show CONFIG_DEBUG* in modinfo 2013-09-17 12:16:36 +02:00
Thomas Schoebel-Theuer
9134c1b771 light: add transferstatus symlink 2013-09-17 12:16:36 +02:00
Frank Liepold
ebe0ca6ad9 light: reduce cascades on lamport clock workaround
Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de>

Some filesystems like ext3 have only full second resolution.

Therefore, we _must_ advance the Lamport clock in whole seconds
when working on such gear, since we want to prevent lost
updates which would be caused by standstill Lamport clocks.

Sometimes, the lamport clock gets updated more frequently per second
than real time. In such cases, the Lamport clock will run much faster
than real time. After some weeks of operation, the Lamport clock
will be far in the future.

In general, we cannot do anything against that. When some fine-grained
information cannot be coded into some specific data type, it
cannot be coded.

However, when updates start to occur less frequently, we want to
_leave_ the workaround mode ASAP. The old code set tv_nsec to 0
which made it very likely that the workaround was triggered
again unnecessarily.

In order to _reduce_ that effect, we prevent unnecessary cascades
of whole-second leaps by setting the nanoseconds constantly to 1
if the full second was increased due to insufficient capabilities
of the underlying filesystem. At least in those cases where
Lamport timestamps are transferred over the network and/or we have
mixed configurations between ext3/ext4, we hope to
decrease the risk of endless cascades.

Experience shows that the new code behaves better.
2013-08-28 14:54:04 +02:00
Thomas Schoebel-Theuer
c877c43eff copy: limit IO parallelism via /proc 2013-07-22 09:15:53 +02:00
Thomas Schoebel-Theuer
72a2537c6d copy: make io_prio configurable via /proc 2013-07-22 08:44:03 +02:00
Frank Liepold
c474e17d88 light: improve info message 2013-07-22 08:44:03 +02:00
Thomas Schoebel-Theuer
ddf28af52d marsadm: fix 'invalidate' racing against replay 2013-07-22 08:44:03 +02:00
Thomas Schoebel-Theuer
acd9b194aa light: add syncpos symlink
This is needed for detection of the real end of inconsistencies
after sync as finished. Consistency is only (re-)reached after
a certain amount of logfile data has been sucessfully applied.

This patch remembers the replaylink from the primary at the time
when the sync has finished.

When at least that amount of logfile data has been applied, we
are certain that now we are consistent.
2013-07-08 10:55:58 +02:00
Thomas Schoebel-Theuer
ad08afe074 light: use replay_tolerance only after failed replay attempt 2013-07-08 10:47:33 +02:00
Frank Liepold
f38c56d5ab light: primary is more tolerant against truncated logfile 2013-07-08 10:19:38 +02:00
Thomas Schoebel-Theuer
2351f54d6f light: fix regression caused by tolerance
On the secondaries, switchover between logfiles could hang
when _check_logging_status() used the tolerance, but
is_switchover_possible() refused to switch over.
2013-07-05 14:35:11 +02:00
Thomas Schoebel-Theuer
156d493192 light: improve tolerance flexibility 2013-07-05 14:08:12 +02:00
Thomas Schoebel-Theuer
0f4cc33d15 logger: make replay_timeout configurable 2013-07-04 07:21:01 +02:00
Thomas Schoebel-Theuer
19f1a95a47 proc: allow some debugging even in production systems (default off) 2013-07-04 07:21:01 +02:00
Thomas Schoebel-Theuer
e33ddf63db light: fix wrong condition in logfile update 2013-07-04 07:21:00 +02:00
Thomas Schoebel-Theuer
f3613177a2 aio: prefer fdatasync() over filemap_write_and_wait_range() 2013-07-04 07:21:00 +02:00
Thomas Schoebel-Theuer
be20dd422d light: tolerate few incomplete log entries at the end when switching to primary 2013-07-04 07:21:00 +02:00
Thomas Schoebel-Theuer
bfb6070d25 logger: add replay_tolerance 2013-07-04 07:21:00 +02:00
Thomas Schoebel-Theuer
0ee23aa3ef light: prevent remote symlink updates when delete is in progress 2013-07-04 07:21:00 +02:00
Thomas Schoebel-Theuer
1d955d9bed light: fix wrong target size of sync 2013-07-04 07:21:00 +02:00
Thomas Schoebel-Theuer
3346daf959 light: allow remote deletion of directories 2013-06-29 21:15:18 +02:00
Thomas Schoebel-Theuer
c917bc239b all: update pre-patches 2013-06-29 21:15:17 +02:00
Thomas Schoebel-Theuer
58e6ae23ad light: workaround nasty race on kthread_stop() 2013-06-29 21:15:17 +02:00
Thomas Schoebel-Theuer
2fee24fe49 light: report is_primary at the end of the round
The old code led to ain unnecessary delay of 1 round.
2013-06-20 15:08:28 +02:00
Thomas Schoebel-Theuer
faa1c8d802 light: fix attach on locked device 2013-06-20 15:08:28 +02:00
Thomas Schoebel-Theuer
a0190b043d proc: add /proc/sys/mars/info 2013-06-20 15:08:27 +02:00
Thomas Schoebel-Theuer
7187462a6e marsadm: add global uuid to cluster 2013-06-20 15:08:27 +02:00
Thomas Schoebel-Theuer
cdd7b85417 infra: systematics of make_brick_all() switching, remove superfluos parameter 2013-06-20 15:08:27 +02:00
Thomas Schoebel-Theuer
62dd9c64dd light: fix logfile replication when trans_logger is not running 2013-06-03 09:05:47 +02:00
Thomas Schoebel-Theuer
bdb6aaef1f light: fix detach operation 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
59d706ba54 infra: replace brick_version by kill_round 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
bf7c0c9f3b infra: remove superfluous parameter is_server 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
dfe2dc5b1c infra: remove recursive button operations
All buttons should be switched step-by-step in future.
The previous patch should ensure that no harm can occur.
2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
726bbe17fc infra: don't switch off if predecessors are working 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
ce110eb52b infra: disallow forbidden brick states
Switch on only when all predecessor bricks are also on.
Failing to do so can result in fatal errors.
Similarly, switch only off if no successor exists any more.
2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
326ed48da2 light: remove superfluous timeout parameter
The concept was broken.
2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
e3c10d31a9 net: decrease trigger turnaround time 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
0e15b38457 marsadm: add new commands 'wait-{cluster,resource}' 2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer
2dd3033ff4 marsadm: split command "primary" into phases 2013-06-03 09:05:46 +02:00