Thomas Schoebel-Theuer
41cf70e288
if: show open_count in statistics
2017-07-05 14:15:41 +02:00
Thomas Schoebel-Theuer
5b12f5c569
client: show number of active channels in statistics
2017-07-05 14:15:41 +02:00
Thomas Schoebel-Theuer
347bb102e7
infra: safeguard dent deallocation
2017-07-05 08:01:48 +02:00
Thomas Schoebel-Theuer
86d70bd6a5
main: more detailed messages on peers and paths
2017-07-05 08:01:48 +02:00
Thomas Schoebel-Theuer
aa5481a87d
main: fetch only metadata of interesting resources
2017-07-05 08:01:48 +02:00
Thomas Schoebel-Theuer
627a402617
main: compute list of participating resources
2017-07-05 08:01:48 +02:00
Thomas Schoebel-Theuer
6c4f72ceab
infra: allow pruning of subdirs
2017-07-05 08:01:48 +02:00
Thomas Schoebel-Theuer
e7a53ec4e3
server: propagate path from client
2017-07-05 08:01:48 +02:00
Thomas Schoebel-Theuer
c4fb7c2e41
main: verbose debugging
2017-07-05 08:01:48 +02:00
Thomas Schoebel-Theuer
f784c6555e
main: remote_trigger only communicating peers
2017-07-05 08:01:48 +02:00
Thomas Schoebel-Theuer
d382bd7037
main: terminate and restart peer thread when necessary
2017-07-05 08:01:47 +02:00
Thomas Schoebel-Theuer
14737303b7
main: show more peer debuginfo
2017-07-05 08:01:47 +02:00
Thomas Schoebel-Theuer
a41c0f8f98
main: run some additional peer threads
2017-07-05 08:01:47 +02:00
Thomas Schoebel-Theuer
c8ec870886
main: only scan the peers we are participating
...
After this, nothing will be propagated to non-participating hosts.
The next patch is needed for fixing this.
2017-07-05 08:01:47 +02:00
Thomas Schoebel-Theuer
475b33d7ee
main: also scan other hostname contexts
2017-07-05 08:01:47 +02:00
Thomas Schoebel-Theuer
0adab134ac
Merge branch 'mars0.1.y' into mars0.1b.y
2017-07-05 07:48:08 +02:00
Thomas Schoebel-Theuer
c117bffa11
logger: reset limiter
2017-07-05 07:37:12 +02:00
Thomas Schoebel-Theuer
a856db082b
server: update limiter during idle time
2017-07-05 07:37:12 +02:00
Thomas Schoebel-Theuer
69d2f864d3
client: reset limiter
2017-07-05 07:37:12 +02:00
Thomas Schoebel-Theuer
25da408d66
copy: reset limiter
2017-07-05 07:37:12 +02:00
Thomas Schoebel-Theuer
ff2c948247
infra: add reset of limiter
2017-07-05 07:37:12 +02:00
Thomas Schoebel-Theuer
27eb38ff3e
infra: add total statistics to limiter
2017-07-05 07:37:12 +02:00
Thomas Schoebel-Theuer
a983bf42de
main: show peer debuginfo
2017-07-05 07:37:12 +02:00
Thomas Schoebel-Theuer
d976fde7fb
main: replace peer_lock spinlock by rwsem
2017-07-05 07:37:12 +02:00
Thomas Schoebel-Theuer
8b6fe3e3bb
infra: remove superfluous event trigger
2017-06-07 06:22:46 +02:00
Thomas Schoebel-Theuer
37fb40f8a7
logger: remove seaparate flying counter
2017-06-07 06:22:46 +02:00
Thomas Schoebel-Theuer
cbb7de25fe
logger: fix races on queues
2017-06-07 06:22:46 +02:00
Thomas Schoebel-Theuer
c95c478f30
logger: new activity counter
2017-06-07 06:22:46 +02:00
Thomas Schoebel-Theuer
0783946bc2
logger: downgrade atomic_t
2017-06-07 06:22:46 +02:00
Thomas Schoebel-Theuer
d8e2421de9
logger: remove useless counter
2017-06-07 06:22:46 +02:00
Thomas Schoebel-Theuer
746f84cd32
Merge branch 'mars0.1.y' into mars0.1b.y
2017-06-06 18:12:31 +02:00
Thomas Schoebel-Theuer
57c9da1800
aio: fix race on array index
2017-06-04 17:56:46 +02:00
Thomas Schoebel-Theuer
0da44a808f
copy: allow non-strict write order
2017-05-28 19:20:26 +02:00
Thomas Schoebel-Theuer
b32b2d57fe
net: use quadratic backoff sleeptime
2017-05-28 19:20:26 +02:00
Thomas Schoebel-Theuer
45a771b652
infra: speedup md5 checksums
2017-05-28 19:20:25 +02:00
Thomas Schoebel-Theuer
ee1cf12efa
infra: add non-strict version of Lamport clock
2017-05-28 19:20:25 +02:00
Thomas Schoebel-Theuer
08c973f181
main: fix forgotten notify
2017-05-28 19:20:25 +02:00
Thomas Schoebel-Theuer
e83bf34926
copy: fix hang
2017-05-28 19:20:24 +02:00
Thomas Schoebel-Theuer
be35a0af37
Merge branch 'mars0.1.y' into mars0.1b.y
2017-05-28 19:19:37 +02:00
Thomas Schoebel-Theuer
0fafba3cd0
infra: better debugging
2017-05-22 11:25:00 +02:00
Thomas Schoebel-Theuer
d12b20ef1c
main: fix hang of fetch
2017-05-22 11:25:00 +02:00
Thomas Schoebel-Theuer
fd72fef4c9
infra: fix signal handling
2017-05-22 11:25:00 +02:00
Thomas Schoebel-Theuer
088b103abf
infra: avoid frequent resched
2017-05-17 12:22:29 +02:00
Thomas Schoebel-Theuer
17832cd7ea
infra: generally disable irqs during spinlocks
2017-05-16 10:24:18 +02:00
Thomas Schoebel-Theuer
2d7f602a32
aio: disable irqs during spinlocks
2017-05-16 10:23:06 +02:00
Thomas Schoebel-Theuer
95d10d02a2
main: disable irqs during spinlocks
2017-05-16 10:21:31 +02:00
Thomas Schoebel-Theuer
386ae8e8d0
infra: disable irqs during spinlocks
2017-05-16 10:17:19 +02:00
Thomas Schoebel-Theuer
95ff42b7de
logger: disable irqs during spinlocks
2017-05-16 10:17:02 +02:00
Thomas Schoebel-Theuer
bb2f82503c
infra: memory debugging must disable irqs during spinlocks
2017-05-16 09:59:44 +02:00
Thomas Schoebel-Theuer
37f738bb5c
aio: workaround standard Unix filehandles
2017-05-14 16:57:01 +02:00
Thomas Schoebel-Theuer
84450d9d70
Merge branch 'mars0.1.y' into mars0.1b.y
2017-05-11 08:51:12 +02:00
Thomas Schoebel-Theuer
f129ae00e9
infra: modinfo shows io driver type
2017-05-09 08:52:48 +02:00
Thomas Schoebel-Theuer
8abf1a0928
infra: modinfo shows whether prepatch is used
2017-05-09 08:52:48 +02:00
Thomas Schoebel-Theuer
a1d4497a51
infra: remove unwanted sys_utimes()
2017-05-04 10:32:50 +02:00
Thomas Schoebel-Theuer
09c6b3112c
infra: replace unwanted sys_unlink() by provisionary wrapper
2017-05-04 10:28:43 +02:00
Thomas Schoebel-Theuer
b3b13d9187
infra: replace unwanted sys_rename() by provisionary wrapper
2017-05-04 10:08:29 +02:00
Thomas Schoebel-Theuer
c4b055584c
infra: replace sys_mkdir() by vfs_mkdir()
2017-05-04 10:08:29 +02:00
Thomas Schoebel-Theuer
8fe84d32d8
infra: replace sys_symlink() by vfs_symlink()
2017-05-04 10:08:29 +02:00
Thomas Schoebel-Theuer
05a5b49aed
infra: remove unwanted reference to min_free_kbyte
2017-05-04 10:08:07 +02:00
Thomas Schoebel-Theuer
b9383da97c
infra: remove unwanted rmdir()
2017-05-04 10:04:12 +02:00
Thomas Schoebel-Theuer
ac2c901943
infra: remove unwanted chmod()
2017-05-04 10:04:02 +02:00
Thomas Schoebel-Theuer
f654129e94
compat: disable aio when necessary
2017-05-04 09:16:17 +02:00
Thomas Schoebel-Theuer
0c714a8bfc
infra: start dual compatibility with/out prepatch
...
Automatic detection whether the prepatch is applied or not.
2017-05-04 09:10:44 +02:00
Thomas Schoebel-Theuer
eaa6fc0efc
infa: introduce wrapper layer for compatibiliy with multiple kernels
...
This is needed for adaptation of the out-of-tree MARS version to multiple
kernel versions.
It will be much simplified after upstream merging, and/or
removed/replaced by something better.
2017-05-04 09:09:19 +02:00
Thomas Schoebel-Theuer
79c7ffe9d4
infra: only allow compilation as a module
2017-05-04 06:14:02 +02:00
Thomas Schoebel-Theuer
7259f3aa5c
copy: fix deadlock on termination
2017-04-21 06:42:36 +02:00
Thomas Schoebel-Theuer
d9d31d831e
net: don't update Lamport clock too often
2017-04-15 18:10:45 +02:00
Thomas Schoebel-Theuer
104b3a522a
infra: new Lamport clock implementation
2017-04-15 18:10:45 +02:00
Thomas Schoebel-Theuer
4f071e362f
infra: new interface to Lamport clock
2017-04-15 18:10:44 +02:00
Thomas Schoebel-Theuer
bf2358f4dc
client: flush old buffers when channel is changed
2017-04-15 18:10:44 +02:00
Thomas Schoebel-Theuer
8045e6b632
net: do corking at mars_send_cb()
2017-04-15 18:10:44 +02:00
Thomas Schoebel-Theuer
9ed1d12ed9
net: do corking at mars_send_mref()
2017-04-15 18:10:44 +02:00
Thomas Schoebel-Theuer
5b2cad9f6e
net: use corking at mars_send_struct()
2017-04-15 18:10:44 +02:00
Thomas Schoebel-Theuer
7437e95776
client: use 2 sockets by default
2017-04-15 18:09:41 +02:00
Thomas Schoebel-Theuer
fdda26821c
client: improve bundling performance
2017-04-15 18:09:40 +02:00
Thomas Schoebel-Theuer
c42bbfec5d
client: fix socket bundling deadlock
2017-04-15 18:09:40 +02:00
Thomas Schoebel-Theuer
4d6317af21
client: fix deadlock on remissive server
...
Abort via timeout wasn't always execeuted when the server first
accepted the connect, but later closed it due to rejected
CMD_CONNECT or other reasons.
2017-04-15 18:09:40 +02:00
Thomas Schoebel-Theuer
ae7d89fdaf
client: better errors and warnings
2017-04-11 09:30:34 +02:00
Thomas Schoebel-Theuer
4793b2c0d2
client: tune socket bundling
2017-04-11 09:30:34 +02:00
Thomas Schoebel-Theuer
f4795b6c74
client: implement socket bundling
2017-04-11 09:30:34 +02:00
Thomas Schoebel-Theuer
d607e422d4
net: find out current tcp send buffer space available
2017-04-11 09:30:34 +02:00
Thomas Schoebel-Theuer
7f86c52f7c
net: use SHUT_RDWR
2017-04-11 09:30:34 +02:00
Thomas Schoebel-Theuer
ed70d7ae2c
copy: quiet potential warning flood
2017-04-11 09:30:34 +02:00
Thomas Schoebel-Theuer
a5247b7304
copy: do hinting per input
2017-04-11 09:27:58 +02:00
Thomas Schoebel-Theuer
4b8226158d
copy: earlier start tail requests
2017-04-11 09:27:58 +02:00
Thomas Schoebel-Theuer
67f82c7cb2
copy: increase table size
2017-04-11 09:27:58 +02:00
Thomas Schoebel-Theuer
0a9fcf5f8a
copy: speed up the speedup
2017-04-11 09:27:51 +02:00
Thomas Schoebel-Theuer
7e2de9c4ac
copy: speed up by hinting
2017-04-11 09:27:51 +02:00
Thomas Schoebel-Theuer
fa91db51ef
copy: avoid double work
2017-04-11 09:27:50 +02:00
Thomas Schoebel-Theuer
eadd8e3e61
copy: remember dirty area
2017-04-11 09:27:50 +02:00
Thomas Schoebel-Theuer
4e8f5d42e1
copy: fix error attribution to progress
2017-04-11 09:27:50 +02:00
Thomas Schoebel-Theuer
3a790eadfc
copy: increase possible copy_last advances
2017-04-11 09:27:50 +02:00
Thomas Schoebel-Theuer
b6d4b69be8
copy: remove obsolete mutex
2017-04-11 09:27:50 +02:00
Thomas Schoebel-Theuer
f1914c254a
bio: safety check on destructor
2017-04-11 09:23:04 +02:00
Thomas Schoebel-Theuer
123de577d8
infra: provisionary parallizing of OLD md5 checksums
2017-04-11 09:23:04 +02:00
Thomas Schoebel-Theuer
b772878be6
bio: use multiple response threads
2017-04-11 09:23:04 +02:00
Thomas Schoebel-Theuer
670dd01cb9
bio: make response thread instantiable
2017-04-11 09:23:04 +02:00
Thomas Schoebel-Theuer
3066473a31
bio: separate response thread data
2017-04-11 09:21:35 +02:00
Thomas Schoebel-Theuer
84ff94faec
if: pimp nr_requests
2017-04-11 09:20:31 +02:00
Thomas Schoebel-Theuer
27be605623
bio: pimp nr_requests
2017-04-11 09:20:31 +02:00
Thomas Schoebel-Theuer
5a06fd26ab
copy: globally limit IO parallelism
2017-04-11 09:18:30 +02:00
Thomas Schoebel-Theuer
71bc90cc71
copy: make fly limitation global
2017-04-11 09:18:30 +02:00
Thomas Schoebel-Theuer
b7a770c91f
bio: speedup submit_thread termination
2017-04-04 08:42:16 +02:00
Thomas Schoebel-Theuer
b17944a512
bio: speedup response_thread termination
2017-04-04 08:42:16 +02:00
Thomas Schoebel-Theuer
ccf3c1b944
infra: increase say IDs
2017-04-04 08:42:16 +02:00
Thomas Schoebel-Theuer
94dcded654
main: earlier syncstatus update
2017-04-04 08:42:16 +02:00
Thomas Schoebel-Theuer
2e58ffadc1
main: introduce updater function at the right place
...
Updates must take place _before_ a copy is switched off.
2017-04-04 08:42:09 +02:00
Thomas Schoebel-Theuer
b7bd757d99
client: dont try get_info when brick isnt working
2017-04-04 08:38:16 +02:00
Thomas Schoebel-Theuer
4805d25cad
client: adapt timeout at get_info
2017-04-04 08:38:16 +02:00
Thomas Schoebel-Theuer
378cf8035f
main: earlier shutdown on rmmod
...
This is important when the network hangs.
2017-04-04 08:38:16 +02:00
Thomas Schoebel-Theuer
4934871905
cient: shut down socket before stopping thread
2017-04-04 08:38:16 +02:00
Thomas Schoebel-Theuer
ec9e4cd536
client: earlier stop sender thread
2017-04-04 08:38:16 +02:00
Thomas Schoebel-Theuer
f84cf05316
client: earlier send stop on shutdown
2017-04-04 08:38:16 +02:00
Thomas Schoebel-Theuer
342e5e40a5
copy: allow stopping in parallel
2017-04-04 08:38:16 +02:00
Thomas Schoebel-Theuer
afe2513c21
infra: shutdown bricks in parallel
2017-04-04 08:38:15 +02:00
Thomas Schoebel-Theuer
9438c99647
client: adapt socket aborts to io_timeout
2017-04-04 08:38:15 +02:00
Thomas Schoebel-Theuer
c0da3f50fe
main: safeguard forceful killing
2017-04-04 08:38:15 +02:00
Thomas Schoebel-Theuer
994ae64b92
main: fix sequential wait upon shutdown
...
Instead, switch off all resources in parallel without waiting for
each shutdown.
2017-04-04 08:38:15 +02:00
Thomas Schoebel-Theuer
ea57a4e898
Merge branch 'mars0.1.y' into mars0.1b.y
2017-04-04 08:37:05 +02:00
Thomas Schoebel-Theuer
d1988b3d7c
copy: leave lifelock when EOF position decreases
2017-04-04 08:03:09 +02:00
Thomas Schoebel-Theuer
85ca001f9f
copy: remove obsolete variable
2017-04-04 07:45:46 +02:00
Thomas Schoebel-Theuer
7f7b6b99a7
main: new simple sync parallelism limit
...
Hopefully this code is now "obviously correct"
2017-02-20 15:29:28 +01:00
Thomas Schoebel-Theuer
c3f931f660
main: remove obsolete 1&1-specific sync feature
2017-02-20 15:29:28 +01:00
Thomas Schoebel-Theuer
84a9273080
main: fix detection of logfile sequence holes
2017-02-16 07:21:09 +01:00
Thomas Schoebel-Theuer
1f11a21f53
aio: decrease context table
2017-02-09 10:13:31 +01:00
Thomas Schoebel-Theuer
1b46726241
main: avoid flipping of syncstatus update
2017-02-09 10:13:21 +01:00
Thomas Schoebel-Theuer
d726df70f3
client: correct timeout error code
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
f62a090575
copy: safeguard power_led_off
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
d897f9060e
infra: fix forced shutdown of bricks
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
bb89cf0dbb
infra: show brick creation timestamp in debuglogs
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
7bdf6ed6c2
infra: show additional variable in debug log
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
1080474ecc
all: use new wrapper
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
e370af69e1
infra: use new wrapper
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
0c76f0f1fd
infra: wrapper for generic_{dis,}connect with locking
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
f0381455cb
logger: increase position update frequency
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
fec2264766
main: fix unintended reset of syncstatus
2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer
300881a308
main: dont reset copy start_pos on network errors
2017-01-24 11:36:26 +01:00
Thomas Schoebel-Theuer
4e80236400
main: fix hang at rmmod
2017-01-24 11:36:26 +01:00
Thomas Schoebel-Theuer
b04db9a5ef
main: fix NULL pointer deref
...
Regression from e969219fca
2016-10-27 11:49:12 +02:00
Thomas Schoebel-Theuer
cc87a72637
if: fix merge_bvec_fn() regression for old kernels
2016-10-23 12:21:04 +02:00
Thomas Schoebel-Theuer
b6ef899ded
Revert "if: remove obsolete merge_bvec_fn()"
...
This reverts commit d96b6e3fbf
.
Altough newer kernels don't have this anymore, old kernels
need it.
Make it dependend from the kernel version.
2016-10-23 11:54:01 +02:00
Thomas Schoebel-Theuer
a92077dd5a
infra: use static inline for cpu_clock() (kernel 4.7)
...
Avoid compiler warnings caused by minor upstream changes
(2c923e94cd9c6acff3b22f0ae29cfe65e2658b40)
2016-08-25 15:39:06 +02:00
Thomas Schoebel-Theuer
0972d2b20d
infra: adapt to new crypto interface (kernel 4.6)
2016-08-25 15:39:06 +02:00
Thomas Schoebel-Theuer
d6e5b979ac
aio: adapt to changes in get_unused_fd()
...
Only relevant for the out-of-tree version.
The AIO stuff needs to be re-implemented anyway.
2016-08-25 15:39:06 +02:00
Thomas Schoebel-Theuer
bab7ba6300
if: adapt to kernel 4.4 BLK_QC_T_NONE
...
see dece16353ef47d8d33f5302bc158072a9d65e26f
2016-08-25 07:16:40 +02:00
Thomas Schoebel-Theuer
d96b6e3fbf
if: remove obsolete merge_bvec_fn()
2016-08-25 07:16:40 +02:00
Thomas Schoebel-Theuer
67977d7abf
if: adapt bio_endio() to kernel 4.3
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
500ddbc97f
bio: adapt bio_endio() to kernel 4.3
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
d04e8e23c4
if: adapt to renamed congestion handling (kernel 4.2)
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
275cc2a195
if: adapt to missing bi_cnt (kernel 4.2)
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
cf8ee66490
bio: adapt to missing BIO_EOPNOTSUPP (kernel 4.2)
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
d2abf4d64f
net: adapt to new sk_net_refcnt (kernel 4.2)
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
5f6c2a25fe
if: move and enable blk_cleanup_queue()
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
7d4dce3e27
infra: compatibility to new filldir_t
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
07887e1f74
net: compatibility to kernel 3.19
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
2ea01ece5f
proc: fix ctl_table conventions
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
df7105dfe2
light: make lockdep happy
2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer
3c244706a5
main: fix replay_code report in primary mode
...
After a primary --force, the error couldn't go away in case of
a defective logfile. Months later, sysadmins were needlessly alarmed
when looking at the primary.
2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer
e969219fca
main: safeguard versionlink appearance
...
In some rare cases (e.g. damaged /mars or crashed primaries),
the versionlink belonging to a logfile may be missing.
Don't insist on the existence of a versionlink if the logfile is
stemming from myself (automatic self-repair).
2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer
634499d3d2
all: testing of hangs
2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer
90653476f6
all: crash testing hardening infrastructure
...
This is important for even more hardening of MARS.
Simulate crashes at the "wrong moment", typically with
IO requests flying, or just before a symlink update.
Only for debugging. Never use for production.
2016-08-09 09:34:19 +02:00
Thomas Schoebel-Theuer
f89e0a7d96
marsadm: lowlevel IP address commands
...
This is absolutely necessary for coping with changes in network
setups.
2016-03-09 09:42:38 +01:00
Thomas Schoebel-Theuer
e7f41563f2
main: fix livelock at end of sync
...
Only observed on very fast hardware.
Leaving the loop may unnecessarily take a long time.
2016-03-08 11:37:41 +01:00
Thomas Schoebel-Theuer
04b2f2120e
Kbuild: fix external 1&1 build process
2016-03-03 12:42:41 +01:00
Thomas Schoebel-Theuer
a5f8f3e464
main: rename mars_light.c to mars_main.c
2016-03-03 09:35:16 +01:00
Thomas Schoebel-Theuer
4d31d09534
all: remove CONFIG_MARS_BIGMODULE
2016-03-03 09:33:34 +01:00
Thomas Schoebel-Theuer
daa701edf1
light: s/light_class/main_class/g
2016-03-03 09:05:01 +01:00
Thomas Schoebel-Theuer
2990b9362e
light: s/light_thread/main_thread/g
2016-03-03 09:04:04 +01:00
Thomas Schoebel-Theuer
42a8bfaa60
all: s/light_(worker|checker)/main_\1/g
2016-03-03 08:57:07 +01:00
Thomas Schoebel-Theuer
dd4748bb52
light: clarify code
2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer
8fa728a0c9
light: fix annoying unnecessary error message
2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer
8abcbf196d
light: safeguard sync vs replay
2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer
e70ac4df8c
light: safeguard position update
2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer
fafad9512a
light: always update position symlinks at logger switchoff
2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer
42c2dc98da
light: fix typo in replay link comparison
2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer
a312e3d93b
light: fix memory leak
...
regression from f235b76900
2016-03-01 11:58:09 +01:00
Thomas Schoebel-Theuer
8bc1e80488
light: safeguard skipping of logfiles in disconnected state.
...
Found by code inspection, neither in practice nor by testing.
Should not occur in practice, because it could only occur after
marsadm pause-fetch, which is an exceptional state only to be entered
for maintenance or for emergency failover.
Skipping over an incorrect logfile at a secondary may produce an
unnecessary split brain.
Fix the potential problem by doing it only after "primary --force",
and by never creating a new logfile, always by re-using existing
logfiles.
2016-02-10 06:44:00 +01:00
Thomas Schoebel-Theuer
f235b76900
light: fix potential deadlock on restart after inconsistent symlinks
...
This has been found by testing.
In extremely rare cases, such after crashes at the "wrong moment"
or after defective /mars filesystems, the replay link could show a
different length than the corresponding versionlink.
The versionlink wouldn't be updated anymore when additionally the
logfile has the same length than the replay link.
The incorrect versionlink will then lead to a lock.
Fix the problem by using the _minimum_ of all length indicators.
For safty, or when in doubt, replay more data, which will in turn
update the versionlink again to its correct value.
2016-02-10 06:24:27 +01:00
Thomas Schoebel-Theuer
8e2de8288d
light: fix missing versionlink upon slow or defective IO
...
Some primary appeared to have died, and was rebooted.
In the meantime, the old secondary was forcefully switched
to primary.
Afterwards, the old primary = new secondary got stuck because 2
versionlinks, which had been _produced_ by _himself_, were
missing, but they were present at the new primary = old secondary!
How could this happen?
All transaction logfiles were fully present and correct everywhere.
However, the old primary kern.log showed that a problem with the
RAID system must have existed. In addition, the RAID controller
errorlog also reported some problems which appeared to have healed.
Problem analysis shows the following possibility:
The transaction logger can continue to write data, even via
fsync(), while the _writeback_ of other parts of the /mars filesystem
(e.g. symlink updates) got stuck for a long time due to an IO problem.
Usually, slow or even missing symlink updates are no problem because
upon recovery after a reboot, everything is healed by transaction
replay (possibly replaying much more data than really necessary,
but this does not affect semantics, and it is even advantageous
when RAID disks might contain defective data).
There is one exception: after a logrotate, the corresponding new
versionlink should appear after a small time. Otherwise, the
above mentioned scenario could emerge.
We use sync_filesystem() to ensure that any versionlink update
to a _new_ versionlink is either guaranteed to become persistent,
or (in case of IO problems) the mars_light thread will hang, which
will be (hopefully) noticed soon by monitoring.
2016-02-03 22:01:48 +01:00
Thomas Schoebel-Theuer
ea48664a14
light: disallow primary from rotating over damaged logfiles
...
Only a secondary is allowed to do this, because we assume that
logfile replay has the property of "anytime consistency"
only there.
When a primary cannot recover after a crash due to a defective
logfile, this is not true. The primary is simply lost in such a
(rare) case. Observed 2 times during almost 8 millions of
operating hours.
In such a case, hardware is truly defective, and you have only
the following options:
1) switchover to a secondary via "primary --force", OR
2) deconstruct the resource everywhere, run fsck or similar on
whatever replica seems to be the best version,
and reconstruct the resource from scratch, OR
3) restore your backup.
2016-01-21 08:09:47 +01:00
Thomas Schoebel-Theuer
acdb9d7a42
light: fix reset of replay-code
...
Reset was forgotten in secondary role. Do it always whenever
a logfile is actually rotated.
2016-01-20 14:48:43 +01:00
Thomas Schoebel-Theuer
496e57e1e1
logger: add new indicator for damaged logfiles
2016-01-15 17:10:58 +01:00
Thomas Schoebel-Theuer
d67336420d
light: fix becoming primary when logfiles are damaged
...
When logfile replay aborts with an error, becoming primary would be
impossible.
Without this, repair would be only possible by complete destruction
of the resource.
A previous version of this patch introduced
/proc/sys/mars/allow_primary_when_damaged which would complicate
the sysadmin interface. People would be unsure what to do.
2016-01-13 14:12:02 +01:00
Thomas Schoebel-Theuer
3eedff125d
infra: fix comparison
...
Under weird circumstances, when a new symlink contents was just a
shortened version (prefix) of the old one, the symlink was not updated.
2016-01-02 10:18:33 +01:00
Thomas Schoebel-Theuer
d18c60f232
infra: fix potential fault
...
Very old idiotic bug.
Under some circumstances, a byte beyond the end of a non-null-terminated
string (such as produced by the VFS) might be read, potentially leading
to a page fault just one byte after a page border.
2016-01-02 10:18:33 +01:00
Thomas Schoebel-Theuer
25d954051b
logger: move ranking array from stack to brick instance
...
Don't allocate this on the stack, it might grow too big in future.
Reduces the risk of stack overflows (not observed until now, but
suspected).
2016-01-02 10:18:22 +01:00
Thomas Schoebel-Theuer
045d0e0356
logger: fix potential deadlock caused by incorrect accounting
...
Never observed in practice, found by testing with kernel upstream
versions.
2016-01-02 09:43:22 +01:00
Thomas Schoebel-Theuer
c1ee80f9f4
server: fix memory leak on writes
...
This was unnoticed for a long time because it simply did not occur
in ordinary MARS Light workloads.
2015-10-19 07:24:20 +02:00
Thomas Schoebel-Theuer
54d8433b21
light: fix spelling
2015-10-07 10:46:04 +02:00
Thomas Schoebel-Theuer
4d8dc3a619
logger: fix spelling
2015-10-07 10:45:51 +02:00
Thomas Schoebel-Theuer
af6ac736c5
if: fix wrong error code ENOSYS
2015-10-07 10:44:44 +02:00
Thomas Schoebel-Theuer
66d200dbf1
infra: fix wrong error code ENOSYS
2015-10-07 10:44:35 +02:00
Thomas Schoebel-Theuer
c6235c71d5
aio: fix race on shutdown
2015-07-15 10:38:49 +02:00
Thomas Schoebel-Theuer
550d02935e
sio: fix race on shutdown
2015-07-15 10:38:49 +02:00
Thomas Schoebel-Theuer
91f458fe66
sio: convert to new mapfree infrastructure
2015-07-15 10:38:49 +02:00
Thomas Schoebel-Theuer
c39a2988b7
light: fix long-lasting switchoff at end of sync
2015-06-17 11:33:27 +02:00
Thomas Schoebel-Theuer
4ecd6937c7
light: don't try fetching from (none)
2015-06-17 11:33:27 +02:00
Thomas Schoebel-Theuer
6eb5cefc19
infra: clean buffer cache on opening block devices
2015-06-17 11:33:18 +02:00
Thomas Schoebel-Theuer
7cbb705882
logger: safeguard endio() calling conventions
2015-05-05 08:46:28 +02:00
Thomas Schoebel-Theuer
18f1ae84f3
logger: fix race on completion refcount
2015-05-05 08:46:28 +02:00