mirror of
https://github.com/schoebel/mars
synced 2025-01-16 03:50:57 +00:00
8e2de8288d
Some primary appeared to have died, and was rebooted. In the meantime, the old secondary was forcefully switched to primary. Afterwards, the old primary = new secondary got stuck because 2 versionlinks, which had been _produced_ by _himself_, were missing, but they were present at the new primary = old secondary! How could this happen? All transaction logfiles were fully present and correct everywhere. However, the old primary kern.log showed that a problem with the RAID system must have existed. In addition, the RAID controller errorlog also reported some problems which appeared to have healed. Problem analysis shows the following possibility: The transaction logger can continue to write data, even via fsync(), while the _writeback_ of other parts of the /mars filesystem (e.g. symlink updates) got stuck for a long time due to an IO problem. Usually, slow or even missing symlink updates are no problem because upon recovery after a reboot, everything is healed by transaction replay (possibly replaying much more data than really necessary, but this does not affect semantics, and it is even advantageous when RAID disks might contain defective data). There is one exception: after a logrotate, the corresponding new versionlink should appear after a small time. Otherwise, the above mentioned scenario could emerge. We use sync_filesystem() to ensure that any versionlink update to a _new_ versionlink is either guaranteed to become persistent, or (in case of IO problems) the mars_light thread will hang, which will be (hopefully) noticed soon by monitoring. |
||
---|---|---|
.. | ||
sy_old | ||
brick_atomic.h | ||
brick_checking.h | ||
brick_locks.h | ||
brick_mem.c | ||
brick_mem.h | ||
brick_say.c | ||
brick_say.h | ||
brick.c | ||
brick.h | ||
gpl-2.0.txt | ||
Kbuild | ||
Kconfig | ||
lamport.c | ||
lamport.h | ||
lib_limiter.c | ||
lib_limiter.h | ||
lib_log.c | ||
lib_log.h | ||
lib_mapfree.c | ||
lib_mapfree.h | ||
lib_pairing_heap.h | ||
lib_queue.h | ||
lib_rank.c | ||
lib_rank.h | ||
lib_timing.c | ||
lib_timing.h | ||
Makefile | ||
mars_aio.c | ||
mars_aio.h | ||
mars_bio.c | ||
mars_bio.h | ||
mars_buf.c | ||
mars_buf.h | ||
mars_check.c | ||
mars_check.h | ||
mars_client.c | ||
mars_client.h | ||
mars_copy.c | ||
mars_copy.h | ||
mars_dummy.c | ||
mars_dummy.h | ||
mars_generic.c | ||
mars_if.c | ||
mars_if.h | ||
mars_net.c | ||
mars_net.h | ||
mars_server.c | ||
mars_server.h | ||
mars_sio.c | ||
mars_sio.h | ||
mars_trans_logger.c | ||
mars_trans_logger.h | ||
mars_usebuf.c | ||
mars_usebuf.h | ||
mars.h | ||
meta.h |