RepoMirrors/mars

mirror of https://github.com/schoebel/mars synced 2024-12-15 03:05:12 +00:00

Author	SHA1	Message	Date
Thomas Schoebel-Theuer	afe2513c21	infra: shutdown bricks in parallel	2017-04-04 08:38:15 +02:00
Thomas Schoebel-Theuer	d897f9060e	infra: fix forced shutdown of bricks	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	bb89cf0dbb	infra: show brick creation timestamp in debuglogs	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	7bdf6ed6c2	infra: show additional variable in debug log	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	e370af69e1	infra: use new wrapper	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	0c76f0f1fd	infra: wrapper for generic_{dis,}connect with locking	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	7d4dce3e27	infra: compatibility to new filldir_t	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	634499d3d2	all: testing of hangs	2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer	4d31d09534	all: remove CONFIG_MARS_BIGMODULE	2016-03-03 09:33:34 +01:00
Thomas Schoebel-Theuer	8e2de8288d	light: fix missing versionlink upon slow or defective IO Some primary appeared to have died, and was rebooted. In the meantime, the old secondary was forcefully switched to primary. Afterwards, the old primary = new secondary got stuck because 2 versionlinks, which had been _produced_ by _himself_, were missing, but they were present at the new primary = old secondary! How could this happen? All transaction logfiles were fully present and correct everywhere. However, the old primary kern.log showed that a problem with the RAID system must have existed. In addition, the RAID controller errorlog also reported some problems which appeared to have healed. Problem analysis shows the following possibility: The transaction logger can continue to write data, even via fsync(), while the _writeback_ of other parts of the /mars filesystem (e.g. symlink updates) got stuck for a long time due to an IO problem. Usually, slow or even missing symlink updates are no problem because upon recovery after a reboot, everything is healed by transaction replay (possibly replaying much more data than really necessary, but this does not affect semantics, and it is even advantageous when RAID disks might contain defective data). There is one exception: after a logrotate, the corresponding new versionlink should appear after a small time. Otherwise, the above mentioned scenario could emerge. We use sync_filesystem() to ensure that any versionlink update to a _new_ versionlink is either guaranteed to become persistent, or (in case of IO problems) the mars_light thread will hang, which will be (hopefully) noticed soon by monitoring.	2016-02-03 22:01:48 +01:00
Thomas Schoebel-Theuer	3eedff125d	infra: fix comparison Under weird circumstances, when a new symlink contents was just a shortened version (prefix) of the old one, the symlink was not updated.	2016-01-02 10:18:33 +01:00
Thomas Schoebel-Theuer	e7464b3c02	all: correct error code EIO The error code -EIO should always refer to a problem of lower storage laysers. Thus MARS should not generate that code itself, but other ones.	2015-01-20 15:20:10 +01:00
Thomas Schoebel-Theuer	802cc73b49	infra: additionally safeguard race on brick resource deallocation	2015-01-19 18:01:04 +01:00
Thomas Schoebel-Theuer	fa49247b8e	infra: fix stale dents	2015-01-19 18:01:04 +01:00
Thomas Schoebel-Theuer	aa09d7df30	all: clarify license GPLv2+	2014-11-25 18:09:17 +01:00
Thomas Schoebel-Theuer	3a6ff3d2c8	infra: quickfix Redhat/openvz builds	2014-07-14 17:27:11 +02:00
Thomas Schoebel-Theuer	1439d30ffb	all: port to newer kernels (up to 3.15)	2014-06-18 12:10:55 +02:00
Thomas Schoebel-Theuer	b64ed7bd96	infra: fix readdir() call	2014-06-18 12:10:48 +02:00
Thomas Schoebel-Theuer	7aebfdf6bb	all: remove __exit annotation	2014-04-24 18:08:31 +02:00
Thomas Schoebel-Theuer	ce7dbc07f1	infra: fix list initialization	2014-04-08 10:12:58 +02:00
Thomas Schoebel-Theuer	90b19cd2f6	infra: fix dent list sorting	2014-04-08 10:06:15 +02:00
Thomas Schoebel-Theuer	2f4696a9cc	all: fix logfile size propagation	2014-03-31 06:59:09 +02:00
Thomas Schoebel-Theuer	2d68b755c2	infra: fix mem error messages	2014-03-26 11:43:05 +01:00
Thomas Schoebel-Theuer	6050b4157f	infra: make string allocation fully dynamic	2014-03-26 11:43:05 +01:00
Thomas Schoebel-Theuer	17ef391953	infra: fix string allocation in mars_readlink()	2014-03-26 11:43:05 +01:00
Thomas Schoebel-Theuer	e551c1aa87	light: fix emergency mode	2014-03-19 11:44:58 +01:00
Thomas Schoebel-Theuer	56f38641ff	infra: fix/remove buggy d_{name,path}len In rare cases, this could lead to buffer overflows. Replace buggy concept from the prototype phase with more robust (although slightly less performant) code.	2014-03-19 11:30:24 +01:00
Thomas Schoebel-Theuer	3acb6a02fe	infra: fix removal of stale directories	2014-03-19 11:30:23 +01:00
Thomas Schoebel-Theuer	5d2a682cfd	infra: fix readlink() for very long paths	2014-03-19 11:30:23 +01:00
Thomas Schoebel-Theuer	3e9aae53c8	all: fix potential buffer overflows, use vscnprintf()	2014-03-19 11:30:23 +01:00
Thomas Schoebel-Theuer	bd9b46fc05	infra: fix forgotten locking	2014-03-19 11:30:23 +01:00
Thomas Schoebel-Theuer	9a8a4d7eb2	light: fix delete-resource forced dealloc	2014-02-03 15:07:45 +01:00
Thomas Schoebel-Theuer	8971edad18	if: set capacity upon regular switch() maintenance	2013-10-17 07:35:34 +02:00
Frank Liepold	08e5803cd1	light: workaround flying IO before reporting memory leaks We report an error if there are unfreed mrefs after the device brick has been switched to power off. Instead of reporting an error at once, we report only warnings in the first 20 seconds. If there are still unfreed mrefs after that time an error is reported.	2013-09-17 13:36:27 +02:00
Frank Liepold	ebe0ca6ad9	light: reduce cascades on lamport clock workaround Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de> Some filesystems like ext3 have only full second resolution. Therefore, we _must_ advance the Lamport clock in whole seconds when working on such gear, since we want to prevent lost updates which would be caused by standstill Lamport clocks. Sometimes, the lamport clock gets updated more frequently per second than real time. In such cases, the Lamport clock will run much faster than real time. After some weeks of operation, the Lamport clock will be far in the future. In general, we cannot do anything against that. When some fine-grained information cannot be coded into some specific data type, it cannot be coded. However, when updates start to occur less frequently, we want to _leave_ the workaround mode ASAP. The old code set tv_nsec to 0 which made it very likely that the workaround was triggered again unnecessarily. In order to _reduce_ that effect, we prevent unnecessary cascades of whole-second leaps by setting the nanoseconds constantly to 1 if the full second was increased due to insufficient capabilities of the underlying filesystem. At least in those cases where Lamport timestamps are transferred over the network and/or we have mixed configurations between ext3/ext4, we hope to decrease the risk of endless cascades. Experience shows that the new code behaves better.	2013-08-28 14:54:04 +02:00
Thomas Schoebel-Theuer	3346daf959	light: allow remote deletion of directories	2013-06-29 21:15:18 +02:00
Thomas Schoebel-Theuer	cdd7b85417	infra: systematics of make_brick_all() switching, remove superfluos parameter	2013-06-20 15:08:27 +02:00
Thomas Schoebel-Theuer	bdb6aaef1f	light: fix detach operation	2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer	59d706ba54	infra: replace brick_version by kill_round	2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer	bf7c0c9f3b	infra: remove superfluous parameter is_server	2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer	dfe2dc5b1c	infra: remove recursive button operations All buttons should be switched step-by-step in future. The previous patch should ensure that no harm can occur.	2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer	726bbe17fc	infra: don't switch off if predecessors are working	2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer	ce110eb52b	infra: disallow forbidden brick states Switch on only when all predecessor bricks are also on. Failing to do so can result in fatal errors. Similarly, switch only off if no successor exists any more.	2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer	326ed48da2	light: remove superfluous timeout parameter The concept was broken.	2013-06-03 09:05:46 +02:00
Thomas Schoebel-Theuer	f5fae8e4ba	light: show runtime connection status information	2013-04-12 08:26:25 +02:00
Thomas Schoebel-Theuer	c275bec28d	light: new systematics for emergency modes (filesystem full)	2013-04-08 17:02:57 +02:00
Thomas Schoebel-Theuer	a6aaa93da7	infra: fix forgotten {get,set}_df()	2013-04-08 17:02:57 +02:00
Thomas Schoebel-Theuer	1c8fa83d1f	infra: control creation of log messages	2013-04-08 17:02:57 +02:00
Thomas Schoebel-Theuer	c58417d271	all: move kernel source into separate directory	2013-04-08 17:01:37 +02:00

49 Commits