RepoMirrors/mars

mirror of https://github.com/schoebel/mars synced 2025-01-12 18:01:52 +00:00

Author	SHA1	Message	Date
Thomas Schoebel-Theuer	95d10d02a2	main: disable irqs during spinlocks	2017-05-16 10:21:31 +02:00
Thomas Schoebel-Theuer	f129ae00e9	infra: modinfo shows io driver type	2017-05-09 08:52:48 +02:00
Thomas Schoebel-Theuer	8abf1a0928	infra: modinfo shows whether prepatch is used	2017-05-09 08:52:48 +02:00
Thomas Schoebel-Theuer	a1d4497a51	infra: remove unwanted sys_utimes()	2017-05-04 10:32:50 +02:00
Thomas Schoebel-Theuer	09c6b3112c	infra: replace unwanted sys_unlink() by provisionary wrapper	2017-05-04 10:28:43 +02:00
Thomas Schoebel-Theuer	b3b13d9187	infra: replace unwanted sys_rename() by provisionary wrapper	2017-05-04 10:08:29 +02:00
Thomas Schoebel-Theuer	c4b055584c	infra: replace sys_mkdir() by vfs_mkdir()	2017-05-04 10:08:29 +02:00
Thomas Schoebel-Theuer	8fe84d32d8	infra: replace sys_symlink() by vfs_symlink()	2017-05-04 10:08:29 +02:00
Thomas Schoebel-Theuer	05a5b49aed	infra: remove unwanted reference to min_free_kbyte	2017-05-04 10:08:07 +02:00
Thomas Schoebel-Theuer	b9383da97c	infra: remove unwanted rmdir()	2017-05-04 10:04:12 +02:00
Thomas Schoebel-Theuer	ac2c901943	infra: remove unwanted chmod()	2017-05-04 10:04:02 +02:00
Thomas Schoebel-Theuer	f654129e94	compat: disable aio when necessary	2017-05-04 09:16:17 +02:00
Thomas Schoebel-Theuer	eaa6fc0efc	infa: introduce wrapper layer for compatibiliy with multiple kernels This is needed for adaptation of the out-of-tree MARS version to multiple kernel versions. It will be much simplified after upstream merging, and/or removed/replaced by something better.	2017-05-04 09:09:19 +02:00
Thomas Schoebel-Theuer	d1988b3d7c	copy: leave lifelock when EOF position decreases	2017-04-04 08:03:09 +02:00
Thomas Schoebel-Theuer	84a9273080	main: fix detection of logfile sequence holes	2017-02-16 07:21:09 +01:00
Thomas Schoebel-Theuer	1b46726241	main: avoid flipping of syncstatus update	2017-02-09 10:13:21 +01:00
Thomas Schoebel-Theuer	d897f9060e	infra: fix forced shutdown of bricks	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	bb89cf0dbb	infra: show brick creation timestamp in debuglogs	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	7bdf6ed6c2	infra: show additional variable in debug log	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	1080474ecc	all: use new wrapper	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	e370af69e1	infra: use new wrapper	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	0c76f0f1fd	infra: wrapper for generic_{dis,}connect with locking	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	fec2264766	main: fix unintended reset of syncstatus	2017-01-25 09:30:52 +01:00
Thomas Schoebel-Theuer	300881a308	main: dont reset copy start_pos on network errors	2017-01-24 11:36:26 +01:00
Thomas Schoebel-Theuer	4e80236400	main: fix hang at rmmod	2017-01-24 11:36:26 +01:00
Thomas Schoebel-Theuer	b04db9a5ef	main: fix NULL pointer deref Regression from `e969219fca`	2016-10-27 11:49:12 +02:00
Thomas Schoebel-Theuer	7d4dce3e27	infra: compatibility to new filldir_t	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	2ea01ece5f	proc: fix ctl_table conventions	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	df7105dfe2	light: make lockdep happy	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	3c244706a5	main: fix replay_code report in primary mode After a primary --force, the error couldn't go away in case of a defective logfile. Months later, sysadmins were needlessly alarmed when looking at the primary.	2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer	e969219fca	main: safeguard versionlink appearance In some rare cases (e.g. damaged /mars or crashed primaries), the versionlink belonging to a logfile may be missing. Don't insist on the existence of a versionlink if the logfile is stemming from myself (automatic self-repair).	2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer	634499d3d2	all: testing of hangs	2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer	90653476f6	all: crash testing hardening infrastructure This is important for even more hardening of MARS. Simulate crashes at the "wrong moment", typically with IO requests flying, or just before a symlink update. Only for debugging. Never use for production.	2016-08-09 09:34:19 +02:00
Thomas Schoebel-Theuer	f89e0a7d96	marsadm: lowlevel IP address commands This is absolutely necessary for coping with changes in network setups.	2016-03-09 09:42:38 +01:00
Thomas Schoebel-Theuer	e7f41563f2	main: fix livelock at end of sync Only observed on very fast hardware. Leaving the loop may unnecessarily take a long time.	2016-03-08 11:37:41 +01:00
Thomas Schoebel-Theuer	a5f8f3e464	main: rename mars_light.c to mars_main.c	2016-03-03 09:35:16 +01:00
Thomas Schoebel-Theuer	4d31d09534	all: remove CONFIG_MARS_BIGMODULE	2016-03-03 09:33:34 +01:00
Thomas Schoebel-Theuer	daa701edf1	light: s/light_class/main_class/g	2016-03-03 09:05:01 +01:00
Thomas Schoebel-Theuer	2990b9362e	light: s/light_thread/main_thread/g	2016-03-03 09:04:04 +01:00
Thomas Schoebel-Theuer	42a8bfaa60	all: s/light_(worker\|checker)/main_\1/g	2016-03-03 08:57:07 +01:00
Thomas Schoebel-Theuer	dd4748bb52	light: clarify code	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	8fa728a0c9	light: fix annoying unnecessary error message	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	8abcbf196d	light: safeguard sync vs replay	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	e70ac4df8c	light: safeguard position update	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	fafad9512a	light: always update position symlinks at logger switchoff	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	42c2dc98da	light: fix typo in replay link comparison	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	a312e3d93b	light: fix memory leak regression from `f235b76900`	2016-03-01 11:58:09 +01:00
Thomas Schoebel-Theuer	8bc1e80488	light: safeguard skipping of logfiles in disconnected state. Found by code inspection, neither in practice nor by testing. Should not occur in practice, because it could only occur after marsadm pause-fetch, which is an exceptional state only to be entered for maintenance or for emergency failover. Skipping over an incorrect logfile at a secondary may produce an unnecessary split brain. Fix the potential problem by doing it only after "primary --force", and by never creating a new logfile, always by re-using existing logfiles.	2016-02-10 06:44:00 +01:00
Thomas Schoebel-Theuer	f235b76900	light: fix potential deadlock on restart after inconsistent symlinks This has been found by testing. In extremely rare cases, such after crashes at the "wrong moment" or after defective /mars filesystems, the replay link could show a different length than the corresponding versionlink. The versionlink wouldn't be updated anymore when additionally the logfile has the same length than the replay link. The incorrect versionlink will then lead to a lock. Fix the problem by using the _minimum_ of all length indicators. For safty, or when in doubt, replay more data, which will in turn update the versionlink again to its correct value.	2016-02-10 06:24:27 +01:00
Thomas Schoebel-Theuer	8e2de8288d	light: fix missing versionlink upon slow or defective IO Some primary appeared to have died, and was rebooted. In the meantime, the old secondary was forcefully switched to primary. Afterwards, the old primary = new secondary got stuck because 2 versionlinks, which had been _produced_ by _himself_, were missing, but they were present at the new primary = old secondary! How could this happen? All transaction logfiles were fully present and correct everywhere. However, the old primary kern.log showed that a problem with the RAID system must have existed. In addition, the RAID controller errorlog also reported some problems which appeared to have healed. Problem analysis shows the following possibility: The transaction logger can continue to write data, even via fsync(), while the _writeback_ of other parts of the /mars filesystem (e.g. symlink updates) got stuck for a long time due to an IO problem. Usually, slow or even missing symlink updates are no problem because upon recovery after a reboot, everything is healed by transaction replay (possibly replaying much more data than really necessary, but this does not affect semantics, and it is even advantageous when RAID disks might contain defective data). There is one exception: after a logrotate, the corresponding new versionlink should appear after a small time. Otherwise, the above mentioned scenario could emerge. We use sync_filesystem() to ensure that any versionlink update to a _new_ versionlink is either guaranteed to become persistent, or (in case of IO problems) the mars_light thread will hang, which will be (hopefully) noticed soon by monitoring.	2016-02-03 22:01:48 +01:00

1 2 3 4 5

235 Commits