RepoMirrors/mars

mirror of https://github.com/schoebel/mars synced 2025-01-12 09:40:07 +00:00

Author	SHA1	Message	Date
Thomas Schoebel-Theuer	b04db9a5ef	main: fix NULL pointer deref Regression from e969219fca0669e611d79b4a3f71ec0fe5d2bba5	2016-10-27 11:49:12 +02:00
Thomas Schoebel-Theuer	cc87a72637	if: fix merge_bvec_fn() regression for old kernels	2016-10-23 12:21:04 +02:00
Thomas Schoebel-Theuer	b6ef899ded	Revert "if: remove obsolete merge_bvec_fn()" This reverts commit d96b6e3fbf23a1428629cd38942eef55d94925d4. Altough newer kernels don't have this anymore, old kernels need it. Make it dependend from the kernel version.	2016-10-23 11:54:01 +02:00
Thomas Schoebel-Theuer	a92077dd5a	infra: use static inline for cpu_clock() (kernel 4.7) Avoid compiler warnings caused by minor upstream changes (2c923e94cd9c6acff3b22f0ae29cfe65e2658b40)	2016-08-25 15:39:06 +02:00
Thomas Schoebel-Theuer	0972d2b20d	infra: adapt to new crypto interface (kernel 4.6)	2016-08-25 15:39:06 +02:00
Thomas Schoebel-Theuer	d6e5b979ac	aio: adapt to changes in get_unused_fd() Only relevant for the out-of-tree version. The AIO stuff needs to be re-implemented anyway.	2016-08-25 15:39:06 +02:00
Thomas Schoebel-Theuer	bab7ba6300	if: adapt to kernel 4.4 BLK_QC_T_NONE see dece16353ef47d8d33f5302bc158072a9d65e26f	2016-08-25 07:16:40 +02:00
Thomas Schoebel-Theuer	d96b6e3fbf	if: remove obsolete merge_bvec_fn()	2016-08-25 07:16:40 +02:00
Thomas Schoebel-Theuer	67977d7abf	if: adapt bio_endio() to kernel 4.3	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	500ddbc97f	bio: adapt bio_endio() to kernel 4.3	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	d04e8e23c4	if: adapt to renamed congestion handling (kernel 4.2)	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	275cc2a195	if: adapt to missing bi_cnt (kernel 4.2)	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	cf8ee66490	bio: adapt to missing BIO_EOPNOTSUPP (kernel 4.2)	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	d2abf4d64f	net: adapt to new sk_net_refcnt (kernel 4.2)	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	5f6c2a25fe	if: move and enable blk_cleanup_queue()	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	7d4dce3e27	infra: compatibility to new filldir_t	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	07887e1f74	net: compatibility to kernel 3.19	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	2ea01ece5f	proc: fix ctl_table conventions	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	df7105dfe2	light: make lockdep happy	2016-08-25 07:16:39 +02:00
Thomas Schoebel-Theuer	3c244706a5	main: fix replay_code report in primary mode After a primary --force, the error couldn't go away in case of a defective logfile. Months later, sysadmins were needlessly alarmed when looking at the primary.	2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer	e969219fca	main: safeguard versionlink appearance In some rare cases (e.g. damaged /mars or crashed primaries), the versionlink belonging to a logfile may be missing. Don't insist on the existence of a versionlink if the logfile is stemming from myself (automatic self-repair).	2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer	634499d3d2	all: testing of hangs	2016-08-09 09:37:09 +02:00
Thomas Schoebel-Theuer	90653476f6	all: crash testing hardening infrastructure This is important for even more hardening of MARS. Simulate crashes at the "wrong moment", typically with IO requests flying, or just before a symlink update. Only for debugging. Never use for production.	2016-08-09 09:34:19 +02:00
Thomas Schoebel-Theuer	f89e0a7d96	marsadm: lowlevel IP address commands This is absolutely necessary for coping with changes in network setups.	2016-03-09 09:42:38 +01:00
Thomas Schoebel-Theuer	e7f41563f2	main: fix livelock at end of sync Only observed on very fast hardware. Leaving the loop may unnecessarily take a long time.	2016-03-08 11:37:41 +01:00
Thomas Schoebel-Theuer	04b2f2120e	Kbuild: fix external 1&1 build process	2016-03-03 12:42:41 +01:00
Thomas Schoebel-Theuer	a5f8f3e464	main: rename mars_light.c to mars_main.c	2016-03-03 09:35:16 +01:00
Thomas Schoebel-Theuer	4d31d09534	all: remove CONFIG_MARS_BIGMODULE	2016-03-03 09:33:34 +01:00
Thomas Schoebel-Theuer	daa701edf1	light: s/light_class/main_class/g	2016-03-03 09:05:01 +01:00
Thomas Schoebel-Theuer	2990b9362e	light: s/light_thread/main_thread/g	2016-03-03 09:04:04 +01:00
Thomas Schoebel-Theuer	42a8bfaa60	all: s/light_(worker\|checker)/main_\1/g	2016-03-03 08:57:07 +01:00
Thomas Schoebel-Theuer	dd4748bb52	light: clarify code	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	8fa728a0c9	light: fix annoying unnecessary error message	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	8abcbf196d	light: safeguard sync vs replay	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	e70ac4df8c	light: safeguard position update	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	fafad9512a	light: always update position symlinks at logger switchoff	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	42c2dc98da	light: fix typo in replay link comparison	2016-03-01 11:58:23 +01:00
Thomas Schoebel-Theuer	a312e3d93b	light: fix memory leak regression from `f235b76900`	2016-03-01 11:58:09 +01:00
Thomas Schoebel-Theuer	8bc1e80488	light: safeguard skipping of logfiles in disconnected state. Found by code inspection, neither in practice nor by testing. Should not occur in practice, because it could only occur after marsadm pause-fetch, which is an exceptional state only to be entered for maintenance or for emergency failover. Skipping over an incorrect logfile at a secondary may produce an unnecessary split brain. Fix the potential problem by doing it only after "primary --force", and by never creating a new logfile, always by re-using existing logfiles.	2016-02-10 06:44:00 +01:00
Thomas Schoebel-Theuer	f235b76900	light: fix potential deadlock on restart after inconsistent symlinks This has been found by testing. In extremely rare cases, such after crashes at the "wrong moment" or after defective /mars filesystems, the replay link could show a different length than the corresponding versionlink. The versionlink wouldn't be updated anymore when additionally the logfile has the same length than the replay link. The incorrect versionlink will then lead to a lock. Fix the problem by using the _minimum_ of all length indicators. For safty, or when in doubt, replay more data, which will in turn update the versionlink again to its correct value.	2016-02-10 06:24:27 +01:00
Thomas Schoebel-Theuer	8e2de8288d	light: fix missing versionlink upon slow or defective IO Some primary appeared to have died, and was rebooted. In the meantime, the old secondary was forcefully switched to primary. Afterwards, the old primary = new secondary got stuck because 2 versionlinks, which had been _produced_ by _himself_, were missing, but they were present at the new primary = old secondary! How could this happen? All transaction logfiles were fully present and correct everywhere. However, the old primary kern.log showed that a problem with the RAID system must have existed. In addition, the RAID controller errorlog also reported some problems which appeared to have healed. Problem analysis shows the following possibility: The transaction logger can continue to write data, even via fsync(), while the _writeback_ of other parts of the /mars filesystem (e.g. symlink updates) got stuck for a long time due to an IO problem. Usually, slow or even missing symlink updates are no problem because upon recovery after a reboot, everything is healed by transaction replay (possibly replaying much more data than really necessary, but this does not affect semantics, and it is even advantageous when RAID disks might contain defective data). There is one exception: after a logrotate, the corresponding new versionlink should appear after a small time. Otherwise, the above mentioned scenario could emerge. We use sync_filesystem() to ensure that any versionlink update to a _new_ versionlink is either guaranteed to become persistent, or (in case of IO problems) the mars_light thread will hang, which will be (hopefully) noticed soon by monitoring.	2016-02-03 22:01:48 +01:00
Thomas Schoebel-Theuer	ea48664a14	light: disallow primary from rotating over damaged logfiles Only a secondary is allowed to do this, because we assume that logfile replay has the property of "anytime consistency" only there. When a primary cannot recover after a crash due to a defective logfile, this is not true. The primary is simply lost in such a (rare) case. Observed 2 times during almost 8 millions of operating hours. In such a case, hardware is truly defective, and you have only the following options: 1) switchover to a secondary via "primary --force", OR 2) deconstruct the resource everywhere, run fsck or similar on whatever replica seems to be the best version, and reconstruct the resource from scratch, OR 3) restore your backup.	2016-01-21 08:09:47 +01:00
Thomas Schoebel-Theuer	acdb9d7a42	light: fix reset of replay-code Reset was forgotten in secondary role. Do it always whenever a logfile is actually rotated.	2016-01-20 14:48:43 +01:00
Thomas Schoebel-Theuer	496e57e1e1	logger: add new indicator for damaged logfiles	2016-01-15 17:10:58 +01:00
Thomas Schoebel-Theuer	d67336420d	light: fix becoming primary when logfiles are damaged When logfile replay aborts with an error, becoming primary would be impossible. Without this, repair would be only possible by complete destruction of the resource. A previous version of this patch introduced /proc/sys/mars/allow_primary_when_damaged which would complicate the sysadmin interface. People would be unsure what to do.	2016-01-13 14:12:02 +01:00
Thomas Schoebel-Theuer	3eedff125d	infra: fix comparison Under weird circumstances, when a new symlink contents was just a shortened version (prefix) of the old one, the symlink was not updated.	2016-01-02 10:18:33 +01:00
Thomas Schoebel-Theuer	d18c60f232	infra: fix potential fault Very old idiotic bug. Under some circumstances, a byte beyond the end of a non-null-terminated string (such as produced by the VFS) might be read, potentially leading to a page fault just one byte after a page border.	2016-01-02 10:18:33 +01:00
Thomas Schoebel-Theuer	25d954051b	logger: move ranking array from stack to brick instance Don't allocate this on the stack, it might grow too big in future. Reduces the risk of stack overflows (not observed until now, but suspected).	2016-01-02 10:18:22 +01:00
Thomas Schoebel-Theuer	045d0e0356	logger: fix potential deadlock caused by incorrect accounting Never observed in practice, found by testing with kernel upstream versions.	2016-01-02 09:43:22 +01:00
Thomas Schoebel-Theuer	c1ee80f9f4	server: fix memory leak on writes This was unnoticed for a long time because it simply did not occur in ordinary MARS Light workloads.	2015-10-19 07:24:20 +02:00

1 2 3 4 5 ...

354 Commits