RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-20 18:33:44 +00:00

Author	SHA1	Message	Date
Josh Durgin	e32874fc5a	objecter: resend all writes after osdmap loses the full flag Now that the osd does not respond if it gets a map with the full flag set first, clients need to resend all writes. Clients talking to old osds are still subject to the race condition, so both sides must be upgraded to avoid it. Refs: #6938 Backport: dumpling, emperor Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-12-06 14:33:35 -08:00
Josh Durgin	4111729dda	osd: drop writes when full instead of returning an error There's a race between the client and osd with a newly marked full osdmap. If the client gets the new map first, it blocks writes and everything works as expected, with no errors from the osd. If the osd gets the map first, however, it will respond to any writes with -ENOSPC. Clients will pass this up the stack, and not retry these writes later. -ENOSPC isn't handled well by all clients. RBD, for example, may pass it on to qemu or kernel rbd which will both interpret it as EIO. Filesystems on top of rbd will not behave well when they receive EIOs like this, especially if the cluster oscillates between full and not full, so some writes succeed. To fix this, never return ENOSPC from the osd because of a map marked full, and rely on the client to retry all writes when the map is no longer marked full. Old clients talking to osds with this fix will hang instead of propagating an error, but only if they run into this race condition. ceph-fuse and rbd with caching enabled are not affected, since the ObjectCacher will retry writes that return errors. Refs: #6938 Backport: dumpling, emperor Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-12-06 14:33:26 -08:00
Sage Weil	1d5427a790	Merge pull request #907 from ceph/wip-3x osd: default to 3x replication	2013-12-06 14:25:38 -08:00
Sage Weil	384f01dfd3	crush/mapper: dump indep partial progression for debugging ...if DEBUG_INDEP is #defined. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:03 -08:00
Sage Weil	e632a79b3c	PendingReleaseNotes: note change of CRUSH indep mode in release notes Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:03 -08:00
Sage Weil	c853019475	crush: add feature CRUSH_V2 for new indep mode and SET_*_TRIES rule steps Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:03 -08:00
Sage Weil	caa0e22e15	crush: CHOOSE_LEAF -> CHOOSELEAF throughout This aligns the internal identifier names with the user-visible names in the decompiled crush map language. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:03 -08:00
Sage Weil	431a13eb37	osd/OSDMap: fix feature calculation for CACHEPOOL We need to include the faeture in the mask. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	03911b07e0	crush/CrushCompiler: [de]compile set_choose[leaf]_tries rule step Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	09ce7a2bd3	crush/CrushWrapper: set chooseleaf_tries to 5 for 'simple' indep rules When making a generic indep rule, set the recursive retry to 5. This gives better overall results. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	d1b97462cf	crush/mapper: add SET_CHOOSE_TRIES rule step Since we can specify the recursive retries in a rule, we may as well also specify the non-recursive tries too for completeness. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	64aeded50d	crush/mapper: apply chooseleaf_tries to firstn mode too Parameterize the attempts for the _firstn choose method, and apply the rule-specified tries count to firstn mode as well. Note that we have slightly different behavior here than with indep: If the firstn value is not specified for firstn, we pass through the normal attempt count. This maintains compatibility with legacy behavior. Note that this is usually not actually N^2 work, though, because of the descend_once tunable. However, descend_once is unfortunately not the same thing as 1 chooseleaf try because it is only checked on a reject but not on a collision. Sigh. In contrast, for indep, if tries is not specified we default to 1 recursive attempt, because that is simply more sane, and we have the option to do so. The descend_once tunable has no effect for indep. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	cb88763ccb	crush/mapper: fix up the indep tests Fix indentation. Simplify+fix the changed vs moved calculation. Use the new SET_CHOOSE_LEAF_TRIES command. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	580cf5f68c	Merge pull request #886 from ceph/wip-6922 Fix some pg_num change return codes and make them more resistant to mis-use Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-06 14:15:56 -08:00
Sage Weil	63755c42f9	Merge pull request #909 from dachary/wip-crush-unittest more CrushWrapper unittest	2013-12-06 12:35:52 -08:00
Loic Dachary	4e26cc0dac	crush: unittest CrushWrapper::get_immediate_parent Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	09938e6455	crush: unittest CrushWrapper::update_item Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	16ac59042e	crush: unittest s/std::string/string/ Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	b8190180c3	crush: unittest use const instead of define And reduce the depth of the hierarchy because three levels of buckets capture the same cases as four levels. Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	dc095214d3	crush: unittest CrushWrapper::check_item_loc Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	000c59a9a2	crush: unittest remove useless c->create() Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Yehuda Sadeh	516788d15b	Merge remote-tracking branch 'origin/next'	2013-12-06 11:24:06 -08:00
Sage Weil	cb26fbde52	osd: default to 3x replication 3x is the recommendation; it should be the default too. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 10:35:45 -08:00
Sage Weil	f4c16236b7	Merge pull request #901 from dachary/wip-crush-unittest crush: check for invalid names in loc[] Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-06 08:29:01 -08:00
Loic Dachary	aedbc99ffc	crush: check for invalid names in loc[] Add the is_valid_crush_loc helper to test for invalid crush names in insert_item and update_item, before performing any side effect. Implement the associated unit tests. Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 09:43:47 +01:00
Sage Weil	fe03ad2801	osd: queue pg deletion after on_removal txn The removal is normally so slow that these don't really race, but they could. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-05 23:13:28 -08:00
Sage Weil	aa63d6730a	os/MemStore: implement reference 'memstore' backend This is (as near to) a trivial ObjectStore backend for the OSD as we can get at the moment. Everything is stored in memory. We are slightly tricky with the locking, but not overly so. On umount we dump everything out to disk, and on mount we load it all in again, so we have some very coarse persistence/durability... just enough to make this usable in a non-failure environment. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-05 23:13:28 -08:00
João Eduardo Luís	80fb336b4c	Merge pull request #900 from ceph/wip-mon-mds-trim mon: MDSMonitor: trim versions and let PaxosService decide whether to propose We were not trimming mdsmap versions and were generating a new map every time we modified the pending value. Now we not only make sure that MDSMonitor will trim old maps (configurable option allowing us to set the maximum number of maps to keep, defaulting to 500, much like other services do) but we also delegate to PaxosService the decision on whether to propose our pending value. We also perform several modifications to 'ceph-kvstore-tool', allowing one to obtain the contents of a given prefix:key and have them outputted to a file instead of stdout, and also add support for getting the size of a given prefix:key's value. 'ceph report' was also modified so that we always output the first and last committed versions for all services; up until this point, we would only output the first committed version on all services, and only a few were also outputting the last committed version. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-05 18:15:21 -08:00
Joao Eduardo Luis	47ee79704f	mon: ceph-kvstore-tool: get size of value for prefix/key Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-06 01:06:17 +00:00
Joao Eduardo Luis	c98c1043e3	tools: ceph-kvstore-tool: output value contents to file on 'get' Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-06 01:06:17 +00:00
Joao Eduardo Luis	00048fe33f	mon: Have 'ceph report' print last committed versions Only for those services that weren't doing it. Backport: dumpling Backport: emperor Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-06 01:06:16 +00:00
Joao Eduardo Luis	cc64382822	mon: MDSMonitor: let PaxosService decide on whether to propose Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-06 01:06:11 +00:00
Sage Weil	5823146077	os/ObjectStore: make getattrs() pure virtual It is required. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-05 15:33:20 -08:00
tamil	11e26ee424	s/true/1 and s/false/0 Signed-off-by: tamil <tamil.muthamizhan@inktank.com>	2013-12-05 13:05:12 -08:00
Joao Eduardo Luis	cf099415ad	mon: MDSMonitor: implement 'get_trim_to()' to let the mon trim mdsmaps This commit also adds two options to the MDSMonitor: - mon_max_mdsmap_epochs: the maximum amount of maps we'll keep (def: 500) - mon_mds_force_trim: the version we want to trim to This results in 'get_trim_to()' returning the possible values: - if we have set mon_mds_force_trim, and this value is greater than the last committed version, trim to mon_mds_force_trim - if we hold more than the max number of maps, trim to last - max - if we have set mon_mds_force_trim and if we hold more than the max number of maps, and mon_mds_force_trim is lower than last - max, then trim to last - max Backport: dumpling Backport: emperor Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-05 17:47:37 +00:00
Joao Eduardo Luis	3e845b56a3	mon: MDSMonitor: print map on encode_pending() iff debug mon = 30+ Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-05 17:41:37 +00:00
Joao Eduardo Luis	62fb47509b	mon: MDSMonitor: consider 'debug level' parameter on 'print_map()' The parameter was there, just not used. It does default to 7, so existing callers are okay. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-05 17:41:37 +00:00
Joao Eduardo Luis	032a00bb35	mon: MDSMonitor: remove reference to no-longer-used encode_trim() We weren't using it and it's no longer used by anyone anyway. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-05 17:41:37 +00:00
Sage Weil	39ddd213da	Merge pull request #899 from dachary/wip-crush-unittest CrushWrapper::insert_item unittest and minor fixes Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-05 09:18:50 -08:00
Loic Dachary	ccc6014512	crush: CrushWrapper unit tests Covers all cases for the following methods. All but insert_item are trivial. * insert_item * set_item_name * name_exists * item_exists * get_item_id * get_item_name * get_num_type_names * get_type_id * get_type_name * is_valid_crush_name Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-05 18:07:03 +01:00
Loic Dachary	b9bff8e8cb	crush: remove redundant test in insert_item A year after the last modification of test to check if an item was added twice to the same bucket, the subtree_contains test was added a few lines above it, making it redundant. Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-05 18:07:03 +01:00
Loic Dachary	8af75968ac	crush: insert_item returns on error if bucket name is invalid A bucket name may be created as a side effect of insert_item. All names in the loc argument are checked for validity at the beginning of the method and an error is returned immediately if one is found. This allows to not check for errors when setting the name of an item later on. Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-05 18:06:55 +01:00
Sage Weil	3b8371a4bf	os/ObjectStore: prevent copying Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-04 14:46:49 -08:00
Sage Weil	a70200e329	os/ObjectStore: pass cct to ctor Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-04 14:46:40 -08:00
Loic Dachary	0dd7d2985a	Merge pull request #892 from jpds/ceph-disk-journal-mbrtogpt Call --mbrtogpt on journal run of sgdisk should the drive require a GPT ... Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Loic Dachary <loic@dachary.org>	2013-12-04 11:42:30 -08:00
Sage Weil	ea600d0e0b	Merge pull request #782 from danchai/master ObjBencher: add rand_read_bench to support rand test in rados-bench	2013-12-04 07:42:48 -08:00
Jonathan Davies	35011e0b01	Call --mbrtogpt on journal run of sgdisk should the drive require a GPT table. Signed-off-by: Jonathan Davies <jonathan.davies@canonical.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Loic Dachary <loic@dachary.org>	2013-12-04 13:57:13 +00:00
danchai	cae10830c7	ObjBencher: add rand_read_bench functions to support rand test in rados-bench Signed-off-by: Tengwei Cai <tengweicai@gmail.com>	2013-12-04 15:41:42 +08:00
Sage Weil	e829859291	doc/rados/operations/crush: fix more Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-03 22:46:37 -08:00
Sage Weil	7709a10f52	doc/rados/operations/crush: fix rst Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-03 22:18:50 -08:00

1 2 3 4 5 ...

30072 Commits