RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-28 22:43:29 +00:00

Author	SHA1	Message	Date
Greg Farnum	9776e97af2	osd/PG: factor out get_next_version() Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:26 -08:00
Greg Farnum	0b0d1e8e42	librados: add wait_for_latest_osdmap() There are times when users may need to make sure the client has the latest osdmap, for example after sending a mon command modifying pool properties. Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Farnum <greg@inktank.com> squash "librados: add wait_for_latest_osdmap()"	2013-12-06 14:37:26 -08:00
Sage Weil	828590688f	librados: expose methods for calculating object hash position Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:26 -08:00
Sage Weil	4b5ab3f106	osdc/Objecter: expose methods for getting object hash position and pg Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:26 -08:00
Sage Weil	92879f7787	osd: capture hashing of objects to hash positions/pgs in pg_pool_t The hashing is dependent on pool properties; capture (more of) it in a method instead of having it in OSDMap. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:25 -08:00
Sage Weil	76e0b88f56	osd/OSDMap: use new object_locator_t::hash to place object in a pg The hash value, if provided, becomes the ps (placement seed) portion of the pg_t, skipping any hashing of the object name (or locator key). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:25 -08:00
Greg Farnum	d692da34ab	osd/osd_types: add explicit hash to object_locator_t Instead of hashing the object name or key, we allow the hash position to be provided explicitly. Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:25 -08:00
Greg Farnum	0d4ea9f746	encoding: allow users to specify a different compatv after encoding This way we can set the compatv preferentially depending on whether we've actually encoded new information or not. Signed-off-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:25 -08:00
Sage Weil	d2963c0a3d	librados: add mon_command to C++ API This way librados users can execute monitor commands. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:25 -08:00
Sage Weil	468fffa529	librados: document aio_flush() Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:25 -08:00
Sage Weil	bc7ace2eef	librados: constify inbl command args Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:25 -08:00
Sage Weil	a29d4fc3fd	osdc/Objecter: constify inbl command args Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:24 -08:00
Sage Weil	fb49065fe7	mon/MonClient: constify inbl command args Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:37:24 -08:00
Sage Weil	ef0f255a4a	osdc/Objecter: reimplement list_objects Return to caller at the end of each PG. This allows the caller to look at the [pg_]hash_position and get something meaningful. If there are no objects in the PG, we skip it so that every callback has some data (unless the pool is totally empty!). So the real difference here is that we don't move on to the next PG just to reach max_entries. This gives the client some data sooner, but may mean more callbacks into client code. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:36:52 -08:00
Sage Weil	d2e6cc635f	librados: add get_pg_hash_position to determine pg while listing objects Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:36:49 -08:00
Sage Weil	eff932c60a	osdc/Objecter: stick bl inside ListContext This is simpler and less error-prone. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:36:45 -08:00
Sage Weil	8e5803abf7	osdc/Objecter: factor pg_read out of list_objects code This will get used later for other ops against PGs (instead of objects). Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:36:41 -08:00
Sage Weil	dd8c939841	osdc/Objecter: separate explicit pg target from current target The pgid field is used to store the pg the op mapped to. We were just setting it directly for PGLS. Instead, fill in a new base_pgid, and copy that to pgid in recalc_op_target(), the same way we do when we map an object name to a PG. In particular, we take this opportunity to map a raw pgid to an actual pgid. This means the base_pg could come from a raw hash value (although it doesn't, yet). Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Farnum <greg@inktank.com>	2013-12-06 14:36:37 -08:00
Sage Weil	9381b69378	osdc/Objecter: drop redundant condition We are inside an if (response_size) block. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:36:34 -08:00
Sage Weil	bffcca6a0a	osd/osd_types: make pref optional in pg_t constructor We don't use preferred placements any more, so this will make it easier to start dropping references to it in new code. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:36:31 -08:00
Josh Durgin	3caf3effcb	rbd: check write return code during bench-write This is allows rbd-bench to detect http://tracker.ceph.com/issues/6938 when combined with rapidly changing the mon osd full ratio. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-12-06 14:33:41 -08:00
Josh Durgin	e32874fc5a	objecter: resend all writes after osdmap loses the full flag Now that the osd does not respond if it gets a map with the full flag set first, clients need to resend all writes. Clients talking to old osds are still subject to the race condition, so both sides must be upgraded to avoid it. Refs: #6938 Backport: dumpling, emperor Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-12-06 14:33:35 -08:00
Josh Durgin	4111729dda	osd: drop writes when full instead of returning an error There's a race between the client and osd with a newly marked full osdmap. If the client gets the new map first, it blocks writes and everything works as expected, with no errors from the osd. If the osd gets the map first, however, it will respond to any writes with -ENOSPC. Clients will pass this up the stack, and not retry these writes later. -ENOSPC isn't handled well by all clients. RBD, for example, may pass it on to qemu or kernel rbd which will both interpret it as EIO. Filesystems on top of rbd will not behave well when they receive EIOs like this, especially if the cluster oscillates between full and not full, so some writes succeed. To fix this, never return ENOSPC from the osd because of a map marked full, and rely on the client to retry all writes when the map is no longer marked full. Old clients talking to osds with this fix will hang instead of propagating an error, but only if they run into this race condition. ceph-fuse and rbd with caching enabled are not affected, since the ObjectCacher will retry writes that return errors. Refs: #6938 Backport: dumpling, emperor Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-12-06 14:33:26 -08:00
Sage Weil	1d5427a790	Merge pull request #907 from ceph/wip-3x osd: default to 3x replication	2013-12-06 14:25:38 -08:00
Sage Weil	384f01dfd3	crush/mapper: dump indep partial progression for debugging ...if DEBUG_INDEP is #defined. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:03 -08:00
Sage Weil	e632a79b3c	PendingReleaseNotes: note change of CRUSH indep mode in release notes Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:03 -08:00
Sage Weil	c853019475	crush: add feature CRUSH_V2 for new indep mode and SET_*_TRIES rule steps Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:03 -08:00
Sage Weil	caa0e22e15	crush: CHOOSE_LEAF -> CHOOSELEAF throughout This aligns the internal identifier names with the user-visible names in the decompiled crush map language. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:03 -08:00
Sage Weil	431a13eb37	osd/OSDMap: fix feature calculation for CACHEPOOL We need to include the faeture in the mask. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	03911b07e0	crush/CrushCompiler: [de]compile set_choose[leaf]_tries rule step Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	09ce7a2bd3	crush/CrushWrapper: set chooseleaf_tries to 5 for 'simple' indep rules When making a generic indep rule, set the recursive retry to 5. This gives better overall results. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	d1b97462cf	crush/mapper: add SET_CHOOSE_TRIES rule step Since we can specify the recursive retries in a rule, we may as well also specify the non-recursive tries too for completeness. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	64aeded50d	crush/mapper: apply chooseleaf_tries to firstn mode too Parameterize the attempts for the _firstn choose method, and apply the rule-specified tries count to firstn mode as well. Note that we have slightly different behavior here than with indep: If the firstn value is not specified for firstn, we pass through the normal attempt count. This maintains compatibility with legacy behavior. Note that this is usually not actually N^2 work, though, because of the descend_once tunable. However, descend_once is unfortunately not the same thing as 1 chooseleaf try because it is only checked on a reject but not on a collision. Sigh. In contrast, for indep, if tries is not specified we default to 1 recursive attempt, because that is simply more sane, and we have the option to do so. The descend_once tunable has no effect for indep. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	cb88763ccb	crush/mapper: fix up the indep tests Fix indentation. Simplify+fix the changed vs moved calculation. Use the new SET_CHOOSE_LEAF_TRIES command. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 14:24:02 -08:00
Sage Weil	580cf5f68c	Merge pull request #886 from ceph/wip-6922 Fix some pg_num change return codes and make them more resistant to mis-use Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-06 14:15:56 -08:00
Sage Weil	63755c42f9	Merge pull request #909 from dachary/wip-crush-unittest more CrushWrapper unittest	2013-12-06 12:35:52 -08:00
Loic Dachary	4e26cc0dac	crush: unittest CrushWrapper::get_immediate_parent Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	09938e6455	crush: unittest CrushWrapper::update_item Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	16ac59042e	crush: unittest s/std::string/string/ Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	b8190180c3	crush: unittest use const instead of define And reduce the depth of the hierarchy because three levels of buckets capture the same cases as four levels. Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	dc095214d3	crush: unittest CrushWrapper::check_item_loc Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Loic Dachary	000c59a9a2	crush: unittest remove useless c->create() Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 20:40:48 +01:00
Yehuda Sadeh	516788d15b	Merge remote-tracking branch 'origin/next'	2013-12-06 11:24:06 -08:00
Sage Weil	cb26fbde52	osd: default to 3x replication 3x is the recommendation; it should be the default too. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-06 10:35:45 -08:00
Sage Weil	f4c16236b7	Merge pull request #901 from dachary/wip-crush-unittest crush: check for invalid names in loc[] Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-06 08:29:01 -08:00
Loic Dachary	aedbc99ffc	crush: check for invalid names in loc[] Add the is_valid_crush_loc helper to test for invalid crush names in insert_item and update_item, before performing any side effect. Implement the associated unit tests. Signed-off-by: Loic Dachary <loic@dachary.org>	2013-12-06 09:43:47 +01:00
Sage Weil	fe03ad2801	osd: queue pg deletion after on_removal txn The removal is normally so slow that these don't really race, but they could. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-05 23:13:28 -08:00
Sage Weil	aa63d6730a	os/MemStore: implement reference 'memstore' backend This is (as near to) a trivial ObjectStore backend for the OSD as we can get at the moment. Everything is stored in memory. We are slightly tricky with the locking, but not overly so. On umount we dump everything out to disk, and on mount we load it all in again, so we have some very coarse persistence/durability... just enough to make this usable in a non-failure environment. Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-05 23:13:28 -08:00
João Eduardo Luís	80fb336b4c	Merge pull request #900 from ceph/wip-mon-mds-trim mon: MDSMonitor: trim versions and let PaxosService decide whether to propose We were not trimming mdsmap versions and were generating a new map every time we modified the pending value. Now we not only make sure that MDSMonitor will trim old maps (configurable option allowing us to set the maximum number of maps to keep, defaulting to 500, much like other services do) but we also delegate to PaxosService the decision on whether to propose our pending value. We also perform several modifications to 'ceph-kvstore-tool', allowing one to obtain the contents of a given prefix:key and have them outputted to a file instead of stdout, and also add support for getting the size of a given prefix:key's value. 'ceph report' was also modified so that we always output the first and last committed versions for all services; up until this point, we would only output the first committed version on all services, and only a few were also outputting the last committed version. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-05 18:15:21 -08:00
Joao Eduardo Luis	47ee79704f	mon: ceph-kvstore-tool: get size of value for prefix/key Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-12-06 01:06:17 +00:00

... 4 5 6 7 8 ...

30244 Commits