This aligns the internal identifier names with the user-visible names in
the decompiled crush map language.
Signed-off-by: Sage Weil <sage@inktank.com>
Since we can specify the recursive retries in a rule, we may as well also
specify the non-recursive tries too for completeness.
Signed-off-by: Sage Weil <sage@inktank.com>
Parameterize the attempts for the _firstn choose method, and apply the
rule-specified tries count to firstn mode as well. Note that we have
slightly different behavior here than with indep:
If the firstn value is not specified for firstn, we pass through the
normal attempt count. This maintains compatibility with legacy behavior.
Note that this is usually *not* actually N^2 work, though, because of the
descend_once tunable. However, descend_once is unfortunately *not* the
same thing as 1 chooseleaf try because it is only checked on a reject but
not on a collision. Sigh.
In contrast, for indep, if tries is not specified we default to 1
recursive attempt, because that is simply more sane, and we have the
option to do so. The descend_once tunable has no effect for indep.
Signed-off-by: Sage Weil <sage@inktank.com>
And reduce the depth of the hierarchy because three levels of buckets
capture the same cases as four levels.
Signed-off-by: Loic Dachary <loic@dachary.org>
Add the is_valid_crush_loc helper to test for invalid crush names in
insert_item and update_item, before performing any side
effect. Implement the associated unit tests.
Signed-off-by: Loic Dachary <loic@dachary.org>
This is (as near to) a trivial ObjectStore backend for the OSD as we can
get at the moment. Everything is stored in memory. We are slightly
tricky with the locking, but not overly so.
On umount we dump everything out to disk, and on mount we load it all in
again, so we have some very coarse persistence/durability... just enough
to make this usable in a non-failure environment.
Signed-off-by: Sage Weil <sage@inktank.com>
mon: MDSMonitor: trim versions and let PaxosService decide whether to propose
We were not trimming mdsmap versions and were generating a new map every time
we modified the pending value.
Now we not only make sure that MDSMonitor will trim old maps (configurable
option allowing us to set the maximum number of maps to keep, defaulting to 500,
much like other services do) but we also delegate to PaxosService the decision on
whether to propose our pending value.
We also perform several modifications to 'ceph-kvstore-tool', allowing one to obtain
the contents of a given prefix:key and have them outputted to a file instead of stdout,
and also add support for getting the size of a given prefix:key's value.
'ceph report' was also modified so that we always output the first and last
committed versions for all services; up until this point, we would only output the
first committed version on all services, and only a few were also outputting the
last committed version.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This commit also adds two options to the MDSMonitor:
- mon_max_mdsmap_epochs: the maximum amount of maps we'll keep (def: 500)
- mon_mds_force_trim: the version we want to trim to
This results in 'get_trim_to()' returning the possible values:
- if we have set mon_mds_force_trim, and this value is greater than the
last committed version, trim to mon_mds_force_trim
- if we hold more than the max number of maps, trim to last - max
- if we have set mon_mds_force_trim and if we hold more than the max
number of maps, and mon_mds_force_trim is lower than last - max,
then trim to last - max
Backport: dumpling
Backport: emperor
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
A year after the last modification of test to check if an item was added
twice to the same bucket, the subtree_contains test was added a few
lines above it, making it redundant.
Signed-off-by: Loic Dachary <loic@dachary.org>
A bucket name may be created as a side effect of insert_item. All names
in the loc argument are checked for validity at the beginning of the
method and an error is returned immediately if one is found. This allows
to not check for errors when setting the name of an item later on.
Signed-off-by: Loic Dachary <loic@dachary.org>
Call --mbrtogpt on journal run of sgdisk should the drive require a GPT ...
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
In order to support this, we pass TPHandle* through the ObjectStore
interface, and then if we have one we suspend the timeouts while
waiting to get our op/byte throttles in the FileStore.
The OSD uses this when doing PG splits.
Signed-off-by: Greg Farnum <greg@inktank.com>