fixes: #4982fixes: #4983
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
This prevents an assert from unexpected scrub results from the previous
scrub on the leader.
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Remove locator arg from get_object_context()/find_object_context()
Remove locator from object_info_t but retain encode format
Remove locator from object_info_t dump output
Remove OLOC_BLANK
Signed-off-by: David Zafman <david.zafman@inktank.com>
Parse namespace spec in osd caps and use in is_match()
Add test cases to unit test
feature: #4983 (OSD: namespaces pt 2 (caps))
Signed-off-by: David Zafman <david.zafman@inktank.com>
Add rados_ioctx_namespace_set_key() and librados::IoCtx::namespace_set_key()
Add namespace to admin-daemon operations
Support namespace in osd map command
Add namespace to object_locator_t and hobject_t
Add random namespaces to psim program
Feature: #4982 (OSD: namespaces pt 1 (librados/osd, not caps))
Signed-off-by: David Zafman <david.zafman@inktank.com>
This rarely represents an actual eversion_t as the epoch and seq values are
bumped semi-independently to ensure it is always unique. Break it into
two separate fields to avoid confusion.
Drop now-unused and slightly curious inc() method.
Signed-off-by: Sage Weil <sage@inktank.com>
We need to take PGs whose mapping has not changed in a long time into
account. For them, the pg state will indicate it was clean at the time of
the report, in which case we can use that as a lower-bound on their actual
latest epoch clean. If they are not currently clean (at report time), use
the last_epoch_clean value.
Fixes: #5519
Signed-off-by: Sage Weil <sage@inktank.com>
The mon needs a moderately accurate last_epoch_clean value in order to trim
old osdmaps. To prevent a PG that hasn't peered or received IO in forever
from preventing this, send pg stats at some minimum frequency. This will
increase the pg stat report workload for the mon over an idle pool, but
should be no worse that a cluster that is getting actual IO and sees these
updates from normal stat updates.
This makes the reported update a bit more aggressive/useful in that the epoch
is the last map epoch processed by this PG and not just one that is >= the
currenting interval. Note that the semantics of this field are pretty useless
at this point.
See #5519
Signed-off-by: Sage Weil <sage@inktank.com>
In certain cases the admin may know that it is safe to trim old osdmaps but
a bug or other issue is preventing the Monitor from deciding on its own.
Signed-off-by: Sage Weil <sage@inktank.com>
This particular failure is easily triggered by the crush_ops.sh
workunit. Make it a bit less likely to fail.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Call get_trim_to() when we need to know how much to trim (if any), and
calculate it then. No need to keep this in a hidden trim_version
variable and remember to update it. This drops several helpers and
accessors and makes get_trim_to() a single method that services need to
override.
Signed-off-by: Sage Weil <sage@inktank.com>
Add a paranoid check to prevent us from forgetting how far ahead our
last_committed was when we sync. This prevents an i'll-timed forced-sync
from allowing paxos to warp back in time.
This should never happen unless there is a perfect storm of bad admin
decisions and/or bugs, but we guard against it anyway.
See: #5256
Signed-off-by: Sage Weil <sage@inktank.com>
The sync no longer cares if we trim Paxos versions as we go, as long as we
don't trim so fast that we fall behind between GET_CHUNK messages, which
we can consider a tuning problem.
Remove this extra complexity!
Signed-off-by: Sage Weil <sage@inktank.com>
We were using paxos_max_join_drift to control the minimum number of
paxos transactions to keep around. Instead, make this explicit, and
separate from the join drift.
Signed-off-by: Sage Weil <sage@inktank.com>
The previous sync implementation was highly stateful and very complex.
This made it very hard to understand and to debug, and there were bugs
still lurking in the timeout code (at least).
Replace it with something much simpler:
- sync providers are almost stateless. they keep an iterator, identified
by a unique cookie, that times out in a simple way.
- sync requesters sync from whomever they fancy. namely anyone with newer
committed paxos state.
There are a few extra fields that might allow sync continuation later, but
this is complex and not necessary at this point.
Signed-off-by: Sage Weil <sage@inktank.com>