This prevents an assert from unexpected scrub results from the previous
scrub on the leader.
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
This rarely represents an actual eversion_t as the epoch and seq values are
bumped semi-independently to ensure it is always unique. Break it into
two separate fields to avoid confusion.
Drop now-unused and slightly curious inc() method.
Signed-off-by: Sage Weil <sage@inktank.com>
We need to take PGs whose mapping has not changed in a long time into
account. For them, the pg state will indicate it was clean at the time of
the report, in which case we can use that as a lower-bound on their actual
latest epoch clean. If they are not currently clean (at report time), use
the last_epoch_clean value.
Fixes: #5519
Signed-off-by: Sage Weil <sage@inktank.com>
The mon needs a moderately accurate last_epoch_clean value in order to trim
old osdmaps. To prevent a PG that hasn't peered or received IO in forever
from preventing this, send pg stats at some minimum frequency. This will
increase the pg stat report workload for the mon over an idle pool, but
should be no worse that a cluster that is getting actual IO and sees these
updates from normal stat updates.
This makes the reported update a bit more aggressive/useful in that the epoch
is the last map epoch processed by this PG and not just one that is >= the
currenting interval. Note that the semantics of this field are pretty useless
at this point.
See #5519
Signed-off-by: Sage Weil <sage@inktank.com>
In certain cases the admin may know that it is safe to trim old osdmaps but
a bug or other issue is preventing the Monitor from deciding on its own.
Signed-off-by: Sage Weil <sage@inktank.com>
This particular failure is easily triggered by the crush_ops.sh
workunit. Make it a bit less likely to fail.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Call get_trim_to() when we need to know how much to trim (if any), and
calculate it then. No need to keep this in a hidden trim_version
variable and remember to update it. This drops several helpers and
accessors and makes get_trim_to() a single method that services need to
override.
Signed-off-by: Sage Weil <sage@inktank.com>
If libcurl supports curl_multi_wait() then use it, otherwise
use select() and force a timeout, even if it has been disabled.
Otherwise we may wait forever for events that we can't wait for
as select() only uses fds < 1024.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Do not process reads (or, by PaxosService::dispatch() implication, writes)
until we have committed the initial service state. This avoids things like
EPERM due to missing keys when we race with mon creation, triggered by
teuthology tests doing their health check after startup.
Fixes: #5515
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
We want to trim old states even if there is no update activity. For
example, if a long-running rebalance finishes all osdmap updates will
stop and we won't trim out old maps to free space.
Instead, trim at the same time as tick(). Remove the trim during
propose_pending() to force all trims through this path and avoid
introducing a new and rarely-exercised behavior.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Right after cluster creation, first_committed is 1 and latest stashed in 0,
but we don't have the initial full map yet. Thereafter, we do (because we
write it with trim). Fixes afd6c7d824.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
MOSDPG(Push|PushReply|Pull|SubOp|SubOpReply) need the
same thing checked prior to queueing the op, so they
share a templated handler.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Compare all keys within the sync'ed prefixes across members of the quorum
and compare the key counts and CRC for inconsistencies.
Currently this is a one-shot inefficient hammer. We'll want to make this
work in chunks before it is usable in production environments.
Protect with a feature bit to avoid sending MMonScrub to mons who can't
decode it.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>