It's possible for tell osd.* to race against an osd we stopped but the
cluster doesn't know is down yet. In tha case we'll get ENXIO on that
osd and the command will fail.
In this context, we don't care.
Signed-off-by: Sage Weil <sage@redhat.com>
* move Thrasher._set_config() to CephManager, and make it a public
method, and rename it to inject_args(),
* use this method instead of using 'tell ... injectargs ...' directly
Signed-off-by: Kefu Chai <kchai@redhat.com>
osd will refused to create new pgs, until its pg number is lower
than the max-pg-per-osd upper bound setting.
Signed-off-by: Kefu Chai <kchai@redhat.com>
bluestore_fsck_on_mount and bluestore_fsck_on_mount_deep are enabled by
default. and bluestore is used as the default store backend. it takes
longer to perform the deep fsck with verbose log. so prolong the
revive_osd()'s timeout from 150 sec to 360 sec.
Fixes: http://tracker.ceph.com/issues/21474
Signed-off-by: Kefu Chai <kchai@redhat.com>
Pg state maybe all in active+clean when no recovering going on,
so check it again before timedout.
Fixes: http://tracker.ceph.com/issues/21294
Signed-off-by: huangjun <huangjun@xsky.com>
We assume below that rerrosd is up, but it may not be when we exit the
loop.
Fixes: http://tracker.ceph.com/issues/21206
Signed-off-by: Sage Weil <sage@redhat.com>
This randomly issues pg force-recovery/force-backfill and
pg cancel-force-recovery/cancel-force-backfill during QA
testing. Disabled for upgrades from hammer, jewel and kraken.
Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
New option "random_eio" to Thrasher, sets 1 osd random read percentage
New option "objectsize" to radosbench task (-o bench option)
New option "type" to radosbench specify write, seq or rand
Signed-off-by: David Zafman <dzafman@redhat.com>
Make sure OSDs are up *and* they have flushed their PG stats before
waiting for recovery to ensure that we do not see a stale 'clean' state.
Signed-off-by: Sage Weil <sage@redhat.com>
The helper gets a sequence number from the osd (or osds), and then
polls the mon until that seq is reflected there.
This is overkill in some cases, since many tests only require that the
stats be reflected on the mgr (not the mon), but waiting for it to also
reach the mon is sufficient!
Signed-off-by: Sage Weil <sage@redhat.com>
Pulling this out of the 'pg dump' heap is inefficient.
Also, pg dump data comes from the mgr and may be stale.
Signed-off-by: Sage Weil <sage@redhat.com>
Keep the pool flag around so we can distinguish between a pool that
should maintain hashes for each chunk, and a missing one is a bug, vs
an overwrites pool where we rely on bluestore checksums for detecting
corruption.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
'remap' is to non-specific a name. In particular, it
sounds like it is related to the 'remapped' PG state
but in reality it is not related.
'upmap' or 'pg-upmap' is more specific: it maps a pgid
to the 'up' set value (or item)
Signed-off-by: Sage Weil <sage@redhat.com>
On slower machines (VPS, OVH) it takes time for the OSD to go down.
Fixes: http://tracker.ceph.com/issues/19556
Signed-off-by: Nathan Cutler <ncutler@suse.com>