os/bluestore: fix bug for calc extent_avg in reshard function
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>
This commit is just a cleanup to make the arguments of the method
around crush_reweight all coherent.
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@redhat.com>
osd: fall back to failsafe threshold if osdmap doesn't set [near]full
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
The reason is that ioc may be reaped in _aio_thread function
with the following statements:
for (auto &&it : registered_devices)
it->reap_ioc();
So if we still use ioc's lock for (random) read, it will cause
core dump.
Signed-off-by: optimistyzy <optimistyzy@gmail.com>
tests: ceph_test_rados_api_watch_notify: test timeout using rados_wat…
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Currently dump_historic_ops dumps ops sorted by their initiation time,
which may not have any relation to how long it took, and sorting output
of that command by op duration is neither fast nor convenient.
New asok command ("dump_historic_ops_by_duration") outputs the same
op list, but ordered by their duration time (longest first).
Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
When get_partition_dev() fails, it reports the following message :
ceph_disk.main.Error: Error: partition 2 for /dev/sdb does not appear to exist
The code search for a directory inside the /sys/block/get_dev_name(os.path.realpath(dev)).
The issue here is the error message doesn't report that path when failing while it might be involved in.
This patch is about reporting where the code was looking at when trying to estimate if the partition was available.
Signed-off-by: Erwan Velu <erwan@redhat.com>
It's possible for the Sequencer to go away while the OpSequencer still has
txcs in flight. We were handling the case where the osr was on the
deferred_queue, but it may be off the deferred_queue but waiting for the
commit to happen, and we still need to wait for that.
Fix this by introducing a 'zombie' state for the osr, in which we keep the
osr in the osr_set.
Clean up the OpSequencer methods and a few other method names.
Signed-off-by: Sage Weil <sage@redhat.com>
We've been avoiding doing this for a while and it has finally caught up
with us: the SharedBlob may outlive the split due to deferred IO, and
a read on the child collection may load a competing Blob and SharedBlob
and read from the on-disk blocks that haven't been written yet.
Fix by preserving the one-SharedBlob-instance invariant by moving cache
items to the new Collection and cache shard like we should have from the
beginning.
Signed-off-by: Sage Weil <sage@redhat.com>
We can't use a bare Collection since we get/put refs, the last put will
delete it, and the dtor asserts nref == 0 (no faking a ref and deliberately
leaking!).
Signed-off-by: Sage Weil <sage@redhat.com>
Otherwise cache items survive beyond umount into the next mount cycle!
Also, ensure that we flush_cache *before* clearing coll_map, as some cache
items have references back to the Collection.
Signed-off-by: Sage Weil <sage@redhat.com>
These can survive as long as the txc, which can be longer than the
Collection. Make sure we have a valid ref as both finish_write and
~SharedBlob use coll for the SharedBlobSet (and coll->store->cct for
debug).
Signed-off-by: Sage Weil <sage@redhat.com>