This was observed when onode's removal is followed by reading
and the latter causes object release before the removal is finalized.
The root cause is an improper 'pinned' state assessment in Onode::get
More detailed overview is:
At some point Onode::get() might face the case when nref == 2 and pinned = true
which means parallel incomplete put is running on the onode - ref count is
decremented but pinned state is still unmodified (and even lock hasn't been
acquired yet).
This might finally result in two puts racing over the same onode with nref == 2
which finally results in a premature onode release:
// nref =3, pinned = 1
// Thread 1 Thread 2
// o->put() o->get()
// --nref(n = 2, pinned=1)
// nref++ (n=3, pinned = 1)
// return
// ...
// o->put()
// --nref(n = 2)
// pinned = 0,
// --nref(n = 1)
// ocs->_unpin_and_rm(o) -> o->put()
// ...
// --nref(n = 0)
// release o
// o->c->get_onode_cache()
// FAULT!
//
The suggested fix is to introduce additional atomic counter tracking
running put() functions. And permit onode release when both regular
nref and put_nref are both equal to zero.
Fixes: https://tracker.ceph.com/issues/53002
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Update the ceph version used in the example upgrade command to match the one mentioned in the text above it.
Signed-off-by: Foad Lind <foad.lind@citynetwork.eu>
crimson/os/seastore: avoid onode/omap laddr hint conflicts as much as possible
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
This should prevent omap and xattr extent allocations from clumping near
the onode's hint. Additionally, only generate them past the default
16MB object_data_handler reservation.
Signed-off-by: Samuel Just <sjust@redhat.com>
crimson/os/seastore/segment_cleaner: correct available space calculation
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
From perspective user who want to use deduplication,
it is hard to know how to use dedup feature.
So, providing chunk-dedup might be helpful to use
deduplication.
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Current available space calculation is wrong, it just counts the space occupied
by extents, deltas and other stuff are not taken into account.
Fixes: https://tracker.ceph.com/issues/53409
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Add a conditional statement when autoscaler is
set to ON to omit message when about pool having
many more objects per pg than cluster average.
Fixes: https://tracker.ceph.com/issues/53516
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
move the encoder and decoder methods into their associated class
files to eliminate undefined references to the class vtable
https://tracker.ceph.com/issues/53596
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
python-common: improve OSD spec error messages
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
Commit 78983ad0d0 added cherrypy to ceph-mgr-cephadm's Requires,
but this needs to be split out into distro-specific sections due
to subtle/irritating naming differences.
Fixes: 78983ad0d0
Signed-off-by: Tim Serong <tserong@suse.com>
Thus should ensure that any captured members of extent_init_func are
still valid at the cost of not being able to access the contents of the
extent at invocation time. With this, we should be able to rely on any
logical extents/lba extents in the cache having validly initialized lba
pins.
Fixes: https://tracker.ceph.com/issues/53555
Signed-off-by: Samuel Just <sjust@redhat.com>