This error only happens until initiator is connected to the target.
Fixes: https://tracker.ceph.com/issues/36564
Signed-off-by: Ricardo Marques <rimarques@suse.com>
When a cache tier promotes an object with one or more error PG log
entries, these errors need to be propagated and recorded for dup
op detection.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
If the base tier records an error against an operation, the cache
tier currently might incorrectly respond with a success return code.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* refs/pull/24686/head:
os/bluestore: show compress and buffered from WriteContext
os/bluestore: fix rename race with trim on replacement onode at old name
Reviewed-by: Jianpeng Ma <jianpeng.ma@intel.com>
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
* refs/pull/24787/head:
Merge PR #24796 into nautilus
osd: fix heartbeat_reset unlock
Merge PR #24780 into nautilus
Merge PR #24761 into nautilus
Merge PR #24651 into nautilus
osd: fix race between op_wq and context_queue
test: Make sure kill_daemons failure will be easy to find
test: Add flush_pg_stats to make test more deterministic
* refs/pull/24666/head:
include/types: fixed compile warning for signed/unsigned comparison
osd/PrimaryLogPG: uncommitted dup ops should respond with logged return code
osd/PrimaryLogPG: propagate error return codes on object copy_get ops
osd/PGLog: optionally record error return codes for extra_reqids
osd/osd_types: include PG log return codes in object copy data
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/24688/head:
common: make ceph_abort store same crash info as ceph_assert
global: store assert msg in global and dump to crash meta
pybind/mgr: make 'ceph crash ls' output sorted list
log: don't clear ring when dump_recent is called
ceph-crash: make clear to user that 'posted' should be directory
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/24780/head:
osd: take heartbeat_lock before checking for session
Merge PR #24725 into nautilus
qa/tasks/qemu: use unique clone directory to avoid race with workunit
mds: add missing mds_lock
Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
Referenced purge subcommand info via ceph osd command label.
Fixes: https://tracker.ceph.com/issues/36605
Signed-off-by: James McClune <jmcclune@mcclunetechnologies.net>
* refs/pull/24651/head:
test: Make sure kill_daemons failure will be easy to find
test: Add flush_pg_stats to make test more deterministic
Reviewed-by: Neha Ojha <nojha@redhat.com>
* refs/pull/24585/head:
doc: add developer documentation on new cephfs reclaim interfaces
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Zheng Yan <zyan@redhat.com>
In CephReleaseNamePipe, we used to blindly return the "release name" portion of
the version string. This ends up e.g. returning 'nautilus' for master right
now, which causes us to link to nonexistent documentation on ceph.com. This
change causes builds marked as 'dev' (as opposed to 'stable') to report
'master' as their release name.
Fixes: https://tracker.ceph.com/issues/36416
Signed-off-by: Zack Cerza <zack@redhat.com>
In case a reshard attempt is left in an incomplete state, i.e., flags
still show resharding even though the bucket reshard lock isn't being
held, try to recover by taking the bucket reshard lock and clearing
flags associated with resharding.
This change requires access to an RGWBucketInfo object. So call stack
into this function should provide that to prevent unnecessary
work. Changes were made to provide this object.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
When we open a connection, there is a short window before we attach
the session. If a fault happens quickly, we won't get the reset, and
will persistently fail to send osd pings.
Move the lock up to avoid this. Note that we should rarely really see
connections without sessions here anyway (except when this specific
race happens), so this should have no negative impact (by taking the lock
when we weren't before).
Fixes: http://tracker.ceph.com/issues/36602
Signed-off-by: Sage Weil <sage@redhat.com>
There are other processes beyond resharding that would need to take a
bucket reshard lock (e.g., correcting bucet resharding flags in event
of crash, tools to remove bucket shard information from earlier
versions of ceph). Pulling this logic outside of RGWReshardBucket
allows this code to be re-used.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
ThreadA ThreadB
sdata->shard_lock.Lock();
if (sdata->pqueue->empty() &&
!(is_smallest_thread_index && !sdata->context_queue.empty())) {
void queue(list<Context *>& ls) {
bool empty = false;
{
std::scoped_lock l(q_mutex);
if (q.empty()) {
q.swap(ls);
empty = true;
} else {
q.insert(q.end(), ls.begin(), ls.end());
}
}
if (empty) {
mutex.Lock();
cond.Signal();
mutex.Unlock();
}
}
sdata->sdata_wait_lock.Lock();
if (!sdata->stop_waiting) {
Fix by simply rechecking that context_queue is empty after taking the
wait lock. We still check it without taking that lock to keep the hot/busy
path fast (we avoid the wait lock in general) at the expense of taking
the context_queue qlock twice in the idle/wait path (where we don't care
so much about additional latency/cycles).
Fixes: http://tracker.ceph.com/issues/36473
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Previously, when resharding failed, we restored the shard status on
the bucket info object. However the status on each of the shards was
left indicating a reshard was underway. This prevented some write
operations from taking place, as they would wait for resharding to
complete. This adds the missing functionality. It also makes the
functionality available to other classes via static functions in
RGWBucketReshard.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
The bucket reshard lock was simply an exclusive lock that existed on
an object solely for the purpose of representing the lock. This is now
changed to exclusvie-ephemeral lock, so as not to leave these objects
behind.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Add a new type of cls lock -- exclusive ephemeral for which the
object only exists to represent the lock and for which the object
should be deleted at unlock. This is to prevent the accumulation of
unneeded objects in the cluster by automatically cleaning them up.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>