Otherwise it is very hard to identify which OSD ops are slow when we've
seen a SLOW_OPS health warning in a qa run.
Notably, without this, bugs like http://tracker.ceph.com/issues/23769
are very challenging to track down.
Signed-off-by: Sage Weil <sage@redhat.com>
CephFS uses a different path to remove selfmanaged snaps than librados,
so while the librados path goes through pg_pool_t::remove_unmanaged_snap(),
we open code the snap addition to the pool's removed_snaps here. If we
don't set FLAG_SELFMANAGED_SNAPS at that time, we will implicitly set it
during decode and get a CRC mismatch.
Fix by explicitly setting FLAG_SELFMANAGED_SNAPS flag here.
Fixes: http://tracker.ceph.com/issues/23949
Signed-off-by: Sage Weil <sage@redhat.com>
This patch correctly sets the PERFCOUNTER_MASK to 3 so that the
PERFCOUNTER_TIME metrics are not ignored by the mgr_module code. It also
converts the TIME metrics from nanoseconds to seconds just like the ceph
perf dump does and exposes the metrics via prometheus module.
Signed-off-by: Boris Ranto <branto@redhat.com>
We could see the slot with a different PG than we expected if the old
PG was removed and a new one was instantiated in its place. We can't
just pick up the new PG pointer, however, since it isn't locked.
Fix by retrying with the slot's new pg (possibly null!). Move this check
below the other cases so that we know we are otherwise consistent with
the slot, since the next pass around we might get pg==null and skip the
to_process.empty() and requeue_seq checks entirely.
Signed-off-by: Sage Weil <sage@redhat.com>
Otherwise the trimming won't advance so that the remaining inodes are marked
clean.
Fixes: http://tracker.ceph.com/issues/23923
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is a pretty questionable check because it complains
about the caller of an API instead of the API itself, if
one of the API's members/arguments is one of the
forbidden variable names such as 'O'.
The interface to pyopenssl includes an 'O' member
on the certificate object.
Signed-off-by: John Spray <john.spray@redhat.com>
build/ops: rpm: Revert "ceph.spec: work around build.opensuse.org"
Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
The path
#9 Objecter::_finish_command (this=this@entry=0x7f76c00aeb30, c=c@entry=0x7f76b0000b10, r=<optimized out>, rs="osd down") at /build/ceph-13.0.2-1932-g458b4fb/src/osdc/Objecter.cc:4950
#10 0x00007f76d26de106 in Objecter::_check_command_map_dne (this=this@entry=0x7f76c00aeb30, c=c@entry=0x7f76b0000b10) at /build/ceph-13.0.2-1932-g458b4fb/src/osdc/Objecter.cc:1726
#11 0x00007f76d26e52e4 in Objecter::_scan_requests (this=this@entry=0x7f76c00aeb30, s=0x7f76c00af8a0, skipped_map=skipped_map@entry=false, cluster_full=cluster_full@entry=false, pool_full_map=0x7f76be7fb330, need_resend=..., need_resend_linger=..., need_resend_command=std::map with 0 elements, sul=...,
gap_removed_snaps=0x7f76ac0016f8) at /build/ceph-13.0.2-1932-g458b4fb/src/osdc/Objecter.cc:1120
#12 0x00007f76d26eded5 in Objecter::handle_osd_map (this=this@entry=0x7f76c00aeb30, m=m@entry=0x7f76ac0014a0) at /build/ceph-13.0.2-1932-g458b4fb/src/osdc/Objecter.cc:1228
led to recursive lock of the session mutex (locked in _scan_requests,
and again in _finish_command).
Fix by making the callers for _finish_command (and
_check_command_map_dne) take the session lock.
Fixes: http://tracker.ceph.com/issues/23940
Signed-off-by: Sage Weil <sage@redhat.com>
hammer does not support async messenger, so set ms_type to "simple" for
hammer client.
Fixes: http://tracker.ceph.com/issues/23922
Signed-off-by: Kefu Chai <kchai@redhat.com>
In addition to line ordering, there were a couple of bogus ones:
E: 30, 0: No name 'version' in module 'distutils' (no-name-in-module)
E: 30, 0: Unable to import 'distutils.version' (import-error)
E: 36, 8: No name 'wsgiserver' in module 'cherrypy' (no-name-in-module)
E: 36, 8: Unable to import 'cherrypy.wsgiserver.wsgiserver2' (import-error)
I don't know why pylint can't see these modules, but they're definitely
there, so I've added them to the ignored list in .pylintrc
Signed-off-by: John Spray <john.spray@redhat.com>
Currently, a filesystem client hangs if a request is made after it's
eviction. Prevent the client from hanging and allow a manual unmount
in such cases.
Fixes: http://tracker.ceph.com/issues/10915
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Apparently some applications use this (like mail servers) and since it's
trivial to support, let's do it. Idea is that st_nlinks for a directory is
either 0 (it is unlinked) or 2 + the number of sub-directories (which have ..
parent links).
Fixes: https://tracker.ceph.com/issues/23873
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
- DeferRecovery event queued by AsyncReserver due to preemption
event. We are in Recovering state with RECOVERING bit set.
- We finish recovery, clear RECOVERING state bit, and queue
AllReplicasRecovered from PrimaryLogPG::start_recovery_ops()
- DeferRecovery event arrives, moving us from Recovering -> NotRecovering
- AllReplciasRecovered event arrives, crashing us.
This is all hard to deal with because the events are queued and may
arrive later. Solve the problem here by tolerating a delayed
DeferRecovery event: if the RECOVERING pg state bit isn't set, ignore
it (it's old). The async reserver cancel events are unpredictable.
Fixes: http://tracker.ceph.com/issues/23860
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/21554/head:
client: avoid second lock on client_lock
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>