Commit Graph

86249 Commits

Author SHA1 Message Date
Sage Weil
126ffe6165 osd: log 'slow op' debug messages for individual slow ops
Otherwise it is very hard to identify which OSD ops are slow when we've
seen a SLOW_OPS health warning in a qa run.

Notably, without this, bugs like http://tracker.ceph.com/issues/23769
are very challenging to track down.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-05-01 13:53:49 -05:00
Sage Weil
1124839204 mon/OSDMonitor: set FLAG_SELFMANAGED_SNAPS on cephfs snap removal
CephFS uses a different path to remove selfmanaged snaps than librados,
so while the librados path goes through pg_pool_t::remove_unmanaged_snap(),
we open code the snap addition to the pool's removed_snaps here.  If we
don't set FLAG_SELFMANAGED_SNAPS at that time, we will implicitly set it
during decode and get a CRC mismatch.

Fix by explicitly setting FLAG_SELFMANAGED_SNAPS flag here.

Fixes: http://tracker.ceph.com/issues/23949
Signed-off-by: Sage Weil <sage@redhat.com>
2018-05-01 13:46:47 -05:00
Sage Weil
6024c5c52c mon/OSDMonitor: dump osdmaps if crc doesn't match
Dump both the json and hexdump at debug level 20.

Hunting http://tracker.ceph.com/issues/23949

Signed-off-by: Sage Weil <sage@redhat.com>
2018-05-01 12:39:03 -05:00
Sage Weil
c335bc16a4
Merge pull request #21742 from liewegas/wip-23940
osdc/Objecter: fix recursive locking in _finish_command

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-05-01 12:26:06 -05:00
Sage Weil
edcbb1bf15
Merge pull request #21745 from liewegas/wip-pg-removal-race
osd: fix _process handling for pg vs slot race

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2018-05-01 12:25:42 -05:00
Yuri Weinstein
b28ab5616d
Merge pull request #20678 from ceph/wip-s3a-fix
fix s3atests that are failing for sometime

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2018-05-01 09:28:24 -07:00
Yuri Weinstein
61a66b4e14
Merge pull request #20894 from ZVampirEM77/wip-multisite-cleanup
rgw: some cleanup for sync status

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2018-05-01 09:27:52 -07:00
Yuri Weinstein
f0e5e624b0
Merge pull request #21647 from yehudasa/wip-23859
rgw: fix for issue #21647 

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2018-05-01 09:27:32 -07:00
Yuri Weinstein
1d2b8b8025
Merge pull request #21648 from yehudasa/wip-cloud-sync-7
rgw: cloud sync fixes

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2018-05-01 09:27:10 -07:00
Kefu Chai
35b1e7ea63
Merge pull request #21678 from idiv-biodiversity/wip-doc-scrub_load_threshold
doc: fix error in osd scrub load threshold

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-05-01 20:22:09 +08:00
Jason Dillaman
ff82e168f6
Merge pull request #21727 from trociny/wip-23929
librbd: release lock executing deep copy progress callback

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-05-01 07:44:17 -04:00
Yan, Zheng
e4160d7e78 mds: don't report slow request for blocked filelock request
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Fixes: http://tracker.ceph.com/issues/22428
2018-05-01 12:54:29 +08:00
Patrick Donnelly
49365c70dc
Merge PR #21719 into master
* refs/pull/21719/head:
	mds: trim log during shutdown to clean metadata

Reviewed-by: Zheng Yan <zyan@redhat.com>
2018-04-30 17:26:07 -07:00
Patrick Donnelly
8153cfa696
Merge PR #21720 into master
* refs/pull/21720/head:
	mds: kick rdlock if waiting for dirfragtreelock

Reviewed-by: Zheng Yan <zyan@redhat.com>
2018-04-30 17:25:01 -07:00
Boris Ranto
056bc08d51 prometheus: Handle the TIME perf counter type metrics
This patch correctly sets the PERFCOUNTER_MASK to 3 so that the
PERFCOUNTER_TIME metrics are not ignored by the mgr_module code. It also
converts the TIME metrics from nanoseconds to seconds just like the ceph
perf dump does and exposes the metrics via prometheus module.

Signed-off-by: Boris Ranto <branto@redhat.com>
2018-05-01 01:20:44 +02:00
Patrick Donnelly
db3b6ca546
common: refactor for array size
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-30 13:43:54 -07:00
Sage Weil
7dbcc772f6 osd: fix _process handling for pg vs slot race
We could see the slot with a different PG than we expected if the old
PG was removed and a new one was instantiated in its place.  We can't
just pick up the new PG pointer, however, since it isn't locked.

Fix by retrying with the slot's new pg (possibly null!).  Move this check
below the other cases so that we know we are otherwise consistent with
the slot, since the next pass around we might get pg==null and skip the
to_process.empty() and requeue_seq checks entirely.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-30 15:38:59 -05:00
Mykola Golub
93b9eb7dd8 librbd: release lock executing deep copy progress callback
Fixes: http://tracker.ceph.com/issues/23929
Signed-off-by: Mykola Golub <mgolub@suse.com>
2018-04-30 22:24:19 +03:00
Josh Durgin
625c6895fb
Merge pull request #21706 from liewegas/wip-23860
osd/PG: fix DeferRecovery vs AllReplicasRecovered race

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-04-30 11:32:31 -07:00
Patrick Donnelly
9e12aa5d3b
mds: kick rdlock if waiting for dirfragtreelock
Fixes: https://tracker.ceph.com/issues/23919

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-30 10:04:08 -07:00
Patrick Donnelly
c60ef1b806
mds: trim log during shutdown to clean metadata
Otherwise the trimming won't advance so that the remaining inodes are marked
clean.

Fixes: http://tracker.ceph.com/issues/23923

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-30 09:58:10 -07:00
Sage Weil
f459de15aa
Merge pull request #21702 from theanalyst/wip-std-mutex
osdc/Objector: use std::shared_mutex instead of boost::shared_mutex

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2018-04-30 11:18:11 -05:00
Patrick Donnelly
cec1fa0998
Merge PR #21731 into master
* refs/pull/21731/head:
	client: drop function _get_inodeno

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-30 09:16:48 -07:00
John Spray
1e768b3f4e mgr/dashboard: silence E741
This is a pretty questionable check because it complains
about the caller of an API instead of the API itself, if
one of the API's members/arguments is one of the
forbidden variable names such as 'O'.

The interface to pyopenssl includes an 'O' member
on the certificate object.

Signed-off-by: John Spray <john.spray@redhat.com>
2018-04-30 16:39:40 +01:00
Sage Weil
c584061d16
Merge pull request #21743 from yuriw/wip-yuriw-crontab
qa/tests: removed rest suite from the mix
2018-04-30 10:33:36 -05:00
Mykola Golub
6b752a3859
Merge pull request #21697 from dillaman/wip-18753-1
rbd-mirror: additional thrasher testing

Reviewed-by: Mykola Golub <mgolub@suse.com>
2018-04-30 18:25:35 +03:00
Yuri Weinstein
42fa821724 qa/tests: removed rest suite from the mix
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
2018-04-30 08:20:06 -07:00
Ken Dreyer
a630681c65 Merge pull request #21716 from smithfarm/wip-drop-obs-kludge
build/ops: rpm: Revert "ceph.spec: work around build.opensuse.org"

Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
2018-04-30 09:15:23 -06:00
Sage Weil
854f44b247
Merge pull request #21739 from tchaikov/wip-23922
qa/suites/rados/thrash-old-clients: ms_type=simple

Reviewed-by: Sage Weil <sage@redhat.com>
2018-04-30 09:55:10 -05:00
Andrew Schoen
2f15a4fba3
Merge pull request #21685 from alfredodeza/wip-rm23874
ceph-volume  failed ceph-osd --mkfs command doesn't halt the OSD creation process

Reviewed-by: Andrew Schoen <aschoen@redhat.com>
2018-04-30 14:52:50 +00:00
Sage Weil
891f519242 osdc/Objecter: fix recursive locking in _finish_command
The path

#9  Objecter::_finish_command (this=this@entry=0x7f76c00aeb30, c=c@entry=0x7f76b0000b10, r=<optimized out>, rs="osd down") at /build/ceph-13.0.2-1932-g458b4fb/src/osdc/Objecter.cc:4950
#10 0x00007f76d26de106 in Objecter::_check_command_map_dne (this=this@entry=0x7f76c00aeb30, c=c@entry=0x7f76b0000b10) at /build/ceph-13.0.2-1932-g458b4fb/src/osdc/Objecter.cc:1726
#11 0x00007f76d26e52e4 in Objecter::_scan_requests (this=this@entry=0x7f76c00aeb30, s=0x7f76c00af8a0, skipped_map=skipped_map@entry=false, cluster_full=cluster_full@entry=false, pool_full_map=0x7f76be7fb330, need_resend=..., need_resend_linger=..., need_resend_command=std::map with 0 elements, sul=...,
    gap_removed_snaps=0x7f76ac0016f8) at /build/ceph-13.0.2-1932-g458b4fb/src/osdc/Objecter.cc:1120
#12 0x00007f76d26eded5 in Objecter::handle_osd_map (this=this@entry=0x7f76c00aeb30, m=m@entry=0x7f76ac0014a0) at /build/ceph-13.0.2-1932-g458b4fb/src/osdc/Objecter.cc:1228

led to recursive lock of the session mutex (locked in _scan_requests,
and again in _finish_command).

Fix by making the callers for _finish_command (and
_check_command_map_dne) take the session lock.

Fixes: http://tracker.ceph.com/issues/23940
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-30 09:52:38 -05:00
Kefu Chai
e62bc6bcd6
Merge pull request #21708 from dalgaaf/wip-da-SCA-20180425
Various fixes for SCA issues

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-04-30 21:57:19 +08:00
Kefu Chai
f072045ebf
Merge pull request #21690 from xiexingguo/wip-pr-20304
mon, osd: add create-time for pool

Reviewed-by: Sage Weil <sage@redhat.com>
2018-04-30 21:53:34 +08:00
Kefu Chai
ceaf329811
Merge pull request #21659 from yangDL/master
pybind/ceph_argparse.py:'timeout' must in kwargs when call run_in_thread

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-04-30 21:48:37 +08:00
Kefu Chai
770dbae2ca qa/suites/rados/thrash-old-clients: ms_type=simple
hammer does not support async messenger, so set ms_type to "simple" for
hammer client.

Fixes: http://tracker.ceph.com/issues/23922
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-04-30 21:40:53 +08:00
John Spray
5596f489da mgr/dashboard: fix linter complaints
In addition to line ordering, there were a couple of bogus ones:
E: 30, 0: No name 'version' in module 'distutils' (no-name-in-module)
E: 30, 0: Unable to import 'distutils.version' (import-error)
E: 36, 8: No name 'wsgiserver' in module 'cherrypy' (no-name-in-module)
E: 36, 8: Unable to import 'cherrypy.wsgiserver.wsgiserver2' (import-error)

I don't know why pylint can't see these modules, but they're definitely
there, so I've added them to the ignored list in .pylintrc

Signed-off-by: John Spray <john.spray@redhat.com>
2018-04-30 14:27:49 +01:00
Jason Dillaman
5d99f4e719
Merge pull request #21733 from trociny/wip-23938
qa/workunits/rbd: potential race in mirror disconnect test

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-04-30 08:55:12 -04:00
Rishabh Dave
b14302d1fe qa/cephfs: test if evicted client unmounts without hanging
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-04-30 12:02:56 +00:00
Rishabh Dave
18a9d0c491 qa/tasks: allow custom timeout for umount_wait()
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-04-30 12:02:56 +00:00
Rishabh Dave
0f56c7e8e5 client: don't hang when MDS sessions are evicted
Currently, a filesystem client hangs if a request is made after it's
eviction. Prevent the client from hanging and allow a manual unmount
in such cases.

Fixes: http://tracker.ceph.com/issues/10915
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-04-30 12:01:52 +00:00
John Spray
b869bfadd9
Merge pull request #21671 from jan--f/mgr-module-config-doc
doc/mgr/plugins: add note about distinction between config and kv store

Reviewed-by: John Spray <john.spray@redhat.com>
2018-04-30 12:42:18 +01:00
Mykola Golub
5bc1d4a51a qa/workunits/rbd: potential race in mirror disconnect test
(due to a typo in get_image_id command arg)

Fixes: http://tracker.ceph.com/issues/23938
Signed-off-by: Mykola Golub <mgolub@suse.com>
2018-04-30 09:44:12 +03:00
Jos Collin
ab46bb3314 client: drop function _get_inodeno
Drop _get_inodeno() as per the comment in https://github.com/ceph/ceph/pull/21554.

Signed-off-by: Jos Collin <jcollin@redhat.com>
2018-04-30 10:04:04 +05:30
Patrick Donnelly
67c7e46191
client: use common interp of st_nlink for dirs
Apparently some applications use this (like mail servers) and since it's
trivial to support, let's do it. Idea is that st_nlinks for a directory is
either 0 (it is unlinked) or 2 + the number of sub-directories (which have ..
parent links).

Fixes: https://tracker.ceph.com/issues/23873

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-29 20:04:43 -07:00
Sage Weil
cfe59cf20c osd/PG: fix DeferRecovery vs AllReplicasRecovered race
- DeferRecovery event queued by AsyncReserver due to preemption
  event.  We are in Recovering state with RECOVERING bit set.
- We finish recovery, clear RECOVERING state bit, and queue
  AllReplicasRecovered from PrimaryLogPG::start_recovery_ops()
- DeferRecovery event arrives, moving us from Recovering -> NotRecovering
- AllReplciasRecovered event arrives, crashing us.

This is all hard to deal with because the events are queued and may
arrive later.  Solve the problem here by tolerating a delayed
DeferRecovery event: if the RECOVERING pg state bit isn't set, ignore
it (it's old).  The async reserver cancel events are unpredictable.

Fixes: http://tracker.ceph.com/issues/23860
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-29 16:00:41 -05:00
Patrick Donnelly
543d8a0e4c
Merge PR #21554 into master
* refs/pull/21554/head:
	client: avoid second lock on client_lock

Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2018-04-29 11:05:33 -07:00
Patrick Donnelly
5a56301945
Merge PR #21592 into master
* refs/pull/21592/head:
	mds: filter out blacklisted clients when importing caps
	mds: don't add blacklisted clients to reconnect gather set
	mds: combine MDCache::{cap_exports,cap_export_targets}

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-29 11:05:27 -07:00
Patrick Donnelly
e7856ffa04
Merge PR #21593 into master
* refs/pull/21593/head:
	mds: properly check auth subtree count in MDCache::shutdown_pass()

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-29 11:05:22 -07:00
Patrick Donnelly
0c11a6fcb4
Merge PR #21601 into master
* refs/pull/21601/head:
	mds: don't discover inode/dirfrag when mds is in 'starting' state

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-29 11:05:16 -07:00
Patrick Donnelly
6c07c85796
Merge PR #21610 into master
* refs/pull/21610/head:
	cephfs-journal-tool: wait prezero ops before destroying journal

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-29 11:05:11 -07:00