Commit Graph

404 Commits

Author SHA1 Message Date
Thomas Bechtold
a004d92ae0 qa/standalone/test_ceph_daemon: Fix hang when CEPH_DAEMON is not set
When running test_ceph_daemon.sh from the root dir and not setting
$CEPH_DAEMON manually, the call hangs at:

$ ./qa/standalone/test_ceph_daemon.sh
[...]
+ for p in $PYTHONS
+ echo '=== re-running with python3 ==='
=== re-running with python3 ===
++ which python3
+ ln -s /usr/bin/python3 /tmp/tmp.6hneCsNMio/python
+ echo '#!/tmp/tmp.6hneCsNMio/python'
+ cat

Check that there is a ceph-daemon found before continue.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-06 16:09:55 +01:00
Sage Weil
f5c7a8c986 qa/standalone/test_ceph_daemon: fix multi-version python test
We have to rewrite the shebang line, since it is no longer just
'#/usr/bin/env python' (as of e12ad1b016).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-03 10:09:06 -06:00
Sage Weil
df40a49eb8 ceph-daemon: use client.admin keyring during bootstrap
It's usually okay to use the mon. key for CLI commands, except we had a
mgr but that prevented you from issuing mgr commands correctly.  We have
the new client.admin key available, so use that instead.

Update tests to not --skip-ssh (now that it doesn't hang).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-30 14:07:52 -05:00
Sage Weil
debde146d2 qa/standalone/test_ceph_damon.sh: test with python2 and python3
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-28 12:15:47 -05:00
Sage Weil
b830870bbb Merge PR #31130 into master
* refs/pull/31130/head:
	ceph-daemon: only set up crash dir mount if it exists

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-10-25 09:18:20 -05:00
Sage Weil
f56a8db34d ceph-daemon: only set up crash dir mount if it exists
Sometimes we run containers on a host that doesn't have a crash dir set
up (becuase no daemon has been deployed).  Examples include shell and
ceph-volume.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-24 20:06:23 -05:00
David Zafman
4ea43f7342
Merge pull request #31133 from dzafman/wip-42476
ceph-objectstore-tool: call collection_bits() crashes on the meta col…

Reviewed-by: Sage Weil <sage@redhat.com>
2019-10-24 17:23:48 -07:00
David Zafman
2d79e77b6a ceph-objectstore-tool: call collection_bits() crashes on the meta collection
Skip new check for meta collection
test:
    Turn off osd_pool_default_pg_autoscale_mode just like bash tests do
    Fix test by checking for new error message

Caused by: f88b353454

Fixes: https://tracker.ceph.com/issues/42476

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-10-24 11:37:30 -07:00
Sage Weil
29c97547a9 Merge PR #30859 into master
* refs/pull/30859/head:
	auth: EACCES, not EPERM
	mon: shunt old tell commands from cli interface to asok
	mon: allow mgr to tell mon.foo smart
	mon: include quorum features in quorum_status
	qa/workunits/mon/caps.sh: fix test
	ceph_test_rados_api_cmd: fix MonDescribe test
	Merge branch 'vstart-fs-auth' of git://github.com/batrick/ceph into wip-cleanup-mon-asok
	test/pybind/test_ceph_argparse: fix tests
	vstart: add volume client keys to keyring
	vstart: use fs authorize to create master client key
	vstart: redirect some output to stderr
	vstart: output command strings to stderr
	qa/workunits/cephtool/test.sh: fix 'quorum enter' caller
	qa: change mon_status calls to quorum_status or tell commands
	mon: fix 'heap ...' command
	mon: consolidate 'sync force' commands
	mon: allow asok commands to return an error code
	mon: move 'quorum enter|exit' and 'mon_status' to asok
	mon: fix 'smart' asok command
	mon: remove old 'config set' and 'injectargs'

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-10-23 21:05:42 -05:00
Sage Weil
d7bd029b51 test_ceph_daemon: test unit, enter, shell
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-23 15:08:55 -05:00
Sage Weil
86b2c8dd60 ceph-daemon: drop exec
It's not identical to enter.  enter seems more intuitive to me, but that
may be because I'm not a longtime docker user.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-23 15:08:55 -05:00
Sage Weil
202d615d38 qa/standalone/test_ceph_daemon.sh: add new functional tests
- sudo as needed
- clean up afterward

There is still a bit of missing coverage, but this captures most of it.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-23 15:08:55 -05:00
Sage Weil
70367de903 qa: change mon_status calls to quorum_status or tell commands
The tests were doing logs of 'ceph mon_status'; change that to
quorum_status or tell.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-12 12:05:36 -05:00
Sage Weil
1e44d86b2c osd: change trigger_[deep_]scrub tommands to a pg tell command
This is cleaner.  All users are currently standalone tests; updated.

It also means that *all* commands that have a name=pgid arg are pg tell
commands.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-04 09:07:02 -05:00
Sage Weil
d8d2b71db5 qa/standalone/mon/health-mute: use power of 2 for pg_num
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-26 09:29:32 -05:00
Sage Weil
ab594b9b31 Merge PR #30475 into master
* refs/pull/30475/head:
	qa/standalone/ceph-helpers: default pg autoscale mode off for standalone
	os/bluestore: fix objectstore_blackhole read-after-write
	test,misc: do not specify pg_num per pool
	mgr/volumes: do not specify pg_num
	pybind/ceph_volume_client: do not specify pg_num for new pools
	doc: remove all pg_num arguments to 'osd pool create'
	mon: do not require pg_num to 'osd pool create'
	common: default pg_autoscale_mode=on for new pools

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-09-23 09:12:42 -05:00
Sage Weil
f71672c6ad qa/standalone/ceph-helpers: default pg autoscale mode off for standalone
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-22 16:59:07 -05:00
Sage Weil
8994a65242 qa/standalone/osd/divergent-priors: add reproducer for bug 41816
Reproducer for https://tracker.ceph.com/issues/41816

Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-21 10:09:15 -05:00
David Zafman
b3e1c58b0e osd: Replace active/pending scrub tracking for local/remote
This is similar to how recovery reservations are split between
local and remote.

It was the case that scrubs_pending was used for reservations at
the replicas as well as at the primary while requesting reservations
from the replicas.  There was no need for scrubs_pending to turn
into scrubs_active at the primary as nothing treated that value
as special.  scrubber.active = true when scrubbing is
actually going.

Now scurbber.local_reserved indicates scrubs_local incremented
Now scrubber.remote_reserved indicates scrubs_remote incremented

Fixes: https://tracker.ceph.com/issues/41669

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-10 13:33:27 -07:00
David Zafman
b98950e707 osd: Rename dump_reservations to dump_recovery_reservations
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-10 13:32:29 -07:00
David Zafman
6d2e4cb109 test: Allow fractional milliseconds to make test possible
Fixes: https://tracker.ceph.com/issues/41689

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-06 11:23:52 -07:00
David Zafman
336b6b66ca
Merge pull request #28755 from dzafman/wip-network
feature: Health warnings on long network ping times, add "dump_osd_network" to get a report

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-09-05 07:54:43 -07:00
David Zafman
5f83a6158b osd doc mon mgr: To milliseconds for config value, user input and threshold out
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-04 17:13:32 +00:00
David Zafman
87d80eb417 test: ceph-objectstore-tool add remove --force with bad snapset test
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-08-27 22:30:02 +00:00
David Zafman
4fb42ea27e test: Add basic test for network ping tracking
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-08-26 15:25:34 +00:00
Sage Weil
2dca76ac84 Merge PR #29774 into master
* refs/pull/29774/head:
	qa/standalone/scrub/osd-scrub-snaps: snapmapper omap is now 'm'

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-08-22 12:27:26 -05:00
Sage Weil
f5a1c57c94 qa/standalone/scrub/osd-scrub-snaps: snapmapper omap is now 'm'
...due to per-pool omap.

Fixes 91f533be71

Fixes: https://tracker.ceph.com/issues/41353
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-20 16:18:41 -05:00
Sage Weil
1e36be9567 qa/standalone/mon/health-mute.sh: fix up rachet test
Make sure we provide time for the mute to get cleared out by tick().

Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-19 12:30:10 -05:00
Sage Weil
9352fc94ab qa/standalone/mon/health-mute.sh: s/kill daemons/kill_daemons/
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-19 09:27:51 -05:00
Kefu Chai
fc55a51a87
Merge pull request #29579 from liewegas/wip-big-vs-bluestore
osd: scrub error on big objects; make bluestore refuse to start on big objects

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-08-16 20:24:43 +08:00
Sage Weil
710fef96ea qa/standalone/mon/health-mutes: add tests
Make sure mute and unmute work.  Make sure stick is sticky. Mkae sure
counts can go down bupt if they go upt hte mute clears.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-14 20:40:08 -05:00
David Zafman
5928fe8ca0 osd/PG: scrub error when objects are larger than osd_max_object_size
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-08-14 20:25:12 -05:00
Kefu Chai
f13c7c83d9
Merge pull request #29342 from Jeegn-Chen/wip-scrub-extended-sleep
osd: support osd_scrub_extended_sleep

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-08-13 09:09:52 +08:00
Jeegn Chen
3bfb5c2621 osd: support osd_scrub_extended_sleep
1. always take osd_scrub_sleep for manually initiated
   scrubs
2. when scrub_time_permit() return true for scheduled
   ones, the existing osd_scrub_sleep is used
3. when scrub_time_permit() return false for scheduled
   ones, there may be 2 scenarios
   3.1 if osd_scrub_extended_sleep <= osd_scrub_sleep,
       let's take osd_scrub_sleep
   3.2 otherwise, let's take osd_scrub_extended_sleep

Fixes: http://tracker.ceph.com/issues/40955
Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
2019-08-12 16:54:36 +08:00
David Zafman
b1c14b7f6e
Merge pull request #29494 from dzafman/wip-scrub-test
test: Bump sleep time for slower machines

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-08-07 18:30:31 -07:00
David Zafman
74d294d70b test: Bump sleep time for slower machines
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-08-05 07:40:09 -07:00
Changcheng Liu
43ad4bf0dc ceph-objectstore-tool: set log date format
Set datefmt parameter to track the log information
%F Equivalent to %Y-%m-%d
%T Equivalent to "%H:%M:%S"

Signed-off-by: Robert Church <robert.church@windriver.com>
Reviewed-by: Changcheng Liu <changcheng.liu@aliyun.com>
2019-07-25 09:39:19 +08:00
Sage Weil
1b46267cf7 Merge PR #28839 into master
* refs/pull/28839/head:
	osd: support osd_repair_during_recovery

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-07-16 10:07:53 -05:00
Sage Weil
ff7813aa14 qa/standalone/scrub/osd-scrub-snaps.sh: adjust expected output
SnapSet now dumps just seq, not a (fake) SnapContext.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 09:55:06 -05:00
Sage Weil
03b9c66080 ceph-objectstore-tool: fix use of SnapSet::snaps
Instead, use clone_snaps to identify clones.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 09:55:06 -05:00
Sage Weil
23eaf7c498 qa/standalone/scrub/osd-scrub-snaps: fix kv grep
SnapMapper keys are now SNA_, not MAP_.

Fixes: http://tracker.ceph.com/issues/40725
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 08:11:21 -05:00
Sage Weil
b2eb5232de Merge PR #28901 into master
* refs/pull/28901/head:
	qa/standalone/scrub/osd-scrub-repair: fix 'scrub ok' grep
	osd/osd_types: remove 'snap_context' from SnapSet::dump()

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-07-08 08:36:05 -05:00
Jeegn Chen
80f4e1f677 osd: support osd_repair_during_recovery
osd_repair_during_recovery=true allow explicitly requested reqair
to be scheduled on OSDs with active recovering.

Fixes: http://tracker.ceph.com/issues/40620
Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
2019-07-08 09:26:27 +08:00
Sage Weil
a960f2faa7 qa/standalone/scrub/osd-scrub-repair: fix 'scrub ok' grep
The log now also has a 'purged_snaps scrub ok' message that (generally)
precedes the first scrubbed PG.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-04 18:27:37 -05:00
Sage Weil
70ad54a0b3 osd/osd_types: remove 'snap_context' from SnapSet::dump()
We no longer have a snaps field with real values, so dumping this as a
"snap_context" is silly.  Instead, just dump the seq.

Adjust qa/standalone/scrub/osd-scrub-repair.sh accordingly.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-04 18:24:41 -05:00
Sage Weil
71e5cba00b Merge PR #28867 into master
* refs/pull/28867/head:
	qa/standalone/ceph-helpers: more osd debug

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-07-03 21:27:20 -05:00
David Zafman
fe3b693d0f
Merge pull request #28334 from dzafman/wip-40073
osd: Fix the way that auto repair triggers after regular scrub

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-07-03 15:27:27 -07:00
Sage Weil
0d0759531a qa/standalone/ceph-helpers: more osd debug
debug_ms=1
debug_monc=20

Hunting down http://tracker.ceph.com/issues/40666

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-03 16:53:00 -05:00
David Zafman
27918bb906 osd: Handle scrub interval changes
Global changes reschedule all PG scrubs
Pool changes reschedule pool PG scrubs

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-06-27 14:20:54 -07:00
Neha Ojha
bd15824567
Merge pull request #28204 from dzafman/wip-39555
mon: Improve health status for backfill_toofull and recovery_toofull

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-06-20 11:12:10 -07:00