Commit Graph

105 Commits

Author SHA1 Message Date
Patrick Donnelly
f13b3483e7
Merge PR #28855 into master
* refs/pull/28855/head:
	doc: document scrub summary in ceph status output
	test: extend scrub control test to validate mds task status
	mds: send scrub state changes to cluster log.
	mds: periodically sent mds scrub status to ceph manager
	mgr, mon: allow normal ceph services to register with manager

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-08-23 16:16:16 -07:00
Patrick Donnelly
dad94db7ae
Merge PR #28378 into master
* refs/pull/28378/head:
	qa/tasks: introduce Thrasher base class
	qa/tasks: Fix typo
	qa/tasks: manage thrashers
	qa/tasks: start DaemonWatchdog when ceph starts
	qa/tasks: make watch and bark handle more daemons
	qa/tasks: move DaemonWatchdog to new file

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-08-21 10:57:15 -07:00
Jos Collin
f31791e35d
qa/tasks: introduce Thrasher base class
* Introduced a Thrasher base class.
* Updated thrashers to inherit from Thrasher.
* Replaced the magic variable e with Thrasher.exception as per the discussion.
  Now the exception variable sets by default as the thrashers are inheriting
  from the Thrasher class.

Fixes: https://github.com/ceph/ceph/pull/28378#discussion_r309337928
Fixes: https://tracker.ceph.com/issues/41133
Signed-off-by: Jos Collin <jcollin@redhat.com>
2019-08-21 10:49:46 +05:30
Sage Weil
8827bc1022 Merge PR #29493 into master
* refs/pull/29493/head:
	qa/tasks/mgr/mgr_test_case: get mgrmap from 'mgr dump', not status
	qa/tasks/ceph_manager: no newlines in 'ceph -s' output
	mon: make mon summary more concise in 'ceph -s'
	mon/MgrStatMonitor: set initial service_map 'modified' to cluster mkfs
	mon: remove double-nesting of "osdmap" for ceph status
	mon/MgrMap: make print_summary (used by 'ceph -s') more concise

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-08-08 15:52:45 -05:00
Venky Shankar
465a3adc6c test: extend scrub control test to validate mds task status
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2019-08-06 02:33:09 -04:00
Sage Weil
9719920920 qa/tasks/ceph_manager: no newlines in 'ceph -s' output
This gets dumped to the log, making it hard to read.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-05 22:02:31 -05:00
Sage Weil
41e4056174 qa/tasks/ceph_manager: remove race from all_active_or_peered()
Signed-off-by: Sage Weil <sage@redhat.com>
2019-08-05 14:01:02 -05:00
Kefu Chai
310ccd9f9f qa/tasks/ceph_manager.py: always use self.logger
in fbd4836d, a regression is introduced:

self.log("failed to read erasure_code_profile. %s was likely removed",
pool)

because `self.log` is actually a lambda which just do

self.logger.info(x)

in this change

* `Thrasher.log()` is added for three reasons:
 - in PEP-8,

> Always use a def statement instead of an assignment statement that
> binds a lambda expression directly to an identifier
so a better way is to define a method using `def`
 - and i think it helps with the readability
* `logger` parameter is now mandatory now in the constructor of
  `Thrasher` class. because the instance of this class is only created
by `qa/tasks/thrashosds.py`, like:

thrash_proc = ceph_manager.Thrasher(
        cluster_manager,
        config,
        logger=log.getChild('thrasher')
        )

and `log.getChild()` does not return `None`, so there is no need to
handle that case.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-07-24 10:28:00 +08:00
Sage Weil
c8f35af6e1 Merge PR #29109 into master
* refs/pull/29109/head:
	qa/tasks/ceph_manager: wait for clean before asserting clean on minsize test

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-07-22 13:39:31 -05:00
Sage Weil
848c5b4a9a qa/tasks/ceph_manager: fix thrash_pg_upmap_items when no pools
Follow-on to e7ca5a92d4

Fixes: https://tracker.ceph.com/issues/40635
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-19 14:49:26 -05:00
Sage Weil
50b439f22f qa/tasks/ceph_manager: wait for clean before asserting clean on minsize test
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-18 09:11:30 -05:00
Sage Weil
e7ca5a92d4 qa/tasks/ceph_manager: make upmap thrasher behave when no pools/pgs
Fixes: https://tracker.ceph.com/issues/40635
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-16 10:04:35 -05:00
Sage Weil
d014b7924d qa/tasks/ceph_manager: 5s -> 15s for 'osd out' to be visible
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 08:56:50 -05:00
Sage Weil
0b4ce2ab4c qa/tasks/ceph_manager: make is_{clean,recovered,active_or_down} less racy
Currently these can be thrown off if the cluster is creating or removing
pools at the same time.  Fix by taking a single snapshot of the pg stats
and based our judgement on that.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-10 11:04:49 -05:00
Kefu Chai
fbd4836d24 qa/tasks/ceph_manager.py: ignore errors in test_pool_min_size
to be specific, ignore errors when querying erasure coded pool's
erasure-code-profile. the pool might be removed after
"test_pool_min_size" lists all pools and before queries the pools'
erasure-code-profile. in that case, we should just continue on with the
next pool.

normally, the pools are created by the "radosbench" tasks. and they
don't delete the ec profiles after removing the ec pools using them, but
i don't want to rely on this fact. so, in this change, the `try` block
guards both `ceph osd pool get <pool_name> erasure_code_profile`
and `ceph osd erasure-code-profile get <profile>` calls.

Fixes: http://tracker.ceph.com/issues/40533
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-06-27 19:00:23 +08:00
Kefu Chai
1a2700f404 qa/tasks: extract {ERASURE_CODED,REPLICATED}_POOL out
so they can be reused by `Thrasher`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-06-27 19:00:23 +08:00
Chang Liu
b02e2f6cf2 test: update test_pool_min_size test in thrasher
Signed-off-by: Chang Liu <liuchang0812@gmail.com>
2019-05-10 10:45:25 +08:00
Greg Farnum
0ee63a0450 qa: extend get_pool_property() to allow non-int values
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2019-05-10 10:45:25 +08:00
Greg Farnum
7950ce2488 qa: don't create rbd pool for min-size thrashing tests
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2019-05-10 10:45:25 +08:00
Greg Farnum
b701395065 qa: write a thrasher for putting PGs below min_size and watching them recover
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2019-05-10 10:45:25 +08:00
Greg Farnum
78755091f9 qa: remove unused variable from ceph_manager
Pyflakes warned me about this.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2019-05-10 10:45:25 +08:00
Sage Weil
54c5202b74 qa/tasks/ceph: stop any split/merge activity before scrubbing
If there are leftover merges at the end of the run they can take a long
time to get through, blowing our timeout for (waiting for pgs to become
active and to stop splitting/merge) and scrubbing pgs.  Stop all of that
at the end of the run so that we don't have to wait so long.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-14 06:51:21 -06:00
Sage Weil
0d4c4db3c0 qa/tasks/ceph_manager: compare osd flush seq #'s as ints
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:38 -06:00
Sage Weil
ac2430a43d qa/tasks/ceph_manager: make get_mon_status use mon addr
We don't have the 'mon addr' config property any more.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
Sage Weil
28aaca58e7 qa/tasks/ceph_manager: avoid test_map_discontinuity stall with too few up osds
Some tests have m=2,k=2 and this will break them.  Sometimes even if we
have 5 up osds, we end up with 4 and CRUSH gets picky, so build in a
buffer and only do this if we have 6 up.

We don't have an easy way from here to see what the min up osds for healthy
is...  basically this map discontinuity test just sucks.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-11-20 17:12:43 -06:00
Sage Weil
b678356594 qa/tasks/ceph_manager: fix get_stuck_pgs from pg dump change
Fixes 95b7d2340c

Fixes: http://tracker.ceph.com/issues/36485
Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-21 10:52:38 -05:00
Patrick Donnelly
d491227956
qa: fix run call args
Fixes: http://tracker.ceph.com/issues/36450
Introduced-by: 95746ecce9
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-10-15 14:45:18 -07:00
John Spray
67d147c00d
Merge pull request #23622 from renhwztetecs/renhw-wip-25103
mgr: fixup pgs show in unknown state

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: John Spray <john.spray@redhat.com>
2018-10-10 13:28:33 +01:00
Volker Theile
95746ecce9 mgr: Add ability to trigger a cluster/audit log message from Python
Fixes: https://tracker.ceph.com/issues/36194

Signed-off-by: Volker Theile <vtheile@suse.com>
2018-10-04 13:33:18 +02:00
huanwen ren
ed442447c0 qa: modify the format for add pgmap_ready.
Signed-off-by: huanwen ren <ren.huanwen@zte.com.cn>
2018-09-27 23:22:50 +08:00
Kefu Chai
4b0e2c8ed4 qa: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:41:42 +08:00
Kefu Chai
510d9e1345
Merge pull request #23723 from xiexingguo/wip-list-missing
osd/PrimaryLogPG: rename list_missing -> list_unfound command

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-11 20:25:21 +08:00
Sage Weil
6bd682f53d ceph-objectstore-tool: prevent import of pg that has since merged
We currently import a portion of the PG if it has split.  Merge is more
complicated, though, mainly because COT is operating in a mode where it
fast-forwards the PG to the latest OSDMap epoch, which means it has to
implement any transformations to the PG (split/merge) independently.
Avoid doing this for merge.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 12:09:05 -05:00
Sage Weil
0b59b7a688 qa/tasks/thrashosds: support merging pgs too
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 12:09:05 -05:00
xie xingguo
85ba2f0a82 osd/PrimaryLogPG: s/list_missing/list_unfound/
Also:
- Do not print **offset** until specified
- Count missing objects correctly (used to be primary's local missing)

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-09-06 09:52:20 +08:00
Sage Weil
2c26fb0fe1 rados: drop mkpool, rmpool commands
- mkpool and rmpool users should use the normal cli/mon commands

Signed-off-by: Sage Weil <sage@redhat.com>
2018-08-31 09:27:36 -05:00
Dan Mick
7fc8714a27 qa/tasks/{ceph_manager.py,vstart_runner.py}: allow kwargs in raw_*
Allow passing kwargs (like stdin=) to the local and teuthology
clusters when running tests

Signed-off-by: Dan Mick <dan.mick@redhat.com>
2018-06-29 14:51:34 -07:00
David Zafman
151de1797b test: wait_for_pg_stats() should do another check after last 13 second sleep
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-05-23 17:27:14 -07:00
Vasu Kulkarni
7881a19d92 qa/tasks: wait_for_clean is called after ceph task as well after osd's are up,
the default timeout is none in that case, there are cases where it can hang  forever
due to error cases, since this dumps quite a lot of info the logs grow in GB's, with
default timeout of 1200 we can avoid such huge logs and fail sooner. Any tests needing
higher timeout can pass the required value.

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2018-04-09 17:24:42 -07:00
Sage Weil
577737d007 osd: osd_mon_report_interval_min -> osd_mon_report_interval, kill _max
The _max isn't used.  Drop the _min suffix.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-06 11:00:14 -05:00
Tatjana Dehler
25a0ed93ec mgr/dashboard: add 'osd metadata' command call
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2018-03-23 11:11:17 +01:00
Neha Ojha
e3899dc901 qa/tasks/ceph_manager: use set_config on revived osd
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-03-14 12:37:56 -07:00
Sage Weil
8651e15c93 qa/tasks/ceph_manager: tolerate failure to force backfill/recoery
The pool may have been deleted out from underneath us.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-01-03 08:37:02 -06:00
Sage Weil
aafb3a565d qa/tasks/ceph_manager: tolerate tell osd.* error
It's possible for tell osd.* to race against an osd we stopped but the
cluster doesn't know is down yet.  In tha case we'll get ENXIO on that
osd and the command will fail.

In this context, we don't care.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-12-06 17:51:20 -06:00
Kefu Chai
a406553a79 qa/tasks/ceph_manager: add inject_args() method
* move Thrasher._set_config() to CephManager, and make it a public
  method, and rename it to inject_args(),
* use this method instead of using 'tell ... injectargs ...' directly

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-11-29 18:44:16 +08:00
Kefu Chai
749bbda075 qa/tasks: prolong revive_osd() timeout to 6 min
see also #17902

Fixes: http://tracker.ceph.com/issues/21474
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-11-20 13:40:59 +08:00
Kefu Chai
7f549af459 qa: do not wait for down/out osd for pg convergence
that osd is not invovlved in the PG state changes.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-11-08 14:50:10 +08:00
Sage Weil
d21809b14e qa/tasks/thrashosds: set min_in default to 4
We have EC tests with k=2,m=2, so we need a min of 4.

Fixes: http://tracker.ceph.com/issues/21997
Signed-off-by: Sage Weil <sage@redhat.com>
2017-11-01 08:32:48 -05:00
Patrick Donnelly
c58161f25b
Merge PR #17266 into master
* refs/pull/17266/head:
	qa: update test_ceph_argparse to test fs cmds
	qa: use fs rm_data_pool
	qa: fix mdsmap lookup
	qa: remove usage of mds dump
	PendingReleaseNotes: add obsoleted mds commands
	qa: remove use of obsolete mds commands
	ceph_volume_client: remove use of obsolete mds cmd
	doc: update on obsolete mds commands
	cephfs: obsolete deprecated mds commands

Reviewed-by: Douglas Fuller <dfuller@redhat.com>
2017-10-24 16:37:14 -07:00
Patrick Donnelly
3a5f090a1e
qa: remove usage of mds dump
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-10-24 11:32:43 -07:00