Commit Graph

193 Commits

Author SHA1 Message Date
Sage Weil
d014b7924d qa/tasks/ceph_manager: 5s -> 15s for 'osd out' to be visible
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 08:56:50 -05:00
Sage Weil
0b4ce2ab4c qa/tasks/ceph_manager: make is_{clean,recovered,active_or_down} less racy
Currently these can be thrown off if the cluster is creating or removing
pools at the same time.  Fix by taking a single snapshot of the pg stats
and based our judgement on that.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-10 11:04:49 -05:00
Kefu Chai
fbd4836d24 qa/tasks/ceph_manager.py: ignore errors in test_pool_min_size
to be specific, ignore errors when querying erasure coded pool's
erasure-code-profile. the pool might be removed after
"test_pool_min_size" lists all pools and before queries the pools'
erasure-code-profile. in that case, we should just continue on with the
next pool.

normally, the pools are created by the "radosbench" tasks. and they
don't delete the ec profiles after removing the ec pools using them, but
i don't want to rely on this fact. so, in this change, the `try` block
guards both `ceph osd pool get <pool_name> erasure_code_profile`
and `ceph osd erasure-code-profile get <profile>` calls.

Fixes: http://tracker.ceph.com/issues/40533
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-06-27 19:00:23 +08:00
Kefu Chai
1a2700f404 qa/tasks: extract {ERASURE_CODED,REPLICATED}_POOL out
so they can be reused by `Thrasher`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-06-27 19:00:23 +08:00
Chang Liu
b02e2f6cf2 test: update test_pool_min_size test in thrasher
Signed-off-by: Chang Liu <liuchang0812@gmail.com>
2019-05-10 10:45:25 +08:00
Greg Farnum
0ee63a0450 qa: extend get_pool_property() to allow non-int values
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2019-05-10 10:45:25 +08:00
Greg Farnum
7950ce2488 qa: don't create rbd pool for min-size thrashing tests
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2019-05-10 10:45:25 +08:00
Greg Farnum
b701395065 qa: write a thrasher for putting PGs below min_size and watching them recover
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2019-05-10 10:45:25 +08:00
Greg Farnum
78755091f9 qa: remove unused variable from ceph_manager
Pyflakes warned me about this.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2019-05-10 10:45:25 +08:00
Sage Weil
54c5202b74 qa/tasks/ceph: stop any split/merge activity before scrubbing
If there are leftover merges at the end of the run they can take a long
time to get through, blowing our timeout for (waiting for pgs to become
active and to stop splitting/merge) and scrubbing pgs.  Stop all of that
at the end of the run so that we don't have to wait so long.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-14 06:51:21 -06:00
Sage Weil
0d4c4db3c0 qa/tasks/ceph_manager: compare osd flush seq #'s as ints
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:38 -06:00
Sage Weil
ac2430a43d qa/tasks/ceph_manager: make get_mon_status use mon addr
We don't have the 'mon addr' config property any more.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
Sage Weil
28aaca58e7 qa/tasks/ceph_manager: avoid test_map_discontinuity stall with too few up osds
Some tests have m=2,k=2 and this will break them.  Sometimes even if we
have 5 up osds, we end up with 4 and CRUSH gets picky, so build in a
buffer and only do this if we have 6 up.

We don't have an easy way from here to see what the min up osds for healthy
is...  basically this map discontinuity test just sucks.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-11-20 17:12:43 -06:00
Sage Weil
b678356594 qa/tasks/ceph_manager: fix get_stuck_pgs from pg dump change
Fixes 95b7d2340c

Fixes: http://tracker.ceph.com/issues/36485
Signed-off-by: Sage Weil <sage@redhat.com>
2018-10-21 10:52:38 -05:00
Patrick Donnelly
d491227956
qa: fix run call args
Fixes: http://tracker.ceph.com/issues/36450
Introduced-by: 95746ecce9
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-10-15 14:45:18 -07:00
John Spray
67d147c00d
Merge pull request #23622 from renhwztetecs/renhw-wip-25103
mgr: fixup pgs show in unknown state

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: John Spray <john.spray@redhat.com>
2018-10-10 13:28:33 +01:00
Volker Theile
95746ecce9 mgr: Add ability to trigger a cluster/audit log message from Python
Fixes: https://tracker.ceph.com/issues/36194

Signed-off-by: Volker Theile <vtheile@suse.com>
2018-10-04 13:33:18 +02:00
huanwen ren
ed442447c0 qa: modify the format for add pgmap_ready.
Signed-off-by: huanwen ren <ren.huanwen@zte.com.cn>
2018-09-27 23:22:50 +08:00
Kefu Chai
4b0e2c8ed4 qa: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:41:42 +08:00
Kefu Chai
510d9e1345
Merge pull request #23723 from xiexingguo/wip-list-missing
osd/PrimaryLogPG: rename list_missing -> list_unfound command

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-11 20:25:21 +08:00
Sage Weil
6bd682f53d ceph-objectstore-tool: prevent import of pg that has since merged
We currently import a portion of the PG if it has split.  Merge is more
complicated, though, mainly because COT is operating in a mode where it
fast-forwards the PG to the latest OSDMap epoch, which means it has to
implement any transformations to the PG (split/merge) independently.
Avoid doing this for merge.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 12:09:05 -05:00
Sage Weil
0b59b7a688 qa/tasks/thrashosds: support merging pgs too
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 12:09:05 -05:00
xie xingguo
85ba2f0a82 osd/PrimaryLogPG: s/list_missing/list_unfound/
Also:
- Do not print **offset** until specified
- Count missing objects correctly (used to be primary's local missing)

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-09-06 09:52:20 +08:00
Sage Weil
2c26fb0fe1 rados: drop mkpool, rmpool commands
- mkpool and rmpool users should use the normal cli/mon commands

Signed-off-by: Sage Weil <sage@redhat.com>
2018-08-31 09:27:36 -05:00
Dan Mick
7fc8714a27 qa/tasks/{ceph_manager.py,vstart_runner.py}: allow kwargs in raw_*
Allow passing kwargs (like stdin=) to the local and teuthology
clusters when running tests

Signed-off-by: Dan Mick <dan.mick@redhat.com>
2018-06-29 14:51:34 -07:00
David Zafman
151de1797b test: wait_for_pg_stats() should do another check after last 13 second sleep
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-05-23 17:27:14 -07:00
Vasu Kulkarni
7881a19d92 qa/tasks: wait_for_clean is called after ceph task as well after osd's are up,
the default timeout is none in that case, there are cases where it can hang  forever
due to error cases, since this dumps quite a lot of info the logs grow in GB's, with
default timeout of 1200 we can avoid such huge logs and fail sooner. Any tests needing
higher timeout can pass the required value.

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2018-04-09 17:24:42 -07:00
Sage Weil
577737d007 osd: osd_mon_report_interval_min -> osd_mon_report_interval, kill _max
The _max isn't used.  Drop the _min suffix.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-06 11:00:14 -05:00
Tatjana Dehler
25a0ed93ec mgr/dashboard: add 'osd metadata' command call
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2018-03-23 11:11:17 +01:00
Neha Ojha
e3899dc901 qa/tasks/ceph_manager: use set_config on revived osd
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-03-14 12:37:56 -07:00
Sage Weil
8651e15c93 qa/tasks/ceph_manager: tolerate failure to force backfill/recoery
The pool may have been deleted out from underneath us.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-01-03 08:37:02 -06:00
Sage Weil
aafb3a565d qa/tasks/ceph_manager: tolerate tell osd.* error
It's possible for tell osd.* to race against an osd we stopped but the
cluster doesn't know is down yet.  In tha case we'll get ENXIO on that
osd and the command will fail.

In this context, we don't care.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-12-06 17:51:20 -06:00
Kefu Chai
a406553a79 qa/tasks/ceph_manager: add inject_args() method
* move Thrasher._set_config() to CephManager, and make it a public
  method, and rename it to inject_args(),
* use this method instead of using 'tell ... injectargs ...' directly

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-11-29 18:44:16 +08:00
Kefu Chai
749bbda075 qa/tasks: prolong revive_osd() timeout to 6 min
see also #17902

Fixes: http://tracker.ceph.com/issues/21474
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-11-20 13:40:59 +08:00
Kefu Chai
7f549af459 qa: do not wait for down/out osd for pg convergence
that osd is not invovlved in the PG state changes.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-11-08 14:50:10 +08:00
Sage Weil
d21809b14e qa/tasks/thrashosds: set min_in default to 4
We have EC tests with k=2,m=2, so we need a min of 4.

Fixes: http://tracker.ceph.com/issues/21997
Signed-off-by: Sage Weil <sage@redhat.com>
2017-11-01 08:32:48 -05:00
Patrick Donnelly
c58161f25b
Merge PR #17266 into master
* refs/pull/17266/head:
	qa: update test_ceph_argparse to test fs cmds
	qa: use fs rm_data_pool
	qa: fix mdsmap lookup
	qa: remove usage of mds dump
	PendingReleaseNotes: add obsoleted mds commands
	qa: remove use of obsolete mds commands
	ceph_volume_client: remove use of obsolete mds cmd
	doc: update on obsolete mds commands
	cephfs: obsolete deprecated mds commands

Reviewed-by: Douglas Fuller <dfuller@redhat.com>
2017-10-24 16:37:14 -07:00
Patrick Donnelly
3a5f090a1e
qa: remove usage of mds dump
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-10-24 11:32:43 -07:00
Kefu Chai
4c7df944c7 osd: add max-pg-per-osd limit
osd will refused to create new pgs, until its pg number is lower
than the max-pg-per-osd upper bound setting.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-10-17 23:08:40 +08:00
Kefu Chai
e21114274f qa: s/backfill/backfilling/
it's renamed "backfilling" in 4015343f .

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-10-11 11:52:43 +08:00
Sage Weil
b6a5c09dba ceph-objectstore-tool: remove rm-past-intervals op
The OSD doesn't rebuild this on demand anymore.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-10-06 13:08:18 -05:00
Sage Weil
61799c4c8c Merge pull request #17810 from hjwsm1989/wip-21294
qa/ceph_manager: check pg state again before timedout

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-09-25 12:33:34 -05:00
Kefu Chai
42be200c56 qa/tasks: prolong revive_osd() timeout to 6 min
bluestore_fsck_on_mount and bluestore_fsck_on_mount_deep are enabled by
default. and bluestore is used as the default store backend. it takes
longer to perform the deep fsck with verbose log. so prolong the
revive_osd()'s timeout from 150 sec to 360 sec.

Fixes: http://tracker.ceph.com/issues/21474
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-22 10:58:41 +08:00
huangjun
fa40add7f0 qa/ceph_manager: check pg state again before timedout
Pg state maybe all in active+clean when no recovering going on,
so check it again before timedout.

Fixes: http://tracker.ceph.com/issues/21294

Signed-off-by: huangjun <huangjun@xsky.com>
2017-09-20 00:04:04 +08:00
yonghengdexin735
fc5ac9ea69 common:fix error word
Signed-off-by: yonghengdexin735 <zhang.zezhu@zte.com.cn>
2017-09-13 10:22:08 +08:00
David Zafman
3bb20f6d75 ceph-objectstore-tool: Make pg removal require --force
Add new export-remove to combine the 2 operations

Fixes: http://tracker.ceph.com/issues/21272

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-08 17:56:05 -07:00
Sage Weil
21027233b2 qa/tasks/ceph_manager: revive osds before doing final rerr reset
We assume below that rerrosd is up, but it may not be when we exit the
loop.

Fixes: http://tracker.ceph.com/issues/21206
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-31 14:55:46 -04:00
Sage Weil
a40d94b163 qa/tasks/ceph: wait for pg stats to flush in healthy check
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:10:27 -04:00
Sage Weil
80978dea8a qa/tasks/ceph_manager: wait_for_all_up -> wait_for_all_osds_up
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:10:26 -04:00
Sage Weil
7648894e55 qa/tasks/ceph_manager: expose flush_all_pg_stats
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:10:26 -04:00
Sage Weil
02c2e853d3 Merge pull request #16509 from liewegas/wip-rgw-wait
qa/suits/rados/basic/tasks/rgw_snaps: wait for pools to be created

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2017-07-24 11:55:54 -05:00
Sage Weil
29549e6834 Merge pull request #13723 from ovh/bp-forced-recovery
osd/PG: make prioritized recovery possible

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-07-24 09:01:03 -05:00
Sage Weil
ecd1193ab9 qa/suites/rados/basic/tasks/rgw_snaps: wait for pools to be be created
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-22 18:54:46 -04:00
Sage Weil
583a38bca2 qa/tasks/ceph_manager: wait for osd to start after objectstore-tool sequence
Fixes: http://tracker.ceph.com/issues/20705
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-20 11:41:36 -04:00
Piotr Dałek
b0134cc7a8 qa: add force/cancel recovery/backfill to QA testing
This randomly issues pg force-recovery/force-backfill and
pg cancel-force-recovery/cancel-force-backfill during QA
testing. Disabled for upgrades from hammer, jewel and kraken.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
2017-07-20 09:35:55 +02:00
Jason Dillaman
836ab7ad95 test: skip pool application metadata tests if OSDs not at min luminous
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-07-19 13:13:01 -04:00
Sage Weil
56e2965502 qa/tasks/ceph_manager: wait longer for pg stats to flush
An ill-timed mgr restart could blow the current 15s wait.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-13 12:13:45 -04:00
David Zafman
33edfe3a0f test: Add two new singleton test yamls radom-eio and thrash-eio
New option "random_eio" to Thrasher, sets 1 osd random read percentage
New option "objectsize" to radosbench task (-o bench option)
New option "type" to radosbench specify write, seq or rand

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-06-23 08:09:15 -07:00
Sage Weil
6a00ba0e26 qa/tasks/ceph_manager: get osds all in after thrashing
Otherwise we might end up with some PGs remapped, which means they won't
get scrubbed.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-20 12:07:25 -04:00
Sage Weil
f870cc5f28 qa/tasks/thrashosds: wait before wait_for_recovery
Make sure OSDs are up *and* they have flushed their PG stats before
waiting for recovery to ensure that we do not see a stale 'clean' state.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-15 12:14:24 -04:00
Kefu Chai
e8b23d6852 qa/tasks: add a blacklist for flush_pg_stats()
so we don't wait for marked out osds.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:06:50 -04:00
Sage Weil
ab1b78ae00 qa/tasks: use new reliable flush_pg_stats helper
The helper gets a sequence number from the osd (or osds), and then
polls the mon until that seq is reflected there.

This is overkill in some cases, since many tests only require that the
stats be reflected on the mgr (not the mon), but waiting for it to also
reach the mon is sufficient!

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:45 -04:00
Kefu Chai
8abc6e1bea qa/tasks/rebuild_mondb: update to address ceph-mgr changes
- revive ceph-mgr after updating the keyring cap
- grant "mgr:allow *" to client.admin
- minor refactors

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-05-28 09:59:50 +08:00
Sage Weil
5ab996ab3c qa/tasks/ceph_manager: 'ceph $service tell ...' is obsolete
This died forever ago; no need for the fallback here.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-05-23 22:53:53 -04:00
Kefu Chai
da1161cbd8 qa/tasks/ceph_manager: always fix pgp_num when done with thrashosd task
Fixes: http://tracker.ceph.com/issues/19771
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-05-03 18:28:27 +08:00
Sage Weil
27dd6530a2 Merge pull request #14559 from liewegas/wip-pg-map
mon: move 'pg map' to OSDMonitor

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-04-21 18:53:17 -05:00
Sage Weil
069182f91f qa/tasks/ceph_manager: use 'pg map' for get_pg_{primary,replica}
Pulling this out of the 'pg dump' heap is inefficient.
Also, pg dump data comes from the mgr and may be stale.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-04-21 10:56:28 -04:00
Kefu Chai
6fa16c4477 Merge pull request #14584 from tchaikov/wip-19631
qa/suites: Revert "qa/suites: add mon-reweight-min-pgs-per-osd = 4"

Reviewed-by: Sage Weil <sage@redhat.com>
2017-04-21 22:56:21 +08:00
Kefu Chai
e6a436bb27 qa/tasks/ceph_manager: be able to store options with service type
so we are able to change options for services other than mon while
thrashing.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-20 14:18:21 +08:00
Kefu Chai
ee653ba87c Merge pull request #14608 from tchaikov/wip-19594
qa/tasks: assert on pg status with a timeout

Reviewed-by: Sage Weil <sage@redhat.com>
2017-04-20 10:49:12 +08:00
Kefu Chai
960032e513 qa/tasks: update tests with helper to wait for pg-stats
and remove unused helpers

Fixes: http://tracker.ceph.com/issues/19594
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-20 09:35:05 +08:00
Kefu Chai
1207caf3a2 qa/tasks/ceph_manager: add a "wait_for_pg_stats()" decorator
and accompany it with two helpers to access the pg stats in a more
natural way

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-20 09:35:04 +08:00
Josh Durgin
6fba80c1fa osd, OSDMonitor, qa: mark ec overwrites non-experimental
Keep the pool flag around so we can distinguish between a pool that
should maintain hashes for each chunk, and a missing one is a bug, vs
an overwrites pool where we rely on bluestore checksums for detecting
corruption.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2017-04-19 17:45:43 -07:00
Sage Weil
ee1bb01a54 Merge pull request #14556 from liewegas/wip-pgupmap
osd: pg-remap -> pg-upmap

Reviewed-by: David Zafman <dzafman@redhat.com>
2017-04-19 17:07:01 -05:00
Sage Weil
ce188e8fdf osd: pg-remap -> pg-upmap
'remap' is to non-specific a name.  In particular, it
sounds like it is related to the 'remapped' PG state
but in reality it is not related.

'upmap' or 'pg-upmap' is more specific: it maps a pgid
to the 'up' set value (or item)

Signed-off-by: Sage Weil <sage@redhat.com>
2017-04-18 12:59:40 -04:00
Kefu Chai
1b54b5f3f1 Merge pull request #14415 from smithfarm/wip-19556
tests: Thrasher: handle "OSD has the store locked" gracefully

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-04-18 23:18:35 +08:00
David Zafman
a5731076ad osd: Handle backfillfull_ratio just like nearfull and full
Add BACKFILLFULL as a local OSD cur_state
Notify monitor of this new fullness state

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
Nathan Cutler
a5b19d2d73 tests: Thrasher: handle "OSD has the store locked" gracefully
On slower machines (VPS, OVH) it takes time for the OSD to go down.

Fixes: http://tracker.ceph.com/issues/19556
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-04-11 16:09:45 +02:00
Sage Weil
2a08cbbed5 qa/tasks/thrashosds,ceph_manager: thrash pg_remap[_items]
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-28 10:12:10 -04:00
Sage Weil
296708091c qa/tasks/ceph_manager: use new luminous set-full-ratio etc
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 16:39:09 -05:00
Sage Weil
a202b68d18 qa/tasks/thrashosds: chance_thrash_cluster_full
Induce a momentarily full cluster.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 13:33:44 -05:00
Samuel Just
44b26f6ab4 Merge pull request #13594 from athanatos/wip-snap-trim-sleep
osd: add snap trim reservation and re-implement osd_snap_trim_sleep

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-02-24 14:09:17 -08:00
Kefu Chai
c0f0cde399 test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens
we should not update pools_to_fix_pgp_num if the pool is not expanded or
the pg_num is not increased due to pgs being created. this prevent us
from fixing the pgp_num after done with thrashing if we actually did
nothing when fixing the pgp_num when thrashing, but we removed the pool
from pools_to_fix_pgp_num after set_pool_pgpnum() returns.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-19 13:10:46 +08:00
Samuel Just
4aebf59d90 rados: check that pool is done trimming before removing it
Signed-off-by: Samuel Just <sjust@redhat.com>
2017-02-13 09:47:02 -08:00
Kefu Chai
de59b5102c test: Thrasher: restore changed options after done with thrash
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:51 +08:00
Kefu Chai
761a1dc391 tests: Thrasher: extract _set_config() method
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Kefu Chai
995e144e3e tests: CephManager: add get_config() method
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Kefu Chai
136483a8f9 test: Thrasher: update pgp_num of all expanded pools if not yet
otherwise wait_until_healthy will fail after timeout as seeing warning
like:

HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Nathan Cutler
db2582e25e tests: fix regression in qa/tasks/ceph_master.py
https://github.com/ceph/ceph/pull/13194 introduced a regression:

2017-02-06T16:14:23.162 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 722, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 839, in do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 305, in kill_osd
    output = proc.stderr.getvalue()
AttributeError: 'NoneType' object has no attribute 'getvalue'

This is because the original patch failed to pass "stderr=StringIO()" to run().

Fixes: http://tracker.ceph.com/issues/16263
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-06 19:37:38 +01:00
Sage Weil
5fc3dd36e2 Merge pull request #13237 from smithfarm/wip-18799
tests: Thrasher: eliminate a race between kill_osd and __init__

Reviewed-by: Sage Weil <sage@redhat.com>
2017-02-05 12:49:30 -06:00
Nathan Cutler
b519d38fb1 tests: Thrasher: eliminate a race between kill_osd and __init__
If Thrasher.__init__() spawns the do_thrash thread before initializing the
ceph_objectstore_tool property, do_thrash races with the rest
of Thrasher.__init__() and in some cases do_thrash can call kill_osd() before
Trasher.__init__() progresses much further. This can lead to an exception
("AttributeError: Thrasher instance has no attribute 'ceph_objectstore_tool'")
being thrown in kill_osd().

This commit eliminates the race by making sure the ceph_objectstore_tool
attribute is initialized before the do_thrash thread is spawned.

Fixes: http://tracker.ceph.com/issues/18799
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-02-02 23:23:54 +01:00
Nathan Cutler
046e873026 tests: ignore bogus ceph-objectstore-tool error in ceph_manager
Fixes: http://tracker.ceph.com/issues/16263
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-01-31 00:49:05 +01:00
Sage Weil
c01f2ee0e2 move ceph-qa-suite dirs into qa/ 2016-12-14 11:29:55 -06:00