Commit Graph

101 Commits

Author SHA1 Message Date
Sage Weil
7edca203d8 qa/tasks/ceph.py: give everyone mgr caps
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:26 -04:00
Sage Weil
2a08cbbed5 qa/tasks/thrashosds,ceph_manager: thrash pg_remap[_items]
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-28 10:12:10 -04:00
Casey Bodley
e3e3a71d1f qa: rgw task uses period instead of region-map
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-03-20 11:50:03 -04:00
Kefu Chai
bd36f13163 doc: fix the links to http://ceph.com/docs
they should point to http://docs.ceph.com/docs/master/.. instead

Fixes: http://tracker.ceph.com/issues/19090
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-03-15 16:40:07 +08:00
Yehuda Sadeh
515db13970 qa/tasks/radosgw_admin: adjust test to new bucket structure
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2017-03-09 09:18:56 -08:00
John Spray
41f8ded3e7 qa: update TestDamage for PurgeQueue
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:03 +00:00
John Spray
1a1951002d qa: update TestFlush for changed stray perf counters
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:03 +00:00
John Spray
6cf9c2956c qa: add TestStrays.test_purge_queue_op_rate
For ensuring that the PurgeQueue code is not generating
too many extra IOs.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:02 +00:00
John Spray
3e66de2182 mds: create purge queue if it's not found
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:26:59 +00:00
John Spray
f826c7e8aa qa/cephfs: add TestStrays.test_purge_on_shutdown
...and change test_migration_on_shutdown to
specifically target non-purgeable strays (i.e.
hardlink-ish things).

Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:26:55 +00:00
John Spray
3970502c9b qa: update test_strays for purgequeue
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:20:59 +00:00
Sage Weil
7fbe8fb085 Merge pull request #13759 from liewegas/wip-19133
osdc/Objecter: resend RWORDERED ops on full

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2017-03-07 21:31:50 -06:00
Sage Weil
296708091c qa/tasks/ceph_manager: use new luminous set-full-ratio etc
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 16:39:09 -05:00
Sage Weil
a202b68d18 qa/tasks/thrashosds: chance_thrash_cluster_full
Induce a momentarily full cluster.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 13:33:44 -05:00
Radoslaw Zarzynski
6440750f53 qa/tasks/rgw.py: start Apache before RadosGW.
At the end of start_rgw() we wait till establishing HTTP connections
with RadosGW become possible. However, if RadosGW uses the FastCGI,
the condition can't be fulfilled without spawning HTTP server first.

Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
2017-03-07 17:31:52 +01:00
John Spray
73100305e5 Merge pull request #13262 from batrick/multimds-thrasher
Add multimds:thrash sub-suite and fix bugs in thrasher for multimds

Reviewed-by: John Spray <john.spray@redhat.com>
2017-03-07 14:29:18 +00:00
John Spray
39204abeda Merge pull request #13282 from jcsp/wip-fuse-mount-teardown
tasks/cephfs: tear down on mount() failure

Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-02-28 15:04:59 +00:00
Kefu Chai
edceabbd47 qa/tasks/workunit: use ceph.git as an alternative of ceph-ci.git for workunit repo
if we run upgrade test, where, for example, "jewel" is not in
ceph-ci.git repo, we should check ceph.git to clone the workunits.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-27 17:36:05 +08:00
Sage Weil
af5dab0613 Merge pull request #13649 from liewegas/wip-ceph-scrub-debug
qa/tasks/ceph.py: debug which pgs aren't scrubbing

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
2017-02-25 13:15:06 -06:00
Sage Weil
f777d849e7 qa/tasks/ceph.py: debug which pgs aren't scrubbing
Signed-off-by: Sage Weil <sage@redhat.com>
2017-02-24 23:07:34 -05:00
Samuel Just
44b26f6ab4 Merge pull request #13594 from athanatos/wip-snap-trim-sleep
osd: add snap trim reservation and re-implement osd_snap_trim_sleep

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-02-24 14:09:17 -08:00
Kefu Chai
4cf28de4c9 qa/tasks/workunit: use the suite repo for cloning workunit
as "workunits" reside in ceph/qa/workunits, it's more intuitive to
respect suite-repo option when cloning workunits.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-24 16:47:47 +08:00
John Spray
de5249436c Merge pull request #13359 from jcsp/wip-logrotate-sshexception
qa: handle SSHException in logrotate

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-02-22 10:05:07 +00:00
Kefu Chai
b3e516fc38 Merge pull request #13518 from tchaikov/wip-fix-pgp-num
test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens

Reviewed-by: Sage Weil <sage@redhat.com>
2017-02-21 00:46:26 +08:00
Kefu Chai
c0f0cde399 test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens
we should not update pools_to_fix_pgp_num if the pool is not expanded or
the pg_num is not increased due to pgs being created. this prevent us
from fixing the pgp_num after done with thrashing if we actually did
nothing when fixing the pgp_num when thrashing, but we removed the pool
from pools_to_fix_pgp_num after set_pool_pgpnum() returns.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-19 13:10:46 +08:00
Sage Weil
86c0d07e32 qa/tasks/ceph.py: fix timing of wait-for-* and osd markdown
Mark down osds, *then* wait for them to come up or for the cluster to be
healthy!

Signed-off-by: Sage Weil <sage@redhat.com>
2017-02-18 21:12:23 -05:00
Sage Weil
96bc86b537 Revert "qa/tasks/workunit: use the suite repo for cloning workunit" 2017-02-17 11:54:27 -06:00
Kefu Chai
1f82b9b944 qa/tasks/workunit: use the suite repo for cloning workunit
as "workunits" reside in ceph/qa/workunits, it's more intuitive to
respect suite-repo option when cloning workunits.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-16 15:05:51 +08:00
Samuel Just
4aebf59d90 rados: check that pool is done trimming before removing it
Signed-off-by: Samuel Just <sjust@redhat.com>
2017-02-13 09:47:02 -08:00
Kefu Chai
de59b5102c test: Thrasher: restore changed options after done with thrash
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:51 +08:00
Kefu Chai
761a1dc391 tests: Thrasher: extract _set_config() method
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Kefu Chai
995e144e3e tests: CephManager: add get_config() method
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Kefu Chai
136483a8f9 test: Thrasher: update pgp_num of all expanded pools if not yet
otherwise wait_until_healthy will fail after timeout as seeing warning
like:

HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
John Spray
880cbf09aa Merge pull request #13137 from jcsp/wip-18661
qa: fix race in Mount.open_background

Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-02-10 17:48:05 +00:00
John Spray
a3fd3f225c Merge pull request #13099 from jcsp/wip-18663
qa/tasks: force umount during kclient teardown
2017-02-10 17:42:37 +00:00
John Spray
6f9e11f03d qa: handle SSHException in logrotate
Yet another different type of exception we may get when
orchestra.run can't talk to a remote host.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-02-10 17:16:24 +00:00
Nathan Cutler
6b7443fb50 tests: drop buildpackages.py
The buildpackages suite has been moved to teuthology. This cleans up a file
that was left behind by https://github.com/ceph/ceph/pull/13297

Fixes: http://tracker.ceph.com/issues/18846
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-02-08 21:23:54 +01:00
Loic Dachary
5a43f8d579 buildpackages: remove because it does not belong
It should live in teuthology, not in Ceph. And it is currently broken:
there is no need to keep it around.

Fixes: http://tracker.ceph.com/issues/18846

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-02-07 18:37:26 +01:00
John Spray
6203f33df4 tasks/cephfs: tear down on mount() failure
There were some cases where we would leave a mountpoint
that would cause the teuthology teardown to get hung up
when it tried to look inside cephtest/

Signed-off-by: John Spray <john.spray@redhat.com>
2017-02-06 22:53:21 +00:00
Patrick Donnelly
d748226f00
qa: add DaemonWatchdog to stop tests on failure
Thrashing MDS will often result in failures which often do not stop the
test. The failure may also cause the test to stall which will force the
machines to needlessly be locked until a timeout is reached. This
watchdog will unmount mounts and kill daemons when a failure is
detected.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
f005e8af6b
qa: disable max_mds changes during thrashing
While the trasher supports the behavior desired by issue 10792 [1], the
bugs uncovered due to deactivating MDS (and sometimes killing
deactivating MDS) are presently a distraction from addressing issues
during normal failures. So now thrashing max_mds is turned off by
default. I have added a TODO to deactivate ranks in order (configurably)
as random deactivation causes a lot of other problems.

This also fixes a bug: random.randrange(0.0, 1.0) always returns 0.
Oops.

[1] http://tracker.ceph.com/issues/10792

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
82662edd7f
qa: do not pretty the json to shorten stdout log
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
a0052fc2d6
qa: use gevent.sleep so greenlet yields
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
cf9e0da078
qa: use fs methods for setting configs
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
0098873fb7
qa: remove old comment
Filesystem is now cluster aware.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
fd4b61890d
qa: allow revived MDS to be up:active
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
884215d933
qa: timeout waiting for thrashed MDS to revive
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
8e9ea7b6ac
qa: configure thrashing while MDS are stopping
Currently multimds is prone to many failures when killing an active or
stopping MDS when there are MDS in the cluster which have been
deactivated (stopping). Have this turned off by default for now.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
6304b6ed5d
qa: add deactivation log message
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
1185326c45
qa: avoid infinite wait if no repl. can be made
The thrasher can enter an infinite loop waiting for an MDS to take a
certain rank when a replacement may not be possible. For example,
max_mds actives are already running.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00