Commit Graph

110 Commits

Author SHA1 Message Date
vasukulkarni
574049a90b Merge pull request #14229 from ceph/wip-systemd
qa: Add reboot case for systemd test
2017-03-31 09:15:53 -07:00
John Spray
992b8499d0 Merge pull request #14254 from idryomov/wip-vstart-runner-ps
qa/vstart_runner: amend ps invocation

Reviewed-by: John Spray <john.spray@redhat.com>
2017-03-31 17:15:30 +01:00
Kefu Chai
9ca7ccf5f1 tasks/workunit.py: specify the branch name when cloning a branch
c1309fb failed to specify a branch when cloning using --depth=1, which
by default clones the HEAD. and we can not "git checkout" a specific
sha1 if it is not HEAD, after cloning using '--depth=1', so in this
change, we dispatch "tag", "branch", "HEAD" using three Refspec classes.

Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Dan Mick <dan.mick@redhat.com>
2017-03-30 20:30:09 -07:00
Sage Weil
578b0f7cfc Merge pull request #13617 from liewegas/wip-mgr-commands
mon,mgr: tag some commands for ceph-mgr

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-30 17:12:00 -05:00
Ilya Dryomov
8d8cd4e4d5 qa/vstart_runner: amend ps invocation
"ps -xwwu<id>" is parsed as BSD, because -x is not a UNIX option.
"u" is a BSD option for user-oriented format, so the <id> ends up being
parsed as an old-style "select by pid".  The only reason this command
doesn't dump other user's processes is that the BSD "only yourself"
restriction is in effect.

I'm not sure what's wrong with a simple "ps xww", but if we want to
select by euid, let's do it right.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-03-30 19:36:43 +02:00
Vasu Kulkarni
7b587304a5 Add reboot case for systemd test
test systemd units restart after reboot

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-03-29 10:30:49 -07:00
Sage Weil
5dc9b8d026 qa/tasks/dump_stuck.py: stop making assertions about 'health' report
Health comes from teh mon, while the pg stats come from teh mgr, so they
may be out of sync.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:27 -04:00
Sage Weil
fa0b2164ad qa/tasks/ceph.py: add 'skip_mgr_daemons' option
For upgrades

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:26 -04:00
Sage Weil
7edca203d8 qa/tasks/ceph.py: give everyone mgr caps
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:26 -04:00
Dan Mick
c1309fbef3 tasks/workunit.py: when cloning, use --depth=1
Help avoid killing git.ceph.com.  A depth 1 clone takes about
7 seconds, whereas a full one takes about 3:40 (much of it
waiting for the server to create a huge compressed pack)

Signed-off-by: Dan Mick <dan.mick@redhat.com>
2017-03-28 20:09:44 -07:00
Sage Weil
2a08cbbed5 qa/tasks/thrashosds,ceph_manager: thrash pg_remap[_items]
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-28 10:12:10 -04:00
Casey Bodley
e3e3a71d1f qa: rgw task uses period instead of region-map
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-03-20 11:50:03 -04:00
Kefu Chai
bd36f13163 doc: fix the links to http://ceph.com/docs
they should point to http://docs.ceph.com/docs/master/.. instead

Fixes: http://tracker.ceph.com/issues/19090
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-03-15 16:40:07 +08:00
Yehuda Sadeh
515db13970 qa/tasks/radosgw_admin: adjust test to new bucket structure
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2017-03-09 09:18:56 -08:00
John Spray
41f8ded3e7 qa: update TestDamage for PurgeQueue
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:03 +00:00
John Spray
1a1951002d qa: update TestFlush for changed stray perf counters
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:03 +00:00
John Spray
6cf9c2956c qa: add TestStrays.test_purge_queue_op_rate
For ensuring that the PurgeQueue code is not generating
too many extra IOs.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:02 +00:00
John Spray
3e66de2182 mds: create purge queue if it's not found
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:26:59 +00:00
John Spray
f826c7e8aa qa/cephfs: add TestStrays.test_purge_on_shutdown
...and change test_migration_on_shutdown to
specifically target non-purgeable strays (i.e.
hardlink-ish things).

Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:26:55 +00:00
John Spray
3970502c9b qa: update test_strays for purgequeue
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:20:59 +00:00
Sage Weil
7fbe8fb085 Merge pull request #13759 from liewegas/wip-19133
osdc/Objecter: resend RWORDERED ops on full

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2017-03-07 21:31:50 -06:00
Sage Weil
296708091c qa/tasks/ceph_manager: use new luminous set-full-ratio etc
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 16:39:09 -05:00
Sage Weil
a202b68d18 qa/tasks/thrashosds: chance_thrash_cluster_full
Induce a momentarily full cluster.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 13:33:44 -05:00
Radoslaw Zarzynski
6440750f53 qa/tasks/rgw.py: start Apache before RadosGW.
At the end of start_rgw() we wait till establishing HTTP connections
with RadosGW become possible. However, if RadosGW uses the FastCGI,
the condition can't be fulfilled without spawning HTTP server first.

Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
2017-03-07 17:31:52 +01:00
John Spray
73100305e5 Merge pull request #13262 from batrick/multimds-thrasher
Add multimds:thrash sub-suite and fix bugs in thrasher for multimds

Reviewed-by: John Spray <john.spray@redhat.com>
2017-03-07 14:29:18 +00:00
John Spray
39204abeda Merge pull request #13282 from jcsp/wip-fuse-mount-teardown
tasks/cephfs: tear down on mount() failure

Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-02-28 15:04:59 +00:00
Kefu Chai
edceabbd47 qa/tasks/workunit: use ceph.git as an alternative of ceph-ci.git for workunit repo
if we run upgrade test, where, for example, "jewel" is not in
ceph-ci.git repo, we should check ceph.git to clone the workunits.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-27 17:36:05 +08:00
Sage Weil
af5dab0613 Merge pull request #13649 from liewegas/wip-ceph-scrub-debug
qa/tasks/ceph.py: debug which pgs aren't scrubbing

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
2017-02-25 13:15:06 -06:00
Sage Weil
f777d849e7 qa/tasks/ceph.py: debug which pgs aren't scrubbing
Signed-off-by: Sage Weil <sage@redhat.com>
2017-02-24 23:07:34 -05:00
Samuel Just
44b26f6ab4 Merge pull request #13594 from athanatos/wip-snap-trim-sleep
osd: add snap trim reservation and re-implement osd_snap_trim_sleep

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-02-24 14:09:17 -08:00
Kefu Chai
4cf28de4c9 qa/tasks/workunit: use the suite repo for cloning workunit
as "workunits" reside in ceph/qa/workunits, it's more intuitive to
respect suite-repo option when cloning workunits.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-24 16:47:47 +08:00
John Spray
de5249436c Merge pull request #13359 from jcsp/wip-logrotate-sshexception
qa: handle SSHException in logrotate

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-02-22 10:05:07 +00:00
Kefu Chai
b3e516fc38 Merge pull request #13518 from tchaikov/wip-fix-pgp-num
test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens

Reviewed-by: Sage Weil <sage@redhat.com>
2017-02-21 00:46:26 +08:00
Kefu Chai
c0f0cde399 test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens
we should not update pools_to_fix_pgp_num if the pool is not expanded or
the pg_num is not increased due to pgs being created. this prevent us
from fixing the pgp_num after done with thrashing if we actually did
nothing when fixing the pgp_num when thrashing, but we removed the pool
from pools_to_fix_pgp_num after set_pool_pgpnum() returns.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-19 13:10:46 +08:00
Sage Weil
86c0d07e32 qa/tasks/ceph.py: fix timing of wait-for-* and osd markdown
Mark down osds, *then* wait for them to come up or for the cluster to be
healthy!

Signed-off-by: Sage Weil <sage@redhat.com>
2017-02-18 21:12:23 -05:00
Sage Weil
96bc86b537 Revert "qa/tasks/workunit: use the suite repo for cloning workunit" 2017-02-17 11:54:27 -06:00
Kefu Chai
1f82b9b944 qa/tasks/workunit: use the suite repo for cloning workunit
as "workunits" reside in ceph/qa/workunits, it's more intuitive to
respect suite-repo option when cloning workunits.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-16 15:05:51 +08:00
Samuel Just
4aebf59d90 rados: check that pool is done trimming before removing it
Signed-off-by: Samuel Just <sjust@redhat.com>
2017-02-13 09:47:02 -08:00
Kefu Chai
de59b5102c test: Thrasher: restore changed options after done with thrash
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:51 +08:00
Kefu Chai
761a1dc391 tests: Thrasher: extract _set_config() method
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Kefu Chai
995e144e3e tests: CephManager: add get_config() method
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Kefu Chai
136483a8f9 test: Thrasher: update pgp_num of all expanded pools if not yet
otherwise wait_until_healthy will fail after timeout as seeing warning
like:

HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
John Spray
880cbf09aa Merge pull request #13137 from jcsp/wip-18661
qa: fix race in Mount.open_background

Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-02-10 17:48:05 +00:00
John Spray
a3fd3f225c Merge pull request #13099 from jcsp/wip-18663
qa/tasks: force umount during kclient teardown
2017-02-10 17:42:37 +00:00
John Spray
6f9e11f03d qa: handle SSHException in logrotate
Yet another different type of exception we may get when
orchestra.run can't talk to a remote host.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-02-10 17:16:24 +00:00
Nathan Cutler
6b7443fb50 tests: drop buildpackages.py
The buildpackages suite has been moved to teuthology. This cleans up a file
that was left behind by https://github.com/ceph/ceph/pull/13297

Fixes: http://tracker.ceph.com/issues/18846
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-02-08 21:23:54 +01:00
Loic Dachary
5a43f8d579 buildpackages: remove because it does not belong
It should live in teuthology, not in Ceph. And it is currently broken:
there is no need to keep it around.

Fixes: http://tracker.ceph.com/issues/18846

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-02-07 18:37:26 +01:00
John Spray
6203f33df4 tasks/cephfs: tear down on mount() failure
There were some cases where we would leave a mountpoint
that would cause the teuthology teardown to get hung up
when it tried to look inside cephtest/

Signed-off-by: John Spray <john.spray@redhat.com>
2017-02-06 22:53:21 +00:00
Patrick Donnelly
d748226f00
qa: add DaemonWatchdog to stop tests on failure
Thrashing MDS will often result in failures which often do not stop the
test. The failure may also cause the test to stall which will force the
machines to needlessly be locked until a timeout is reached. This
watchdog will unmount mounts and kill daemons when a failure is
detected.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
f005e8af6b
qa: disable max_mds changes during thrashing
While the trasher supports the behavior desired by issue 10792 [1], the
bugs uncovered due to deactivating MDS (and sometimes killing
deactivating MDS) are presently a distraction from addressing issues
during normal failures. So now thrashing max_mds is turned off by
default. I have added a TODO to deactivate ranks in order (configurably)
as random deactivation causes a lot of other problems.

This also fixes a bug: random.randrange(0.0, 1.0) always returns 0.
Oops.

[1] http://tracker.ceph.com/issues/10792

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00