Josh Durgin
a763297685
misc: move deep_merge out of the MergeConfig class - it's generic
2011-11-17 13:06:36 -08:00
Josh Durgin
c6988a07f4
Save config after locking nodes, so targets are included.
2011-11-17 11:57:07 -08:00
Josh Durgin
4e6cd55c59
filestore_idempotent: remove unused import
2011-11-17 11:18:24 -08:00
Josh Durgin
7d51e3d381
mon_recovery: remove unused code and import
2011-11-17 11:16:08 -08:00
Josh Durgin
f4d527e743
thrashosds: timeout for every clean check, not just the last one
2011-11-17 11:11:33 -08:00
Josh Durgin
9d12b720e8
ceph_manager: add a default timeout of 5 minutes for mon quorum
2011-11-17 11:05:12 -08:00
Josh Durgin
cb9ac0897b
ceph_manager: log mon quorum status so the logs show progress (or lack thereof)
2011-11-17 10:45:19 -08:00
Yehuda Sadeh
f3c569ee23
rgw: add swift task
...
still not completely working (for some reason it skips all the tests)
2011-11-16 16:00:01 -08:00
Sage Weil
c5f070b8a9
filestore_idempotent.py: simple task to test non-idempotent osd ops
...
Write some non-idempotent events to the osd. Simulate a failure. Verify
the result is correct on replay.
This must be preceeded by the ceph task just so that we get the binaries
installed. Should clean this up later if/when the installation gets
factored out of ceph.py.
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-10 21:35:11 -08:00
Sage Weil
77c977c1cf
misc: allow >1 monitor per role in get_mon_names()
...
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-10 14:13:24 -08:00
Sage Weil
303e863d32
add hammer.sh
...
simple script to repeat a test until it fails. can probably do something much more sophisticated
here, but this works.
2011-11-09 13:37:02 -08:00
Josh Durgin
afa56f16d1
nuke: increase reboot timeout
...
Some sepia nodes are very slow to reboot.
2011-11-09 10:49:37 -08:00
Sage Weil
6618a0275c
mon_recovery: add task to test monitor cluster failure recovery
...
Some simple tests to start with. We still need some sort of mon cluster
thrashing.
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-08 22:17:00 -08:00
Sage Weil
60863f70eb
ceph_manager: manipulate monitors
2011-11-08 22:17:00 -08:00
Sage Weil
6d39cc1146
ceph: keep ceph.conf at ctx.ceph.conf
...
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-08 22:17:00 -08:00
Josh Durgin
006a0dd423
Remove unused imports and variable.
2011-11-08 16:09:21 -08:00
Josh Durgin
5d32bcae50
Add nuke-on-error option.
...
This lets automated jobs nuke and unlock machines after failed
tests. Each machine is nuke individually, so one down machine won't
keep others from being nuked and unlocked.
2011-11-08 16:09:21 -08:00
Tommi Virtanen
c764b2475b
Fix leftover orchestra import clause.
...
This seems to be a leftover from
a2372fce12
,
no idea how it stayed hidden this long.
2011-11-07 13:05:14 -08:00
Josh Durgin
4f3b113832
ceph_manager: log ceph -s output so progress is visible in the logs
2011-11-03 13:27:44 -07:00
Josh Durgin
0b451f9475
Keep each ssh connection alive.
...
With long-running jobs like thrashing, ssh connections were timing
out.
2011-11-03 13:08:49 -07:00
Josh Durgin
6e3e0d7cdc
connection: allow the caller to specify whether keep-alive should be used
2011-11-03 13:07:21 -07:00
Josh Durgin
b1a0c1adea
locker: fix race in locking
...
The isolation level is lower than I thought. This made it possible for
two clients to think they both locked the same machines, since the
update would still be modifying each row to change the locked_since
time.
2011-11-03 11:29:18 -07:00
Samuel Just
a2f406ef49
testrados: set CEPH_CLIENT_ID without a ;
...
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-02 11:33:37 -07:00
Samuel Just
810cae1a1d
testrados: specify CEPH_CONF directly
...
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-10-31 14:54:24 -07:00
Yehuda Sadeh
10c3508741
rgw: add user suspend/enable test
2011-10-27 12:11:28 -07:00
Yehuda Sadeh
86aa940ffb
rgw: log-to-stderr is now a binary flag
2011-10-27 11:32:12 -07:00
Samuel Just
8d0a7c5977
testrados: rename testsnaps to testrados and make snap testing optional
...
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-10-24 14:25:22 -07:00
Josh Durgin
a1249d07ca
workunit: set PYTHONPATH so we can test python bindings
2011-10-24 13:52:58 -07:00
Sage Weil
61cbb3218e
ceph.conf: python parser doens't like ; comments
2011-10-23 10:30:27 -07:00
Sage Weil
3ed065625b
ceph.conf: more frequent osd scrubbing; remove old cruft
2011-10-22 22:16:39 -07:00
Sage Weil
b8beff3dd5
ceph_manager: count active+clean+<somjething else> as active+clean
...
In my case, one pg was active+clean+scrubbing.
Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-21 10:54:05 -07:00
Josh Durgin
409c57170d
coverage: don't remove ceph tarball
...
We want to keep it for examining core files, and we're already
fetching it here, once per suite run.
2011-10-20 16:28:32 -07:00
Sage Weil
4ec37b2391
add lost_unfound task
...
Also some misc useful bits to ceph_manager.
2011-10-17 15:32:22 -07:00
Josh Durgin
bcded7f163
ceph: add whitelist for cluster log errors
...
Some messages are expected when thrashing osds or creating unfound
objects.
Fixes : #1622
2011-10-17 14:42:08 -07:00
Josh Durgin
fba220ecaa
nuke: reset syslog configuration after rebooting
...
Previously we removed a file and rebooted without syncing, so the file
was never deleted.
2011-10-17 10:40:19 -07:00
Yehuda Sadeh
493596a7fd
radosgw-admin: test swift keys creation/removal
2011-10-12 15:37:33 -07:00
Josh Durgin
321381d75f
teuthology-worker: remove --keep-locked-on-error
2011-10-07 14:51:46 -07:00
Josh Durgin
3d3eb0efea
Remove --keep-locked-on-error, and behave as if it were specified
...
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
2011-10-07 14:49:53 -07:00
Josh Durgin
c56ab97442
reconnect: ignore SSHExceptions before the timeout expires
...
Fixes : #1587
2011-10-06 17:18:35 -07:00
Samuel Just
4722d468c6
task/watch_notify_stress: watch_notify_stress now thrashes clients
...
This should exercise the watch notify timeout code.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-10-06 14:34:44 -07:00
Sage Weil
4e61e4835e
rgw: keep radosgw in foreground
...
It defaults to a daemon now.
2011-10-06 12:50:12 -07:00
Josh Durgin
107db6a913
Retry listing machines if the lock server goes down.
2011-10-04 17:21:00 -07:00
Sage Weil
39a1e76065
rgw: use normal logging mechanism
...
Keep capturing stdout/err, even though it should end up empty.
Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-04 16:09:51 -07:00
Josh Durgin
7b7ff6e8ce
teuthology-worker: clean up last_in_suite jobs
...
There's no reason not to delete them once they start.
2011-10-04 12:32:58 -07:00
Josh Durgin
3d3ba1ebb1
daemon-helper: detect the signal actually sent
...
I thought I fixed this when I implemented coverage collection, but I
guess it got lost in a rebase or something.
2011-10-04 12:17:19 -07:00
Josh Durgin
d305d61b86
ceph_manager: remove unused raw_pg_status method
2011-10-03 17:49:53 -07:00
Josh Durgin
8e031730c1
ceph_manager: run ceph -s as a normal program
...
This allows failures from it to be detected better.
2011-10-03 17:49:13 -07:00
Josh Durgin
bad609e63e
teuthology-results: include passed tests in email
2011-10-03 17:11:53 -07:00
Josh Durgin
8bcd2a74ca
teuthology-results: include reasons for failure in email
2011-10-03 17:08:29 -07:00
Josh Durgin
030161ed8d
teuthology-ls: show reasons for failures with -v
2011-10-03 17:07:41 -07:00