Josh Durgin
62bda12711
misc: always return a usable result from get_valgrind_args
2012-02-24 14:56:43 -08:00
Josh Durgin
e4801819f2
rgw: simplify valgrind args
2012-02-24 14:56:42 -08:00
Sage Weil
edbb41e1f8
add peer task
...
Force a pg to get stuck in 'down' state, verify we can query the peering
state, then start the OSD so it can recover.
2012-02-24 15:05:17 -08:00
Sage Weil
7ac04a422a
lost_unfound: list missing/unfound for each pg and verify the unfound counts
...
This also tests the pg list_missing functionality.
2012-02-24 12:42:39 -08:00
Sage Weil
c43e87d118
ceph_manager: list_pg_missing
...
List missing objects for the given pgid.
2012-02-24 12:42:39 -08:00
Josh Durgin
c93a08eda0
Whitespace and unnecessary formatting fixes
2012-02-24 12:05:35 -08:00
Josh Durgin
3bfb8d696e
ceph, ceph-fuse: simplify valgrind argument additions
2012-02-24 12:05:35 -08:00
Sage Weil
9ec047226f
refactor all valgrind users to use a get_valgrind_args() helper
...
This avoids much annoying, duplicated code.
2012-02-24 12:05:35 -08:00
Sage Weil
90fdc84086
ceph: always create valgrind logs dir
...
Other tasks use it too. It's more annoying to conditionally create it.
2012-02-24 12:05:35 -08:00
Sage Weil
7af6e46c94
ceph: always try to process valgrind logs
...
Check for errors in valgrind logs even if there is no valgrind option
the ceph task config stanza. Other tasks can run via valgrind (ceph-fuse,
rgw). If the logs aren't there, this is harmless.
2012-02-24 12:05:35 -08:00
Sage Weil
e2ea73d1a5
rgw: add valgrind support
...
tasks:
- ceph:
- rgw:
client.a:
valgrind: [--tool=memcheck]
2012-02-24 12:05:35 -08:00
Sage Weil
7bf64b73ee
rgw: accept dict
...
e.g.,
tasks:
...
- rgw:
client.0:
client.1:
2012-02-24 12:05:35 -08:00
Sage Weil
d40a9b275f
lost_unfound: new mark_unfound_lost syntax
2012-02-23 20:09:09 -08:00
Josh Durgin
81a46c462a
dump_stuck: flush stats before waiting for recovery/clean
2012-02-23 17:07:26 -08:00
Josh Durgin
995dc1f751
Add a task for testing stuck pg visibility.
2012-02-21 15:12:48 -08:00
Josh Durgin
2a1c74c5f5
Move duration calculation to an internal task
...
This excludes all generic start up costs, like waiting for locks,
rebooting into a new kernel, etc.
2012-02-21 15:12:26 -08:00
Josh Durgin
eb434a507a
Add necessary imports for s3 tasks, and keep them alphabetical.
2012-02-21 15:04:00 -08:00
Yehuda Sadeh
11073e505f
s3roundtrip, s3readwrite: access key uses url safe chars
...
Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-02-21 12:23:38 -08:00
Yehuda Sadeh
6e1b3a5644
rgw: access key uses url safe chars
...
Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-02-21 12:12:03 -08:00
Sage Weil
c5688e6570
ceph: valgrind trumps coverage when picking a flavor
...
valgrind will crash if we don't use notcmalloc; coverage will silently
fail to collect coverage info.
2012-02-20 15:17:52 -08:00
Sage Weil
5216d3c7a9
ceph.conf: no lockdep by default
2012-02-20 14:54:10 -08:00
Sage Weil
5f9445c88b
suite.results: include test duration in output
2012-02-20 13:38:06 -08:00
Sage Weil
71d0d97a97
cfuse -> ceph-fuse
2012-02-20 07:12:53 -08:00
Sage Weil
7ff9f044e7
ceph: allow valgrind per-type (not just per-name)
2012-02-20 07:04:45 -08:00
Sage Weil
eb93fa744d
lost_unfound: mark osds in when we revive them
...
so that we test what we meant to. It also lets us actually go clean at the
very end.
2012-02-19 19:40:45 -08:00
Sage Weil
45b6189b7d
ceph_manager: ignore stale states when counting
...
also remove assumptions about ordering of states
2012-02-18 14:44:53 -08:00
Sage Weil
196d4a1f16
wait_till_clean -> wait_for_clean and wait_for_recovery
...
Clean now also means the correct number of replicas, whereas recovered
means we have done all the work we can do given the replicas/osds we have.
For example, degraded and clean are now mutually exclusive.
Also move away from 'till'.
2012-02-17 21:53:25 -08:00
Sage Weil
ad9d7fb6e1
backfill: wait for clean before writing+blackholing
...
If we have straggler pgs and blackhole osd.1, we can deadlock because we
need info from that osd to repeer and continue. Make sure we're clean, and
then start the write + blackhole + kill test.
2012-02-14 15:24:11 -08:00
Sage Weil
50cc60f02d
nuke: nuke testrados too
...
Slightly fewer nuke -r's
2012-02-14 15:23:19 -08:00
Sage Weil
6f3abc6ced
ceph_manager: mark in a bit more often than out
...
Otherwise we can get into cases where many/most nodes are out, and things
don't work as well. e.g., crush may start to fail.
2012-02-13 15:28:24 -08:00
Sage Weil
af4ce44233
ceph: use any fs, not just btrfs, on scratch devices
...
The
btrfs: true
syntax is replaced with
fs: btrfs
or ext4, xfs.
2012-02-13 15:28:24 -08:00
Sage Weil
975d73a2bb
nuke: nuke testrados and rados processes, too
...
So that -r is needed slightly less often.
2012-02-13 15:28:24 -08:00
Sage Weil
46b612efa4
misc: make get_scratch_devices look for (almost) any disk that's not mounted
2012-02-13 15:28:24 -08:00
Sage Weil
2adad559bd
hammer.sh: assume path is set
2012-02-11 14:19:49 -08:00
Josh Durgin
0cd16cf03d
ceph: always add logger for daemons
...
The extra log function added redundant info and didn't allow different
levels.
2012-02-02 09:36:04 -08:00
Josh Durgin
7af7c66bd0
ceph: rename type parameter to type_
...
type is a built-in and shouldn't be aliased.
2012-02-02 09:35:58 -08:00
Josh Durgin
7146db9215
ceph: use the correct comparison operator
...
is compares identity (i.e. address in cpython), not value.
2012-02-02 09:27:04 -08:00
Josh Durgin
e7672b6433
ceph: sync before unmounting btrfs devices
...
There may still be writes in flight, since the osds may not have
shutdown cleanly. This should prevent EBUSY when unmounting.
Fixes : #1997
2012-02-02 09:26:45 -08:00
Josh Durgin
1364b8826f
ceph: delay raising exceptions until all daemons are stopped
...
If a daemon crashes, the exception is raised when we stop it. This
caused some daemons to continue running during cleanup, since the rest
of the daemons of the same type would not be shut down. Also log each
daemon that crashed, for easier debugging.
Fixes : #1744
2012-02-02 09:26:25 -08:00
Sage Weil
0236dc0f5e
add backfill task
...
This does a basic test of backfill functionality, including a divergent
log on a backfill target (#1983 ).
2012-01-31 16:25:53 -08:00
Sage Weil
e337c4727c
ceph_manager: add manager.blackhole_kill_osd()
...
This will suspend disk writes for a couple seconds and then kill the
daemon. It helps us similute a hardware failure.
2012-01-31 16:13:59 -08:00
Tommi Virtanen
d7be77628c
Allow user to disable lock checking.
...
The new plana hardware isn't in the old sepia lock database,
and the machine pools are risky to merge as nothing in the
software guarantees allocation from just one pool. This allows
us to hand-allocate machines temporarily.
2012-01-31 08:05:36 -08:00
Tommi Virtanen
09bed16408
Allow user to provide flavor to use.
...
With this, you can use Ubuntu 11.10 machines with teuthology by saying::
tasks:
- ceph:
flavor: oneiric
...
2012-01-31 07:59:43 -08:00
Josh Durgin
f84b4aa5e3
Add admin socket task.
...
This simply gets the output of an admin socket command, makes sure
it's json, and runs a user-provided test script on it.
2012-01-27 17:13:36 -08:00
Samuel Just
4aa9ca4551
CephManager: base timeout on time since last change in active+clean
...
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-01-24 11:28:38 -08:00
Josh Durgin
29885f3e42
kernel: ignore connection problems while waiting for reboot
2012-01-18 17:49:05 -08:00
Sage Weil
45e4c924fa
thrashosds: maxdead default to 0
...
This avoids any possibility of blocking peering.
2012-01-17 09:24:54 -08:00
Sage Weil
bf22a4fb92
task/rados: use new usage for radosmodel tool
2012-01-16 16:53:55 -08:00
Sage Weil
71390f9784
thrashosds: fix action selection
...
I'm not sure what the old code was trying to do, but I'm pretty sure it
wasn't doing it correctly.. a .1 chance_down was killing an OSD for me
virtually every time.
2012-01-16 15:05:43 -08:00
Sage Weil
8fc6086986
thrashosds: make actions less nonsensical
...
Make marking OSD up/down and in/out totally orthogonal.
Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-16 15:05:43 -08:00