Commit Graph

610 Commits

Author SHA1 Message Date
Josh Durgin
62bda12711 misc: always return a usable result from get_valgrind_args 2012-02-24 14:56:43 -08:00
Josh Durgin
e4801819f2 rgw: simplify valgrind args 2012-02-24 14:56:42 -08:00
Sage Weil
edbb41e1f8 add peer task
Force a pg to get stuck in 'down' state, verify we can query the peering
state, then start the OSD so it can recover.
2012-02-24 15:05:17 -08:00
Sage Weil
7ac04a422a lost_unfound: list missing/unfound for each pg and verify the unfound counts
This also tests the pg list_missing functionality.
2012-02-24 12:42:39 -08:00
Sage Weil
c43e87d118 ceph_manager: list_pg_missing
List missing objects for the given pgid.
2012-02-24 12:42:39 -08:00
Josh Durgin
c93a08eda0 Whitespace and unnecessary formatting fixes 2012-02-24 12:05:35 -08:00
Josh Durgin
3bfb8d696e ceph, ceph-fuse: simplify valgrind argument additions 2012-02-24 12:05:35 -08:00
Sage Weil
9ec047226f refactor all valgrind users to use a get_valgrind_args() helper
This avoids much annoying, duplicated code.
2012-02-24 12:05:35 -08:00
Sage Weil
90fdc84086 ceph: always create valgrind logs dir
Other tasks use it too.  It's more annoying to conditionally create it.
2012-02-24 12:05:35 -08:00
Sage Weil
7af6e46c94 ceph: always try to process valgrind logs
Check for errors in valgrind logs even if there is no valgrind option
the ceph task config stanza.  Other tasks can run via valgrind (ceph-fuse,
rgw).  If the logs aren't there, this is harmless.
2012-02-24 12:05:35 -08:00
Sage Weil
e2ea73d1a5 rgw: add valgrind support
tasks:
- ceph:
- rgw:
   client.a:
     valgrind: [--tool=memcheck]
2012-02-24 12:05:35 -08:00
Sage Weil
7bf64b73ee rgw: accept dict
e.g.,

tasks:
...
- rgw:
    client.0:
    client.1:
2012-02-24 12:05:35 -08:00
Sage Weil
d40a9b275f lost_unfound: new mark_unfound_lost syntax 2012-02-23 20:09:09 -08:00
Josh Durgin
81a46c462a dump_stuck: flush stats before waiting for recovery/clean 2012-02-23 17:07:26 -08:00
Josh Durgin
995dc1f751 Add a task for testing stuck pg visibility. 2012-02-21 15:12:48 -08:00
Josh Durgin
2a1c74c5f5 Move duration calculation to an internal task
This excludes all generic start up costs, like waiting for locks,
rebooting into a new kernel, etc.
2012-02-21 15:12:26 -08:00
Josh Durgin
eb434a507a Add necessary imports for s3 tasks, and keep them alphabetical. 2012-02-21 15:04:00 -08:00
Yehuda Sadeh
11073e505f s3roundtrip, s3readwrite: access key uses url safe chars
Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-02-21 12:23:38 -08:00
Yehuda Sadeh
6e1b3a5644 rgw: access key uses url safe chars
Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-02-21 12:12:03 -08:00
Sage Weil
c5688e6570 ceph: valgrind trumps coverage when picking a flavor
valgrind will crash if we don't use notcmalloc; coverage will silently
fail to collect coverage info.
2012-02-20 15:17:52 -08:00
Sage Weil
5216d3c7a9 ceph.conf: no lockdep by default 2012-02-20 14:54:10 -08:00
Sage Weil
5f9445c88b suite.results: include test duration in output 2012-02-20 13:38:06 -08:00
Sage Weil
71d0d97a97 cfuse -> ceph-fuse 2012-02-20 07:12:53 -08:00
Sage Weil
7ff9f044e7 ceph: allow valgrind per-type (not just per-name) 2012-02-20 07:04:45 -08:00
Sage Weil
eb93fa744d lost_unfound: mark osds in when we revive them
so that we test what we meant to.  It also lets us actually go clean at the
very end.
2012-02-19 19:40:45 -08:00
Sage Weil
45b6189b7d ceph_manager: ignore stale states when counting
also remove assumptions about ordering of states
2012-02-18 14:44:53 -08:00
Sage Weil
196d4a1f16 wait_till_clean -> wait_for_clean and wait_for_recovery
Clean now also means the correct number of replicas, whereas recovered
means we have done all the work we can do given the replicas/osds we have.
For example, degraded and clean are now mutually exclusive.

Also move away from 'till'.
2012-02-17 21:53:25 -08:00
Sage Weil
ad9d7fb6e1 backfill: wait for clean before writing+blackholing
If we have straggler pgs and blackhole osd.1, we can deadlock because we
need info from that osd to repeer and continue.  Make sure we're clean, and
then start the write + blackhole + kill test.
2012-02-14 15:24:11 -08:00
Sage Weil
50cc60f02d nuke: nuke testrados too
Slightly fewer nuke -r's
2012-02-14 15:23:19 -08:00
Sage Weil
6f3abc6ced ceph_manager: mark in a bit more often than out
Otherwise we can get into cases where many/most nodes are out, and things
don't work as well.  e.g., crush may start to fail.
2012-02-13 15:28:24 -08:00
Sage Weil
af4ce44233 ceph: use any fs, not just btrfs, on scratch devices
The

  btrfs: true

syntax is replaced with

  fs: btrfs

or ext4, xfs.
2012-02-13 15:28:24 -08:00
Sage Weil
975d73a2bb nuke: nuke testrados and rados processes, too
So that -r is needed slightly less often.
2012-02-13 15:28:24 -08:00
Sage Weil
46b612efa4 misc: make get_scratch_devices look for (almost) any disk that's not mounted 2012-02-13 15:28:24 -08:00
Sage Weil
2adad559bd hammer.sh: assume path is set 2012-02-11 14:19:49 -08:00
Josh Durgin
0cd16cf03d ceph: always add logger for daemons
The extra log function added redundant info and didn't allow different
levels.
2012-02-02 09:36:04 -08:00
Josh Durgin
7af7c66bd0 ceph: rename type parameter to type_
type is a built-in and shouldn't be aliased.
2012-02-02 09:35:58 -08:00
Josh Durgin
7146db9215 ceph: use the correct comparison operator
is compares identity (i.e. address in cpython), not value.
2012-02-02 09:27:04 -08:00
Josh Durgin
e7672b6433 ceph: sync before unmounting btrfs devices
There may still be writes in flight, since the osds may not have
shutdown cleanly. This should prevent EBUSY when unmounting.

Fixes: #1997
2012-02-02 09:26:45 -08:00
Josh Durgin
1364b8826f ceph: delay raising exceptions until all daemons are stopped
If a daemon crashes, the exception is raised when we stop it. This
caused some daemons to continue running during cleanup, since the rest
of the daemons of the same type would not be shut down. Also log each
daemon that crashed, for easier debugging.

Fixes: #1744
2012-02-02 09:26:25 -08:00
Sage Weil
0236dc0f5e add backfill task
This does a basic test of backfill functionality, including a divergent
log on a backfill target (#1983).
2012-01-31 16:25:53 -08:00
Sage Weil
e337c4727c ceph_manager: add manager.blackhole_kill_osd()
This will suspend disk writes for a couple seconds and then kill the
daemon.  It helps us similute a hardware failure.
2012-01-31 16:13:59 -08:00
Tommi Virtanen
d7be77628c Allow user to disable lock checking.
The new plana hardware isn't in the old sepia lock database,
and the machine pools are risky to merge as nothing in the
software guarantees allocation from just one pool. This allows
us to hand-allocate machines temporarily.
2012-01-31 08:05:36 -08:00
Tommi Virtanen
09bed16408 Allow user to provide flavor to use.
With this, you can use Ubuntu 11.10 machines with teuthology by saying::

  tasks:
  - ceph:
      flavor: oneiric
  ...
2012-01-31 07:59:43 -08:00
Josh Durgin
f84b4aa5e3 Add admin socket task.
This simply gets the output of an admin socket command, makes sure
it's json, and runs a user-provided test script on it.
2012-01-27 17:13:36 -08:00
Samuel Just
4aa9ca4551 CephManager: base timeout on time since last change in active+clean
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-01-24 11:28:38 -08:00
Josh Durgin
29885f3e42 kernel: ignore connection problems while waiting for reboot 2012-01-18 17:49:05 -08:00
Sage Weil
45e4c924fa thrashosds: maxdead default to 0
This avoids any possibility of blocking peering.
2012-01-17 09:24:54 -08:00
Sage Weil
bf22a4fb92 task/rados: use new usage for radosmodel tool 2012-01-16 16:53:55 -08:00
Sage Weil
71390f9784 thrashosds: fix action selection
I'm not sure what the old code was trying to do, but I'm pretty sure it
wasn't doing it correctly.. a .1 chance_down was killing an OSD for me
virtually every time.
2012-01-16 15:05:43 -08:00
Sage Weil
8fc6086986 thrashosds: make actions less nonsensical
Make marking OSD up/down and in/out totally orthogonal.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-16 15:05:43 -08:00