Commit Graph

651 Commits

Author SHA1 Message Date
Sage Weil
22b1f17f78 ls: another newline 2012-04-10 08:59:47 -07:00
Sage Weil
7757fbb9bd ls: remote stray newline 2012-04-10 08:57:19 -07:00
Dan Mick
9906d5ed08 Change to local mirror of linux-firmware repo to try to stop failures 2012-04-09 16:58:59 -07:00
Mark Nelson
3d7f1db731 Kernel: Pull linux-firmware from git
Signed-off-by: Mark Nelson <nhm@clusterfaq.org>
2012-04-05 08:49:19 -07:00
Mark Nelson
1836d4672f Added assertion to check that targets > roles
Signed-off-by: Mark Nelson <mark.nelson@dreamhost.com>
2012-04-03 15:56:51 -07:00
Sage Weil
952940272b nuke: don't run umount when no xargs args
Gets rid of this noise:

INFO:teuthology.nuke:Unmount any osd data directories...
INFO:teuthology.orchestra.run.err:Usage: umount -h | -V
INFO:teuthology.orchestra.run.err:       umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
INFO:teuthology.orchestra.run.err:       umount [-d] [-f] [-r] [-n] [-v] special | node...
INFO:teuthology.orchestra.run.err:Usage: umount -h | -V
INFO:teuthology.orchestra.run.err:       umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
INFO:teuthology.orchestra.run.err:       umount [-d] [-f] [-r] [-n] [-v] special | node...
...
2012-04-03 15:56:36 -07:00
Sage Weil
9a69c3f319 ceph.conf: enable 'osd recover clone overlap'
to test the recovery cloning in qa.  this was redone, but forgot to enable
it in qa.
2012-03-30 16:15:34 -07:00
Samuel Just
b4aa098f47 make Thrasher not inherit from Greenlet 2012-03-29 18:08:19 -07:00
Samuel Just
394d8b1ebd Add test for object source marked down 2012-03-29 18:08:19 -07:00
Samuel Just
749826c29b allow use of a separate journal block device 2012-03-27 17:18:44 -07:00
Josh Durgin
e30b7710f5 rbd: fix typo in default config
pyflakes would have caught this if 'all' weren't a built-in function
2012-03-26 11:57:07 -07:00
Sage Weil
397e7f2f7b add osd_recovery task to test divergent osd logs 2012-03-24 21:09:19 -07:00
Sage Weil
1c1192a9fb backfill: use 'rbd' pool instead of 'data'
(data has a replay interval, which makes writes take longer to resume
after repeering)
2012-03-24 21:09:19 -07:00
Sage Weil
ca9a5a4ac4 rename backfill -> osd_backfill 2012-03-24 16:05:11 -07:00
Sage Weil
22e808746f put filestore xattr option in [global]
...for test_filestore_idempotent's benefit
2012-03-24 15:36:08 -07:00
Josh Durgin
6f0f250b26 suite: add missing print statement 2012-03-21 12:00:55 -07:00
Josh Durgin
8a9a567067 suite: fix print statement when summary doesn't exist 2012-03-21 11:58:17 -07:00
Samuel Just
91c08f6eee Add watch op to rados.py
Signed-off-by: Samuel Just <sam.just@dreamhost.com>
2012-03-20 19:00:12 -07:00
Josh Durgin
815fc3e2f6 suite: failed runs might not have durations
This was one cause of emails not being sent - stale /tmp/cephtest dirs
fail without recording a duration.
2012-03-20 07:50:08 -07:00
Josh Durgin
a65d4136e5 suite, coverage: use absolute dirs for isdir checks
This fixes the results to wait for all jobs to complete again.
2012-03-19 14:16:14 -07:00
Josh Durgin
bdb72c282f filestore_idempotent: get coverage and coredumps 2012-03-19 11:57:02 -07:00
Josh Durgin
6c8db1a807 suite: more results logging 2012-03-19 11:31:33 -07:00
Sage Weil
7173a8afb6 ceph.conf: no comment 2012-03-18 11:56:18 -07:00
Sage Weil
7de798f6fa ceph.conf: set 'filestore xattr use omap = true' 2012-03-18 11:06:05 -07:00
Sage Weil
7d2e1056fd fix teuthology-ls isdir check 2012-03-18 10:50:17 -07:00
Sage Weil
94f0ba1efe run valgrind with cwd set to /tmp/cephtest/archive/coredump
This lets us capture the vgcore.* files, which always go to valgrind's
cwd.

Fixes: #1953
2012-03-18 10:48:51 -07:00
Josh Durgin
07b97fe77f suite: log results and coverage generation
Need to figure out where and when results emails are failing.
2012-03-16 11:44:13 -07:00
Josh Durgin
8fbd087d6b results: make sure email is sent before anything else fails 2012-03-15 17:34:19 -07:00
Mark Nelson
e14d428c98 Merge branch 'master' of github.com:ceph/teuthology 2012-03-14 15:32:23 -05:00
Sage Weil
5c9acbd897 gitbuilder: put flavor last
in case we refine the field later
2012-03-13 10:09:18 -07:00
Sage Weil
1a01ccaafb Pull from new gitbuilder.ceph.com locations.
Simplifies the flavor stuff into a tuple of

<package,type,flavor,dist,arch>

where package is ceph, kenrel, etc.
type is tarball, deb
flavor is basic, gcov, notcmalloc
arch is x86_64, i686 (uname -m)
dist is oneiric, etc. (lsb_release -s -c)
2012-03-13 10:02:26 -07:00
Mark Nelson
3833ada8b9 Made the example better with multiple roles. 2012-03-12 15:13:36 -05:00
Mark Nelson
0a61ffad4c Added some example yaml files and an example parallel execution task. 2012-03-12 14:33:10 -05:00
Sage Weil
008cf7fd95 autotest: pull from github.com/ceph/autotest 2012-03-10 19:15:21 -08:00
Sage Weil
2124129e70 workunit: include python2.7 path too 2012-03-10 15:34:19 -08:00
Samuel Just
ddc1ab0c03 rados.py: include setattr and rmattr
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-08 16:14:44 -08:00
Mark Nelson
31762c0003 lock: Improved logging when there aren't enough nodes available to lock-many. 2012-03-07 12:55:54 -08:00
Mark Nelson
05a07dda7d lock: Added a --locked flag to teuthology-lock.
Can be used to restrict searches based on lock status, e.g.
'teuthology-lock --list -a --locked false --status up' shows available nodes.
2012-03-07 12:55:33 -08:00
Sage Weil
2a18c3e1d0 nuke: unmount osd data directories
This helps us avoid reboot to clean up osd data directories that are left
mounted.
2012-03-06 09:34:38 -08:00
Josh Durgin
1493674735 Use non-zero exit status if any tests failed
Fixes: #1989
2012-03-05 13:34:33 -08:00
Sage Weil
dc1abab211 github.com/NewDreamNetwork -> github.com/ceph 2012-03-02 10:55:56 -08:00
Josh Durgin
a80246c17f dump_stuck: note required ceph configuration 2012-02-29 15:47:17 -08:00
Josh Durgin
85cc96c11a dump_stuck: verify that 'ceph health' mentions the right number of inactive/unclean/stale pgs 2012-02-28 13:55:46 -08:00
Sage Weil
999e21928c peer: ignore +scrubbing portion of pg state
It can cause the mon state and osd states to not match.
2012-02-28 09:50:29 -08:00
Sage Weil
84cd4ed6c3 peer: wait for peering to complete, or block
We need to wait for peering to either complete, or block because it is
waiting for another PG.  _Then_ look at all the PG states and compare the
mon values with what we get from qeurying the OSDs directly.
2012-02-25 21:05:00 -08:00
Josh Durgin
b8739585a0 peer: remove unused variable 2012-02-24 15:01:34 -08:00
Josh Durgin
62bda12711 misc: always return a usable result from get_valgrind_args 2012-02-24 14:56:43 -08:00
Josh Durgin
e4801819f2 rgw: simplify valgrind args 2012-02-24 14:56:42 -08:00
Sage Weil
edbb41e1f8 add peer task
Force a pg to get stuck in 'down' state, verify we can query the peering
state, then start the OSD so it can recover.
2012-02-24 15:05:17 -08:00
Sage Weil
7ac04a422a lost_unfound: list missing/unfound for each pg and verify the unfound counts
This also tests the pg list_missing functionality.
2012-02-24 12:42:39 -08:00
Sage Weil
c43e87d118 ceph_manager: list_pg_missing
List missing objects for the given pgid.
2012-02-24 12:42:39 -08:00
Josh Durgin
c93a08eda0 Whitespace and unnecessary formatting fixes 2012-02-24 12:05:35 -08:00
Josh Durgin
3bfb8d696e ceph, ceph-fuse: simplify valgrind argument additions 2012-02-24 12:05:35 -08:00
Sage Weil
9ec047226f refactor all valgrind users to use a get_valgrind_args() helper
This avoids much annoying, duplicated code.
2012-02-24 12:05:35 -08:00
Sage Weil
90fdc84086 ceph: always create valgrind logs dir
Other tasks use it too.  It's more annoying to conditionally create it.
2012-02-24 12:05:35 -08:00
Sage Weil
7af6e46c94 ceph: always try to process valgrind logs
Check for errors in valgrind logs even if there is no valgrind option
the ceph task config stanza.  Other tasks can run via valgrind (ceph-fuse,
rgw).  If the logs aren't there, this is harmless.
2012-02-24 12:05:35 -08:00
Sage Weil
e2ea73d1a5 rgw: add valgrind support
tasks:
- ceph:
- rgw:
   client.a:
     valgrind: [--tool=memcheck]
2012-02-24 12:05:35 -08:00
Sage Weil
7bf64b73ee rgw: accept dict
e.g.,

tasks:
...
- rgw:
    client.0:
    client.1:
2012-02-24 12:05:35 -08:00
Sage Weil
d40a9b275f lost_unfound: new mark_unfound_lost syntax 2012-02-23 20:09:09 -08:00
Josh Durgin
81a46c462a dump_stuck: flush stats before waiting for recovery/clean 2012-02-23 17:07:26 -08:00
Josh Durgin
995dc1f751 Add a task for testing stuck pg visibility. 2012-02-21 15:12:48 -08:00
Josh Durgin
2a1c74c5f5 Move duration calculation to an internal task
This excludes all generic start up costs, like waiting for locks,
rebooting into a new kernel, etc.
2012-02-21 15:12:26 -08:00
Josh Durgin
eb434a507a Add necessary imports for s3 tasks, and keep them alphabetical. 2012-02-21 15:04:00 -08:00
Yehuda Sadeh
11073e505f s3roundtrip, s3readwrite: access key uses url safe chars
Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-02-21 12:23:38 -08:00
Yehuda Sadeh
6e1b3a5644 rgw: access key uses url safe chars
Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
2012-02-21 12:12:03 -08:00
Sage Weil
c5688e6570 ceph: valgrind trumps coverage when picking a flavor
valgrind will crash if we don't use notcmalloc; coverage will silently
fail to collect coverage info.
2012-02-20 15:17:52 -08:00
Sage Weil
5216d3c7a9 ceph.conf: no lockdep by default 2012-02-20 14:54:10 -08:00
Sage Weil
5f9445c88b suite.results: include test duration in output 2012-02-20 13:38:06 -08:00
Sage Weil
71d0d97a97 cfuse -> ceph-fuse 2012-02-20 07:12:53 -08:00
Sage Weil
7ff9f044e7 ceph: allow valgrind per-type (not just per-name) 2012-02-20 07:04:45 -08:00
Sage Weil
eb93fa744d lost_unfound: mark osds in when we revive them
so that we test what we meant to.  It also lets us actually go clean at the
very end.
2012-02-19 19:40:45 -08:00
Sage Weil
45b6189b7d ceph_manager: ignore stale states when counting
also remove assumptions about ordering of states
2012-02-18 14:44:53 -08:00
Sage Weil
196d4a1f16 wait_till_clean -> wait_for_clean and wait_for_recovery
Clean now also means the correct number of replicas, whereas recovered
means we have done all the work we can do given the replicas/osds we have.
For example, degraded and clean are now mutually exclusive.

Also move away from 'till'.
2012-02-17 21:53:25 -08:00
Sage Weil
ad9d7fb6e1 backfill: wait for clean before writing+blackholing
If we have straggler pgs and blackhole osd.1, we can deadlock because we
need info from that osd to repeer and continue.  Make sure we're clean, and
then start the write + blackhole + kill test.
2012-02-14 15:24:11 -08:00
Sage Weil
50cc60f02d nuke: nuke testrados too
Slightly fewer nuke -r's
2012-02-14 15:23:19 -08:00
Sage Weil
6f3abc6ced ceph_manager: mark in a bit more often than out
Otherwise we can get into cases where many/most nodes are out, and things
don't work as well.  e.g., crush may start to fail.
2012-02-13 15:28:24 -08:00
Sage Weil
af4ce44233 ceph: use any fs, not just btrfs, on scratch devices
The

  btrfs: true

syntax is replaced with

  fs: btrfs

or ext4, xfs.
2012-02-13 15:28:24 -08:00
Sage Weil
975d73a2bb nuke: nuke testrados and rados processes, too
So that -r is needed slightly less often.
2012-02-13 15:28:24 -08:00
Sage Weil
46b612efa4 misc: make get_scratch_devices look for (almost) any disk that's not mounted 2012-02-13 15:28:24 -08:00
Josh Durgin
0cd16cf03d ceph: always add logger for daemons
The extra log function added redundant info and didn't allow different
levels.
2012-02-02 09:36:04 -08:00
Josh Durgin
7af7c66bd0 ceph: rename type parameter to type_
type is a built-in and shouldn't be aliased.
2012-02-02 09:35:58 -08:00
Josh Durgin
7146db9215 ceph: use the correct comparison operator
is compares identity (i.e. address in cpython), not value.
2012-02-02 09:27:04 -08:00
Josh Durgin
e7672b6433 ceph: sync before unmounting btrfs devices
There may still be writes in flight, since the osds may not have
shutdown cleanly. This should prevent EBUSY when unmounting.

Fixes: #1997
2012-02-02 09:26:45 -08:00
Josh Durgin
1364b8826f ceph: delay raising exceptions until all daemons are stopped
If a daemon crashes, the exception is raised when we stop it. This
caused some daemons to continue running during cleanup, since the rest
of the daemons of the same type would not be shut down. Also log each
daemon that crashed, for easier debugging.

Fixes: #1744
2012-02-02 09:26:25 -08:00
Sage Weil
0236dc0f5e add backfill task
This does a basic test of backfill functionality, including a divergent
log on a backfill target (#1983).
2012-01-31 16:25:53 -08:00
Sage Weil
e337c4727c ceph_manager: add manager.blackhole_kill_osd()
This will suspend disk writes for a couple seconds and then kill the
daemon.  It helps us similute a hardware failure.
2012-01-31 16:13:59 -08:00
Tommi Virtanen
d7be77628c Allow user to disable lock checking.
The new plana hardware isn't in the old sepia lock database,
and the machine pools are risky to merge as nothing in the
software guarantees allocation from just one pool. This allows
us to hand-allocate machines temporarily.
2012-01-31 08:05:36 -08:00
Tommi Virtanen
09bed16408 Allow user to provide flavor to use.
With this, you can use Ubuntu 11.10 machines with teuthology by saying::

  tasks:
  - ceph:
      flavor: oneiric
  ...
2012-01-31 07:59:43 -08:00
Josh Durgin
f84b4aa5e3 Add admin socket task.
This simply gets the output of an admin socket command, makes sure
it's json, and runs a user-provided test script on it.
2012-01-27 17:13:36 -08:00
Samuel Just
4aa9ca4551 CephManager: base timeout on time since last change in active+clean
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-01-24 11:28:38 -08:00
Josh Durgin
29885f3e42 kernel: ignore connection problems while waiting for reboot 2012-01-18 17:49:05 -08:00
Sage Weil
45e4c924fa thrashosds: maxdead default to 0
This avoids any possibility of blocking peering.
2012-01-17 09:24:54 -08:00
Sage Weil
bf22a4fb92 task/rados: use new usage for radosmodel tool 2012-01-16 16:53:55 -08:00
Sage Weil
71390f9784 thrashosds: fix action selection
I'm not sure what the old code was trying to do, but I'm pretty sure it
wasn't doing it correctly.. a .1 chance_down was killing an OSD for me
virtually every time.
2012-01-16 15:05:43 -08:00
Sage Weil
8fc6086986 thrashosds: make actions less nonsensical
Make marking OSD up/down and in/out totally orthogonal.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-16 15:05:43 -08:00
Sage Weil
9419f583c6 ls: include duration, less noise 2012-01-16 13:18:49 -08:00
Sage Weil
8fb115fe2c include run duration in summary.yaml 2012-01-16 12:39:20 -08:00
Sage Weil
7b47e49fa8 ls: fix extraneous newline 2012-01-16 10:47:44 -08:00
Sage Weil
b58f9560ea ceph: ignore all leaks
unless/until we figure out where the DefinitelyLost records are coming
from.. at first glance they look bogus.
2012-01-16 09:55:47 -08:00
Sage Weil
40fb86ff81 ceph: take single arg or list for valgrind args 2012-01-16 09:22:45 -08:00
Sage Weil
c88ec5719e combined mon, osd, mds starter functions 2012-01-15 22:54:09 -08:00
Sage Weil
f8ec23e79d rbd: default to all: 2012-01-15 22:53:39 -08:00
Sage Weil
72057a9cd8 use local mirrors for (most) github urls
A cronjob on ceph.newdream.net updates these every 15 minutes.  Sigh.
2012-01-15 22:52:58 -08:00
Sage Weil
fbfa94bb09 teuthology-ls: show pid, last line of output for running jobs 2012-01-15 22:52:58 -08:00
Sage Weil
f70b158cd1 show host -> roles mapping on startup
Less guessing when manually inspecting an in-progress or hung run.
2012-01-15 22:52:58 -08:00
Sage Weil
f795261454 lost_unfound: make test work with backfill
If we backfill, we fail to peer instead of having every object show up as
'unfound'.  Avoid that by preventing log trimming, so that we always do
log recovery for this test.
2012-01-15 22:52:58 -08:00
Tommi Virtanen
3bfa41cf6a Use yaml.safe_dump so unicode doesn't mess up the yaml files.
In general, yaml.dump is comparable to pickle, and my personal
coding standard says *never* use it. yaml.safe_dump is much nicer.
yaml.dump should have been named yaml.unsafe_dump, yaml.safe_dump
should have been named yaml.dump :(
2012-01-13 11:26:36 -08:00
Josh Durgin
0da44591a9 nuke: take config files from -t argument
teuthology-lock and teuthology-updatekeys both use -t for this already
2012-01-12 14:48:36 -08:00
Josh Durgin
96e89d30ec kernel: loop reconnecting in case we race with shutdown
Previously, if we reconnected before shutdown completed we asserted
that the kernel did not boot into the new version, when we just needed
to wait for the machine to reboot.
2012-01-12 13:02:22 -08:00
Sage Weil
59369237c9 thrasher: don't mark down osds out; tell monitor same
Stopping ceph-osd doesn't make it out (immediately).  Prevent monitor
from doing this after a delay too so we can keep our notion of what is
up/down/in/out accurate.
2012-01-11 12:54:09 -08:00
Sage Weil
3c0346b4cb lost_unfound: typo 2012-01-11 12:54:09 -08:00
Sage Weil
6dae2f8ae3 thrasher: adjust min_dead default
Make this 1, not 2.  That's a bit more friendly.  It doesn't strictly
matter, tho, since we revive osds before waiting for clean.
2012-01-11 12:54:09 -08:00
Sage Weil
fb74b90152 thrasher: add max_dead
Add max_dead, and revive osds prior to waiting for clean.  Otherwise we
can leave too many OSDs down and the cluster will never go clean.
2012-01-11 12:54:08 -08:00
Sage Weil
50463ffddd verify all osds start before checking health
Just checking health isn't good enough, since it races with OSD startup:
we can have a healthy cluster with 0 (or something else < total) OSDs.
2012-01-11 12:54:08 -08:00
Josh Durgin
f4883ebf09 ceph: let the user running ceph-osd remove subvolumes
This will prevent EPERM when using the SNAP_DESTROY ioctl,
so the filestore will use btrfs snaps.
2012-01-10 16:07:04 -08:00
Josh Durgin
d2fadf9fe2 syslog: ignore lockdep non-static key warning
It looks like this warning was made default in linux 3.2.
This will keep happening until #1922 is done.
2012-01-10 15:28:42 -08:00
Sage Weil
b354ce4e91 run: put pid in archive dir
This will make it easy for teuthology-ls to show you the running process's
pid (if it's still running).  Or for other utiltizes to kill + clean up
a hung teuthology run.
2012-01-08 14:39:30 -08:00
Sage Weil
13445d237b ceph_manager: a booting osd is no longer automatically marked in
as of ceph.git commit 96b7b0d83e
2012-01-06 17:21:38 -08:00
Sage Weil
001701a0f7 mon_recovery: need n/2 + 1 monitors for quorum 2012-01-06 15:12:15 -08:00
Sage Weil
da9210779e ceph: don't skip monitor ports
We can use the same port multiple times if they are on a different hosts.
2012-01-06 13:36:54 -08:00
Josh Durgin
561f06cf94 suite: make email-on-success the default behavior
This way you can tell when a run is complete, instead of wondering if
it's stuck in the queue.
2012-01-05 17:27:31 -08:00
Josh Durgin
ec3a3a9654 rados: fix example config 2012-01-03 14:07:45 -08:00
Josh Durgin
cdd5c456a0 nuke-on-error: only unlock if this run locked the machines 2012-01-03 13:02:31 -08:00
Josh Durgin
0176c9ab0f Remove unused mon.0 variables. 2012-01-03 13:02:31 -08:00
Josh Durgin
2e9b1c75f9 rados: use testrados instead of testsnaps and testreadwrite 2012-01-03 13:02:29 -08:00
Josh Durgin
932257fb6e rados: remove unused variable 2011-12-30 14:37:45 -08:00
Josh Durgin
0af9c0a2e7 rados: clean up argument construction
Only the client id varies, so it can be done outside the loop. Also
handle coredumps and coverage, and use LD_LIBRARY_PATH instead of
LD_PRELOAD.
2011-12-30 14:37:45 -08:00
Josh Durgin
6df4ce5075 rados: fix references to testrados 2011-12-30 14:37:45 -08:00
Josh Durgin
cdf142b597 rados: fix documentation format 2011-12-30 14:37:45 -08:00
Josh Durgin
2f71f03fdd misc: simplify reconnect logic
Ignore all errors until the timeout expires so we don't have to worry
about whitelisting them.
2011-12-30 14:37:37 -08:00
Mark Kampe
f04e29557e teuthology rgw-admin: annotated test cases for inventory
this is not a nose suite, so I simply added test case
   descriptions in csv format, and put a file to extract
   them at the top of the file.
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>
2011-12-29 13:09:08 -08:00
Josh Durgin
d0e90d71bd syslog checking: forgot a pipe 2011-12-16 18:09:17 -08:00
Yehuda Sadeh
7eec30946d rountrip: add task 2011-12-15 13:24:53 -08:00
Yehuda Sadeh
97cc6c2990 readwrite: fix task with default conf 2011-12-15 12:39:39 -08:00
Yehuda Sadeh
659e66aa09 readwrite: fix conf, task runs 2011-12-14 17:14:30 -08:00
Yehuda Sadeh
7d085ad939 readwrite: add readwrite task
still not really running, but at least getting configured
2011-12-14 16:12:55 -08:00
Josh Durgin
31b5ccbf1b coverage: use locally stored build instead of downloading from a gitbuilder 2011-12-13 16:16:09 -08:00
Josh Durgin
c9e4504fbd Ignore lockdep being turned off for now.
Some machines are hitting this udev issue:
http://marc.info/?l=linux-kernel&m=132033587908426&w=2 and lockdep is
turned off after the first warning.
2011-12-12 16:29:41 -08:00
Josh Durgin
a768ad738a coverage: don't generate html reports for each test
These can always be generated from the lcov files later, right now they just waste space.
2011-12-08 17:47:14 -08:00
Josh Durgin
7b52dd1410 syslog: ignore 'task blocked' warnings
These will happen under heavy load (usually on the osd).
2011-12-08 17:17:47 -08:00
Josh Durgin
e69057e4a1 internal: check syslog for errors
This should catch lockdep warnings and mark tests with them as failed.
2011-12-07 15:20:33 -08:00
Josh Durgin
95e632475f workunit: set client id and secretfile env vars
These are used by the kernel rbd workunit to know how to map images.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-12-06 16:16:38 -08:00
Tommi Virtanen
e80c32c442 Rename "testrados" and "testswift" tasks to not begin with "test".
Anything "test*" looks like a unit test, and shouldn't be used for
actual code.
2011-12-05 10:07:25 -08:00
Tommi Virtanen
0dd4d69ffe Fix unit tests for SSH keep-alive setting.
Commit 6e3e0d7cdc failed to pass
unit tests.
2011-12-05 10:02:30 -08:00
Tommi Virtanen
50c4b312a2 Handle interactive-on-error also when error is from contextmanager exit.
Closes: http://tracker.newdream.net/issues/1745
2011-11-30 17:07:26 -08:00
Tommi Virtanen
c651c88eac Properly handle case where first error is inside a context manager __exit__.
Closes: http://tracker.newdream.net/issues/1743
2011-11-21 16:00:49 -08:00
Sage Weil
721c0e9720 nuke: don't specify full path
/tmp/cephtest/binary may have been removed; kill stray daemons by name
only.  we really don't care about false positives here!
2011-11-19 20:56:49 -08:00
Sage Weil
4b53288b0c ceph_manager: % 2011-11-19 20:56:49 -08:00
Josh Durgin
508f4f8359 Save summary after nuking machines.
This way you can tell when tests are entirely finished running.
2011-11-18 13:53:51 -08:00
Josh Durgin
42cecb5e55 suite: put common config before facets
This lets you add tasks to the beginning of a run, like the chef task.
2011-11-17 17:26:21 -08:00
Josh Durgin
044a88ce59 suite: schedule a list of collections for running instead of a single suite directory 2011-11-17 17:16:23 -08:00
Yehuda Sadeh
23aae67aff testswift: fix config 2011-11-17 16:53:57 -08:00
Tommi Virtanen
d8fc151365 Clean up C++isms. 2011-11-17 17:00:44 -08:00
Tommi Virtanen
c545094895 Add a task for easily running chef-solo on all the nodes. 2011-11-17 16:49:47 -08:00
Sage Weil
89f80412c2 ceph_manager: fix logging 2011-11-17 13:46:02 -08:00
Josh Durgin
f85f5dd7e3 ceph: deep merge overrides, so e.g. log whitelists can be overridden 2011-11-17 13:07:03 -08:00
Josh Durgin
a763297685 misc: move deep_merge out of the MergeConfig class - it's generic 2011-11-17 13:06:36 -08:00
Josh Durgin
c6988a07f4 Save config after locking nodes, so targets are included. 2011-11-17 11:57:07 -08:00
Josh Durgin
4e6cd55c59 filestore_idempotent: remove unused import 2011-11-17 11:18:24 -08:00
Josh Durgin
7d51e3d381 mon_recovery: remove unused code and import 2011-11-17 11:16:08 -08:00
Josh Durgin
f4d527e743 thrashosds: timeout for every clean check, not just the last one 2011-11-17 11:11:33 -08:00
Josh Durgin
9d12b720e8 ceph_manager: add a default timeout of 5 minutes for mon quorum 2011-11-17 11:05:12 -08:00
Josh Durgin
cb9ac0897b ceph_manager: log mon quorum status so the logs show progress (or lack thereof) 2011-11-17 10:45:19 -08:00
Yehuda Sadeh
f3c569ee23 rgw: add swift task
still not completely working (for some reason it skips all the tests)
2011-11-16 16:00:01 -08:00
Sage Weil
c5f070b8a9 filestore_idempotent.py: simple task to test non-idempotent osd ops
Write some non-idempotent events to the osd.  Simulate a failure.  Verify
the result is correct on replay.

This must be preceeded by the ceph task just so that we get the binaries
installed.  Should clean this up later if/when the installation gets
factored out of ceph.py.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-10 21:35:11 -08:00
Sage Weil
77c977c1cf misc: allow >1 monitor per role in get_mon_names()
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-10 14:13:24 -08:00
Josh Durgin
afa56f16d1 nuke: increase reboot timeout
Some sepia nodes are very slow to reboot.
2011-11-09 10:49:37 -08:00
Sage Weil
6618a0275c mon_recovery: add task to test monitor cluster failure recovery
Some simple tests to start with.  We still need some sort of mon cluster
thrashing.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-08 22:17:00 -08:00
Sage Weil
60863f70eb ceph_manager: manipulate monitors 2011-11-08 22:17:00 -08:00
Sage Weil
6d39cc1146 ceph: keep ceph.conf at ctx.ceph.conf
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-08 22:17:00 -08:00
Josh Durgin
006a0dd423 Remove unused imports and variable. 2011-11-08 16:09:21 -08:00
Josh Durgin
5d32bcae50 Add nuke-on-error option.
This lets automated jobs nuke and unlock machines after failed
tests. Each machine is nuke individually, so one down machine won't
keep others from being nuked and unlocked.
2011-11-08 16:09:21 -08:00
Tommi Virtanen
c764b2475b Fix leftover orchestra import clause.
This seems to be a leftover from
a2372fce12,
no idea how it stayed hidden this long.
2011-11-07 13:05:14 -08:00
Josh Durgin
4f3b113832 ceph_manager: log ceph -s output so progress is visible in the logs 2011-11-03 13:27:44 -07:00
Josh Durgin
0b451f9475 Keep each ssh connection alive.
With long-running jobs like thrashing, ssh connections were timing
out.
2011-11-03 13:08:49 -07:00
Josh Durgin
6e3e0d7cdc connection: allow the caller to specify whether keep-alive should be used 2011-11-03 13:07:21 -07:00
Josh Durgin
b1a0c1adea locker: fix race in locking
The isolation level is lower than I thought. This made it possible for
two clients to think they both locked the same machines, since the
update would still be modifying each row to change the locked_since
time.
2011-11-03 11:29:18 -07:00
Samuel Just
a2f406ef49 testrados: set CEPH_CLIENT_ID without a ;
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-02 11:33:37 -07:00
Samuel Just
810cae1a1d testrados: specify CEPH_CONF directly
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-10-31 14:54:24 -07:00
Yehuda Sadeh
10c3508741 rgw: add user suspend/enable test 2011-10-27 12:11:28 -07:00
Yehuda Sadeh
86aa940ffb rgw: log-to-stderr is now a binary flag 2011-10-27 11:32:12 -07:00
Samuel Just
8d0a7c5977 testrados: rename testsnaps to testrados and make snap testing optional
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-10-24 14:25:22 -07:00
Josh Durgin
a1249d07ca workunit: set PYTHONPATH so we can test python bindings 2011-10-24 13:52:58 -07:00
Sage Weil
61cbb3218e ceph.conf: python parser doens't like ; comments 2011-10-23 10:30:27 -07:00
Sage Weil
3ed065625b ceph.conf: more frequent osd scrubbing; remove old cruft 2011-10-22 22:16:39 -07:00
Sage Weil
b8beff3dd5 ceph_manager: count active+clean+<somjething else> as active+clean
In my case, one pg was active+clean+scrubbing.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-21 10:54:05 -07:00
Sage Weil
4ec37b2391 add lost_unfound task
Also some misc useful bits to ceph_manager.
2011-10-17 15:32:22 -07:00
Josh Durgin
bcded7f163 ceph: add whitelist for cluster log errors
Some messages are expected when thrashing osds or creating unfound
objects.

Fixes: #1622
2011-10-17 14:42:08 -07:00
Josh Durgin
fba220ecaa nuke: reset syslog configuration after rebooting
Previously we removed a file and rebooted without syncing, so the file
was never deleted.
2011-10-17 10:40:19 -07:00
Yehuda Sadeh
493596a7fd radosgw-admin: test swift keys creation/removal 2011-10-12 15:37:33 -07:00
Josh Durgin
321381d75f teuthology-worker: remove --keep-locked-on-error 2011-10-07 14:51:46 -07:00
Josh Durgin
3d3eb0efea Remove --keep-locked-on-error, and behave as if it were specified
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
2011-10-07 14:49:53 -07:00
Josh Durgin
c56ab97442 reconnect: ignore SSHExceptions before the timeout expires
Fixes: #1587
2011-10-06 17:18:35 -07:00
Samuel Just
4722d468c6 task/watch_notify_stress: watch_notify_stress now thrashes clients
This should exercise the watch notify timeout code.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-10-06 14:34:44 -07:00
Sage Weil
4e61e4835e rgw: keep radosgw in foreground
It defaults to a daemon now.
2011-10-06 12:50:12 -07:00
Josh Durgin
107db6a913 Retry listing machines if the lock server goes down. 2011-10-04 17:21:00 -07:00
Sage Weil
39a1e76065 rgw: use normal logging mechanism
Keep capturing stdout/err, even though it should end up empty.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-04 16:09:51 -07:00
Josh Durgin
7b7ff6e8ce teuthology-worker: clean up last_in_suite jobs
There's no reason not to delete them once they start.
2011-10-04 12:32:58 -07:00
Josh Durgin
3d3ba1ebb1 daemon-helper: detect the signal actually sent
I thought I fixed this when I implemented coverage collection, but I
guess it got lost in a rebase or something.
2011-10-04 12:17:19 -07:00
Josh Durgin
d305d61b86 ceph_manager: remove unused raw_pg_status method 2011-10-03 17:49:53 -07:00