Commit Graph

51 Commits

Author SHA1 Message Date
Sam Lang
58111595d4 Support power cycling osds/nodes through ipmi
This patch defines a RemoteConsole class associated
with each Remote class instance, allowing
power cycling a target through ipmi.

Fixes/Implements #3782.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 08:23:37 -06:00
Sam Lang
ace4cb07b2 Replace /tmp/cephtest/ with configurable path
Teuthology uses /tmp/cephtest/ as the scratch test directory for
a run.  This patch replaces /tmp/cephtest/ everywhere with a
per-run directory: {basedir}/{rundir} where {basedir} is a directory
configured in .teuthology.yaml (/tmp/cephtest if not specified),
and {rundir} is the name of the run, as given in --name.  If no name
is specified, {user}-{timestamp} is used.

To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest
in .teuthology.yaml.

This change was modivated by #3782, which requires a test dir that
survives across reboots, but also resolves #3767.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 08:23:31 -06:00
Sage Weil
b22e3ea526 internal: stop warning about lockdep circular dependency
This is coming from xfs, currently.  Bah.
2012-09-30 21:07:58 -07:00
Sage Weil
30748f36e2 fix lock held when returning to user space typo 2012-09-23 08:03:17 -07:00
Sage Weil
0395df3157 ignore 'lock held when returning to user space' from btrfs sb_internal crap 2012-09-19 16:42:02 -07:00
Sage Weil
dc1c247abc disable lockdep recursive warnings until #3040 is fixed 2012-08-24 19:23:34 -07:00
Sage Weil
b6b302890f internal: fix escaping of \b in syslog grep 2012-08-23 11:00:39 -07:00
Sage Weil
1c93d5ab4d syslog check: fix false-positive BUG matches in random strings 2012-07-29 12:15:51 -07:00
Sage Weil
ff0f4742fe set machine description to ctx.archive when auto-locking machines for a run 2012-07-16 10:53:25 -07:00
Sage Weil
cff2cfa217 internal: move pulling archive w/ tar to helper 2012-07-11 14:10:00 -07:00
Sage Weil
cc380dee40 ignore DEADLOCK line inside lockdep splat 2012-06-25 15:20:19 -07:00
Sage Weil
7773a93e3e whitelist current lockdep warnings in syslog
These are causing too much noise in the qa runs to leave, and #2617 is
sufficiently non-trivial to do this in the interim.  Putting a better
mechanism in place will include removing these coarse whitelist items and
replacing with something that specifically matches the failures we want
to ignore.
2012-06-21 13:20:18 -07:00
Sage Weil
715abdea56 ignore syslog cron noise 2012-05-01 22:26:03 -07:00
Sage Weil
407b2e0bc7 whitelist xfs_fsr syslog noise
Ignore lines like

2012-04-17T13:44:11-07:00 plana59 fsr[5454]: DEBUG: fsize=450560 blsz_dio=450560 d_min=512 d_max=2147483136 pgsz=4096
2012-04-18 11:21:10 -07:00
Josh Durgin
2a1c74c5f5 Move duration calculation to an internal task
This excludes all generic start up costs, like waiting for locks,
rebooting into a new kernel, etc.
2012-02-21 15:12:26 -08:00
Tommi Virtanen
d7be77628c Allow user to disable lock checking.
The new plana hardware isn't in the old sepia lock database,
and the machine pools are risky to merge as nothing in the
software guarantees allocation from just one pool. This allows
us to hand-allocate machines temporarily.
2012-01-31 08:05:36 -08:00
Sage Weil
f70b158cd1 show host -> roles mapping on startup
Less guessing when manually inspecting an in-progress or hung run.
2012-01-15 22:52:58 -08:00
Josh Durgin
d2fadf9fe2 syslog: ignore lockdep non-static key warning
It looks like this warning was made default in linux 3.2.
This will keep happening until #1922 is done.
2012-01-10 15:28:42 -08:00
Josh Durgin
d0e90d71bd syslog checking: forgot a pipe 2011-12-16 18:09:17 -08:00
Josh Durgin
c9e4504fbd Ignore lockdep being turned off for now.
Some machines are hitting this udev issue:
http://marc.info/?l=linux-kernel&m=132033587908426&w=2 and lockdep is
turned off after the first warning.
2011-12-12 16:29:41 -08:00
Josh Durgin
7b52dd1410 syslog: ignore 'task blocked' warnings
These will happen under heavy load (usually on the osd).
2011-12-08 17:17:47 -08:00
Josh Durgin
e69057e4a1 internal: check syslog for errors
This should catch lockdep warnings and mark tests with them as failed.
2011-12-07 15:20:33 -08:00
Josh Durgin
c6988a07f4 Save config after locking nodes, so targets are included. 2011-11-17 11:57:07 -08:00
Tommi Virtanen
c764b2475b Fix leftover orchestra import clause.
This seems to be a leftover from
a2372fce12,
no idea how it stayed hidden this long.
2011-11-07 13:05:14 -08:00
Josh Durgin
0b451f9475 Keep each ssh connection alive.
With long-running jobs like thrashing, ssh connections were timing
out.
2011-11-03 13:08:49 -07:00
Josh Durgin
3d3eb0efea Remove --keep-locked-on-error, and behave as if it were specified
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
2011-10-07 14:49:53 -07:00
Josh Durgin
107db6a913 Retry listing machines if the lock server goes down. 2011-10-04 17:21:00 -07:00
Josh Durgin
1cad309d65 Add failure_reason to summary for the first failure detected.
For now, this is the exception raised during a task, the error found
in the central log, or coredumps found. More specific errors
(i.e. s3-tests had 3 failures) can be added later as exceptions raised
by tasks.
2011-10-03 17:07:41 -07:00
Tommi Virtanen
a2372fce12 Move orchestra to teuthology.orchestra so there's just one top-level package. 2011-09-13 14:53:02 -07:00
Tommi Virtanen
747deecaf6 Add assert to catch simple typos in roles list.
Input of "roles:\n- [mds,1]" used to make teuthology crash
in a non-obviou way.
2011-08-15 09:36:06 -07:00
Josh Durgin
3e6b17f1b8 Down machines shouldn't be considered free. 2011-08-05 10:59:16 -07:00
Josh Durgin
68e6f2b77e Make scheduled tasks leave some machines free. 2011-08-04 18:32:57 -07:00
Josh Durgin
4e399da700 Log connections to targets
This way you can tell which machines have problems in case of an
error.
2011-08-04 18:25:43 -07:00
Greg Farnum
6ac6f7ab38 teuthology: convert from bzip2 to gzip.
gzip is much, much faster on large log files. With a 7.7GB client log, gzip
took 2:45 to compress it to 624MB. bzip2 took 34:38 to compress it to
366MB. For our purposes the space savings are not worth the time loss.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-07-29 10:35:02 -07:00
Josh Durgin
271e066d6c Connect without using any known_hosts files. 2011-07-19 17:13:13 -07:00
Josh Durgin
8d196b001c Make targets a dictionary mapping hosts to ssh host keys. 2011-07-19 17:13:13 -07:00
Josh Durgin
5fadb1c11c Whitespace and style cleanup. 2011-07-11 18:07:37 -07:00
Josh Durgin
e69cf0b1b7 Success of test may not have been set yet. 2011-07-11 18:00:12 -07:00
Josh Durgin
28f19a4104 Add an option to keep machines locked if a test fails. 2011-07-11 16:23:05 -07:00
Sage Weil
2f35eddb27 clean up locked machine list 2011-07-11 15:28:15 -07:00
Sage Weil
91c6f351a1 tell user which machines you locked 2011-07-11 14:39:21 -07:00
Sage Weil
a8d4901fe6 make connect work if no roles are specified
This is useful for -nuke.
2011-07-11 14:23:31 -07:00
Josh Durgin
fd30ed76bf Add --block option to retry until machines are locked.
If there are not enough machines up, fail immediately.
2011-07-07 16:15:18 -07:00
Josh Durgin
a55d2eb53a Read lock server from ~/teuthology.yaml. 2011-07-07 12:35:11 -07:00
Josh Durgin
9bfca87980 Check that all machines are locked, and add an option to lock machines instead of providing targets. 2011-07-07 12:35:11 -07:00
Tommi Virtanen
e16556e377 Archive dir removal has to be unconditional.
Even when ctx.archive is False, ceph logging
need the destination directory exist, so
/tmp/cephtest/archive has to be created (and
thus removed) unconditionally.
2011-06-30 11:26:20 -07:00
Tommi Virtanen
e481db1337 Archive syslog messages while the test was in progress. 2011-06-20 14:31:41 -07:00
Tommi Virtanen
bc8cc868f9 Fix bug that thought all >1 node clusters always had core dumps.
Accidentally shared the stdout between all the runs.
2011-06-20 14:31:41 -07:00
Tommi Virtanen
57c542b9e8 Archive cores dumped during test, record test as failed if any seen. 2011-06-17 16:00:39 -07:00
Tommi Virtanen
78a3c23418 Move non-ceph logic out of the ceph task: base dir, archive transfer. 2011-06-16 14:36:22 -07:00