Commit Graph

37 Commits

Author SHA1 Message Date
Josh Durgin
2a1c74c5f5 Move duration calculation to an internal task
This excludes all generic start up costs, like waiting for locks,
rebooting into a new kernel, etc.
2012-02-21 15:12:26 -08:00
Tommi Virtanen
d7be77628c Allow user to disable lock checking.
The new plana hardware isn't in the old sepia lock database,
and the machine pools are risky to merge as nothing in the
software guarantees allocation from just one pool. This allows
us to hand-allocate machines temporarily.
2012-01-31 08:05:36 -08:00
Sage Weil
f70b158cd1 show host -> roles mapping on startup
Less guessing when manually inspecting an in-progress or hung run.
2012-01-15 22:52:58 -08:00
Josh Durgin
d2fadf9fe2 syslog: ignore lockdep non-static key warning
It looks like this warning was made default in linux 3.2.
This will keep happening until #1922 is done.
2012-01-10 15:28:42 -08:00
Josh Durgin
d0e90d71bd syslog checking: forgot a pipe 2011-12-16 18:09:17 -08:00
Josh Durgin
c9e4504fbd Ignore lockdep being turned off for now.
Some machines are hitting this udev issue:
http://marc.info/?l=linux-kernel&m=132033587908426&w=2 and lockdep is
turned off after the first warning.
2011-12-12 16:29:41 -08:00
Josh Durgin
7b52dd1410 syslog: ignore 'task blocked' warnings
These will happen under heavy load (usually on the osd).
2011-12-08 17:17:47 -08:00
Josh Durgin
e69057e4a1 internal: check syslog for errors
This should catch lockdep warnings and mark tests with them as failed.
2011-12-07 15:20:33 -08:00
Josh Durgin
c6988a07f4 Save config after locking nodes, so targets are included. 2011-11-17 11:57:07 -08:00
Tommi Virtanen
c764b2475b Fix leftover orchestra import clause.
This seems to be a leftover from
a2372fce12,
no idea how it stayed hidden this long.
2011-11-07 13:05:14 -08:00
Josh Durgin
0b451f9475 Keep each ssh connection alive.
With long-running jobs like thrashing, ssh connections were timing
out.
2011-11-03 13:08:49 -07:00
Josh Durgin
3d3eb0efea Remove --keep-locked-on-error, and behave as if it were specified
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
2011-10-07 14:49:53 -07:00
Josh Durgin
107db6a913 Retry listing machines if the lock server goes down. 2011-10-04 17:21:00 -07:00
Josh Durgin
1cad309d65 Add failure_reason to summary for the first failure detected.
For now, this is the exception raised during a task, the error found
in the central log, or coredumps found. More specific errors
(i.e. s3-tests had 3 failures) can be added later as exceptions raised
by tasks.
2011-10-03 17:07:41 -07:00
Tommi Virtanen
a2372fce12 Move orchestra to teuthology.orchestra so there's just one top-level package. 2011-09-13 14:53:02 -07:00
Tommi Virtanen
747deecaf6 Add assert to catch simple typos in roles list.
Input of "roles:\n- [mds,1]" used to make teuthology crash
in a non-obviou way.
2011-08-15 09:36:06 -07:00
Josh Durgin
3e6b17f1b8 Down machines shouldn't be considered free. 2011-08-05 10:59:16 -07:00
Josh Durgin
68e6f2b77e Make scheduled tasks leave some machines free. 2011-08-04 18:32:57 -07:00
Josh Durgin
4e399da700 Log connections to targets
This way you can tell which machines have problems in case of an
error.
2011-08-04 18:25:43 -07:00
Greg Farnum
6ac6f7ab38 teuthology: convert from bzip2 to gzip.
gzip is much, much faster on large log files. With a 7.7GB client log, gzip
took 2:45 to compress it to 624MB. bzip2 took 34:38 to compress it to
366MB. For our purposes the space savings are not worth the time loss.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-07-29 10:35:02 -07:00
Josh Durgin
271e066d6c Connect without using any known_hosts files. 2011-07-19 17:13:13 -07:00
Josh Durgin
8d196b001c Make targets a dictionary mapping hosts to ssh host keys. 2011-07-19 17:13:13 -07:00
Josh Durgin
5fadb1c11c Whitespace and style cleanup. 2011-07-11 18:07:37 -07:00
Josh Durgin
e69cf0b1b7 Success of test may not have been set yet. 2011-07-11 18:00:12 -07:00
Josh Durgin
28f19a4104 Add an option to keep machines locked if a test fails. 2011-07-11 16:23:05 -07:00
Sage Weil
2f35eddb27 clean up locked machine list 2011-07-11 15:28:15 -07:00
Sage Weil
91c6f351a1 tell user which machines you locked 2011-07-11 14:39:21 -07:00
Sage Weil
a8d4901fe6 make connect work if no roles are specified
This is useful for -nuke.
2011-07-11 14:23:31 -07:00
Josh Durgin
fd30ed76bf Add --block option to retry until machines are locked.
If there are not enough machines up, fail immediately.
2011-07-07 16:15:18 -07:00
Josh Durgin
a55d2eb53a Read lock server from ~/teuthology.yaml. 2011-07-07 12:35:11 -07:00
Josh Durgin
9bfca87980 Check that all machines are locked, and add an option to lock machines instead of providing targets. 2011-07-07 12:35:11 -07:00
Tommi Virtanen
e16556e377 Archive dir removal has to be unconditional.
Even when ctx.archive is False, ceph logging
need the destination directory exist, so
/tmp/cephtest/archive has to be created (and
thus removed) unconditionally.
2011-06-30 11:26:20 -07:00
Tommi Virtanen
e481db1337 Archive syslog messages while the test was in progress. 2011-06-20 14:31:41 -07:00
Tommi Virtanen
bc8cc868f9 Fix bug that thought all >1 node clusters always had core dumps.
Accidentally shared the stdout between all the runs.
2011-06-20 14:31:41 -07:00
Tommi Virtanen
57c542b9e8 Archive cores dumped during test, record test as failed if any seen. 2011-06-17 16:00:39 -07:00
Tommi Virtanen
78a3c23418 Move non-ceph logic out of the ceph task: base dir, archive transfer. 2011-06-16 14:36:22 -07:00
Tommi Virtanen
301ab56748 Move non-ceph logic out of the ceph task: host in use check.
To avoid every config always listing basic tasks, we silently
add internal.* tasks in front of the task list.
2011-06-16 14:36:21 -07:00