Josh Durgin
2a1c74c5f5
Move duration calculation to an internal task
...
This excludes all generic start up costs, like waiting for locks,
rebooting into a new kernel, etc.
2012-02-21 15:12:26 -08:00
Tommi Virtanen
d7be77628c
Allow user to disable lock checking.
...
The new plana hardware isn't in the old sepia lock database,
and the machine pools are risky to merge as nothing in the
software guarantees allocation from just one pool. This allows
us to hand-allocate machines temporarily.
2012-01-31 08:05:36 -08:00
Sage Weil
f70b158cd1
show host -> roles mapping on startup
...
Less guessing when manually inspecting an in-progress or hung run.
2012-01-15 22:52:58 -08:00
Josh Durgin
d2fadf9fe2
syslog: ignore lockdep non-static key warning
...
It looks like this warning was made default in linux 3.2.
This will keep happening until #1922 is done.
2012-01-10 15:28:42 -08:00
Josh Durgin
d0e90d71bd
syslog checking: forgot a pipe
2011-12-16 18:09:17 -08:00
Josh Durgin
c9e4504fbd
Ignore lockdep being turned off for now.
...
Some machines are hitting this udev issue:
http://marc.info/?l=linux-kernel&m=132033587908426&w=2 and lockdep is
turned off after the first warning.
2011-12-12 16:29:41 -08:00
Josh Durgin
7b52dd1410
syslog: ignore 'task blocked' warnings
...
These will happen under heavy load (usually on the osd).
2011-12-08 17:17:47 -08:00
Josh Durgin
e69057e4a1
internal: check syslog for errors
...
This should catch lockdep warnings and mark tests with them as failed.
2011-12-07 15:20:33 -08:00
Josh Durgin
c6988a07f4
Save config after locking nodes, so targets are included.
2011-11-17 11:57:07 -08:00
Tommi Virtanen
c764b2475b
Fix leftover orchestra import clause.
...
This seems to be a leftover from
a2372fce12
,
no idea how it stayed hidden this long.
2011-11-07 13:05:14 -08:00
Josh Durgin
0b451f9475
Keep each ssh connection alive.
...
With long-running jobs like thrashing, ssh connections were timing
out.
2011-11-03 13:08:49 -07:00
Josh Durgin
3d3eb0efea
Remove --keep-locked-on-error, and behave as if it were specified
...
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
2011-10-07 14:49:53 -07:00
Josh Durgin
107db6a913
Retry listing machines if the lock server goes down.
2011-10-04 17:21:00 -07:00
Josh Durgin
1cad309d65
Add failure_reason to summary for the first failure detected.
...
For now, this is the exception raised during a task, the error found
in the central log, or coredumps found. More specific errors
(i.e. s3-tests had 3 failures) can be added later as exceptions raised
by tasks.
2011-10-03 17:07:41 -07:00
Tommi Virtanen
a2372fce12
Move orchestra to teuthology.orchestra so there's just one top-level package.
2011-09-13 14:53:02 -07:00
Tommi Virtanen
747deecaf6
Add assert to catch simple typos in roles list.
...
Input of "roles:\n- [mds,1]" used to make teuthology crash
in a non-obviou way.
2011-08-15 09:36:06 -07:00
Josh Durgin
3e6b17f1b8
Down machines shouldn't be considered free.
2011-08-05 10:59:16 -07:00
Josh Durgin
68e6f2b77e
Make scheduled tasks leave some machines free.
2011-08-04 18:32:57 -07:00
Josh Durgin
4e399da700
Log connections to targets
...
This way you can tell which machines have problems in case of an
error.
2011-08-04 18:25:43 -07:00
Greg Farnum
6ac6f7ab38
teuthology: convert from bzip2 to gzip.
...
gzip is much, much faster on large log files. With a 7.7GB client log, gzip
took 2:45 to compress it to 624MB. bzip2 took 34:38 to compress it to
366MB. For our purposes the space savings are not worth the time loss.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-07-29 10:35:02 -07:00
Josh Durgin
271e066d6c
Connect without using any known_hosts files.
2011-07-19 17:13:13 -07:00
Josh Durgin
8d196b001c
Make targets a dictionary mapping hosts to ssh host keys.
2011-07-19 17:13:13 -07:00
Josh Durgin
5fadb1c11c
Whitespace and style cleanup.
2011-07-11 18:07:37 -07:00
Josh Durgin
e69cf0b1b7
Success of test may not have been set yet.
2011-07-11 18:00:12 -07:00
Josh Durgin
28f19a4104
Add an option to keep machines locked if a test fails.
2011-07-11 16:23:05 -07:00
Sage Weil
2f35eddb27
clean up locked machine list
2011-07-11 15:28:15 -07:00
Sage Weil
91c6f351a1
tell user which machines you locked
2011-07-11 14:39:21 -07:00
Sage Weil
a8d4901fe6
make connect work if no roles are specified
...
This is useful for -nuke.
2011-07-11 14:23:31 -07:00
Josh Durgin
fd30ed76bf
Add --block option to retry until machines are locked.
...
If there are not enough machines up, fail immediately.
2011-07-07 16:15:18 -07:00
Josh Durgin
a55d2eb53a
Read lock server from ~/teuthology.yaml.
2011-07-07 12:35:11 -07:00
Josh Durgin
9bfca87980
Check that all machines are locked, and add an option to lock machines instead of providing targets.
2011-07-07 12:35:11 -07:00
Tommi Virtanen
e16556e377
Archive dir removal has to be unconditional.
...
Even when ctx.archive is False, ceph logging
need the destination directory exist, so
/tmp/cephtest/archive has to be created (and
thus removed) unconditionally.
2011-06-30 11:26:20 -07:00
Tommi Virtanen
e481db1337
Archive syslog messages while the test was in progress.
2011-06-20 14:31:41 -07:00
Tommi Virtanen
bc8cc868f9
Fix bug that thought all >1 node clusters always had core dumps.
...
Accidentally shared the stdout between all the runs.
2011-06-20 14:31:41 -07:00
Tommi Virtanen
57c542b9e8
Archive cores dumped during test, record test as failed if any seen.
2011-06-17 16:00:39 -07:00
Tommi Virtanen
78a3c23418
Move non-ceph logic out of the ceph task: base dir, archive transfer.
2011-06-16 14:36:22 -07:00
Tommi Virtanen
301ab56748
Move non-ceph logic out of the ceph task: host in use check.
...
To avoid every config always listing basic tasks, we silently
add internal.* tasks in front of the task list.
2011-06-16 14:36:21 -07:00