Commit Graph

58 Commits

Author SHA1 Message Date
Tommi Virtanen
99ac6b0b3e Disable asynchronous DNS lookups.
Especially on older hosts, we keep triggering errors::

  ServerNotFoundError: Unable to find the server at
  teuthology.front.sepia.ceph.com: [Errno 3] name does not exist

That comes from libevent's evdns via gevent.dns and httplib2. The rate
of these errors is low enough that they seem to be perhaps timeouts,
or more arbitrary. Busy looping on DNS resolution calls has never
triggered them, so far.

With ``monkey.patch_all(dns=False)``, the teuthology process will
block as a whole whenever doing DNS resolution. This will hopefully be
rare enough that it won't matter.

The only real "fix" seems to be upgrading libraries and hoping for the
best; this commit can be reverted after that is done.
2012-08-13 16:18:33 -07:00
Sage Weil
042edcbe1e schedule/suite: schedule job, suite N times 2012-07-14 13:51:51 -07:00
Sage Weil
e5fb49914c run: make -a short for --archive 2012-07-05 13:43:19 -07:00
Sage Weil
c8e1ec6a91 record owner at start of run
So that we can clean up easily even when we don't finish and there is no
summary.yaml.
2012-06-20 11:35:43 -07:00
Josh Durgin
25114bf9a4 nuke: refactor to run in parallel and add unlock option
nuke-on-error already did this, but now teuthology-nuke does it
too. Also outputs targets that couldn't be nuked at the end.
2012-04-24 17:52:01 -07:00
Mark Nelson
1836d4672f Added assertion to check that targets > roles
Signed-off-by: Mark Nelson <mark.nelson@dreamhost.com>
2012-04-03 15:56:51 -07:00
Josh Durgin
1493674735 Use non-zero exit status if any tests failed
Fixes: #1989
2012-03-05 13:34:33 -08:00
Josh Durgin
2a1c74c5f5 Move duration calculation to an internal task
This excludes all generic start up costs, like waiting for locks,
rebooting into a new kernel, etc.
2012-02-21 15:12:26 -08:00
Sage Weil
8fb115fe2c include run duration in summary.yaml 2012-01-16 12:39:20 -08:00
Sage Weil
b354ce4e91 run: put pid in archive dir
This will make it easy for teuthology-ls to show you the running process's
pid (if it's still running).  Or for other utiltizes to kill + clean up
a hung teuthology run.
2012-01-08 14:39:30 -08:00
Josh Durgin
561f06cf94 suite: make email-on-success the default behavior
This way you can tell when a run is complete, instead of wondering if
it's stuck in the queue.
2012-01-05 17:27:31 -08:00
Josh Durgin
cdd5c456a0 nuke-on-error: only unlock if this run locked the machines 2012-01-03 13:02:31 -08:00
Josh Durgin
508f4f8359 Save summary after nuking machines.
This way you can tell when tests are entirely finished running.
2011-11-18 13:53:51 -08:00
Josh Durgin
a763297685 misc: move deep_merge out of the MergeConfig class - it's generic 2011-11-17 13:06:36 -08:00
Josh Durgin
c6988a07f4 Save config after locking nodes, so targets are included. 2011-11-17 11:57:07 -08:00
Josh Durgin
5d32bcae50 Add nuke-on-error option.
This lets automated jobs nuke and unlock machines after failed
tests. Each machine is nuke individually, so one down machine won't
keep others from being nuked and unlocked.
2011-11-08 16:09:21 -08:00
Josh Durgin
3d3eb0efea Remove --keep-locked-on-error, and behave as if it were specified
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
2011-10-07 14:49:53 -07:00
Josh Durgin
c3c262656d schedule: put results timeout in the job
The default was always being used instead.
2011-09-21 11:05:33 -07:00
Tommi Virtanen
a2372fce12 Move orchestra to teuthology.orchestra so there's just one top-level package. 2011-09-13 14:53:02 -07:00
Sage Weil
d4a876f3e3 teuthology: do a deep merge of input yaml fragments
Concatenate lists, and recursively combine dicts.

If you specify inputs like

 foo:
 - a
 - b

and

 foo:
 - c

you should get

 foo:
 - a
 - b
 - c

Dicts should also be merged (last one wins), and the merging is deep. E.g.

 foo:
   a:
     b:
       c: 1

and

 foo:
   a:
     b:
       c: 2

is

 foo:
   a:
     b:
       c: 2

Fixes: #1497
2011-09-03 15:07:21 -07:00
Josh Durgin
d340ebac4e schedule: add a way to delete jobs from the queue 2011-08-31 17:43:14 -07:00
Josh Durgin
7be9eaa030 suite: add option to send an email if the entire suite passed 2011-08-29 12:42:45 -07:00
Josh Durgin
4f4227a44d Generate coverage at the end of a suite run,
and optionally email failures and ongoing jobs.
2011-08-29 10:23:12 -07:00
Greg Farnum
af0d7c5e44 teuthology-nuke: move it into its own file.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-08-10 15:38:57 -07:00
Greg Farnum
453a0f99d4 teuthology-nuke: identify and reboot machines with kernel mounts
This includes untested code for just force-unmounting them
when that works again, but for now it does a full reboot-and-
reconnect cycle.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-08-10 14:37:46 -07:00
Greg Farnum
9566008468 teuthology-nuke: use a more robust cfuse mount finder
This way it can remove cfuse mounts in any location on
the system.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-08-10 14:37:41 -07:00
Greg Farnum
257d63137f teuthology-nuke: split out different pieces into different loops
This will let us behave more intelligently on things like
nuking kernel mounts.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-08-10 14:37:36 -07:00
Josh Durgin
5897d7b95d teuthology-nuke: run in parallel, and print each node being nuked 2011-08-03 14:52:55 -07:00
Josh Durgin
30a8dac323 Set success at the beginning of a run.
This way internal tasks like locking can tell whether the run
succeeded, and unlock nodes if it did.
2011-08-03 14:03:13 -07:00
Josh Durgin
e8676ce0eb teuthology-nuke: reset rsyslog config 2011-08-03 11:21:32 -07:00
Josh Durgin
02d0efad97 schedule: make default owner different from that of a normal run
This way the machines locked by scheduled jobs aren't confused
with those locked by manual runs, so they're harder to accidentally
unlock.
2011-07-19 17:25:57 -07:00
Josh Durgin
176b304c3d fusermount runs on a single mount point. 2011-07-13 14:02:46 -07:00
Josh Durgin
5fadb1c11c Whitespace and style cleanup. 2011-07-11 18:07:37 -07:00
Josh Durgin
28f19a4104 Add an option to keep machines locked if a test fails. 2011-07-11 16:23:05 -07:00
Sage Weil
6cf9633a6a nuke: use default owner 2011-07-11 14:39:04 -07:00
Josh Durgin
85c24bda7f Add teuthology-schedule and teuthology-worker.
schedule puts jobs in a beanstalk queue, worker takes them out and runs them.
2011-07-11 13:49:06 -07:00
Josh Durgin
fd30ed76bf Add --block option to retry until machines are locked.
If there are not enough machines up, fail immediately.
2011-07-07 16:15:18 -07:00
Josh Durgin
a55d2eb53a Read lock server from ~/teuthology.yaml. 2011-07-07 12:35:11 -07:00
Josh Durgin
9158c83167 Verify that machines are locked before nuking them. 2011-07-07 12:35:11 -07:00
Josh Durgin
9bfca87980 Check that all machines are locked, and add an option to lock machines instead of providing targets. 2011-07-07 12:35:11 -07:00
Josh Durgin
09bee43593 Move username to a utility method. 2011-07-07 12:32:58 -07:00
Sage Weil
f164dd7933 nuke: sudo for the final rm -rf 2011-07-05 16:47:00 -07:00
Sage Weil
2b168b033d nuke: do not escape fusermount .../mnt.* 2011-07-05 09:01:01 -07:00
Josh Durgin
effee7ffc6 Make kernel a separate entity outside of tasks.
It is run before anything other than checking for conflicts.
This way it can't step on the connections used by other tasks,
or clobber test files in /tmp when rebooting.
2011-06-30 16:05:53 -07:00
Sage Weil
b95e61ae29 teuthology-nuke
Take in a full config (or just targets: portion) and do a destructive
cleanup.

Still need to clean up kernel mounts.
2011-06-29 12:23:44 -07:00
Sage Weil
2125e8dc1e include @hostname in owner 2011-06-29 12:09:38 -07:00
Sage Weil
052f43c958 pass owner, optional description through to summary.yaml
Owner can be overridden explicitly, otherwise it's the running unix user.

The description is optional and passed straight through.
2011-06-29 12:09:38 -07:00
Tommi Virtanen
e481db1337 Archive syslog messages while the test was in progress. 2011-06-20 14:31:41 -07:00
Tommi Virtanen
57c542b9e8 Archive cores dumped during test, record test as failed if any seen. 2011-06-17 16:00:39 -07:00
Tommi Virtanen
78a3c23418 Move non-ceph logic out of the ceph task: base dir, archive transfer. 2011-06-16 14:36:22 -07:00