Commit Graph

1626 Commits

Author SHA1 Message Date
Zack Cerza
7f135ec94a Enable reporting of single jobs
(also switch to docopt)

Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-12 17:00:43 -06:00
Zack Cerza
3d23b9b205 Remove the child's stderr completely
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-12 15:45:58 -06:00
Zack Cerza
625f479b68 When starting a job, tell paddles it's running
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-12 11:47:45 -06:00
Sandon Van Ness
a7f87f3a3a Longer timeout after sync/reboot.
With only a 5 second sleep via ssh and python it looks like a
race-condition was sometimes hitting where it would think
the machine is back up before the reboot command had completed.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-12-11 18:07:43 -08:00
Zack Cerza
b3acff1d4f Use continue, not break
Fixes a bug where not all pids were being collected

Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-10 16:48:12 -06:00
Zack Cerza
4a6e47cdce Tweak logic for pid lookup
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-10 16:48:07 -06:00
Zack Cerza
77145f1b7f Fix indentation
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-10 16:25:28 -06:00
Zack Cerza
57574fefc1 Don't show child's stderr, but show archive path
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-10 13:19:56 -06:00
Zack Cerza
339b7c474a Add debug statements
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-10 10:06:39 -06:00
Sage Weil
6c856a2e94 rados: allow existing pool(s) to be used
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-09 16:02:13 -08:00
Sage Weil
2266eeb301 ceph.conf: put 2x command in [global]
so that osdmaptool sees it.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-09 15:37:58 -08:00
Zack Cerza
48b8ba4ad2 Create a DateTime object from the timestamp
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-09 16:57:11 -06:00
Zack Cerza
5ea5018dbe Make -a optional
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-09 16:42:15 -06:00
Zack Cerza
3d6feb4b60 Merge pull request #151 from ceph/wip-distro-kernel
Wip distro kernel
2013-12-09 13:16:33 -08:00
Zack Cerza
d7289f75e8 Auto-restart
If /tmp/teuthology-restart-workers is newer than the running process,
restart.

Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-09 15:01:33 -06:00
Zack Cerza
33a3600ff3 Merge pull request #158 from ceph/wip-nuke
make nuke behave
2013-12-09 13:01:03 -08:00
Sage Weil
1b80f4aa1c nuke: ignore exceptions while issuing reboot command
I'm seeing failed tasks (and nuke) leak machines.  It looks like we are
getting an exception on the '... reboot -f -n' command when we should be
ignoring it and waiting for the machine to restart.

For example:
   http://qa-proxy.ceph.com/teuthology/sage-2013-12-08_19:25:06-rados:thrash-wip-tier-foo-basic-plana/136321/teuthology.log

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-09 11:42:12 -08:00
Sandon Van Ness
478ecc304f Remove unused variable.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-12-09 11:42:06 -08:00
Sandon Van Ness
ce8ff0a3c8 Added additional comments.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-12-09 11:35:23 -08:00
Sage Weil
a276606312 ceph.conf: default to 2x
A bunch of our tests rely on this; they need to be fixed
before we can run at 3x.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-07 13:20:58 -08:00
Sage Weil
c0a4327513 nuke: fix sync before reboot timeout
If you do 'timeout 5 sync' and sync hangs, timeout will block trying to
kill it.

Instead, just background sync, wait a few seconds, and reboot.  This means
we wait a few seconds even if sync returns immediately, but who cares!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 17:42:23 -08:00
Zack Cerza
856f83449c Implement a watchdog for queued jobs
This continually posts the run's status to the results server, if
configured, at an interval defaulting to 600 seconds.

Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-12-05 17:48:10 -06:00
Warren Usui
421192617f A create_if_vm call was made more than once when a lock-many style lock
was performed.  This caused downburst to run twice, and the second
downburst fails as a result of the first downburst running.

Fixes: 6933
2013-12-04 17:49:21 -08:00
Warren Usui
207c910e85 Merge branch 'teuthology-fix-downburst-yaml-wusui' 2013-12-04 17:36:14 -08:00
Warren Usui
94f7dd1f3a Implement --downburst-conf parameter for teuthology-lock.
Load the appropriate yaml information when found (this formerly
did not work).  Make sure teuthology --lock works with a downburst
entry in the yaml files.  Document how this works in README.rst.

Fixes: #6921
Reviewed-by: Dan Mick
2013-12-04 17:31:55 -08:00
Josh Durgin
5cc60996cf rbd: make default size larger for xfstests
Test 167 runs out of space on newer kernels

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-03 17:31:45 -08:00
Warren Usui
49a48ae8cf Merge branch 'wip-fix-teuth-tgt-wusui' 2013-11-25 20:56:24 -08:00
Warren Usui
4c7dd504ca tgt and iscsi code need some minor fixes. Moved the settle call during
simple read testing.  In iscsi.py, generic_mkfs and generic_mount need
to be called from the main body of the task.  An extraneous iscsiadm
command was removed.  The tgt size is now not hard-coded.  It is extracted
from the property and defaults to 10240.

Fixes: #6782
2013-11-25 20:44:52 -08:00
Zack Cerza
e75b2d58a2 Merge pull request #154 from ceph/wip-multi-mtype
Wip multi mtype
2013-11-25 15:31:34 -08:00
Sandon Van Ness
c0297b436a Changes suggested per review.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-11-25 01:19:13 -08:00
Zack Cerza
deec86c703 Also catch httplib2.ServerNotFoundError
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-11-22 17:03:29 -06:00
Dan Mick
f6b5acc043 internal.py: nitty little spelling error
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-11-21 22:04:19 -08:00
Sandon Van Ness
f7af3e723e Schedule-suite Use 'multi' tube for multiple types. Scheduling.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-11-21 15:21:19 -08:00
Sandon Van Ness
c38eeec85f Allow ability to use multi machine type deliminated by ,- \t.
I was originally attempting a more complicated locking mechanism
but I think its almost as good to just have it attempt the other
machine type if one.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-11-21 14:19:44 -08:00
Zack Cerza
d04f3a6ae0 Skip cluster() if use_existing_cluster is True
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-11-21 13:56:41 -06:00
SandonV
bf9434dbe7 Merge pull request #153 from ceph/wip-6790
Reviewed by Warren.
2013-11-20 18:03:04 -08:00
Sandon Van Ness
c5a26b38de Use shortened version in order to avoid revision/arch mishaps.
Sometimes -X is added to package names which does not exist in the
/version file. Simply using the version string does not work on
RHEL (it does on centos). Until version and the packages match
identically we instead will just split the version at the - and
no longer specify the dist for better reliability but slightly
lower accuracy.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-11-20 16:37:31 -08:00
Zack Cerza
f8150d44d0 Add optional 'use_existing_cluster' flag
If this flag is present, skip a few unnecessary steps

Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-11-20 16:23:07 -06:00
Sandon Van Ness
39830c613e Fix ceph.repo so it uses URI value.
Basically some weird cases where ceph-releases would be pointing
to the wrong branch/build when two branches had the same sha1.
This fixes that.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-11-14 21:47:41 -08:00
Samuel Just
04322d9fbb ceph_manager: provide unique pool names to avoid collision
Fixes: #6769
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-11-14 15:13:37 -08:00
Josh Durgin
07db94ef26 syslog: ignore perf nmi handler timeout
This seems to have started appearing in recent 3.12+ kernels
with perf enabled.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-11-13 15:27:30 -08:00
Zack Cerza
88792d62e1 Make report_job() always return an int 2013-11-12 17:07:15 -06:00
Sandon Van Ness
96cfb11b91 Add some debug logging.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-11-12 13:04:00 -08:00
Sandon Van Ness
f0e01ad0e5 Distro kernel bug-fixes.
Fixed some things that were being done incorrectly.

Some distro kernels have no debug so added | true when disabling
kdb. Also changed what was skipping kernels if non-ubuntu to also
schedule kernel install if a distro kernel.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-11-08 14:35:51 -08:00
Zack Cerza
8d9b86f5d7 Merge pull request #146 from ceph/wip-os-type
Wip os type
2013-11-08 12:24:42 -08:00
Sandon Van Ness
03f31c6caf Consolidate two excepts into one.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-11-08 11:02:48 -08:00
Zack Cerza
b3e730e346 Also catch socket.error in try_push_job_info 2013-11-07 18:39:16 -06:00
Zack Cerza
d8f98201ac Don't re-call logging.basicConfig()
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-11-06 16:04:39 -06:00
Zack Cerza
3fd3bd966d Fix hilariously long sentry_event para
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-11-05 15:09:36 -06:00
Zack Cerza
ed81960242 Don't use create_run() unless necessary
Runs are created automatically now.

Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-11-04 14:56:13 -06:00