Removing the scratchdir in the remote run command
at the end of the script invocation will do the remove
once the first script finishes. With possibly a shared
scratch dir across workunit clients, we want to wait to
remove the scratch dir once all the workunit scripts have
completed.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Thrasher can now with configurable frequency test min_size by
taking down all but one osd, waiting, killing that osd and bringing
back the others, and verifying that the cluster goes clean.
Signed-off-by: Samuel Just <sam.just@inktank.com>
This adds the ability to use the new repeat count argument to the
run_xfstests.sh script. By default, the test suite will be run
once, but if a count is specified the script will execute the suite
that many times, but will only perform the setup (building the
tests, etc.) once.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Ability to specify options
By default only export to current hosts
Fixes: 3245
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
teuthology-[schedule|suite] get a parameter to specify the branch,
to put the job in a branch-specific queue. Workers running that
branch of teuthology can pull jobs from that queue.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Exception objects don't contain the traceback of where they were
raised from (to avoid cyclic data structures wrecking gc and causing
mem leaks), so the singular "raise obj" form creates a new traceback
from the current execution location, thus losing the original location
of the error.
Gevent explicitly wants to throw away the traceback, to release any
objects the greenlet may still be referring to, closing files,
releasing locks etc. In this case, we think it's safe, so stash the
exception info away in a holder object, and resurrect it on the other
side of the results queue.
http://stackoverflow.com/questions/9268916/how-to-capture-a-traceback-in-gevent
This can be reproduced easily with
from teuthology.parallel import parallel
def f():
raise RuntimeError("bork")
with parallel() as p:
p.spawn(f)
and looking at the resulting traceback with and without this change.
This adds the ability to specify the rbd image format to use for the
scratch and test devices for the rbd.xfstests task.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
This adds the ability to specify an rbd image format (either 1 or 2)
for an rbd image.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
This task allows ceph to signal to teuth that it should die immediately
by touching a file under /tmp/cephtest
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Especially on older hosts, we keep triggering errors::
ServerNotFoundError: Unable to find the server at
teuthology.front.sepia.ceph.com: [Errno 3] name does not exist
That comes from libevent's evdns via gevent.dns and httplib2. The rate
of these errors is low enough that they seem to be perhaps timeouts,
or more arbitrary. Busy looping on DNS resolution calls has never
triggered them, so far.
With ``monkey.patch_all(dns=False)``, the teuthology process will
block as a whole whenever doing DNS resolution. This will hopefully be
rare enough that it won't matter.
The only real "fix" seems to be upgrading libraries and hoping for the
best; this commit can be reverted after that is done.