Commit Graph

723 Commits

Author SHA1 Message Date
Sage Weil
06310994df ceph: enable malloc debugging for ceph-osd 2013-01-02 12:31:54 -08:00
Joao Eduardo Luis
ed586c1bb0 task: ceph: don't wait for 'healthy' if 'wait-for-healthy' is false.
This new config option obviously defaults to 'true' in order to not only
maintain compatibility, but because it makes sense.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-12-31 16:11:50 +00:00
Sage Weil
bb4a2c558b rgw: enable logging in ceph.conf 2012-12-29 08:28:44 -08:00
Yehuda Sadeh
c02d34dce1 task/swift: change upstream repository url
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-12-21 10:20:02 -08:00
Samuel Just
f2dbe5edd7 CephManager: add ability to test split
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-11 15:11:06 -08:00
Joe Buck
b916f67982 pexec.py: Parse out role ID from the back.
Also, do not assume that the command needs to run from a specific directory.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
2012-12-11 14:07:28 -08:00
Joe Buck
0890d48a67 Adding a Hadoop task.
This task configures and starts a Hadoop cluster.
It does not run any jobs, that must be done after
this task runs.
Can run on either Ceph or HDFS.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
2012-12-11 14:07:28 -08:00
Joe Buck
0cd84b3aed New ssh task that adds keys for node -> node ssh.
This generates a new keypair, pushes it to all nodes
in the context and adds all hosts to all other hosts
.ssh/authorized_keys file.
Cleans up all keys and authorized_keys entries
afterwards.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2012-12-11 14:07:28 -08:00
Samuel Just
a5b9939e17 ceph.conf: default to smaller recovery chunk
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-10 14:37:01 -08:00
Josh Durgin
90d815621a qemu: set qemu cache mode based on rbd cache setting
If we don't do this, qemu assumes no caching is used and doesn't send flushes.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-05 16:17:41 -08:00
Joao Eduardo Luis
213787fb41 Merge branch 'wip-mon-thrasher' 2012-11-29 00:53:59 +00:00
Joao Eduardo Luis
f525359208 task: mon_thrash: thrash monitors while running other tests
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-11-29 00:12:34 +00:00
Sage Weil
d07d7289a4 run: save original config, too 2012-11-25 08:37:06 -08:00
Sage Weil
b2f80359c2 s3tests: fix typo 2012-11-22 13:59:58 -08:00
Sage Weil
ddcf2089bd workunit: fix indentation 2012-11-21 08:29:47 -08:00
Josh Durgin
ca086261a0 xfstests: run in parallel on multiple machines
xfstests itself still seems to have some global dependencies that
make it hard to run more than one instance per node, so keep
the one client per node restriction.

Name the image after the client using it, and only run the
nested context managers once, so this task can work with
more than one client.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-20 15:48:21 -08:00
Yehuda Sadeh
1c50db6a46 rgw-logsocket: a task to verify opslog socket works
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-11-20 13:57:24 -08:00
Sam Lang
df3b1b89b1 task/pexec: Output stderr to teuthology log
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-20 09:53:52 -06:00
Sam Lang
d516307d88 task/ceph-fuse: Add log messages for abort
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-19 10:33:18 -06:00
Sage Weil
7a602fa12e workunit: fix default subdir
Make subdir argument optional.
2012-11-18 09:24:10 -08:00
Sage Weil
fa63dd4213 valgrind: enumerate warnings in log; check leaks from client, mon only 2012-11-17 21:01:52 -08:00
Mike Ryan
2ab8b38857 task: benchmark recovery
Measures latency before and during recovery using smalliobench.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-11-16 16:55:30 -08:00
Sander Pool
69e613f053 Starting to auto-document this code. 2012-11-14 17:24:50 -08:00
Sam Lang
652c429408 workunit: Fix indentation
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-14 16:49:24 -06:00
Sam Lang
05065dff10 task/ceph-fuse: If umount fails, abort and cleanup
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-14 14:51:03 -06:00
Sam Lang
cfa2883d47 pexec: Logging each command isn't useful
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-14 10:45:10 -06:00
Sam Lang
25964046d2 Add task pexec to run bash commands in parallel
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-14 10:12:36 -06:00
Sam Lang
25d4f56067 misc: Show url on get failure
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-12 13:16:49 -06:00
Sage Weil
a46dd6b605 ceph-fuse: apply overrides[ceph-fuse] to config 2012-11-11 07:13:09 -08:00
Sage Weil
f9b4efeab1 valgrind.supp: deliverate onexit leak 2012-11-11 07:13:09 -08:00
Sage Weil
02d62d731d valgrind.supp: ceph-fuse leak from libfuse
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-11 07:13:09 -08:00
Samuel Just
f309c33d2d Clean up string interpolation operator spacing ceph_manager.py
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-11-09 10:52:16 -08:00
Samuel Just
f82d4a7b86 Add divergent_priors test
Tests scenario where merge_old_entry encounters a divergent
entry where the prior_version is prior to log_tail.  This
is a problem since it will go into the missing set, but won't
be re-added to the missing set during read_log() if the node
restarts prior to recovering the object.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-11-09 10:52:15 -08:00
Sam Lang
96458387c7 workunit: Move cleanup to separate run
Removing the scratchdir in the remote run command
at the end of the script invocation will do the remove
once the first script finishes.  With possibly a shared
scratch dir across workunit clients, we want to wait to
remove the scratch dir once all the workunit scripts have
completed.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-08 09:09:23 -06:00
Sam Lang
f0080b021e workunit: Allow scratch dir to already exist
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-08 09:09:23 -06:00
Sam Lang
ea02fb74b9 workunit: Add option to use specified subdir
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-11-08 09:09:23 -06:00
Samuel Just
bd83ed70dc ceph_manager: add test_min_size action
Thrasher can now with configurable frequency test min_size by
taking down all but one osd, waiting, killing that osd and bringing
back the others, and verifying that the cluster goes clean.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-11-07 12:56:31 -08:00
Josh Durgin
6c9d45e399 schedule: fix var name 2012-11-02 11:33:46 -07:00
Josh Durgin
5f4414e072 schedule: add option to display jobs in the queue
beanstalkd doesn't let you list jobs in the queue, but you can
inpsect specific job ids.
2012-11-02 11:08:59 -07:00
Alex Elder
f88a2f73d8 rbd task: support xfstests repeat count
This adds the ability to use the new repeat count argument to the
run_xfstests.sh script.  By default, the test suite will be run
once, but if a count is specified the script will execute the suite
that many times, but will only perform the setup (building the
tests, etc.) once.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-11-01 13:49:58 -05:00
Joe Buck
53ff33a7f0 Use the configured username for _make_scratch_dir 2012-10-25 17:42:32 -07:00
David Zafman
e10b99a301 Add exit to kcon_most script
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-25 10:08:05 -07:00
Josh Durgin
afcf2ea81c coverage: note db table structure 2012-10-24 16:11:12 -07:00
Sage Weil
b4bf14edd5 add exec task 2012-10-22 16:51:54 -07:00
David Zafman
ce7d997362 New nfs task that performs NFS client mount of export (see knfsd)
Fixes: 3245
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-10 18:18:14 -07:00
David Zafman
cac4a6acbc New knfsd task that does an nfs server export
Ability to specify options
    By default only export to current hosts

Fixes: 3245
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-10 18:18:14 -07:00
David Zafman
939c3aee7f New kcon_most task that enables most ceph kernel logging
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-10 18:18:14 -07:00
David Zafman
558590c1f9 Fix ceph-fuse example
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-10 18:18:14 -07:00
tamil
d94821244c Printing the number of tests passed when 'all' tests are successful
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2012-10-05 17:33:57 -07:00
Yehuda Sadeh
7c9dc932e1 radosgw-admin: usage should time out after 20 minutes
Not 45 seconds.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-10-01 17:17:14 -07:00
Sage Weil
7a593a08ab console: add console task
Log the sol console of every target to a file in the archive dir.
2012-09-30 21:08:41 -07:00
Sage Weil
b22e3ea526 internal: stop warning about lockdep circular dependency
This is coming from xfs, currently.  Bah.
2012-09-30 21:07:58 -07:00
Sage Weil
ee3407fa04 include newpool in osd cap for client.0
This is needed by the kclient_workunit_kclient task.
2012-09-29 08:55:58 -07:00
Josh Durgin
13c91dba67 misc: use new syntax for osd caps
pool=pool1,pool2 is not valid for the new grammar
2012-09-28 10:07:45 -07:00
Sage Weil
30748f36e2 fix lock held when returning to user space typo 2012-09-23 08:03:17 -07:00
Josh Durgin
a09153b688 Allow scheduled jobs to use different teuthology branches
teuthology-[schedule|suite] get a parameter to specify the branch,
to put the job in a branch-specific queue. Workers running that
branch of teuthology can pull jobs from that queue.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-21 17:16:56 -07:00
Josh Durgin
57bb434def Fix errors found by pyflakes
A bunch of unused imports and variables.
2012-09-21 16:46:24 -07:00
Sage Weil
0395df3157 ignore 'lock held when returning to user space' from btrfs sb_internal crap 2012-09-19 16:42:02 -07:00
Sam Lang
a101e49190 replace tab with spaces
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-09-18 16:31:39 -07:00
Sam Lang
5ce4d70e4b fix error on teardown failing to unmount /mnt 2012-09-18 15:56:08 -07:00
tamil
78b7b02c07 imported subprocess module in nuke script
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2012-09-14 15:04:40 -07:00
Josh Durgin
d27806a293 nuke: add missing import 2012-09-13 14:31:46 -07:00
Josh Durgin
c8c7014fc0 rbd: fix typo and cast to int before comparing format 2012-09-13 14:29:43 -07:00
Josh Durgin
055bf73d50 rbd: only specify --format if not using the default
This lets older versions that do not support --format still work with
format 1 images.
2012-09-12 11:31:28 -07:00
Tommi Virtanen
79607eed3c Don't lose tracebacks of exceptions raised in a greenlet.
Exception objects don't contain the traceback of where they were
raised from (to avoid cyclic data structures wrecking gc and causing
mem leaks), so the singular "raise obj" form creates a new traceback
from the current execution location, thus losing the original location
of the error.

Gevent explicitly wants to throw away the traceback, to release any
objects the greenlet may still be referring to, closing files,
releasing locks etc. In this case, we think it's safe, so stash the
exception info away in a holder object, and resurrect it on the other
side of the results queue.

http://stackoverflow.com/questions/9268916/how-to-capture-a-traceback-in-gevent

This can be reproduced easily with

	from teuthology.parallel import parallel
	def f():
	    raise RuntimeError("bork")
	with parallel() as p:
	    p.spawn(f)

and looking at the resulting traceback with and without this change.
2012-09-11 11:25:21 -07:00
Alex Elder
f64cedf4db rbd: allow xfstests task to specify rbd image formats
This adds the ability to specify the rbd image format to use for the
scratch and test devices for the rbd.xfstests task.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-09-10 19:38:21 -05:00
Alex Elder
73a29cdf91 rbd: allow image format to be specified
This adds the ability to specify an rbd image format (either 1 or 2)
for an rbd image.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-09-10 19:37:25 -05:00
tamil
39efbbcc2d Suppress valgrind error "Invalid write 8"
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2012-09-10 15:02:47 -07:00
Yehuda Sadeh
d6c2ded087 radosgw-admin: update task for new usage reporting
Usage reporting output has been modified, also use the new
--categories input param.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-09-10 11:24:25 -07:00
Sage Weil
3473c2ed1d s3tests: run against arbitrary branch/sha1 of s3-tests.git 2012-09-10 11:08:57 -07:00
Sage Weil
db8037d998 debian ntp servers 2012-09-09 14:23:12 -07:00
Mike Ryan
f8e1f5c222 task: die on ceph error or coredump
This task allows ceph to signal to teuth that it should die immediately
by touching a file under /tmp/cephtest

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-09-04 09:52:38 -07:00
Sage Weil
dc1c247abc disable lockdep recursive warnings until #3040 is fixed 2012-08-24 19:23:34 -07:00
Sage Weil
b6b302890f internal: fix escaping of \b in syslog grep 2012-08-23 11:00:39 -07:00
Sage Weil
82cefa2477 suppress this valgrind error
<error>
  <unique>0x4</unique>
  <tid>1</tid>
  <kind>InvalidWrite</kind>
  <what>Invalid write of size 8</what>
  <stack>
    <frame>
      <ip>0x400A299</ip>
      <obj>/lib/x86_64-linux-gnu/ld-2.15.so</obj>
      <fn>do_lookup_x</fn>
      <dir>/build/buildd/eglibc-2.15/elf</dir>
      <file>dl-lookup.c</file>
      <line>250</line>
    </frame>
    <frame>
      <ip>0x403122F</ip>
    </frame>
    <frame>
      <ip>0x400A522</ip>
      <obj>/lib/x86_64-linux-gnu/ld-2.15.so</obj>
      <fn>_dl_lookup_symbol_x</fn>
      <dir>/build/buildd/eglibc-2.15/elf</dir>
      <file>dl-lookup.c</file>
      <line>739</line>
    </frame>
  </stack>
  <auxwhat>Address 0x7feffeec8 is on thread 1's stack</auxwhat>
</error>

pops up recently
2012-08-23 11:00:39 -07:00
Sage Weil
b800496bb4 ceph: fix cpu_profile default 2012-08-19 20:16:43 -07:00
Sage Weil
7d50411ca9 rbd.xfstests: default to 1gb (not 250mb) image 2012-08-18 20:10:54 -07:00
Mike Ryan
5b7ec43e0e task: run osd/mds/mon with Google CPU profiler via cpu_profile option
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-17 13:47:13 -07:00
Mike Ryan
7f6591b556 ceph: support tmpfs_journal option to put journal on tmpfs
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-16 15:50:10 -07:00
Sage Weil
6dbbcf03fc queue: fix logging of child return code 2012-08-14 15:08:21 -07:00
Tommi Virtanen
99ac6b0b3e Disable asynchronous DNS lookups.
Especially on older hosts, we keep triggering errors::

  ServerNotFoundError: Unable to find the server at
  teuthology.front.sepia.ceph.com: [Errno 3] name does not exist

That comes from libevent's evdns via gevent.dns and httplib2. The rate
of these errors is low enough that they seem to be perhaps timeouts,
or more arbitrary. Busy looping on DNS resolution calls has never
triggered them, so far.

With ``monkey.patch_all(dns=False)``, the teuthology process will
block as a whole whenever doing DNS resolution. This will hopefully be
rare enough that it won't matter.

The only real "fix" seems to be upgrading libraries and hoping for the
best; this commit can be reverted after that is done.
2012-08-13 16:18:33 -07:00
Tommi Virtanen
273a43eda8 Flush data to temp file before reading it in another process. 2012-08-09 09:42:35 -07:00
Tommi Virtanen
8aaf21d537 Oops tempfile now gives us file objects not fds. 2012-08-09 09:42:13 -07:00
Tommi Virtanen
99e99758e5 In teuthology-worker, shuffle the child stdout/stderr into our log.
Otherwise, child can suffer a failure that does not get logged by
it's own exception handling machinery, and we have no idea why.
2012-08-08 14:48:21 -07:00
Tommi Virtanen
05007f7e0f Minimize scope of try-except.
os.write and list.append won't raise CalledProcessError, and now
we don't need to try to contain them for temp file clean up reasons.
2012-08-08 14:45:49 -07:00
Tommi Virtanen
4b9e17626d Use tempfile.NamedTemporaryFile instead of mkstemp.
Simpler code, no manual cleanup needed. We see a littering of
zero-length temp files from teuthology-worker, and this seems
like a likely source.
2012-08-08 14:44:47 -07:00
Mike Ryan
3b85b2311b task: verify scrub detects files whose contents changed
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-02 11:14:51 -07:00
Mike Ryan
8665bdc164 task: scrub OSDs periodically
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-02 11:14:51 -07:00
Sage Weil
e4e239e268 kernel: push a local .deb instead of using gitbuilder
This lets you specify a path to an existing kernel deb
to be pushed and installed on the remote node.

Limitations:
 - We don't build the deb for you.  figuring out what
   filename 'make deb-pkg' is building is annoying.
 - We need to be able to figure out the sha1 from the provided
   path.  It shouldn't be a problem, given the way make deb-pkg
   names the debs.
2012-07-29 12:40:13 -07:00
Sage Weil
1c93d5ab4d syslog check: fix false-positive BUG matches in random strings 2012-07-29 12:15:51 -07:00
Sage Weil
a0847694a5 osd_recovery: also test unfound discovery
This tests for bug #2866.
2012-07-28 10:53:09 -07:00
Sage Weil
8dd09cb21d osd_recovery: test incomplete pg recovery
4-osd test to reproduce #2860 and confirm the fix.
2012-07-28 10:23:18 -07:00
Sage Weil
a9f2bf622f ceph_manager: wait_for_active 2012-07-28 10:23:18 -07:00
Sage Weil
731d520900 ceph_manager: count 'incomplete' as 'down' 2012-07-28 10:23:18 -07:00
tamil
0d6ce42405 Fixed the code to pass 'yes' during mkfs
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2012-07-26 13:48:11 -07:00
tamil
2b75ddef63 Added '-y' option for mkfs.ext4
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2012-07-25 16:38:25 -07:00
Sage Weil
9bc86171ac admin_socket: make test optional
If it's not there, we just verify the output is valid json.
2012-07-24 15:26:06 -07:00
Sage Weil
f70b825042 ceph: fix mkfs/mount option defaults
Later code expects a list, not None.
2012-07-21 20:18:24 -07:00
Samuel Just
e1c98e7d19 tasks: add multibench task for testing pool creation
Also adds support for specifying a pool for radosbench
to create and then cleanup instead of "data".

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-19 15:51:55 -07:00
Sage Weil
c49daeca2f clock: print skew with ntp servers to log to help debug time issues 2012-07-18 13:44:59 -07:00