Commit Graph

1022 Commits

Author SHA1 Message Date
Sam Lang
9a9fe73ec3 task/ceph: Fix typo in previous commit
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 14:07:10 -06:00
Sam Lang
9de9ebcf05 nuke: get_testdir_base needs to be imported
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 13:01:25 -06:00
Sam Lang
edfe5eeda1 nuke: Fix cleanup of test dir
Nuke used to remove /tmp/cephtest, now it tries to
remove the test dir, which it may not have the name
for.  Instead of removing the test dir, we just
remove the base directory for all test directories,
which may or may not be /tmp/cephtest.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 11:45:04 -06:00
Sam Lang
4ebd90eb81 task/ceph: Initialize disk_config maps
The mount_options and fstype maps need to be
initialized properly for later.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 11:37:13 -06:00
Sam Lang
150a3d7d9e misc: Don't include existing partitions in devs
We don't want to include /dev/sda1, etc. in the
list of devices to use.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 10:53:47 -06:00
Sam Lang
3806dc5e72 task/ceph: Fix device list
dict.items() returns a tuple, whereas we want
the values().

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 10:16:44 -06:00
Sam Lang
64e3966779 misc: get_wwn_id_map() needs to return dict
If we can't find device ids, we need to return
a dict, not a list.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 09:13:48 -06:00
Sam Lang
dcf99e43b9 nuke: Optionally check console status
Only check the ipmi console status if the ipmi
parameters have been defined in .teuthology.yaml.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 08:24:41 -06:00
Sam Lang
ac4ba69d8d misc: Fix get_wwn_id_map() to be optional
Not all plana nodes have symlinks setup when
we check /dev/disk/by-id/wwn-*.  Instead of failing
here, just use the /dev/disk/sd* devices.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 08:20:43 -06:00
Sam Lang
933cc3c382 run.py: Fix argument parsing for --name
With the addition of the --name argument to the
teuthology program (run.py), jobs were failing
because --name was being treated as a non-arg
option, even though the name was being supplied
by the workers.  Fix that and give it a metavar.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-01 07:46:04 -06:00
Samuel Just
fadc22c0b9 ceph_manager: wait for admin socket on restart, use for set_config
Fixes: #3966
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-31 12:59:00 -08:00
Josh Durgin
8f9267cf0e thrashosds: note assumption for powercycling 2013-01-31 09:14:06 -08:00
Sam Lang
77e8d801b1 Remove console.py
Handling of ipmi via the console is now done through the
Console class in teuthology/orchestra/remote.py.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 08:23:41 -06:00
Sam Lang
8f720454cb Assign devices to osds using the device wwn
Linux doesn't guarantee device names (/dev/sdb, etc.)
are always mapped to the same disk.  Instead of assigning
nominal devices to osds, we map devices by their wwn
(/dev/disk/by-id/wwn-*) to an osd (both data and journal).

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 08:23:39 -06:00
Sam Lang
58111595d4 Support power cycling osds/nodes through ipmi
This patch defines a RemoteConsole class associated
with each Remote class instance, allowing
power cycling a target through ipmi.

Fixes/Implements #3782.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 08:23:37 -06:00
Sam Lang
87b9849628 add --name option to teuthology
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 08:23:34 -06:00
Sam Lang
ace4cb07b2 Replace /tmp/cephtest/ with configurable path
Teuthology uses /tmp/cephtest/ as the scratch test directory for
a run.  This patch replaces /tmp/cephtest/ everywhere with a
per-run directory: {basedir}/{rundir} where {basedir} is a directory
configured in .teuthology.yaml (/tmp/cephtest if not specified),
and {rundir} is the name of the run, as given in --name.  If no name
is specified, {user}-{timestamp} is used.

To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest
in .teuthology.yaml.

This change was modivated by #3782, which requires a test dir that
survives across reboots, but also resolves #3767.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 08:23:31 -06:00
Sam Lang
14730276b9 Fixes for syntax errors found by pyflakes.
This patch includes minor fixes to the teuthology
python code for syntax errors found by running
check-syntax.sh (which runs pyflakes on each file).

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 07:58:57 -06:00
Sam Lang
3390cc30a6 Scripts to use pyflakes to check python syntax.
pyflakes runs a basic syntax checker against python code.
The added check-syntax.sh script and Makefile run pyflakes
on the python code within the teuthology directory reporting
any syntax errors that are found.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 07:56:56 -06:00
Joao Eduardo Luis
a63fac32f8 task: mon_clock_skew_check: use absolute value when comparing mon_skew
The monitors may report either positive or negative clock skews, and by
not using an absolute value we were constantly ignoring reported negative
clock skews.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-01-30 20:52:39 +00:00
Joao Eduardo Luis
89e09fa90c task: mon_clock_skew_check: mark as ran once if an expected skew was found
... even if we didn't get a clean/finished result from the monitors

This ought to significantly cut the waiting time if something else (or
someone else) is leaving the leader hanging thus unable to finish a given
timecheck round.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-01-30 20:52:03 +00:00
Sage Weil
19f4273190 peer: fix filtering out of scrub from pg state 2013-01-29 14:04:09 -08:00
Sage Weil
e805b7d62e admin_socket: don't bother remote executing if there is no test 2013-01-29 03:45:45 -08:00
Samuel Just
e33b425db7 osd_recovery: use --no-cleanup for rados bench
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-28 20:22:33 -08:00
Samuel Just
1c31194920 osd_recovery: inject a recovery delay
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-28 20:22:33 -08:00
Sage Weil
3b27c9ecbc osd_backfill: --no-cleanup for rados bench 2013-01-28 19:53:34 -08:00
Josh Durgin
826e5860a0 cram: fix for runs with coverage enabled
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-28 14:54:49 -08:00
Sage Weil
b5f81636a2 osdthrasher: inject pause on a live (on in) osd 2013-01-26 13:13:08 -08:00
Joao Eduardo Luis
aa85d914c4 task: mon_clock_skew_check: increase timeout and kick it off only on stop
We were kicking-off the timeout as soon as we started; it's better however
to kick if off only when we are told to stop (as long as 'at-least-once'
is true).

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-01-25 17:56:09 -08:00
Joao Eduardo Luis
673101c72f task: mon_clock_skew_check: distinguish between on-going and finished check
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-01-25 17:56:05 -08:00
Samuel Just
3a5c70b89b ceph_manager: turn long stall injection off by default
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-24 17:31:38 -08:00
Sage Weil
006e70657d osd_recovery: fix up incomplete test
- stop rados bench from cleaning up
- flush pg stats
- fix sleep call

One or more of these helped fix this test, don't really care which.
2013-01-24 16:24:16 -08:00
Sage Weil
20af01f23b ceph_manager: fix get_num_active_recovered()
The states now have 'backfill' *or* 'recover' in them.
2013-01-24 16:23:33 -08:00
Sage Weil
b150e8e3f3 workunit: pass java path as env variable
The libcephfs-java test needs this.
2013-01-24 15:21:01 -08:00
Samuel Just
6a859bcd56 ceph_manager: use 80/70 as pause_long, pause_check_after defaults
OSD::op_tp suicides after 150.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-24 12:50:26 -08:00
Samuel Just
0f24dca2d7 ceph_manager: use do_rados for rmpool
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-24 10:08:44 -08:00
Sage Weil
9b56f3671a Merge remote-tracking branch 'gh/wip_heartbeat' 2013-01-23 18:43:02 -08:00
Samuel Just
ec5a14553f ceph_manager: default chance_down to 0.4
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-23 17:44:05 -08:00
Samuel Just
566ae5332e ceph_manager: add filestore and heartbeat stalls
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-23 17:40:40 -08:00
Sandon Van Ness
5d66c9ab01 Use ceph git repo instead of github.
This code change is so that instead of pulling the tarball of github
which can be unreliable at times it instead uses the ceph repo mirror
and serves as the same function. Now it is using git archive and no
longer uses wget. Because of this less tar-fu is needed to extract
the necessary files as it can be done directly through git archive.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-01-23 17:22:31 -08:00
David Zafman
e714c77812 osd: Testing of deep-scrub omap changes
Fix scrub_test.py and add omap corruption test

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-01-22 15:48:45 -08:00
Joe Buck
b6e3edc6d8 test: create /tmp/cephtest/mnt.{id}
The workunit task assumes that a mount exists
at /tmp/cephtest/mnt.{id}
This patch creates the path if it doesn't
exist, enabling workunits to run in the absense
of kclient or ceph-fuse tasks.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by:  Sam Lang <sam.lang@inktank.com>
2013-01-22 13:09:46 -08:00
Joao Eduardo Luis
98cc1b835c task: mon_clock_skew_check: add option to run at least one timecheck
at-least-once          Runs at least once, even if we are told to stop.
                         (default: True)
  at-least-once-timeout  If we were told to stop but we are attempting to
                         run at least once, timeout after this many
                         seconds. (default: 300)

Fixes: #3854

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-01-21 21:10:45 -08:00
Sam Lang
53f22d9493 task/mds_thrasher: New task for thrashing the mds
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-01-18 15:48:52 -06:00
Alex Elder
dbc38eff62 rbd.py: update scratch and test image sizes
Test 167 was failing due to running out of space on the scratch
file system.  The test reserves 21MB in a file, and repeats 50
times.  It required just over 1GB, so I bumped the default size
for the testing device to 1200 MB.  I increased the test device
size as well.

This resolves http://tracker.newdream.net/issues/3864.

Signed-off-by: Alex Elder <elder@inktank.com>
2013-01-18 12:47:34 -06:00
Sage Weil
cd09be6ac8 ceph: pass ceph.conf to osdmaptool
This ensure it sees the chooseleaf option and generates the proper
CRUSH rules.
2013-01-17 12:27:17 -08:00
Loic Dachary
72db1a59cd When running teuthology with targets provisionned on OpenStack and kvm, the disks will show under /dev/vda, /dev/vdb etc. Add them to the list of devices to inspect and use for tests.
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-01-16 20:48:15 -08:00
Josh Durgin
c8a9a9a888 Add cram task
This runs cram tests, which are an easy way to test output
stays consistent. We already use cram for basic cli tests with no cluster,
and now we can use it for whole system tests too.
2013-01-15 14:07:58 -08:00
Greg Farnum
71097b7b91 Revert "task/kclient: chmod root to 1777."
This reverts commit f17847e537. It had
a typo and we hopefully don't need it.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-01-14 16:14:08 -08:00
Sage Weil
92a9d9c229 ceph.conf: separate replicas across osds
ceph.git master now separates across crush hosts without this setting.
For teuthology clusters, we don't want that (unless the tests specifies
otherwise).
2013-01-13 22:52:00 -08:00