Commit Graph

1048 Commits

Author SHA1 Message Date
Samuel Just
b124e8eafa ceph_manager: mount_osd_data expects osd as a str
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-05-01 13:14:53 -07:00
Samuel Just
b948406a07 ceph.py: set up ctx.disk_config outside of the loop
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-05-01 13:14:35 -07:00
Samuel Just
0382aa60e9 ceph.py: the journal component does not current work with restart
Removing for the time being.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-05-01 13:13:52 -07:00
Josh Durgin
52742fb072 fix some errors found by pyflakes
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-30 17:09:19 -07:00
Josh Durgin
7df72f2652 s3tests: revert useless portion of 1c50db6a46
Perhaps it was attempting to debug something, but it shouldn't have been committed.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-30 17:02:28 -07:00
Josh Durgin
5a6e560706 rgw tests: remove users after each test
These should all be cleanup up at some point. They're
almost all the same code.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-30 16:49:04 -07:00
Josh Durgin
6aba6d2cad rgw tests: clean up immediately after the test
There's no need for an explicit cleanup function, so move it back
to where it came from (except in s3roundtrip, which did not have it).

Instead, since these use a nested contextmanager, pass through
and yield to the top-level run_tasks after the nested
contextmanager has finished (and thus run all the cleanup steps
in the subtasks for this test).

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-30 16:47:34 -07:00
Josh Durgin
935e8685e6 ceph: allow restarting radosgw
Only split once, since radosgw will have client.X after it.
Monitors and MDSs may have names with more .s as well.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-30 16:39:48 -07:00
Josh Durgin
55b16c790b rgw: add to ctx.daemons so it can be stopped/started dynamically
Name the daemon after the client it runs on, since only
one per host is supported anyway.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-30 16:37:48 -07:00
Josh Durgin
4979df32c5 misc: move daemon stopping function to a generic place
This will be useful for other daemons, like radosgw, in the future.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-30 16:35:11 -07:00
Sandon Van Ness
08bf16102a Verbose output on ceph-qa-chef.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-04-30 13:04:28 -07:00
Sage Weil
4f70c898ef misc: default base_test_dir to /home/ubuntu/cephtest
This matches what the teuthworker is currently doing.
2013-04-30 09:15:37 -07:00
Yehuda Sadeh
57404b6a7b swift, s3readwrite: add missing yield
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-30 07:06:49 -07:00
Sandon Van Ness
70ce4db476 Disable quiet mode wget output on wget for ceph-qa-chef
So maybe I can get a better idea of what is causing it to fail.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-04-29 17:11:27 -07:00
Yehuda Sadeh
c8ec76eed8 s3tests, s3readwrite, swift: cleanup explicitly
Cleaning up test dir explicitly after run, so that
consecutive runs don't fail.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-04-29 11:24:04 -07:00
Samuel Just
45df0b264e workunit: use passed refspec rather than checking sha1 again
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-28 12:27:52 -07:00
Sage Weil
de745dba8a install.upgrade: apt-get install instead of upgrade
Upgrade does not actually upgrade in some cases; use install!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-28 10:28:52 -07:00
Sage Weil
1e52fb9b81 install: prefer 'branch' over 'sha1'
The upgrade tasks specify 'branch' in the job file, but the
schedule_suite.sh script sets a sha1 in the overrides.  Make
the upgrade tests actually test an upgrade by preferring branch
over sha1 when both are specified.

This is fragile, but ought to do the trick for now!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-28 09:35:45 -07:00
David Zafman
6b8f1c6bce repair_test.py: Additional test cases
Test repair with more than 1 damaged object and with different types of damage
Regression test for bug #4778

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-04-24 17:39:25 -07:00
Sandon Van Ness
1435cb5442 Merge branch 'next' of github.com:ceph/teuthology into next 2013-04-23 11:23:36 -07:00
Sandon Van Ness
0b50cb5e84 Increase IPMI attempts to try to get around Flakey IPMI.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-04-23 11:22:52 -07:00
Sage Weil
7fbe467f2f ceph.conf: enable full debugging on the mon 2013-04-23 11:02:27 -07:00
Sage Weil
48d89c616a ceph-deploy: fix stop command
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-22 13:01:02 -07:00
Sage Weil
4efed08415 ceph-deploy: stop daemons, archive, then purge[data]
Purge removes logs, and we want to archive those, so explicitly shut down
all daemons before doing the archiving step.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-18 10:15:44 -07:00
Sage Weil
a3c48351a4 ceph.conf: lower mon disk avail warning threshold
Only wanr when we hit 90% instead of default 70%

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit cf4bf09b2c)
2013-04-18 10:15:44 -07:00
Sam Lang
77cf9f4b68 misc: Fix for case status['description'] == None
Skip the machine that has a description, but the
value is None.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Warren Usui <warren.usui@inktank.com>
2013-04-17 17:43:14 -05:00
caleb miles
2bcbf1846a radosgw-admin-rest: Add task for RESTful admin api.
Signed-off-by caleb miles <caleb.miles@inktank.com>
2013-04-17 08:49:26 -07:00
Sam Lang
750c69b08c misc: Check for 'None' string from yaml
The description attribute from the machines yaml returned by the
locker might be the string 'None'.  Need to explicitly check for
that to avoid using a test dir of /tmp/cephtest/None.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-04-17 10:30:57 -05:00
Sam Lang
1727d9b356 misc: Use pythonic 'is not None' for jobid case
The conditional 'if global_jobid:' evaluates to true
in some cases even when global_jobid is None.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-04-17 10:30:20 -05:00
Sam Lang
c1d47a2c63 misc: Fix name parsing
Use last two digits of year.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-04-17 10:30:02 -05:00
Sam Lang
b37f43db1b lock: Fix import cycle breakage
fa2049f caused an import cycle between lock.py and misc.py.  Move the
needed functions from lock.py to lockstatus.py so that we can avoid the
import cycle.

Signed-off-by: Sam Lang <sam.lang@inktank.com>

Conflicts:
	teuthology/lock.py
2013-04-17 10:28:55 -05:00
Sam Lang
72cbf1157a misc: Use job id and make short path for testdir
Nightlies run on teuthology currently use a testdir of
/home/ubuntu/cephtest, but this causes stale job errors occasionally
from the previous tests not getting properly cleaned up, which prevents
the nightlies from running successfully.

The misc.py get_testdir() function can specify a testdir that is
specific to the job, but previously the path was too long and would
cause separate job failures.

This patch does two things to resolve that.  First, it uses the job id
from the teuthology run if one exists.  This should be a relatively
short number that will identify the job run effectively.  Second,
if the job id isn't available, it creates a shortened form of the
job's name, for example the job name:

teuthology-2013-04-09_23:51:49-rgw-next-testing-basic

becomes:

te1304092351rntb

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-04-17 10:24:16 -05:00
Sage Weil
e8aa0d8bb8 ceph-deploy: purge before archiving
Purge will uninstall and (in so doing) stop the daemons. This avoids trying
to tar up the mon data or logs while they are being written to, which
avoids errors like

2013-04-16T20:21:47.103 INFO:teuthology.task.ceph-deploy:Archiving mon data...
2013-04-16T20:21:47.545 INFO:teuthology.orchestra.run.err:tar: ./ceph-mira089/store.db/000009.log: file changed as we read it

Also drop the unnecessary uninstall (it is implied by purge).

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4befae4fbe)
2013-04-16 20:51:31 -07:00
Sage Weil
33a6693f45 scheduled_suite.sh: check clock skew at start and end of run
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5c80201ec4)
2013-04-16 19:58:38 -07:00
Dan Mick
f69ddafde7 Revert "Revert "Install.py: Prevent prompts from breaking apt""
This reverts commit 67a616a979.

Sigh.  As it turns out, /etc/default/grub being hacked also
causes the same problem.  I think there's a way to fix that cleanly
as well, but until then, replacing the "accept installed version"
hack here so jobs can run.
2013-04-15 11:24:31 -07:00
Dan Mick
67a616a979 Revert "Install.py: Prevent prompts from breaking apt"
This reverts commit 5995ae7e78.

With the changes to ceph-qa-chef and the teuthology kernel task,
we're no longer touching packaged file /etc/grub.d/10_linux, which
was the reason for this apt forcing.  Remove so that we find other
package problems that might be masked by this; we can always
put it back if there are such problems until we can fix those as well.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit c2b0828b19)
2013-04-12 15:49:24 -07:00
Dan Mick
52cdaae683 kernel.py: put submenu name in 01_ceph_kernel if necessary
We had been writing 01_ceph_kernel with the kernel title, and
relying on the fact that grub.cfg would never have submenus in it
(implemented by a hack to /etc/grub.d/10_linux which neutered its
submenu creation).  However, that hack was modifying a package file,
and got in the way of later apt commands.  Rather than doing it
that way, this divines the title of the submenu and sets the
default variable to "submenu>kernel", which works to select the
desired kernel.

It depends on there being only one level of submenu, and on the
format of the menuentry and submenu commands, dictated by grub2.
None of this is likely to work at all outside Ubuntu.

Fixes: #4496
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 52aec32a7d)
2013-04-12 15:49:15 -07:00
Sandon Van Ness
9c9baef680 Fix: kdb: doesn't work on mira nodes
Change kernel.py to use ttyS2 for kdb output instead of ttyS1 when
the node is a mira machine. This is a fix for issue #4677
2013-04-09 13:09:39 -07:00
Sandon Van Ness
41028847f8 Install.py: Prevent prompts from breaking apt
Change apt commands to prevent prompts from coming up (forcing
non-interactive mode) so things like grub or other stuff doesn't
break teuthology runs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-04 19:40:21 -07:00
caleb miles
7b3973fff2 radosgw-admin: cluster info -> zone info
Signed-off-by caleb.miles <caleb.miles@inktank.com>
2013-04-01 20:46:30 -07:00
Samuel Just
d81babffe5 repair_test: add test for repairing read errs and truncations
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 16:38:33 -07:00
Josh Durgin
2a1cdda90d locker: try to make up for apache timeouts
If the lock request succeeds in updating the db, but the client gets a
timeout from apache, they can now try again and get back the machines
they just locked.

Only automatic runs have a description set when locking several
machines, so this does not affect users of teuthology-lock
--lock-many, where no description can be set in the same request.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-29 16:34:15 -07:00
Sage Weil
aeb1bbe414 do not archive on pass if 'archive-on-error: True'
Optional flag makes us suck down the archive (mostly, the logs, which
might be huge for some debugging tests) unless the test has failed.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-29 14:27:20 -07:00
Sage Weil
a40b850eb3 locker: log desc too
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-29 14:27:13 -07:00
Sage Weil
9f46f47b6b run: clean up machine_type thing
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-29 12:19:05 -07:00
Sage Weil
e8afa454d8 ceph_manager: retry set_pool_property on EAGAIN
Retry indefinitely, for now.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-28 15:25:10 -07:00
Sage Weil
b815268b58 run: machine-type: foo, not machine_type: foo
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-28 15:25:10 -07:00
Sage Weil
7dca4aee9e Merge pull request #6 from ceph/wip-mds-thrasher-logging
task/mds_thrash: Log mds dump after long delay
2013-03-27 08:56:04 -07:00
Sam Lang
6fd7ebd44d task/mds_thrash: Log mds dump after long delay
In cases where the mds thrasher continuously loops
waiting for an mds to be removed from the map, or
for a new mds to become active, we want to start logging
the mds state for debugging.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-03-27 08:48:45 -05:00
Sage Weil
bc54a8bfaa locker: make desc optional
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-26 13:27:53 -07:00