Commit Graph

371 Commits

Author SHA1 Message Date
Sage Weil
7523ff3e58 ceph: simplify 'cluster' mon log handling
It's not a special file in the mon_data directory anymore, but intead
something in archive that will get slurped up normally.  Make sure we
grep for badness from the proper location.
2012-06-06 13:32:56 -07:00
Eleanor Cawthon
23c729305a task/: Added object map benchmarking test
Signed-off-by: Eleanor Cawthon <eleanor.cawthon@inktank.com>
2012-06-05 15:30:51 -07:00
Sage Weil
d3f855ec81 fix up dist var
This lets you override the default (now precise) in the ceph config yaml,
e.g.

- ceph:
    dist: oneiric
    branch: master
2012-05-31 21:39:33 -07:00
Dan Mick
af4fe154d8 Change hardcoded oneiric to precise
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-05-31 17:09:20 -07:00
Sage Weil
62f8f006b3 rbd.xfstests: default to 250mb instead of 100mb 2012-05-20 20:50:19 -07:00
Sage Weil
3d1fff89c9 rbd_fsx: resize to byte boundaries (not object multiples) 2012-05-05 21:22:30 -07:00
Sage Weil
396d1feff9 ceph.newdream.net -> ceph.com 2012-05-05 09:30:41 -07:00
Sage Weil
715abdea56 ignore syslog cron noise 2012-05-01 22:26:03 -07:00
Sage Weil
dcbb8d4013 osd_recovery: test no* osdmap flags 2012-04-30 11:13:02 -07:00
Sage Weil
6cf876733a filestore_idempotent: url has changed 2012-04-21 13:36:27 -07:00
Sage Weil
e3af087712 rbd_fsx: show progress
The updated fsx takes this arg.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-19 13:32:01 -07:00
Sage Weil
6a58314d46 fix misc checks that wait for N osds to be up
These all cut&pasted broken code, blah!
2012-04-19 12:44:10 -07:00
Sage Weil
407b2e0bc7 whitelist xfs_fsr syslog noise
Ignore lines like

2012-04-17T13:44:11-07:00 plana59 fsr[5454]: DEBUG: fsize=450560 blsz_dio=450560 d_min=512 d_max=2147483136 pgsz=4096
2012-04-18 11:21:10 -07:00
Josh Durgin
e875b89f93 Add task for running fsx on an rbd image. 2012-04-17 08:59:51 -07:00
Sage Weil
19e673ccf9 filestore_idempotent: use new sequence-based tester
random seed, inject at 50-300.
2012-04-14 14:06:12 -07:00
Sage Weil
6ba4efcd3a rbd.py: add xfstests functionality
Add tasks for running xfstests over a pair of rbd volumes.  The main
one is called xfstests, and it sets up rbd volumes of specified size
and runs a set of likely-to-be-successful tests.  The other one is
used by the first, and is called run_xfstests.  This provides a
generic (device rather than rbd device oriented) interface to
xfstests, and should probably be made standalone and distinct from
rbd at some point.

Using multiple rbd devices required the rbd udev rule manipulation
to ignore errors, since it appears that each device caused the a
teardown attempt, which leads to failures the second time around.
There's probably a more robust solution, but this works for now.

Signed-off-by: Alex Elder <elder@dreamhost.com>
2012-04-13 22:28:05 -07:00
Josh Durgin
ddb98f7773 ceph_manager: don't try to start greenlet twice
spawn already scheduled it. Trying to start it again hits an assert.
2012-04-10 16:23:58 -07:00
Sage Weil
1ac5554d75 kernel: kludge around mysterious 0-byte .git/HEAD files
No idea where these are coming from, but they break nodes with behavior
like

ubuntu@plana08:~$ sudo install -d -m0755 /lib/firmware/updates && cd /lib/firmware/updates && sudo git init
Reinitialized existing Git repository in /lib/firmware/updates/.git/
ubuntu@plana08:/lib/firmware/updates$ sudo git --git-dir=/lib/firmware/updates/.git config --get remote.origin.url >/dev/null || sudo git --git-dir=/lib/firmware/updates/.git remote add origin git://ceph.newdream.net/git/linux-firmware.git
ubuntu@plana08:/lib/firmware/updates$ cd /lib/firmware/updates && sudo git pull origin master
fatal: Not a git repository (or any of the parent directories): .git

where the .git directory looks like

total 32
drwxr-xr-x 7 root root 4096 2012-04-10 12:52 .
drwxr-xr-x 3 root root 4096 2012-04-06 13:54 ..
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 branches
-rwxr--r-- 1 root root  236 2012-04-10 11:33 config
-rw-r--r-- 1 root root    0 2012-04-10 12:52 config.lock
-rw-r--r-- 1 root root    0 2012-04-06 13:54 description
-rw-r--r-- 1 root root    0 2012-04-06 13:54 FETCH_HEAD
-rw-r--r-- 1 root root    0 2012-04-06 13:54 HEAD
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 hooks
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 info
drwxr-xr-x 4 root root 4096 2012-04-06 13:54 objects
drwxr-xr-x 4 root root 4096 2012-04-06 13:54 refs

Hopefully someone can figure out what is causing this and revert this
later.
2012-04-10 13:41:16 -07:00
Sage Weil
0d5918f8e4 kernel: reset to remote firmware branch; don't pull
Pull might merge if upstream rebases.  Just make our branch match the
remote one.
2012-04-10 09:17:24 -07:00
Sage Weil
9b755fd665 kernel: change git incantation for firmware pull
The 'git pull <uri>' seemed to consistently fail on some nodes.  Can't be
sure this was really the problem with them all down now, but this is more
common, and works.
2012-04-10 09:12:01 -07:00
Dan Mick
9906d5ed08 Change to local mirror of linux-firmware repo to try to stop failures 2012-04-09 16:58:59 -07:00
Mark Nelson
3d7f1db731 Kernel: Pull linux-firmware from git
Signed-off-by: Mark Nelson <nhm@clusterfaq.org>
2012-04-05 08:49:19 -07:00
Samuel Just
b4aa098f47 make Thrasher not inherit from Greenlet 2012-03-29 18:08:19 -07:00
Samuel Just
394d8b1ebd Add test for object source marked down 2012-03-29 18:08:19 -07:00
Samuel Just
749826c29b allow use of a separate journal block device 2012-03-27 17:18:44 -07:00
Josh Durgin
e30b7710f5 rbd: fix typo in default config
pyflakes would have caught this if 'all' weren't a built-in function
2012-03-26 11:57:07 -07:00
Sage Weil
397e7f2f7b add osd_recovery task to test divergent osd logs 2012-03-24 21:09:19 -07:00
Sage Weil
1c1192a9fb backfill: use 'rbd' pool instead of 'data'
(data has a replay interval, which makes writes take longer to resume
after repeering)
2012-03-24 21:09:19 -07:00
Sage Weil
ca9a5a4ac4 rename backfill -> osd_backfill 2012-03-24 16:05:11 -07:00
Samuel Just
91c08f6eee Add watch op to rados.py
Signed-off-by: Samuel Just <sam.just@dreamhost.com>
2012-03-20 19:00:12 -07:00
Josh Durgin
bdb72c282f filestore_idempotent: get coverage and coredumps 2012-03-19 11:57:02 -07:00
Sage Weil
94f0ba1efe run valgrind with cwd set to /tmp/cephtest/archive/coredump
This lets us capture the vgcore.* files, which always go to valgrind's
cwd.

Fixes: #1953
2012-03-18 10:48:51 -07:00
Mark Nelson
e14d428c98 Merge branch 'master' of github.com:ceph/teuthology 2012-03-14 15:32:23 -05:00
Sage Weil
1a01ccaafb Pull from new gitbuilder.ceph.com locations.
Simplifies the flavor stuff into a tuple of

<package,type,flavor,dist,arch>

where package is ceph, kenrel, etc.
type is tarball, deb
flavor is basic, gcov, notcmalloc
arch is x86_64, i686 (uname -m)
dist is oneiric, etc. (lsb_release -s -c)
2012-03-13 10:02:26 -07:00
Mark Nelson
3833ada8b9 Made the example better with multiple roles. 2012-03-12 15:13:36 -05:00
Mark Nelson
0a61ffad4c Added some example yaml files and an example parallel execution task. 2012-03-12 14:33:10 -05:00
Sage Weil
008cf7fd95 autotest: pull from github.com/ceph/autotest 2012-03-10 19:15:21 -08:00
Sage Weil
2124129e70 workunit: include python2.7 path too 2012-03-10 15:34:19 -08:00
Samuel Just
ddc1ab0c03 rados.py: include setattr and rmattr
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-03-08 16:14:44 -08:00
Sage Weil
dc1abab211 github.com/NewDreamNetwork -> github.com/ceph 2012-03-02 10:55:56 -08:00
Josh Durgin
a80246c17f dump_stuck: note required ceph configuration 2012-02-29 15:47:17 -08:00
Josh Durgin
85cc96c11a dump_stuck: verify that 'ceph health' mentions the right number of inactive/unclean/stale pgs 2012-02-28 13:55:46 -08:00
Sage Weil
999e21928c peer: ignore +scrubbing portion of pg state
It can cause the mon state and osd states to not match.
2012-02-28 09:50:29 -08:00
Sage Weil
84cd4ed6c3 peer: wait for peering to complete, or block
We need to wait for peering to either complete, or block because it is
waiting for another PG.  _Then_ look at all the PG states and compare the
mon values with what we get from qeurying the OSDs directly.
2012-02-25 21:05:00 -08:00
Josh Durgin
b8739585a0 peer: remove unused variable 2012-02-24 15:01:34 -08:00
Josh Durgin
e4801819f2 rgw: simplify valgrind args 2012-02-24 14:56:42 -08:00
Sage Weil
edbb41e1f8 add peer task
Force a pg to get stuck in 'down' state, verify we can query the peering
state, then start the OSD so it can recover.
2012-02-24 15:05:17 -08:00
Sage Weil
7ac04a422a lost_unfound: list missing/unfound for each pg and verify the unfound counts
This also tests the pg list_missing functionality.
2012-02-24 12:42:39 -08:00
Sage Weil
c43e87d118 ceph_manager: list_pg_missing
List missing objects for the given pgid.
2012-02-24 12:42:39 -08:00
Josh Durgin
c93a08eda0 Whitespace and unnecessary formatting fixes 2012-02-24 12:05:35 -08:00