Sage Weil
196d4a1f16
wait_till_clean -> wait_for_clean and wait_for_recovery
...
Clean now also means the correct number of replicas, whereas recovered
means we have done all the work we can do given the replicas/osds we have.
For example, degraded and clean are now mutually exclusive.
Also move away from 'till'.
2012-02-17 21:53:25 -08:00
Sage Weil
ad9d7fb6e1
backfill: wait for clean before writing+blackholing
...
If we have straggler pgs and blackhole osd.1, we can deadlock because we
need info from that osd to repeer and continue. Make sure we're clean, and
then start the write + blackhole + kill test.
2012-02-14 15:24:11 -08:00
Sage Weil
6f3abc6ced
ceph_manager: mark in a bit more often than out
...
Otherwise we can get into cases where many/most nodes are out, and things
don't work as well. e.g., crush may start to fail.
2012-02-13 15:28:24 -08:00
Sage Weil
af4ce44233
ceph: use any fs, not just btrfs, on scratch devices
...
The
btrfs: true
syntax is replaced with
fs: btrfs
or ext4, xfs.
2012-02-13 15:28:24 -08:00
Josh Durgin
0cd16cf03d
ceph: always add logger for daemons
...
The extra log function added redundant info and didn't allow different
levels.
2012-02-02 09:36:04 -08:00
Josh Durgin
7af7c66bd0
ceph: rename type parameter to type_
...
type is a built-in and shouldn't be aliased.
2012-02-02 09:35:58 -08:00
Josh Durgin
7146db9215
ceph: use the correct comparison operator
...
is compares identity (i.e. address in cpython), not value.
2012-02-02 09:27:04 -08:00
Josh Durgin
e7672b6433
ceph: sync before unmounting btrfs devices
...
There may still be writes in flight, since the osds may not have
shutdown cleanly. This should prevent EBUSY when unmounting.
Fixes : #1997
2012-02-02 09:26:45 -08:00
Josh Durgin
1364b8826f
ceph: delay raising exceptions until all daemons are stopped
...
If a daemon crashes, the exception is raised when we stop it. This
caused some daemons to continue running during cleanup, since the rest
of the daemons of the same type would not be shut down. Also log each
daemon that crashed, for easier debugging.
Fixes : #1744
2012-02-02 09:26:25 -08:00
Sage Weil
0236dc0f5e
add backfill task
...
This does a basic test of backfill functionality, including a divergent
log on a backfill target (#1983 ).
2012-01-31 16:25:53 -08:00
Sage Weil
e337c4727c
ceph_manager: add manager.blackhole_kill_osd()
...
This will suspend disk writes for a couple seconds and then kill the
daemon. It helps us similute a hardware failure.
2012-01-31 16:13:59 -08:00
Tommi Virtanen
d7be77628c
Allow user to disable lock checking.
...
The new plana hardware isn't in the old sepia lock database,
and the machine pools are risky to merge as nothing in the
software guarantees allocation from just one pool. This allows
us to hand-allocate machines temporarily.
2012-01-31 08:05:36 -08:00
Tommi Virtanen
09bed16408
Allow user to provide flavor to use.
...
With this, you can use Ubuntu 11.10 machines with teuthology by saying::
tasks:
- ceph:
flavor: oneiric
...
2012-01-31 07:59:43 -08:00
Josh Durgin
f84b4aa5e3
Add admin socket task.
...
This simply gets the output of an admin socket command, makes sure
it's json, and runs a user-provided test script on it.
2012-01-27 17:13:36 -08:00
Samuel Just
4aa9ca4551
CephManager: base timeout on time since last change in active+clean
...
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-01-24 11:28:38 -08:00
Josh Durgin
29885f3e42
kernel: ignore connection problems while waiting for reboot
2012-01-18 17:49:05 -08:00
Sage Weil
45e4c924fa
thrashosds: maxdead default to 0
...
This avoids any possibility of blocking peering.
2012-01-17 09:24:54 -08:00
Sage Weil
bf22a4fb92
task/rados: use new usage for radosmodel tool
2012-01-16 16:53:55 -08:00
Sage Weil
71390f9784
thrashosds: fix action selection
...
I'm not sure what the old code was trying to do, but I'm pretty sure it
wasn't doing it correctly.. a .1 chance_down was killing an OSD for me
virtually every time.
2012-01-16 15:05:43 -08:00
Sage Weil
8fc6086986
thrashosds: make actions less nonsensical
...
Make marking OSD up/down and in/out totally orthogonal.
Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-16 15:05:43 -08:00
Sage Weil
b58f9560ea
ceph: ignore all leaks
...
unless/until we figure out where the DefinitelyLost records are coming
from.. at first glance they look bogus.
2012-01-16 09:55:47 -08:00
Sage Weil
40fb86ff81
ceph: take single arg or list for valgrind args
2012-01-16 09:22:45 -08:00
Sage Weil
c88ec5719e
combined mon, osd, mds starter functions
2012-01-15 22:54:09 -08:00
Sage Weil
f8ec23e79d
rbd: default to all:
2012-01-15 22:53:39 -08:00
Sage Weil
72057a9cd8
use local mirrors for (most) github urls
...
A cronjob on ceph.newdream.net updates these every 15 minutes. Sigh.
2012-01-15 22:52:58 -08:00
Sage Weil
f70b158cd1
show host -> roles mapping on startup
...
Less guessing when manually inspecting an in-progress or hung run.
2012-01-15 22:52:58 -08:00
Sage Weil
f795261454
lost_unfound: make test work with backfill
...
If we backfill, we fail to peer instead of having every object show up as
'unfound'. Avoid that by preventing log trimming, so that we always do
log recovery for this test.
2012-01-15 22:52:58 -08:00
Tommi Virtanen
3bfa41cf6a
Use yaml.safe_dump so unicode doesn't mess up the yaml files.
...
In general, yaml.dump is comparable to pickle, and my personal
coding standard says *never* use it. yaml.safe_dump is much nicer.
yaml.dump should have been named yaml.unsafe_dump, yaml.safe_dump
should have been named yaml.dump :(
2012-01-13 11:26:36 -08:00
Josh Durgin
96e89d30ec
kernel: loop reconnecting in case we race with shutdown
...
Previously, if we reconnected before shutdown completed we asserted
that the kernel did not boot into the new version, when we just needed
to wait for the machine to reboot.
2012-01-12 13:02:22 -08:00
Sage Weil
59369237c9
thrasher: don't mark down osds out; tell monitor same
...
Stopping ceph-osd doesn't make it out (immediately). Prevent monitor
from doing this after a delay too so we can keep our notion of what is
up/down/in/out accurate.
2012-01-11 12:54:09 -08:00
Sage Weil
3c0346b4cb
lost_unfound: typo
2012-01-11 12:54:09 -08:00
Sage Weil
6dae2f8ae3
thrasher: adjust min_dead default
...
Make this 1, not 2. That's a bit more friendly. It doesn't strictly
matter, tho, since we revive osds before waiting for clean.
2012-01-11 12:54:09 -08:00
Sage Weil
fb74b90152
thrasher: add max_dead
...
Add max_dead, and revive osds prior to waiting for clean. Otherwise we
can leave too many OSDs down and the cluster will never go clean.
2012-01-11 12:54:08 -08:00
Sage Weil
50463ffddd
verify all osds start before checking health
...
Just checking health isn't good enough, since it races with OSD startup:
we can have a healthy cluster with 0 (or something else < total) OSDs.
2012-01-11 12:54:08 -08:00
Josh Durgin
f4883ebf09
ceph: let the user running ceph-osd remove subvolumes
...
This will prevent EPERM when using the SNAP_DESTROY ioctl,
so the filestore will use btrfs snaps.
2012-01-10 16:07:04 -08:00
Josh Durgin
d2fadf9fe2
syslog: ignore lockdep non-static key warning
...
It looks like this warning was made default in linux 3.2.
This will keep happening until #1922 is done.
2012-01-10 15:28:42 -08:00
Sage Weil
13445d237b
ceph_manager: a booting osd is no longer automatically marked in
...
as of ceph.git commit 96b7b0d83e
2012-01-06 17:21:38 -08:00
Sage Weil
001701a0f7
mon_recovery: need n/2 + 1 monitors for quorum
2012-01-06 15:12:15 -08:00
Josh Durgin
ec3a3a9654
rados: fix example config
2012-01-03 14:07:45 -08:00
Josh Durgin
0176c9ab0f
Remove unused mon.0 variables.
2012-01-03 13:02:31 -08:00
Josh Durgin
2e9b1c75f9
rados: use testrados instead of testsnaps and testreadwrite
2012-01-03 13:02:29 -08:00
Josh Durgin
932257fb6e
rados: remove unused variable
2011-12-30 14:37:45 -08:00
Josh Durgin
0af9c0a2e7
rados: clean up argument construction
...
Only the client id varies, so it can be done outside the loop. Also
handle coredumps and coverage, and use LD_LIBRARY_PATH instead of
LD_PRELOAD.
2011-12-30 14:37:45 -08:00
Josh Durgin
6df4ce5075
rados: fix references to testrados
2011-12-30 14:37:45 -08:00
Josh Durgin
cdf142b597
rados: fix documentation format
2011-12-30 14:37:45 -08:00
Mark Kampe
f04e29557e
teuthology rgw-admin: annotated test cases for inventory
...
this is not a nose suite, so I simply added test case
descriptions in csv format, and put a file to extract
them at the top of the file.
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>
2011-12-29 13:09:08 -08:00
Josh Durgin
d0e90d71bd
syslog checking: forgot a pipe
2011-12-16 18:09:17 -08:00
Yehuda Sadeh
7eec30946d
rountrip: add task
2011-12-15 13:24:53 -08:00
Yehuda Sadeh
97cc6c2990
readwrite: fix task with default conf
2011-12-15 12:39:39 -08:00
Yehuda Sadeh
659e66aa09
readwrite: fix conf, task runs
2011-12-14 17:14:30 -08:00