Commit Graph

37 Commits

Author SHA1 Message Date
Warren Usui
694827bc0c Allow scrubbing while thrashing
Added ability to implement scrubbing while thrashing
(scrub_interval in config can be set to an interval
similar to how clean_interval is set).  Defaults to 0,
which indicates that no scrubbing will take place.
Add scrub_interval description to thrashosds docstring.

Fixes: 7199
Signed-off-by: Warren Usui <warren.usui@inktank.com>
2014-04-28 11:00:38 -07:00
Zack Cerza
158f9ba1ff Revert "Lines formerly of the form '(remote,) = ctx.cluster.only(role).remotes.keys()'"
This reverts commit d693b3f895.
2014-03-27 11:35:28 -05:00
Warren Usui
d693b3f895 Lines formerly of the form '(remote,) = ctx.cluster.only(role).remotes.keys()'
and '(remote,) = ctx.cluster.only(role).remotes.iterkeys()' would fail with
ValueError and no message if there were less than 0 or more than 1 key.
Now a new function, get_single_remote_value() is called which prints out
more understandable messages.

Fixes: 7510
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Warren Usui <warren.usui@inktank.com>
2014-03-26 18:43:48 -07:00
Sage Weil
3d0ce6936d thrashosds: allow primary-affinity thrashing to be disabled
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-17 13:16:42 -08:00
Sage Weil
495f2163a8 thrashosds: change min_in from 2 -> 3
See #7171. In rare cases CRUSH can't handle it when only 2/6 of
the OSDs are marked in.  Avoid those situations for now.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-10 11:00:55 -08:00
Warren Usui
a1d8225b7d Added docstrings, and improved some of the comments on several tasks. 2013-10-12 01:35:34 -07:00
Samuel Just
a355d9f570 ceph_manager: add test_map_discontinuity to thrasher
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-26 10:40:58 -07:00
Samuel Just
77cae4bf35 thrashosds: add delay option after recovery
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-22 16:30:57 -07:00
Warren Usui
a4994e3bde Support added for running scheduled tasks on virtual machines.
This included:
    A). changes made so that full path names on some files were used
        (scheduled tasks started in different home directories).
    B.) Changes to insure tasks come up on the beanstalkc queue properly,
    C.) Finding and inserting the libvirt eqivalent code for vm machines
        in order to simulate ipmi actions,
    D.) Fix host key code, report valgrind issue more clearly.
    E.) Some message and downburst call changes.

    Fix #4988
    Fix #5122
    Signed-off-by: Warren Usui <warren.usui@inktank.com>
2013-06-07 19:32:15 -07:00
Sage Weil
6c9292c80f thrashosds: sync before doing powercycle testing
Hopefully fixes #5112
2013-05-20 12:26:49 -07:00
Samuel Just
5741228f60 ceph_manager: add timeout option to revive, increase for power_cycle
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-05-07 15:51:36 -07:00
Samuel Just
c50b143e92 thrashosds: add test_backfill_full
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-03-25 15:39:12 -07:00
Samuel Just
97a5c05141 thrashosds.py: fix line length
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-03-25 15:39:11 -07:00
Sam Lang
6be6f6c607 task/thrashosds: Ipmi checking/setup in thrashosds
We don't need to setup the ipmi console on runs that
don't use powercycling, so delay setup of the RemoteConsole
with ipmi to the thrashosd task and only then if the powercycle
config is set.  This avoids spurious test failures from flaky
ipmi.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-03-13 10:19:48 -05:00
Josh Durgin
8f9267cf0e thrashosds: note assumption for powercycling 2013-01-31 09:14:06 -08:00
Sam Lang
58111595d4 Support power cycling osds/nodes through ipmi
This patch defines a RemoteConsole class associated
with each Remote class instance, allowing
power cycling a target through ipmi.

Fixes/Implements #3782.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-31 08:23:37 -06:00
Samuel Just
3a5c70b89b ceph_manager: turn long stall injection off by default
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-24 17:31:38 -08:00
Samuel Just
6a859bcd56 ceph_manager: use 80/70 as pause_long, pause_check_after defaults
OSD::op_tp suicides after 150.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-24 12:50:26 -08:00
Samuel Just
ec5a14553f ceph_manager: default chance_down to 0.4
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-23 17:44:05 -08:00
Samuel Just
566ae5332e ceph_manager: add filestore and heartbeat stalls
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-23 17:40:40 -08:00
Samuel Just
f2dbe5edd7 CephManager: add ability to test split
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-11 15:11:06 -08:00
Samuel Just
bd83ed70dc ceph_manager: add test_min_size action
Thrasher can now with configurable frequency test min_size by
taking down all but one osd, waiting, killing that osd and bringing
back the others, and verifying that the cluster goes clean.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-11-07 12:56:31 -08:00
Sage Weil
196d4a1f16 wait_till_clean -> wait_for_clean and wait_for_recovery
Clean now also means the correct number of replicas, whereas recovered
means we have done all the work we can do given the replicas/osds we have.
For example, degraded and clean are now mutually exclusive.

Also move away from 'till'.
2012-02-17 21:53:25 -08:00
Sage Weil
45e4c924fa thrashosds: maxdead default to 0
This avoids any possibility of blocking peering.
2012-01-17 09:24:54 -08:00
Sage Weil
6dae2f8ae3 thrasher: adjust min_dead default
Make this 1, not 2.  That's a bit more friendly.  It doesn't strictly
matter, tho, since we revive osds before waiting for clean.
2012-01-11 12:54:09 -08:00
Sage Weil
fb74b90152 thrasher: add max_dead
Add max_dead, and revive osds prior to waiting for clean.  Otherwise we
can leave too many OSDs down and the cluster will never go clean.
2012-01-11 12:54:08 -08:00
Josh Durgin
f4d527e743 thrashosds: timeout for every clean check, not just the last one 2011-11-17 11:11:33 -08:00
Samuel Just
a3c886af19 ceph.py/cephmanager.py: add ctx.daemons for restarting daemons
ctx.daemons will now be an instance of CephState.

ctx.daemons.get_daemon(role, id).stop() to stop daemon, retart() to
restart the daemon, etc.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-09-15 17:08:34 -07:00
Josh Durgin
1970bad9d9 thrashosds: fix timeout when no options are specified 2011-09-09 10:31:08 -07:00
Josh Durgin
8dd52f9941 thrashosds: fail if cluster doesn't finally become clean in 5 minutes 2011-09-08 18:09:11 -07:00
Josh Durgin
b72c5a8363 thrashosds: wait for every pg to go active and clean before exiting 2011-09-08 14:07:23 -07:00
Sage Weil
c502418fca thrashosds: make it work when first mon isn't mon.0 2011-09-01 12:56:29 -07:00
Sage Weil
3ce1cbb3c4 thrashosds: no camelcaps, add some whitespace 2011-09-01 12:56:29 -07:00
Greg Farnum
fb33ef3c69 thrasher: improve documentation a little
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-08-25 15:27:30 -07:00
Greg Farnum
0f9b74e28c thrasher: allow a config to set values
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-08-25 15:18:42 -07:00
Josh Durgin
5fadb1c11c Whitespace and style cleanup. 2011-07-11 18:07:37 -07:00
Samuel Just
883991a057 added thrashosds
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-06-13 17:01:02 -07:00