Josh Durgin
f28f881bda
ceph_manager: test offline split via ceph-objectstore-tool
...
When killing an osd, split all pools with a low threshold.
This will slow down tests, but should not impact correctness.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-08-26 00:28:32 +00:00
Josh Durgin
c12b3513a7
Merge pull request #1003 from athanatos/wip-15655
...
ceph_manager: test [test-]reweight-by-(utilization|pg)
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-07-22 14:46:21 -07:00
Samuel Just
19854c095b
ec_lost_unfound: set min_size to 2
...
We changed the default to k+1 instead of k. Adjust test to compensate.
Fixes: http://tracker.ceph.com/issues/16416
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-07-05 14:05:12 -07:00
Josh Durgin
274d79ade3
tasks/ceph_manager: make utility_task cluster-aware
...
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:52:00 -07:00
Josh Durgin
256ebf8a12
tasks: move find_remote to util, rename and add helper
...
This is a useful for any cluster-aware task.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
bba323834b
tasks/ceph_manager: make Thrasher cluster-aware
...
Just a few spots need to know to lookup only osds in this cluster, or
prefix a filename with the cluster. Use CephManager.find_remote() to
avoid a bunch of repetition and look only in the intended cluster.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
713e717fda
tasks/ceph_manager: make mount_osd_data() cluster-aware
...
Use a cluster-specific mount point, and address osds by full role,
rather than just id, in the ctx.disk_config structures.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
524e6d7a5e
tasks/ceph_manager: add cluster param to write_conf()
...
Only used by cephfs right now, so don't bother changing callers.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
ff49deb6f0
tasks/ceph_manager: simplify remote lookup, and make it cluster aware
...
Re-implement find_remote() using ctx.cluster.only() with a matcher
function that includes the manager's cluster, and use it instead of
miscellaneous ctx.cluster.only() calls elsewhere.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
141c73d399
tasks/ceph_manager: parameterize CephManager with cluster
...
Add --cluster arguments, pass cluster to get_daemon() and
iter_daemons_of_role, replace 'ceph' with cluster in paths, and use
ctx.ceph[cluster] instead of ctx.ceph.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:58 -07:00
Samuel Just
482a12f348
ceph_manager: test [test-]reweight-by-(utilization|pg)
...
Fixes: http://tracker.ceph.com/issues/15655
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-29 15:11:19 -07:00
David Zafman
8da6e97bd4
CephManager: Wait 1 second for pool creation to get far enough along
...
Fixes: http://tracker.ceph.com/issues/15673
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 14:29:13 -07:00
David Zafman
a595651c54
CephManager: Maximum 2 minutes for raw cluster commands
...
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:39 -07:00
David Zafman
447bf873a8
thrasher: Add noscrub_toggle_delay and flip the noscrub osd flags
...
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:39 -07:00
David Zafman
7a528763d1
thrasher: Add dump_ops_enable and optrack_toggle_delay options
...
Add dump_ops_enable which continuously dumps ops using 3 commands
Add optrack_toggle_delay to alternate op tracking enablement
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:38 -07:00
David Zafman
4ad3b86604
ceph_manager: Add timeout to admin_socket/osd_admin_socket
...
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:38 -07:00
Samuel Just
7e53203e80
rados/singleton-nomsgr: add lfn upgrade tests
...
Upgrade from hammer/infernalis to x and verify lfn objects are valid
across upgrade.
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
93892eb82a
ceph_manager: return exit status on do_get, do_put, do_rm
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
269d6002f1
ceph_manager: add do_rm
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
670ca43dfc
ceph_manager: extend do_put and do_get to allow a namespace
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
c8f7694d52
ceph_manager: fix do_get to actually do a get
...
Currently unused.
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
John Spray
53f4430d03
tasks/cephfs: further thrasher fixes
...
Move the thrasher-specific methods out of CephManager
into MDSThrasher and plumb them into MDSCluster.
Signed-off-by: John Spray <john.spray@redhat.com
2016-03-11 10:39:37 +00:00
Sage Weil
6deba7c649
tasks/ceph_manager: dump pgs if other peering timeouts expire
...
We were doing this for one of the recovery timeouts but not all of them.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-03-07 12:21:10 -05:00
Samuel Just
8cf25611fb
ceph_manager: use time before mon command for timeout
...
Slow mon commands can cause a false failure.
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-02-19 12:28:36 -08:00
John Spray
d8106fa9e1
tasks: add run_ceph_w to CephManager
...
Analogous to raw_cluster_command, but instead
of calling blocking CLI command we're invoking
the -w mode.
Signed-off-by: John Spray <john.spray@redhat.com>
2016-01-05 18:58:00 +00:00
Samuel Just
89dcc0daf3
ceph_manager: do_pg_scrub: keep scrubbing until it's done
...
The ceph pg scrub ... command isn't really guarranteed to
start a scrub, keep reissuing it until the scrub actually
happens.
Related: #12746
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-11-19 15:07:38 -08:00
Sage Weil
f467a98a29
tasks/ceph_manager: %d -> %s
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-30 14:58:32 -04:00
Sage Weil
a53a80b9f0
tasks/ceph_manager: fix logging on failed pool property
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-30 09:24:38 -04:00
Samuel Just
4e9f1df514
rados: add test for 13234.yaml
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-09-29 21:19:10 -07:00
Sage Weil
0e2814d81e
tasks/ceph_manager: ignore failure getting pg_num
...
Otherwise, we may fail while racing with a workload that deletes a pool:
2015-09-23T15:01:52.855 INFO:tasks.workunit.client.1.vpm128.stdout:[ RUN ] LibRadosTwoPoolsPP.PromoteSnapTrimRace
2015-09-23T15:01:53.892 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw pg_num'
2015-09-23T15:01:54.206 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.gc pg_num'
2015-09-23T15:01:54.462 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users.uid pg_num'
2015-09-23T15:01:54.696 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users.email pg_num'
2015-09-23T15:01:55.006 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users pg_num'
2015-09-23T15:01:55.296 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.buckets.index pg_num'
2015-09-23T15:01:55.523 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .log pg_num'
2015-09-23T15:01:55.752 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .usage pg_num'
2015-09-23T15:01:56.188 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.buckets.extra pg_num'
2015-09-23T15:01:56.625 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get test-rados-api-vpm128-17360-6 pg_num'
2015-09-23T15:01:56.928 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get test-rados-api-vpm128-17360-13 pg_num'
2015-09-23T15:01:57.193 INFO:teuthology.orchestra.run.vpm176.stderr:Error ENOENT: unrecognized pool 'test-rados-api-vpm128-17360-13'
2015-09-23T15:01:57.206 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-24 12:19:07 -04:00
Sage Weil
dad981d339
tasks: sudo ceph for cli
...
/var/run/ceph is 770. This is mainly necessary for any
interaction with the daemon sockets, but it is what users do
and it may avoid log noise.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-11 12:15:01 -04:00
Sage Weil
a328e3e60d
tasks/ceph_manager: dump pgs when recover times out
...
It is really hard to map a stuck recovery back to the pgs that
are stuck. This will make it easy.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-08 08:59:49 -04:00
Sage Weil
c93fe1f1c6
tasks/ceph_manager: be silent about sending SIGHUPs
...
At the default interval this generates tons of log noise.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-04 13:08:04 -04:00
Andrew Schoen
a3c9a763b1
ceph_manager: don't add an osd to live_osds until it's been revived
...
also waits to remove it from dead_osds. this fixes an issue where
do_sighup tries to send a signal to an osd that has not been revived
yet.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 17:05:31 -05:00
Andrew Schoen
84d24038aa
ceph_manager: adds a do_sighup method
...
This method runs in a separate greenlet than do_thrash and will pick a
random live osd to send a signal.SIGHUP to. There is a config option,
sighup_delay, which controls how long to delay between sending the
signals.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 14:46:12 -05:00
Andrew Schoen
ed73f67991
ceph_manager: adds a signal_osd method
...
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 14:13:30 -05:00
David Zafman
b255db820f
thrasher: Can't test ceph-objectstore-tool if nodes turned off (powercycle)
...
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-06-09 14:24:47 -07:00
Samuel Just
91b300d12c
rados/thrash: add test for radosgw with snaps
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-05-28 15:36:39 -07:00
Samuel Just
2a60852a1d
squash: ceph_manager: add utility_task doc string
2015-05-04 14:21:31 -07:00
Samuel Just
015ed70f8a
suites/rados: add test for 11429
...
This patch also adds some convenience facilities for making
some of the ceph_manager methods into tasks usable from a
yaml file.
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-05-04 11:53:54 -07:00
John Spray
0de712f42a
tasks/ceph_manager: DRY in mds_status
...
Signed-off-by: John Spray <john.spray@redhat.com>
2015-04-14 14:13:38 +01:00
John Spray
5c1071b103
ceph_manager: fix bad type assertions
...
In python, isinstance(foo, str) will fail if
a unicode string is passed in. The correct check
is basestring.
Signed-off-by: John Spray <john.spray@redhat.com>
2015-04-14 14:13:38 +01:00
Yuri Weinstein
581fcf192f
Merge pull request #380 from ceph/wip-11204
...
Make sure that ulimits are adjusted for ceph-objectstore-tool
2015-03-27 12:23:37 -07:00
Sage Weil
dcb5e8da9d
Merge remote-tracking branch 'gh/hammer'
...
Conflicts:
.gitignore
2015-03-26 17:09:33 -07:00
David Zafman
e6ce90fdb1
Make sure that ulimits are adjusted for ceph-objectstore-tool
...
Fixes : #11204
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-03-26 15:18:47 -07:00
David Zafman
6c5300552d
ceph_manager: Check for exit status 11 from ceph-objectstore-tool import
...
Fixes : #11139
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-03-20 21:25:41 -07:00
Alfredo Deza
4ed442e44c
stdin is no longer a kwarg
...
Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 49a61dc2d2
)
2015-02-26 14:48:40 -05:00
Alfredo Deza
33f7982480
add the log object to ceph_manager
...
Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit f7c1ca4a1e
)
2015-02-26 14:48:30 -05:00
Alfredo Deza
49a61dc2d2
stdin is no longer a kwarg
...
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2015-02-26 11:34:21 -05:00
Alfredo Deza
f7c1ca4a1e
add the log object to ceph_manager
...
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2015-02-26 11:33:47 -05:00