Commit Graph

94 Commits

Author SHA1 Message Date
Josh Durgin
f28f881bda ceph_manager: test offline split via ceph-objectstore-tool
When killing an osd, split all pools with a low threshold.
This will slow down tests, but should not impact correctness.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-08-26 00:28:32 +00:00
Josh Durgin
c12b3513a7 Merge pull request #1003 from athanatos/wip-15655
ceph_manager: test [test-]reweight-by-(utilization|pg)

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-07-22 14:46:21 -07:00
Samuel Just
19854c095b ec_lost_unfound: set min_size to 2
We changed the default to k+1 instead of k.  Adjust test to compensate.

Fixes: http://tracker.ceph.com/issues/16416
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-07-05 14:05:12 -07:00
Josh Durgin
274d79ade3 tasks/ceph_manager: make utility_task cluster-aware
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:52:00 -07:00
Josh Durgin
256ebf8a12 tasks: move find_remote to util, rename and add helper
This is a useful for any cluster-aware task.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
bba323834b tasks/ceph_manager: make Thrasher cluster-aware
Just a few spots need to know to lookup only osds in this cluster, or
prefix a filename with the cluster. Use CephManager.find_remote() to
avoid a bunch of repetition and look only in the intended cluster.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
713e717fda tasks/ceph_manager: make mount_osd_data() cluster-aware
Use a cluster-specific mount point, and address osds by full role,
rather than just id, in the ctx.disk_config structures.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
524e6d7a5e tasks/ceph_manager: add cluster param to write_conf()
Only used by cephfs right now, so don't bother changing callers.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
ff49deb6f0 tasks/ceph_manager: simplify remote lookup, and make it cluster aware
Re-implement find_remote() using ctx.cluster.only() with a matcher
function that includes the manager's cluster, and use it instead of
miscellaneous ctx.cluster.only() calls elsewhere.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
141c73d399 tasks/ceph_manager: parameterize CephManager with cluster
Add --cluster arguments, pass cluster to get_daemon() and
iter_daemons_of_role, replace 'ceph' with cluster in paths, and use
ctx.ceph[cluster] instead of ctx.ceph.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:58 -07:00
Samuel Just
482a12f348 ceph_manager: test [test-]reweight-by-(utilization|pg)
Fixes: http://tracker.ceph.com/issues/15655
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-29 15:11:19 -07:00
David Zafman
8da6e97bd4 CephManager: Wait 1 second for pool creation to get far enough along
Fixes: http://tracker.ceph.com/issues/15673

Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 14:29:13 -07:00
David Zafman
a595651c54 CephManager: Maximum 2 minutes for raw cluster commands
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:39 -07:00
David Zafman
447bf873a8 thrasher: Add noscrub_toggle_delay and flip the noscrub osd flags
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:39 -07:00
David Zafman
7a528763d1 thrasher: Add dump_ops_enable and optrack_toggle_delay options
Add dump_ops_enable which continuously dumps ops using 3 commands
Add optrack_toggle_delay to alternate op tracking enablement

Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:38 -07:00
David Zafman
4ad3b86604 ceph_manager: Add timeout to admin_socket/osd_admin_socket
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:38 -07:00
Samuel Just
7e53203e80 rados/singleton-nomsgr: add lfn upgrade tests
Upgrade from hammer/infernalis to x and verify lfn objects are valid
across upgrade.

Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
93892eb82a ceph_manager: return exit status on do_get, do_put, do_rm
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
269d6002f1 ceph_manager: add do_rm
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
670ca43dfc ceph_manager: extend do_put and do_get to allow a namespace
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
c8f7694d52 ceph_manager: fix do_get to actually do a get
Currently unused.

Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
John Spray
53f4430d03 tasks/cephfs: further thrasher fixes
Move the thrasher-specific methods out of CephManager
into MDSThrasher and plumb them into MDSCluster.

Signed-off-by: John Spray <john.spray@redhat.com
2016-03-11 10:39:37 +00:00
Sage Weil
6deba7c649 tasks/ceph_manager: dump pgs if other peering timeouts expire
We were doing this for one of the recovery timeouts but not all of them.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-03-07 12:21:10 -05:00
Samuel Just
8cf25611fb ceph_manager: use time before mon command for timeout
Slow mon commands can cause a false failure.

Signed-off-by: Samuel Just <sjust@redhat.com>
2016-02-19 12:28:36 -08:00
John Spray
d8106fa9e1 tasks: add run_ceph_w to CephManager
Analogous to raw_cluster_command, but instead
of calling blocking CLI command we're invoking
the -w mode.

Signed-off-by: John Spray <john.spray@redhat.com>
2016-01-05 18:58:00 +00:00
Samuel Just
89dcc0daf3 ceph_manager: do_pg_scrub: keep scrubbing until it's done
The ceph pg scrub ... command isn't really guarranteed to
start a scrub, keep reissuing it until the scrub actually
happens.

Related: #12746
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-11-19 15:07:38 -08:00
Sage Weil
f467a98a29 tasks/ceph_manager: %d -> %s
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-30 14:58:32 -04:00
Sage Weil
a53a80b9f0 tasks/ceph_manager: fix logging on failed pool property
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-30 09:24:38 -04:00
Samuel Just
4e9f1df514 rados: add test for 13234.yaml
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-09-29 21:19:10 -07:00
Sage Weil
0e2814d81e tasks/ceph_manager: ignore failure getting pg_num
Otherwise, we may fail while racing with a workload that deletes a pool:

2015-09-23T15:01:52.855 INFO:tasks.workunit.client.1.vpm128.stdout:[ RUN      ] LibRadosTwoPoolsPP.PromoteSnapTrimRace
2015-09-23T15:01:53.892 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw pg_num'
2015-09-23T15:01:54.206 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.gc pg_num'
2015-09-23T15:01:54.462 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users.uid pg_num'
2015-09-23T15:01:54.696 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users.email pg_num'
2015-09-23T15:01:55.006 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users pg_num'
2015-09-23T15:01:55.296 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.buckets.index pg_num'
2015-09-23T15:01:55.523 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .log pg_num'
2015-09-23T15:01:55.752 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .usage pg_num'
2015-09-23T15:01:56.188 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.buckets.extra pg_num'
2015-09-23T15:01:56.625 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get test-rados-api-vpm128-17360-6 pg_num'
2015-09-23T15:01:56.928 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get test-rados-api-vpm128-17360-13 pg_num'
2015-09-23T15:01:57.193 INFO:teuthology.orchestra.run.vpm176.stderr:Error ENOENT: unrecognized pool 'test-rados-api-vpm128-17360-13'
2015-09-23T15:01:57.206 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
...

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-24 12:19:07 -04:00
Sage Weil
dad981d339 tasks: sudo ceph for cli
/var/run/ceph is 770.  This is mainly necessary for any
interaction with the daemon sockets, but it is what users do
and it may avoid log noise.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-11 12:15:01 -04:00
Sage Weil
a328e3e60d tasks/ceph_manager: dump pgs when recover times out
It is really hard to map a stuck recovery back to the pgs that
are stuck.  This will make it easy.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-08 08:59:49 -04:00
Sage Weil
c93fe1f1c6 tasks/ceph_manager: be silent about sending SIGHUPs
At the default interval this generates tons of log noise.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-04 13:08:04 -04:00
Andrew Schoen
a3c9a763b1 ceph_manager: don't add an osd to live_osds until it's been revived
also waits to remove it from dead_osds. this fixes an issue where
do_sighup tries to send a signal to an osd that has not been revived
yet.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 17:05:31 -05:00
Andrew Schoen
84d24038aa ceph_manager: adds a do_sighup method
This method runs in a separate greenlet than do_thrash and will pick a
random live osd to send a signal.SIGHUP to. There is a config option,
sighup_delay, which controls how long to delay between sending the
signals.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 14:46:12 -05:00
Andrew Schoen
ed73f67991 ceph_manager: adds a signal_osd method
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 14:13:30 -05:00
David Zafman
b255db820f thrasher: Can't test ceph-objectstore-tool if nodes turned off (powercycle)
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-06-09 14:24:47 -07:00
Samuel Just
91b300d12c rados/thrash: add test for radosgw with snaps
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-05-28 15:36:39 -07:00
Samuel Just
2a60852a1d squash: ceph_manager: add utility_task doc string 2015-05-04 14:21:31 -07:00
Samuel Just
015ed70f8a suites/rados: add test for 11429
This patch also adds some convenience facilities for making
some of the ceph_manager methods into tasks usable from a
yaml file.

Signed-off-by: Samuel Just <sjust@redhat.com>
2015-05-04 11:53:54 -07:00
John Spray
0de712f42a tasks/ceph_manager: DRY in mds_status
Signed-off-by: John Spray <john.spray@redhat.com>
2015-04-14 14:13:38 +01:00
John Spray
5c1071b103 ceph_manager: fix bad type assertions
In python, isinstance(foo, str) will fail if
a unicode string is passed in.  The correct check
is basestring.

Signed-off-by: John Spray <john.spray@redhat.com>
2015-04-14 14:13:38 +01:00
Yuri Weinstein
581fcf192f Merge pull request #380 from ceph/wip-11204
Make sure that ulimits are adjusted for ceph-objectstore-tool
2015-03-27 12:23:37 -07:00
Sage Weil
dcb5e8da9d Merge remote-tracking branch 'gh/hammer'
Conflicts:
	.gitignore
2015-03-26 17:09:33 -07:00
David Zafman
e6ce90fdb1 Make sure that ulimits are adjusted for ceph-objectstore-tool
Fixes: #11204

Signed-off-by: David Zafman <dzafman@redhat.com>
2015-03-26 15:18:47 -07:00
David Zafman
6c5300552d ceph_manager: Check for exit status 11 from ceph-objectstore-tool import
Fixes: #11139

Signed-off-by: David Zafman <dzafman@redhat.com>
2015-03-20 21:25:41 -07:00
Alfredo Deza
4ed442e44c stdin is no longer a kwarg
Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 49a61dc2d2)
2015-02-26 14:48:40 -05:00
Alfredo Deza
33f7982480 add the log object to ceph_manager
Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit f7c1ca4a1e)
2015-02-26 14:48:30 -05:00
Alfredo Deza
49a61dc2d2 stdin is no longer a kwarg
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2015-02-26 11:34:21 -05:00
Alfredo Deza
f7c1ca4a1e add the log object to ceph_manager
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2015-02-26 11:33:47 -05:00