Josh Durgin
c12b3513a7
Merge pull request #1003 from athanatos/wip-15655
...
ceph_manager: test [test-]reweight-by-(utilization|pg)
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-07-22 14:46:21 -07:00
Samuel Just
19854c095b
ec_lost_unfound: set min_size to 2
...
We changed the default to k+1 instead of k. Adjust test to compensate.
Fixes: http://tracker.ceph.com/issues/16416
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-07-05 14:05:12 -07:00
Josh Durgin
274d79ade3
tasks/ceph_manager: make utility_task cluster-aware
...
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:52:00 -07:00
Josh Durgin
256ebf8a12
tasks: move find_remote to util, rename and add helper
...
This is a useful for any cluster-aware task.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
bba323834b
tasks/ceph_manager: make Thrasher cluster-aware
...
Just a few spots need to know to lookup only osds in this cluster, or
prefix a filename with the cluster. Use CephManager.find_remote() to
avoid a bunch of repetition and look only in the intended cluster.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
713e717fda
tasks/ceph_manager: make mount_osd_data() cluster-aware
...
Use a cluster-specific mount point, and address osds by full role,
rather than just id, in the ctx.disk_config structures.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
524e6d7a5e
tasks/ceph_manager: add cluster param to write_conf()
...
Only used by cephfs right now, so don't bother changing callers.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
ff49deb6f0
tasks/ceph_manager: simplify remote lookup, and make it cluster aware
...
Re-implement find_remote() using ctx.cluster.only() with a matcher
function that includes the manager's cluster, and use it instead of
miscellaneous ctx.cluster.only() calls elsewhere.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:59 -07:00
Josh Durgin
141c73d399
tasks/ceph_manager: parameterize CephManager with cluster
...
Add --cluster arguments, pass cluster to get_daemon() and
iter_daemons_of_role, replace 'ceph' with cluster in paths, and use
ctx.ceph[cluster] instead of ctx.ceph.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-05-09 11:51:58 -07:00
Samuel Just
482a12f348
ceph_manager: test [test-]reweight-by-(utilization|pg)
...
Fixes: http://tracker.ceph.com/issues/15655
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-29 15:11:19 -07:00
David Zafman
8da6e97bd4
CephManager: Wait 1 second for pool creation to get far enough along
...
Fixes: http://tracker.ceph.com/issues/15673
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 14:29:13 -07:00
David Zafman
a595651c54
CephManager: Maximum 2 minutes for raw cluster commands
...
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:39 -07:00
David Zafman
447bf873a8
thrasher: Add noscrub_toggle_delay and flip the noscrub osd flags
...
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:39 -07:00
David Zafman
7a528763d1
thrasher: Add dump_ops_enable and optrack_toggle_delay options
...
Add dump_ops_enable which continuously dumps ops using 3 commands
Add optrack_toggle_delay to alternate op tracking enablement
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:38 -07:00
David Zafman
4ad3b86604
ceph_manager: Add timeout to admin_socket/osd_admin_socket
...
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-04-29 09:43:38 -07:00
Samuel Just
7e53203e80
rados/singleton-nomsgr: add lfn upgrade tests
...
Upgrade from hammer/infernalis to x and verify lfn objects are valid
across upgrade.
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
93892eb82a
ceph_manager: return exit status on do_get, do_put, do_rm
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
269d6002f1
ceph_manager: add do_rm
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
670ca43dfc
ceph_manager: extend do_put and do_get to allow a namespace
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
Samuel Just
c8f7694d52
ceph_manager: fix do_get to actually do a get
...
Currently unused.
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-04-07 15:35:30 -07:00
John Spray
53f4430d03
tasks/cephfs: further thrasher fixes
...
Move the thrasher-specific methods out of CephManager
into MDSThrasher and plumb them into MDSCluster.
Signed-off-by: John Spray <john.spray@redhat.com
2016-03-11 10:39:37 +00:00
Sage Weil
6deba7c649
tasks/ceph_manager: dump pgs if other peering timeouts expire
...
We were doing this for one of the recovery timeouts but not all of them.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-03-07 12:21:10 -05:00
Samuel Just
8cf25611fb
ceph_manager: use time before mon command for timeout
...
Slow mon commands can cause a false failure.
Signed-off-by: Samuel Just <sjust@redhat.com>
2016-02-19 12:28:36 -08:00
John Spray
d8106fa9e1
tasks: add run_ceph_w to CephManager
...
Analogous to raw_cluster_command, but instead
of calling blocking CLI command we're invoking
the -w mode.
Signed-off-by: John Spray <john.spray@redhat.com>
2016-01-05 18:58:00 +00:00
Samuel Just
89dcc0daf3
ceph_manager: do_pg_scrub: keep scrubbing until it's done
...
The ceph pg scrub ... command isn't really guarranteed to
start a scrub, keep reissuing it until the scrub actually
happens.
Related: #12746
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-11-19 15:07:38 -08:00
Sage Weil
f467a98a29
tasks/ceph_manager: %d -> %s
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-30 14:58:32 -04:00
Sage Weil
a53a80b9f0
tasks/ceph_manager: fix logging on failed pool property
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-30 09:24:38 -04:00
Samuel Just
4e9f1df514
rados: add test for 13234.yaml
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-09-29 21:19:10 -07:00
Sage Weil
0e2814d81e
tasks/ceph_manager: ignore failure getting pg_num
...
Otherwise, we may fail while racing with a workload that deletes a pool:
2015-09-23T15:01:52.855 INFO:tasks.workunit.client.1.vpm128.stdout:[ RUN ] LibRadosTwoPoolsPP.PromoteSnapTrimRace
2015-09-23T15:01:53.892 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw pg_num'
2015-09-23T15:01:54.206 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.gc pg_num'
2015-09-23T15:01:54.462 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users.uid pg_num'
2015-09-23T15:01:54.696 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users.email pg_num'
2015-09-23T15:01:55.006 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .users pg_num'
2015-09-23T15:01:55.296 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.buckets.index pg_num'
2015-09-23T15:01:55.523 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .log pg_num'
2015-09-23T15:01:55.752 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .usage pg_num'
2015-09-23T15:01:56.188 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get .rgw.buckets.extra pg_num'
2015-09-23T15:01:56.625 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get test-rados-api-vpm128-17360-6 pg_num'
2015-09-23T15:01:56.928 INFO:teuthology.orchestra.run.vpm176:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool get test-rados-api-vpm128-17360-13 pg_num'
2015-09-23T15:01:57.193 INFO:teuthology.orchestra.run.vpm176.stderr:Error ENOENT: unrecognized pool 'test-rados-api-vpm128-17360-13'
2015-09-23T15:01:57.206 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
...
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-24 12:19:07 -04:00
Sage Weil
dad981d339
tasks: sudo ceph for cli
...
/var/run/ceph is 770. This is mainly necessary for any
interaction with the daemon sockets, but it is what users do
and it may avoid log noise.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-11 12:15:01 -04:00
Sage Weil
a328e3e60d
tasks/ceph_manager: dump pgs when recover times out
...
It is really hard to map a stuck recovery back to the pgs that
are stuck. This will make it easy.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-09-08 08:59:49 -04:00
Sage Weil
c93fe1f1c6
tasks/ceph_manager: be silent about sending SIGHUPs
...
At the default interval this generates tons of log noise.
Signed-off-by: Sage Weil <sage@redhat.com>
2015-08-04 13:08:04 -04:00
Andrew Schoen
a3c9a763b1
ceph_manager: don't add an osd to live_osds until it's been revived
...
also waits to remove it from dead_osds. this fixes an issue where
do_sighup tries to send a signal to an osd that has not been revived
yet.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 17:05:31 -05:00
Andrew Schoen
84d24038aa
ceph_manager: adds a do_sighup method
...
This method runs in a separate greenlet than do_thrash and will pick a
random live osd to send a signal.SIGHUP to. There is a config option,
sighup_delay, which controls how long to delay between sending the
signals.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 14:46:12 -05:00
Andrew Schoen
ed73f67991
ceph_manager: adds a signal_osd method
...
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2015-07-28 14:13:30 -05:00
David Zafman
b255db820f
thrasher: Can't test ceph-objectstore-tool if nodes turned off (powercycle)
...
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-06-09 14:24:47 -07:00
Samuel Just
91b300d12c
rados/thrash: add test for radosgw with snaps
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-05-28 15:36:39 -07:00
Samuel Just
2a60852a1d
squash: ceph_manager: add utility_task doc string
2015-05-04 14:21:31 -07:00
Samuel Just
015ed70f8a
suites/rados: add test for 11429
...
This patch also adds some convenience facilities for making
some of the ceph_manager methods into tasks usable from a
yaml file.
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-05-04 11:53:54 -07:00
John Spray
0de712f42a
tasks/ceph_manager: DRY in mds_status
...
Signed-off-by: John Spray <john.spray@redhat.com>
2015-04-14 14:13:38 +01:00
John Spray
5c1071b103
ceph_manager: fix bad type assertions
...
In python, isinstance(foo, str) will fail if
a unicode string is passed in. The correct check
is basestring.
Signed-off-by: John Spray <john.spray@redhat.com>
2015-04-14 14:13:38 +01:00
Yuri Weinstein
581fcf192f
Merge pull request #380 from ceph/wip-11204
...
Make sure that ulimits are adjusted for ceph-objectstore-tool
2015-03-27 12:23:37 -07:00
Sage Weil
dcb5e8da9d
Merge remote-tracking branch 'gh/hammer'
...
Conflicts:
.gitignore
2015-03-26 17:09:33 -07:00
David Zafman
e6ce90fdb1
Make sure that ulimits are adjusted for ceph-objectstore-tool
...
Fixes : #11204
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-03-26 15:18:47 -07:00
David Zafman
6c5300552d
ceph_manager: Check for exit status 11 from ceph-objectstore-tool import
...
Fixes : #11139
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-03-20 21:25:41 -07:00
Alfredo Deza
4ed442e44c
stdin is no longer a kwarg
...
Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 49a61dc2d2
)
2015-02-26 14:48:40 -05:00
Alfredo Deza
33f7982480
add the log object to ceph_manager
...
Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit f7c1ca4a1e
)
2015-02-26 14:48:30 -05:00
Alfredo Deza
49a61dc2d2
stdin is no longer a kwarg
...
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2015-02-26 11:34:21 -05:00
Alfredo Deza
f7c1ca4a1e
add the log object to ceph_manager
...
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2015-02-26 11:33:47 -05:00
Kefu Chai
64de3cd574
Thrasher: log backtrace of thrown exception
...
* add a wrapper to log uncaught exception to self.logger, greenlet also
prints the backtrace and exception to stderr, but teuthology.log does
not capture stderr. so we need to catch them by ourselves to reveal
more info to root-cause this issue.
* log uncaught exception thrown by Thrasher.do_thrash() to self.log.
See: #10630
Signed-off-by: Kefu Chai <kchai@redhat.com>
2015-02-25 16:10:52 +08:00