This commit amends the MDS thrasher task to also work on multimds
clusters. Main changes:
o New FSStatus class in tasks/cephfs/filesystem.py which gets a snapshot
of the fsmap (`ceph fs dump`). This allows consecutive operations on
the same fsmap without repeated fs dumps.
o Only one MDSThrasher is started for each file system.
o The MDSThrasher operates on ranks instead of names (and groups of
standbys following the initial active).
o The MDSThrasher also will change the max_mds for the cluster to a new
value [1, current) or (current, starting max_mds]. When reduced,
randomly selected MDSs other than rank 0 will be deactivated to reach
the new max_mds. The likelihood of changing max_mds in a given cycle of
the MDSThrasher is set by the "thrash_max_mds" config.
o The MDSThrasher prints out stats on completion, e.g. number of
mds deactivated or mds_max changed.
Pre-requisite for: http://tracker.ceph.com/issues/10792
Partially fixes: http://tracker.ceph.com/issues/15134
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This was only used in this task, and it is much too
ceph-specific to belong in teuthology.
Fixes: http://tracker.ceph.com/issues/17614
Signed-off-by: John Spray <john.spray@redhat.com>
Previously, if errors occurred during healthy(), then
the finally block would invoke osd_scrub_pgs, which relies
on CephManager being constructed, and it would die, hiding
the original exception.
Signed-off-by: John Spray <john.spray@redhat.com>
ceph.restart should mark restarted osds down in order to avoid a
race condition with ceph_manager.wait_for_clean
Fixes: http://tracker.ceph.com/issues/15778
Signed-off-by: Warren Usui <wusui@redhat.com>
These setup and parse logs on all hosts, so they should be run only
for the first cluster setup. This cluster will be torn down last, so
the cleanup happens after all clusters are shutdown as well.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Add --cluster arguments, pass cluster to get_daemon() and
iter_daemons_of_role, replace 'ceph' with cluster in paths, and use
ctx.ceph[cluster] instead of ctx.ceph.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
ceph.healthy may be used as a standalone task, so it may not always
have the cluster name in its configuration.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Note that cephfs tests using the Filesystem abstractions will need to
be converted to understand multiple clusters later. This just updates
the ceph task portion.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Add a cluster option to the ceph task, and pass that through to
cluster(). Make sure monitors and clients don't collide by adding
their cluster to paths they use.
This assumes there is one ceph task per cluster, and osds from
multiple clusters do not share hosts (or else the block device
assignment won't work).
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This is the correct implementation of 685d76a77c,
merged while broken in ff1655cb57 and
reverted in 4cccde634f.
Signed-off-by: John Spray <john.spray@redhat.com>
This reverts commit ff1655cb57, reversing
changes made to 2b25080d4f.
Since we haven't actually started the MDS daemons yet, this code is broken.
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
/var/run/ceph is 770. This is mainly necessary for any
interaction with the daemon sockets, but it is what users do
and it may avoid log noise.
Signed-off-by: Sage Weil <sage@redhat.com>
Pass -f by default to btrfs instead of first trying without and *then*
trying with.
Among other things, this avoids a confusing failure where we try mkfs.ext4
device (no -f), fail for some reason, and then try again with -f and get
a usage error (-f does not mean force for mke2fs).
Signed-off-by: Sage Weil <sage@redhat.com>