This commit amends the MDS thrasher task to also work on multimds
clusters. Main changes:
o New FSStatus class in tasks/cephfs/filesystem.py which gets a snapshot
of the fsmap (`ceph fs dump`). This allows consecutive operations on
the same fsmap without repeated fs dumps.
o Only one MDSThrasher is started for each file system.
o The MDSThrasher operates on ranks instead of names (and groups of
standbys following the initial active).
o The MDSThrasher also will change the max_mds for the cluster to a new
value [1, current) or (current, starting max_mds]. When reduced,
randomly selected MDSs other than rank 0 will be deactivated to reach
the new max_mds. The likelihood of changing max_mds in a given cycle of
the MDSThrasher is set by the "thrash_max_mds" config.
o The MDSThrasher prints out stats on completion, e.g. number of
mds deactivated or mds_max changed.
Pre-requisite for: http://tracker.ceph.com/issues/10792
Partially fixes: http://tracker.ceph.com/issues/15134
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
A more generic CephTestCase and CephCluster, for
writeing non-cephfs test cases.
This avoids overloading one class with the functionality
needed by lots of different subsystems.
Signed-off-by: John Spray <john.spray@redhat.com>
For the case where we have a daemon come up that
wants to be a standby replay, but someone else
is already following the target, so he has to
be just a regular standy instead.
Signed-off-by: John Spray <john.spray@redhat.com>
While Filesystem at large requires the new commands, for
use from the `ceph` task we must support old style commands,
as the ceph task is used to instantiate old clusters during
upgrade testing.
Fixes: #15124, #15049, #15106
Signed-off-by: John Spray <john.spray@redhat.com>
Move the thrasher-specific methods out of CephManager
into MDSThrasher and plumb them into MDSCluster.
Signed-off-by: John Spray <john.spray@redhat.com
A quick check that clients refuse to mount
when daemons are laggy, and while we're at it,
that the basics of failover work. It's a trivial
test, but it's nice to have this kind of thing
so that we don't have to wait for weird thrasher
failures if something breaks.
Signed-off-by: John Spray <john.spray@redhat.com>
It was trying to get the output file from
a different remote than the one used to
run the journal tool.
Signed-off-by: John Spray <john.spray@redhat.com>
In teuthology this isn't needed because we join the
mds child processes after killing them. In vstart
we're killing them asynchronously, so be a bit more
careful to ensure they can't re-insert themselves
to the mdsmap between our calling fail and our calling
fs rm.
Signed-off-by: John Spray <john.spray@redhat.com>
Use this during test setup to check whether
a filesystem is configured at all, before
trying to tear it down.
Signed-off-by: John Spray <john.spray@redhat.com>
So that my vstart subclass can put ./ before
all the commands.
One could set $PATH, but I like to unambiguously point
it at the local built binaries in case someone also
has some systemwide packages.
Signed-off-by: John Spray <john.spray@redhat.com>
This is for verifying the new layout-writing behaviour. While
we're at it, test that the pre-existing backtrace behaviours
are really happening (updating old_pools)
Signed-off-by: John Spray <john.spray@redhat.com>
Run the same procedure as TestClusterFull, but
instead of limiting OSD memstore size, use pool
quota on the data pool.
Signed-off-by: John Spray <john.spray@redhat.com>
... s/mon_remote/admin_remote/ and allow caller to pass
in which remote they want to use for that. Enables use
with ceph_deploy task which does not give admin keys
to mons.
Signed-off-by: John Spray <john.spray@redhat.com>
This tests the new purge file/ops throttling
in the MDS, via the new perf counters for
strays/purging.
Fixes: #10390
Signed-off-by: John Spray <john.spray@redhat.com>
...as long as only one is active, all the ops
that default to talking to a single MDS should
be happy to talk to the active MDS, even if there
happens to be a standby lying around too.
Signed-off-by: John Spray <john.spray@redhat.com>
Where multiple MDSs were on the same node, trying
to concurrently update their firewall state was
causing an exception because the iptables command
errors out if another instance is already running.
Fixes: #10948
Signed-off-by: John Spray <john.spray@redhat.com>
This tests the new #9883 repair functionality
where we selectively scrape dentries out of
the journal while the MDS is offline.
Signed-off-by: John Spray <john.spray@redhat.com>
This was only used in get_first_mon, which doesn't actually
need the parameter itself. Makes it easier to casually
use Filesystem from any place with a ctx to hand.
Signed-off-by: John Spray <john.spray@redhat.com>