* use list comprehension instead of concatenating two ranges for
better readablity -- we want to skip current max_mds for changing
it. this helps reader to understand the goal of thrashing
* random.sample() is replaced with random.choice(). the latter is a
better alternative, if the number of samples is 1.
Signed-off-by: Kefu Chai <kchai@redhat.com>
instead of using filter(), use `sum()` for counting its
length, as in Python3, `filter()` actually returns a `filter` object
instead of a list.
in this change, `filter()` calls are replaced with `sum()`
for Python3 compatibility.
Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
* Dropped name setter and property from Thrasher base class
* Updated each Thrasher class with a name attribute
Signed-off-by: Jos Collin <jcollin@redhat.com>
* refs/pull/28378/head:
qa/tasks: introduce Thrasher base class
qa/tasks: Fix typo
qa/tasks: manage thrashers
qa/tasks: start DaemonWatchdog when ceph starts
qa/tasks: make watch and bark handle more daemons
qa/tasks: move DaemonWatchdog to new file
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* Introduced a Thrasher base class.
* Updated thrashers to inherit from Thrasher.
* Replaced the magic variable e with Thrasher.exception as per the discussion.
Now the exception variable sets by default as the thrashers are inheriting
from the Thrasher class.
Fixes: https://github.com/ceph/ceph/pull/28378#discussion_r309337928
Fixes: https://tracker.ceph.com/issues/41133
Signed-off-by: Jos Collin <jcollin@redhat.com>
* Added daemons to thrashers
* Join the mds thrasher, as the other thrashers did
Fixes: http://tracker.ceph.com/issues/10369
Signed-off-by: Jos Collin <jcollin@redhat.com>
* Start DaemonWatchdog when ceph starts
* Drop the DaemonWatchdog starting in mds_thrash.py
* Bring the thrashers in mds_thrash.py into the context
Fixes: http://tracker.ceph.com/issues/10369
Signed-off-by: Jos Collin <jcollin@redhat.com>
Current monitor only allows deactivating one mds at a time. Besides,
the mds to deactivate should have max rank id.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Thrashing MDS will often result in failures which often do not stop the
test. The failure may also cause the test to stall which will force the
machines to needlessly be locked until a timeout is reached. This
watchdog will unmount mounts and kill daemons when a failure is
detected.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
While the trasher supports the behavior desired by issue 10792 [1], the
bugs uncovered due to deactivating MDS (and sometimes killing
deactivating MDS) are presently a distraction from addressing issues
during normal failures. So now thrashing max_mds is turned off by
default. I have added a TODO to deactivate ranks in order (configurably)
as random deactivation causes a lot of other problems.
This also fixes a bug: random.randrange(0.0, 1.0) always returns 0.
Oops.
[1] http://tracker.ceph.com/issues/10792
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Currently multimds is prone to many failures when killing an active or
stopping MDS when there are MDS in the cluster which have been
deactivated (stopping). Have this turned off by default for now.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The thrasher can enter an infinite loop waiting for an MDS to take a
certain rank when a replacement may not be possible. For example,
max_mds actives are already running.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
During the course of thrashing max_mds, the ranks assigned to MDSs may
develop holes. This causes the thrasher to try to wrongly deactivate
ranks that are not assigned.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>