Commit Graph

24 Commits

Author SHA1 Message Date
Jos Collin
f13f9f9fc1
qa/tasks: drop object inherit
Signed-off-by: Jos Collin <jcollin@redhat.com>
2019-08-23 15:29:27 +05:30
Patrick Donnelly
dad94db7ae
Merge PR #28378 into master
* refs/pull/28378/head:
	qa/tasks: introduce Thrasher base class
	qa/tasks: Fix typo
	qa/tasks: manage thrashers
	qa/tasks: start DaemonWatchdog when ceph starts
	qa/tasks: make watch and bark handle more daemons
	qa/tasks: move DaemonWatchdog to new file

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-08-21 10:57:15 -07:00
Jos Collin
f31791e35d
qa/tasks: introduce Thrasher base class
* Introduced a Thrasher base class.
* Updated thrashers to inherit from Thrasher.
* Replaced the magic variable e with Thrasher.exception as per the discussion.
  Now the exception variable sets by default as the thrashers are inheriting
  from the Thrasher class.

Fixes: https://github.com/ceph/ceph/pull/28378#discussion_r309337928
Fixes: https://tracker.ceph.com/issues/41133
Signed-off-by: Jos Collin <jcollin@redhat.com>
2019-08-21 10:49:46 +05:30
Jos Collin
51d851815e
qa/tasks: fixed typo in the comment
Signed-off-by: Jos Collin <jcollin@redhat.com>
2019-08-20 15:31:07 +05:30
Jos Collin
3f13a355c7
qa/tasks: manage thrashers
* Added daemons to thrashers
* Join the mds thrasher, as the other thrashers did

Fixes: http://tracker.ceph.com/issues/10369
Signed-off-by: Jos Collin <jcollin@redhat.com>
2019-08-06 06:36:39 +05:30
Jos Collin
08b99eef27
qa/tasks: start DaemonWatchdog when ceph starts
* Start DaemonWatchdog when ceph starts
* Drop the DaemonWatchdog starting in mds_thrash.py
* Bring the thrashers in mds_thrash.py into the context

Fixes: http://tracker.ceph.com/issues/10369
Signed-off-by: Jos Collin <jcollin@redhat.com>
2019-08-06 06:36:33 +05:30
Jos Collin
b7a1f5ca6c
qa/tasks: move DaemonWatchdog to new file
* Moved DaemonWatchdog class to a new file daemonwatchdog.py
* Dropped the client watch

Signed-off-by: Jos Collin <jcollin@redhat.com>
2019-08-06 06:36:11 +05:30
Patrick Donnelly
8cbdad9f9b
qa: update testing for standby-replay
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-02-27 21:39:12 -08:00
Patrick Donnelly
1dc5b62557
qa: mds_thrash updates for new max_mds behavior
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-04-17 11:26:56 -07:00
Shengjing Zhu
2cbba835aa misc: fix various spelling errors
Signed-off-by: Shengjing Zhu <i@zhsj.me>
2018-03-10 23:39:20 +08:00
Patrick Donnelly
a84e3c89bf
qa: thrash max_mds and deactivate ranks
Fixes: http://tracker.ceph.com/issues/10792

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-06 22:29:41 -07:00
Yan, Zheng
8d1828dc60 qa: update thrash max mds testing
Current monitor only allows deactivating one mds at a time. Besides,
the mds to deactivate should have max rank id.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-06-27 22:08:26 +08:00
Patrick Donnelly
d748226f00
qa: add DaemonWatchdog to stop tests on failure
Thrashing MDS will often result in failures which often do not stop the
test. The failure may also cause the test to stall which will force the
machines to needlessly be locked until a timeout is reached. This
watchdog will unmount mounts and kill daemons when a failure is
detected.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
f005e8af6b
qa: disable max_mds changes during thrashing
While the trasher supports the behavior desired by issue 10792 [1], the
bugs uncovered due to deactivating MDS (and sometimes killing
deactivating MDS) are presently a distraction from addressing issues
during normal failures. So now thrashing max_mds is turned off by
default. I have added a TODO to deactivate ranks in order (configurably)
as random deactivation causes a lot of other problems.

This also fixes a bug: random.randrange(0.0, 1.0) always returns 0.
Oops.

[1] http://tracker.ceph.com/issues/10792

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
a0052fc2d6
qa: use gevent.sleep so greenlet yields
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
fd4b61890d
qa: allow revived MDS to be up:active
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
884215d933
qa: timeout waiting for thrashed MDS to revive
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
8e9ea7b6ac
qa: configure thrashing while MDS are stopping
Currently multimds is prone to many failures when killing an active or
stopping MDS when there are MDS in the cluster which have been
deactivated (stopping). Have this turned off by default for now.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
6304b6ed5d
qa: add deactivation log message
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
1185326c45
qa: avoid infinite wait if no repl. can be made
The thrasher can enter an infinite loop waiting for an MDS to take a
certain rank when a replacement may not be possible. For example,
max_mds actives are already running.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00
Patrick Donnelly
638bccb2bb
qa: timeout thrasher if fs does not stabilize
After 5 minutes of waiting, it's reasonable to stop as the cluster is
probably stuck.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00
Patrick Donnelly
8f3e745344
qa: check replacement MDS is active in thrasher
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00
Patrick Donnelly
19289725c8
qa: handle thrashing ranks with holes
During the course of thrashing max_mds, the ranks assigned to MDSs may
develop holes. This causes the thrasher to try to wrongly deactivate
ranks that are not assigned.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00
Sage Weil
c01f2ee0e2 move ceph-qa-suite dirs into qa/ 2016-12-14 11:29:55 -06:00