RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-24 20:33:27 +00:00

Author	SHA1	Message	Date
Patrick Donnelly	d748226f00	qa: add DaemonWatchdog to stop tests on failure Thrashing MDS will often result in failures which often do not stop the test. The failure may also cause the test to stall which will force the machines to needlessly be locked until a timeout is reached. This watchdog will unmount mounts and kill daemons when a failure is detected. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:14 -05:00
Patrick Donnelly	f005e8af6b	qa: disable max_mds changes during thrashing While the trasher supports the behavior desired by issue 10792 [1], the bugs uncovered due to deactivating MDS (and sometimes killing deactivating MDS) are presently a distraction from addressing issues during normal failures. So now thrashing max_mds is turned off by default. I have added a TODO to deactivate ranks in order (configurably) as random deactivation causes a lot of other problems. This also fixes a bug: random.randrange(0.0, 1.0) always returns 0. Oops. [1] http://tracker.ceph.com/issues/10792 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:14 -05:00
Patrick Donnelly	a0052fc2d6	qa: use gevent.sleep so greenlet yields Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:14 -05:00
Patrick Donnelly	fd4b61890d	qa: allow revived MDS to be up:active Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	884215d933	qa: timeout waiting for thrashed MDS to revive Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	8e9ea7b6ac	qa: configure thrashing while MDS are stopping Currently multimds is prone to many failures when killing an active or stopping MDS when there are MDS in the cluster which have been deactivated (stopping). Have this turned off by default for now. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	6304b6ed5d	qa: add deactivation log message Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	1185326c45	qa: avoid infinite wait if no repl. can be made The thrasher can enter an infinite loop waiting for an MDS to take a certain rank when a replacement may not be possible. For example, max_mds actives are already running. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:12 -05:00
Patrick Donnelly	638bccb2bb	qa: timeout thrasher if fs does not stabilize After 5 minutes of waiting, it's reasonable to stop as the cluster is probably stuck. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:12 -05:00
Patrick Donnelly	8f3e745344	qa: check replacement MDS is active in thrasher Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:12 -05:00
Patrick Donnelly	19289725c8	qa: handle thrashing ranks with holes During the course of thrashing max_mds, the ranks assigned to MDSs may develop holes. This causes the thrasher to try to wrongly deactivate ranks that are not assigned. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:12 -05:00
Sage Weil	c01f2ee0e2	move ceph-qa-suite dirs into qa/	2016-12-14 11:29:55 -06:00

12 Commits