RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-18 01:16:55 +00:00

Author	SHA1	Message	Date
John Spray	3e66de2182	mds: create purge queue if it's not found Signed-off-by: John Spray <john.spray@redhat.com>	2017-03-08 10:26:59 +00:00
John Spray	f826c7e8aa	qa/cephfs: add TestStrays.test_purge_on_shutdown ...and change test_migration_on_shutdown to specifically target non-purgeable strays (i.e. hardlink-ish things). Signed-off-by: John Spray <john.spray@redhat.com>	2017-03-08 10:26:55 +00:00
John Spray	3970502c9b	qa: update test_strays for purgequeue Signed-off-by: John Spray <john.spray@redhat.com>	2017-03-08 10:20:59 +00:00
Sage Weil	7fbe8fb085	Merge pull request #13759 from liewegas/wip-19133 osdc/Objecter: resend RWORDERED ops on full Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Greg Farnum <gfarnum@redhat.com>	2017-03-07 21:31:50 -06:00
Sage Weil	296708091c	qa/tasks/ceph_manager: use new luminous set-full-ratio etc Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-07 16:39:09 -05:00
Sage Weil	a202b68d18	qa/tasks/thrashosds: chance_thrash_cluster_full Induce a momentarily full cluster. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-07 13:33:44 -05:00
Radoslaw Zarzynski	6440750f53	qa/tasks/rgw.py: start Apache before RadosGW. At the end of start_rgw() we wait till establishing HTTP connections with RadosGW become possible. However, if RadosGW uses the FastCGI, the condition can't be fulfilled without spawning HTTP server first. Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>	2017-03-07 17:31:52 +01:00
John Spray	73100305e5	Merge pull request #13262 from batrick/multimds-thrasher Add multimds:thrash sub-suite and fix bugs in thrasher for multimds Reviewed-by: John Spray <john.spray@redhat.com>	2017-03-07 14:29:18 +00:00
John Spray	39204abeda	Merge pull request #13282 from jcsp/wip-fuse-mount-teardown tasks/cephfs: tear down on mount() failure Reviewed-by: Yan, Zheng <zyan@redhat.com>	2017-02-28 15:04:59 +00:00
Kefu Chai	edceabbd47	qa/tasks/workunit: use ceph.git as an alternative of ceph-ci.git for workunit repo if we run upgrade test, where, for example, "jewel" is not in ceph-ci.git repo, we should check ceph.git to clone the workunits. Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-27 17:36:05 +08:00
Sage Weil	af5dab0613	Merge pull request #13649 from liewegas/wip-ceph-scrub-debug qa/tasks/ceph.py: debug which pgs aren't scrubbing Reviewed-by: Brad Hubbard <bhubbard@redhat.com>	2017-02-25 13:15:06 -06:00
Sage Weil	f777d849e7	qa/tasks/ceph.py: debug which pgs aren't scrubbing Signed-off-by: Sage Weil <sage@redhat.com>	2017-02-24 23:07:34 -05:00
Samuel Just	44b26f6ab4	Merge pull request #13594 from athanatos/wip-snap-trim-sleep osd: add snap trim reservation and re-implement osd_snap_trim_sleep Reviewed-by: Josh Durgin <jdurgin@redhat.com>	2017-02-24 14:09:17 -08:00
Kefu Chai	4cf28de4c9	qa/tasks/workunit: use the suite repo for cloning workunit as "workunits" reside in ceph/qa/workunits, it's more intuitive to respect suite-repo option when cloning workunits. Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-24 16:47:47 +08:00
John Spray	de5249436c	Merge pull request #13359 from jcsp/wip-logrotate-sshexception qa: handle SSHException in logrotate Reviewed-by: Kefu Chai <kchai@redhat.com>	2017-02-22 10:05:07 +00:00
Kefu Chai	b3e516fc38	Merge pull request #13518 from tchaikov/wip-fix-pgp-num test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens Reviewed-by: Sage Weil <sage@redhat.com>	2017-02-21 00:46:26 +08:00
Kefu Chai	c0f0cde399	test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens we should not update pools_to_fix_pgp_num if the pool is not expanded or the pg_num is not increased due to pgs being created. this prevent us from fixing the pgp_num after done with thrashing if we actually did nothing when fixing the pgp_num when thrashing, but we removed the pool from pools_to_fix_pgp_num after set_pool_pgpnum() returns. Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-19 13:10:46 +08:00
Sage Weil	86c0d07e32	qa/tasks/ceph.py: fix timing of wait-for-* and osd markdown Mark down osds, then wait for them to come up or for the cluster to be healthy! Signed-off-by: Sage Weil <sage@redhat.com>	2017-02-18 21:12:23 -05:00
Sage Weil	96bc86b537	Revert "qa/tasks/workunit: use the suite repo for cloning workunit"	2017-02-17 11:54:27 -06:00
Kefu Chai	1f82b9b944	qa/tasks/workunit: use the suite repo for cloning workunit as "workunits" reside in ceph/qa/workunits, it's more intuitive to respect suite-repo option when cloning workunits. Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-16 15:05:51 +08:00
Samuel Just	4aebf59d90	rados: check that pool is done trimming before removing it Signed-off-by: Samuel Just <sjust@redhat.com>	2017-02-13 09:47:02 -08:00
Kefu Chai	de59b5102c	test: Thrasher: restore changed options after done with thrash Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-13 09:25:51 +08:00
Kefu Chai	761a1dc391	tests: Thrasher: extract _set_config() method Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-13 09:25:50 +08:00
Kefu Chai	995e144e3e	tests: CephManager: add get_config() method Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-13 09:25:50 +08:00
Kefu Chai	136483a8f9	test: Thrasher: update pgp_num of all expanded pools if not yet otherwise wait_until_healthy will fail after timeout as seeing warning like: HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172 Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-13 09:25:50 +08:00
John Spray	880cbf09aa	Merge pull request #13137 from jcsp/wip-18661 qa: fix race in Mount.open_background Reviewed-by: Yan, Zheng <zyan@redhat.com>	2017-02-10 17:48:05 +00:00
John Spray	a3fd3f225c	Merge pull request #13099 from jcsp/wip-18663 qa/tasks: force umount during kclient teardown	2017-02-10 17:42:37 +00:00
John Spray	6f9e11f03d	qa: handle SSHException in logrotate Yet another different type of exception we may get when orchestra.run can't talk to a remote host. Signed-off-by: John Spray <john.spray@redhat.com>	2017-02-10 17:16:24 +00:00
Nathan Cutler	6b7443fb50	tests: drop buildpackages.py The buildpackages suite has been moved to teuthology. This cleans up a file that was left behind by https://github.com/ceph/ceph/pull/13297 Fixes: http://tracker.ceph.com/issues/18846 Signed-off-by: Nathan Cutler <ncutler@suse.com>	2017-02-08 21:23:54 +01:00
Loic Dachary	5a43f8d579	buildpackages: remove because it does not belong It should live in teuthology, not in Ceph. And it is currently broken: there is no need to keep it around. Fixes: http://tracker.ceph.com/issues/18846 Signed-off-by: Loic Dachary <loic@dachary.org>	2017-02-07 18:37:26 +01:00
John Spray	6203f33df4	tasks/cephfs: tear down on mount() failure There were some cases where we would leave a mountpoint that would cause the teuthology teardown to get hung up when it tried to look inside cephtest/ Signed-off-by: John Spray <john.spray@redhat.com>	2017-02-06 22:53:21 +00:00
Patrick Donnelly	d748226f00	qa: add DaemonWatchdog to stop tests on failure Thrashing MDS will often result in failures which often do not stop the test. The failure may also cause the test to stall which will force the machines to needlessly be locked until a timeout is reached. This watchdog will unmount mounts and kill daemons when a failure is detected. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:14 -05:00
Patrick Donnelly	f005e8af6b	qa: disable max_mds changes during thrashing While the trasher supports the behavior desired by issue 10792 [1], the bugs uncovered due to deactivating MDS (and sometimes killing deactivating MDS) are presently a distraction from addressing issues during normal failures. So now thrashing max_mds is turned off by default. I have added a TODO to deactivate ranks in order (configurably) as random deactivation causes a lot of other problems. This also fixes a bug: random.randrange(0.0, 1.0) always returns 0. Oops. [1] http://tracker.ceph.com/issues/10792 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:14 -05:00
Patrick Donnelly	82662edd7f	qa: do not pretty the json to shorten stdout log Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:14 -05:00
Patrick Donnelly	a0052fc2d6	qa: use gevent.sleep so greenlet yields Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:14 -05:00
Patrick Donnelly	cf9e0da078	qa: use fs methods for setting configs Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	0098873fb7	qa: remove old comment Filesystem is now cluster aware. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	fd4b61890d	qa: allow revived MDS to be up:active Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	884215d933	qa: timeout waiting for thrashed MDS to revive Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	8e9ea7b6ac	qa: configure thrashing while MDS are stopping Currently multimds is prone to many failures when killing an active or stopping MDS when there are MDS in the cluster which have been deactivated (stopping). Have this turned off by default for now. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	6304b6ed5d	qa: add deactivation log message Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:13 -05:00
Patrick Donnelly	1185326c45	qa: avoid infinite wait if no repl. can be made The thrasher can enter an infinite loop waiting for an MDS to take a certain rank when a replacement may not be possible. For example, max_mds actives are already running. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:12 -05:00
Patrick Donnelly	638bccb2bb	qa: timeout thrasher if fs does not stabilize After 5 minutes of waiting, it's reasonable to stop as the cluster is probably stuck. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:12 -05:00
Patrick Donnelly	8f3e745344	qa: check replacement MDS is active in thrasher Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:12 -05:00
Patrick Donnelly	19289725c8	qa: handle thrashing ranks with holes During the course of thrashing max_mds, the ranks assigned to MDSs may develop holes. This causes the thrasher to try to wrongly deactivate ranks that are not assigned. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2017-02-06 14:07:12 -05:00
Nathan Cutler	db2582e25e	tests: fix regression in qa/tasks/ceph_master.py https://github.com/ceph/ceph/pull/13194 introduced a regression: 2017-02-06T16:14:23.162 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last): File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 722, in wrapper return func(self) File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 839, in do_thrash self.choose_action()() File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 305, in kill_osd output = proc.stderr.getvalue() AttributeError: 'NoneType' object has no attribute 'getvalue' This is because the original patch failed to pass "stderr=StringIO()" to run(). Fixes: http://tracker.ceph.com/issues/16263 Signed-off-by: Nathan Cutler <ncutler@suse.com> Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-02-06 19:37:38 +01:00
Sage Weil	5fc3dd36e2	Merge pull request #13237 from smithfarm/wip-18799 tests: Thrasher: eliminate a race between kill_osd and __init__ Reviewed-by: Sage Weil <sage@redhat.com>	2017-02-05 12:49:30 -06:00
Josh Durgin	21cdcfcc66	Merge pull request #13194 from smithfarm/wip-16263 tests: ignore bogus ceph-objectstore-tool error in ceph_manager Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: David Zafman <dzafman@redhat.com>	2017-02-02 15:31:29 -08:00
Nathan Cutler	b519d38fb1	tests: Thrasher: eliminate a race between kill_osd and __init__ If Thrasher.__init__() spawns the do_thrash thread before initializing the ceph_objectstore_tool property, do_thrash races with the rest of Thrasher.__init__() and in some cases do_thrash can call kill_osd() before Trasher.__init__() progresses much further. This can lead to an exception ("AttributeError: Thrasher instance has no attribute 'ceph_objectstore_tool'") being thrown in kill_osd(). This commit eliminates the race by making sure the ceph_objectstore_tool attribute is initialized before the do_thrash thread is spawned. Fixes: http://tracker.ceph.com/issues/18799 Signed-off-by: Nathan Cutler <ncutler@suse.com>	2017-02-02 23:23:54 +01:00
John Spray	3c9f16d8ab	tasks/kclient: apply timeout to umount The umount process can get stuck, in which case we want to fail the test rather than waiting around for it. During teardown of the kclient task catch this timeout explicitly so that we will powercycle the node if needed. Signed-off-by: John Spray <john.spray@redhat.com>	2017-02-02 15:09:48 +00:00

1 2

93 Commits