RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-29 22:43:40 +00:00

Author	SHA1	Message	Date
Sage Weil	22b1f17f78	ls: another newline	2012-04-10 08:59:47 -07:00
Sage Weil	7757fbb9bd	ls: remote stray newline	2012-04-10 08:57:19 -07:00
Dan Mick	9906d5ed08	Change to local mirror of linux-firmware repo to try to stop failures	2012-04-09 16:58:59 -07:00
Mark Nelson	3d7f1db731	Kernel: Pull linux-firmware from git Signed-off-by: Mark Nelson <nhm@clusterfaq.org>	2012-04-05 08:49:19 -07:00
Mark Nelson	1836d4672f	Added assertion to check that targets > roles Signed-off-by: Mark Nelson <mark.nelson@dreamhost.com>	2012-04-03 15:56:51 -07:00
Sage Weil	952940272b	nuke: don't run umount when no xargs args Gets rid of this noise: INFO:teuthology.nuke:Unmount any osd data directories... INFO:teuthology.orchestra.run.err:Usage: umount -h \| -V INFO:teuthology.orchestra.run.err: umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts] INFO:teuthology.orchestra.run.err: umount [-d] [-f] [-r] [-n] [-v] special \| node... INFO:teuthology.orchestra.run.err:Usage: umount -h \| -V INFO:teuthology.orchestra.run.err: umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts] INFO:teuthology.orchestra.run.err: umount [-d] [-f] [-r] [-n] [-v] special \| node... ...	2012-04-03 15:56:36 -07:00
Sage Weil	9a69c3f319	ceph.conf: enable 'osd recover clone overlap' to test the recovery cloning in qa. this was redone, but forgot to enable it in qa.	2012-03-30 16:15:34 -07:00
Samuel Just	b4aa098f47	make Thrasher not inherit from Greenlet	2012-03-29 18:08:19 -07:00
Samuel Just	394d8b1ebd	Add test for object source marked down	2012-03-29 18:08:19 -07:00
Samuel Just	749826c29b	allow use of a separate journal block device	2012-03-27 17:18:44 -07:00
Josh Durgin	e30b7710f5	rbd: fix typo in default config pyflakes would have caught this if 'all' weren't a built-in function	2012-03-26 11:57:07 -07:00
Sage Weil	397e7f2f7b	add osd_recovery task to test divergent osd logs	2012-03-24 21:09:19 -07:00
Sage Weil	1c1192a9fb	backfill: use 'rbd' pool instead of 'data' (data has a replay interval, which makes writes take longer to resume after repeering)	2012-03-24 21:09:19 -07:00
Sage Weil	ca9a5a4ac4	rename backfill -> osd_backfill	2012-03-24 16:05:11 -07:00
Sage Weil	22e808746f	put filestore xattr option in [global] ...for test_filestore_idempotent's benefit	2012-03-24 15:36:08 -07:00
Josh Durgin	6f0f250b26	suite: add missing print statement	2012-03-21 12:00:55 -07:00
Josh Durgin	8a9a567067	suite: fix print statement when summary doesn't exist	2012-03-21 11:58:17 -07:00
Samuel Just	91c08f6eee	Add watch op to rados.py Signed-off-by: Samuel Just <sam.just@dreamhost.com>	2012-03-20 19:00:12 -07:00
Josh Durgin	815fc3e2f6	suite: failed runs might not have durations This was one cause of emails not being sent - stale /tmp/cephtest dirs fail without recording a duration.	2012-03-20 07:50:08 -07:00
Josh Durgin	a65d4136e5	suite, coverage: use absolute dirs for isdir checks This fixes the results to wait for all jobs to complete again.	2012-03-19 14:16:14 -07:00
Josh Durgin	bdb72c282f	filestore_idempotent: get coverage and coredumps	2012-03-19 11:57:02 -07:00
Josh Durgin	6c8db1a807	suite: more results logging	2012-03-19 11:31:33 -07:00
Sage Weil	7173a8afb6	ceph.conf: no comment	2012-03-18 11:56:18 -07:00
Sage Weil	7de798f6fa	ceph.conf: set 'filestore xattr use omap = true'	2012-03-18 11:06:05 -07:00
Sage Weil	7d2e1056fd	fix teuthology-ls isdir check	2012-03-18 10:50:17 -07:00
Sage Weil	94f0ba1efe	run valgrind with cwd set to /tmp/cephtest/archive/coredump This lets us capture the vgcore.* files, which always go to valgrind's cwd. Fixes: #1953	2012-03-18 10:48:51 -07:00
Josh Durgin	07b97fe77f	suite: log results and coverage generation Need to figure out where and when results emails are failing.	2012-03-16 11:44:13 -07:00
Josh Durgin	8fbd087d6b	results: make sure email is sent before anything else fails	2012-03-15 17:34:19 -07:00
Mark Nelson	e14d428c98	Merge branch 'master' of github.com:ceph/teuthology	2012-03-14 15:32:23 -05:00
Sage Weil	5c9acbd897	gitbuilder: put flavor last in case we refine the field later	2012-03-13 10:09:18 -07:00
Sage Weil	1a01ccaafb	Pull from new gitbuilder.ceph.com locations. Simplifies the flavor stuff into a tuple of <package,type,flavor,dist,arch> where package is ceph, kenrel, etc. type is tarball, deb flavor is basic, gcov, notcmalloc arch is x86_64, i686 (uname -m) dist is oneiric, etc. (lsb_release -s -c)	2012-03-13 10:02:26 -07:00
Mark Nelson	3833ada8b9	Made the example better with multiple roles.	2012-03-12 15:13:36 -05:00
Mark Nelson	0a61ffad4c	Added some example yaml files and an example parallel execution task.	2012-03-12 14:33:10 -05:00
Sage Weil	008cf7fd95	autotest: pull from github.com/ceph/autotest	2012-03-10 19:15:21 -08:00
Sage Weil	2124129e70	workunit: include python2.7 path too	2012-03-10 15:34:19 -08:00
Samuel Just	ddc1ab0c03	rados.py: include setattr and rmattr Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2012-03-08 16:14:44 -08:00
Mark Nelson	31762c0003	lock: Improved logging when there aren't enough nodes available to lock-many.	2012-03-07 12:55:54 -08:00
Mark Nelson	05a07dda7d	lock: Added a --locked flag to teuthology-lock. Can be used to restrict searches based on lock status, e.g. 'teuthology-lock --list -a --locked false --status up' shows available nodes.	2012-03-07 12:55:33 -08:00
Sage Weil	2a18c3e1d0	nuke: unmount osd data directories This helps us avoid reboot to clean up osd data directories that are left mounted.	2012-03-06 09:34:38 -08:00
Josh Durgin	1493674735	Use non-zero exit status if any tests failed Fixes: #1989	2012-03-05 13:34:33 -08:00
Sage Weil	dc1abab211	github.com/NewDreamNetwork -> github.com/ceph	2012-03-02 10:55:56 -08:00
Josh Durgin	a80246c17f	dump_stuck: note required ceph configuration	2012-02-29 15:47:17 -08:00
Josh Durgin	85cc96c11a	dump_stuck: verify that 'ceph health' mentions the right number of inactive/unclean/stale pgs	2012-02-28 13:55:46 -08:00
Sage Weil	999e21928c	peer: ignore +scrubbing portion of pg state It can cause the mon state and osd states to not match.	2012-02-28 09:50:29 -08:00
Sage Weil	84cd4ed6c3	peer: wait for peering to complete, or block We need to wait for peering to either complete, or block because it is waiting for another PG. _Then_ look at all the PG states and compare the mon values with what we get from qeurying the OSDs directly.	2012-02-25 21:05:00 -08:00
Josh Durgin	b8739585a0	peer: remove unused variable	2012-02-24 15:01:34 -08:00
Josh Durgin	62bda12711	misc: always return a usable result from get_valgrind_args	2012-02-24 14:56:43 -08:00
Josh Durgin	e4801819f2	rgw: simplify valgrind args	2012-02-24 14:56:42 -08:00
Sage Weil	edbb41e1f8	add peer task Force a pg to get stuck in 'down' state, verify we can query the peering state, then start the OSD so it can recover.	2012-02-24 15:05:17 -08:00
Sage Weil	7ac04a422a	lost_unfound: list missing/unfound for each pg and verify the unfound counts This also tests the pg list_missing functionality.	2012-02-24 12:42:39 -08:00
Sage Weil	c43e87d118	ceph_manager: list_pg_missing List missing objects for the given pgid.	2012-02-24 12:42:39 -08:00
Josh Durgin	c93a08eda0	Whitespace and unnecessary formatting fixes	2012-02-24 12:05:35 -08:00
Josh Durgin	3bfb8d696e	ceph, ceph-fuse: simplify valgrind argument additions	2012-02-24 12:05:35 -08:00
Sage Weil	9ec047226f	refactor all valgrind users to use a get_valgrind_args() helper This avoids much annoying, duplicated code.	2012-02-24 12:05:35 -08:00
Sage Weil	90fdc84086	ceph: always create valgrind logs dir Other tasks use it too. It's more annoying to conditionally create it.	2012-02-24 12:05:35 -08:00
Sage Weil	7af6e46c94	ceph: always try to process valgrind logs Check for errors in valgrind logs even if there is no valgrind option the ceph task config stanza. Other tasks can run via valgrind (ceph-fuse, rgw). If the logs aren't there, this is harmless.	2012-02-24 12:05:35 -08:00
Sage Weil	e2ea73d1a5	rgw: add valgrind support tasks: - ceph: - rgw: client.a: valgrind: [--tool=memcheck]	2012-02-24 12:05:35 -08:00
Sage Weil	7bf64b73ee	rgw: accept dict e.g., tasks: ... - rgw: client.0: client.1:	2012-02-24 12:05:35 -08:00
Sage Weil	d40a9b275f	lost_unfound: new mark_unfound_lost syntax	2012-02-23 20:09:09 -08:00
Josh Durgin	81a46c462a	dump_stuck: flush stats before waiting for recovery/clean	2012-02-23 17:07:26 -08:00
Josh Durgin	995dc1f751	Add a task for testing stuck pg visibility.	2012-02-21 15:12:48 -08:00
Josh Durgin	2a1c74c5f5	Move duration calculation to an internal task This excludes all generic start up costs, like waiting for locks, rebooting into a new kernel, etc.	2012-02-21 15:12:26 -08:00
Josh Durgin	eb434a507a	Add necessary imports for s3 tasks, and keep them alphabetical.	2012-02-21 15:04:00 -08:00
Yehuda Sadeh	11073e505f	s3roundtrip, s3readwrite: access key uses url safe chars Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>	2012-02-21 12:23:38 -08:00
Yehuda Sadeh	6e1b3a5644	rgw: access key uses url safe chars Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>	2012-02-21 12:12:03 -08:00
Sage Weil	c5688e6570	ceph: valgrind trumps coverage when picking a flavor valgrind will crash if we don't use notcmalloc; coverage will silently fail to collect coverage info.	2012-02-20 15:17:52 -08:00
Sage Weil	5216d3c7a9	ceph.conf: no lockdep by default	2012-02-20 14:54:10 -08:00
Sage Weil	5f9445c88b	suite.results: include test duration in output	2012-02-20 13:38:06 -08:00
Sage Weil	71d0d97a97	cfuse -> ceph-fuse	2012-02-20 07:12:53 -08:00
Sage Weil	7ff9f044e7	ceph: allow valgrind per-type (not just per-name)	2012-02-20 07:04:45 -08:00
Sage Weil	eb93fa744d	lost_unfound: mark osds in when we revive them so that we test what we meant to. It also lets us actually go clean at the very end.	2012-02-19 19:40:45 -08:00
Sage Weil	45b6189b7d	ceph_manager: ignore stale states when counting also remove assumptions about ordering of states	2012-02-18 14:44:53 -08:00
Sage Weil	196d4a1f16	wait_till_clean -> wait_for_clean and wait_for_recovery Clean now also means the correct number of replicas, whereas recovered means we have done all the work we can do given the replicas/osds we have. For example, degraded and clean are now mutually exclusive. Also move away from 'till'.	2012-02-17 21:53:25 -08:00
Sage Weil	ad9d7fb6e1	backfill: wait for clean before writing+blackholing If we have straggler pgs and blackhole osd.1, we can deadlock because we need info from that osd to repeer and continue. Make sure we're clean, and then start the write + blackhole + kill test.	2012-02-14 15:24:11 -08:00
Sage Weil	50cc60f02d	nuke: nuke testrados too Slightly fewer nuke -r's	2012-02-14 15:23:19 -08:00
Sage Weil	6f3abc6ced	ceph_manager: mark in a bit more often than out Otherwise we can get into cases where many/most nodes are out, and things don't work as well. e.g., crush may start to fail.	2012-02-13 15:28:24 -08:00
Sage Weil	af4ce44233	ceph: use any fs, not just btrfs, on scratch devices The btrfs: true syntax is replaced with fs: btrfs or ext4, xfs.	2012-02-13 15:28:24 -08:00
Sage Weil	975d73a2bb	nuke: nuke testrados and rados processes, too So that -r is needed slightly less often.	2012-02-13 15:28:24 -08:00
Sage Weil	46b612efa4	misc: make get_scratch_devices look for (almost) any disk that's not mounted	2012-02-13 15:28:24 -08:00
Josh Durgin	0cd16cf03d	ceph: always add logger for daemons The extra log function added redundant info and didn't allow different levels.	2012-02-02 09:36:04 -08:00
Josh Durgin	7af7c66bd0	ceph: rename type parameter to type_ type is a built-in and shouldn't be aliased.	2012-02-02 09:35:58 -08:00
Josh Durgin	7146db9215	ceph: use the correct comparison operator is compares identity (i.e. address in cpython), not value.	2012-02-02 09:27:04 -08:00
Josh Durgin	e7672b6433	ceph: sync before unmounting btrfs devices There may still be writes in flight, since the osds may not have shutdown cleanly. This should prevent EBUSY when unmounting. Fixes: #1997	2012-02-02 09:26:45 -08:00
Josh Durgin	1364b8826f	ceph: delay raising exceptions until all daemons are stopped If a daemon crashes, the exception is raised when we stop it. This caused some daemons to continue running during cleanup, since the rest of the daemons of the same type would not be shut down. Also log each daemon that crashed, for easier debugging. Fixes: #1744	2012-02-02 09:26:25 -08:00
Sage Weil	0236dc0f5e	add backfill task This does a basic test of backfill functionality, including a divergent log on a backfill target (#1983).	2012-01-31 16:25:53 -08:00
Sage Weil	e337c4727c	ceph_manager: add manager.blackhole_kill_osd() This will suspend disk writes for a couple seconds and then kill the daemon. It helps us similute a hardware failure.	2012-01-31 16:13:59 -08:00
Tommi Virtanen	d7be77628c	Allow user to disable lock checking. The new plana hardware isn't in the old sepia lock database, and the machine pools are risky to merge as nothing in the software guarantees allocation from just one pool. This allows us to hand-allocate machines temporarily.	2012-01-31 08:05:36 -08:00
Tommi Virtanen	09bed16408	Allow user to provide flavor to use. With this, you can use Ubuntu 11.10 machines with teuthology by saying:: tasks: - ceph: flavor: oneiric ...	2012-01-31 07:59:43 -08:00
Josh Durgin	f84b4aa5e3	Add admin socket task. This simply gets the output of an admin socket command, makes sure it's json, and runs a user-provided test script on it.	2012-01-27 17:13:36 -08:00
Samuel Just	4aa9ca4551	CephManager: base timeout on time since last change in active+clean Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2012-01-24 11:28:38 -08:00
Josh Durgin	29885f3e42	kernel: ignore connection problems while waiting for reboot	2012-01-18 17:49:05 -08:00
Sage Weil	45e4c924fa	thrashosds: maxdead default to 0 This avoids any possibility of blocking peering.	2012-01-17 09:24:54 -08:00
Sage Weil	bf22a4fb92	task/rados: use new usage for radosmodel tool	2012-01-16 16:53:55 -08:00
Sage Weil	71390f9784	thrashosds: fix action selection I'm not sure what the old code was trying to do, but I'm pretty sure it wasn't doing it correctly.. a .1 chance_down was killing an OSD for me virtually every time.	2012-01-16 15:05:43 -08:00
Sage Weil	8fc6086986	thrashosds: make actions less nonsensical Make marking OSD up/down and in/out totally orthogonal. Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-16 15:05:43 -08:00
Sage Weil	9419f583c6	ls: include duration, less noise	2012-01-16 13:18:49 -08:00
Sage Weil	8fb115fe2c	include run duration in summary.yaml	2012-01-16 12:39:20 -08:00
Sage Weil	7b47e49fa8	ls: fix extraneous newline	2012-01-16 10:47:44 -08:00
Sage Weil	b58f9560ea	ceph: ignore all leaks unless/until we figure out where the DefinitelyLost records are coming from.. at first glance they look bogus.	2012-01-16 09:55:47 -08:00
Sage Weil	40fb86ff81	ceph: take single arg or list for valgrind args	2012-01-16 09:22:45 -08:00
Sage Weil	c88ec5719e	combined mon, osd, mds starter functions	2012-01-15 22:54:09 -08:00
Sage Weil	f8ec23e79d	rbd: default to all:	2012-01-15 22:53:39 -08:00
Sage Weil	72057a9cd8	use local mirrors for (most) github urls A cronjob on ceph.newdream.net updates these every 15 minutes. Sigh.	2012-01-15 22:52:58 -08:00
Sage Weil	fbfa94bb09	teuthology-ls: show pid, last line of output for running jobs	2012-01-15 22:52:58 -08:00
Sage Weil	f70b158cd1	show host -> roles mapping on startup Less guessing when manually inspecting an in-progress or hung run.	2012-01-15 22:52:58 -08:00
Sage Weil	f795261454	lost_unfound: make test work with backfill If we backfill, we fail to peer instead of having every object show up as 'unfound'. Avoid that by preventing log trimming, so that we always do log recovery for this test.	2012-01-15 22:52:58 -08:00
Tommi Virtanen	3bfa41cf6a	Use yaml.safe_dump so unicode doesn't mess up the yaml files. In general, yaml.dump is comparable to pickle, and my personal coding standard says never use it. yaml.safe_dump is much nicer. yaml.dump should have been named yaml.unsafe_dump, yaml.safe_dump should have been named yaml.dump :(	2012-01-13 11:26:36 -08:00
Josh Durgin	0da44591a9	nuke: take config files from -t argument teuthology-lock and teuthology-updatekeys both use -t for this already	2012-01-12 14:48:36 -08:00
Josh Durgin	96e89d30ec	kernel: loop reconnecting in case we race with shutdown Previously, if we reconnected before shutdown completed we asserted that the kernel did not boot into the new version, when we just needed to wait for the machine to reboot.	2012-01-12 13:02:22 -08:00
Sage Weil	59369237c9	thrasher: don't mark down osds out; tell monitor same Stopping ceph-osd doesn't make it out (immediately). Prevent monitor from doing this after a delay too so we can keep our notion of what is up/down/in/out accurate.	2012-01-11 12:54:09 -08:00
Sage Weil	3c0346b4cb	lost_unfound: typo	2012-01-11 12:54:09 -08:00
Sage Weil	6dae2f8ae3	thrasher: adjust min_dead default Make this 1, not 2. That's a bit more friendly. It doesn't strictly matter, tho, since we revive osds before waiting for clean.	2012-01-11 12:54:09 -08:00
Sage Weil	fb74b90152	thrasher: add max_dead Add max_dead, and revive osds prior to waiting for clean. Otherwise we can leave too many OSDs down and the cluster will never go clean.	2012-01-11 12:54:08 -08:00
Sage Weil	50463ffddd	verify all osds start before checking health Just checking health isn't good enough, since it races with OSD startup: we can have a healthy cluster with 0 (or something else < total) OSDs.	2012-01-11 12:54:08 -08:00
Josh Durgin	f4883ebf09	ceph: let the user running ceph-osd remove subvolumes This will prevent EPERM when using the SNAP_DESTROY ioctl, so the filestore will use btrfs snaps.	2012-01-10 16:07:04 -08:00
Josh Durgin	d2fadf9fe2	syslog: ignore lockdep non-static key warning It looks like this warning was made default in linux 3.2. This will keep happening until #1922 is done.	2012-01-10 15:28:42 -08:00
Sage Weil	b354ce4e91	run: put pid in archive dir This will make it easy for teuthology-ls to show you the running process's pid (if it's still running). Or for other utiltizes to kill + clean up a hung teuthology run.	2012-01-08 14:39:30 -08:00
Sage Weil	13445d237b	ceph_manager: a booting osd is no longer automatically marked in as of ceph.git commit `96b7b0d83e`	2012-01-06 17:21:38 -08:00
Sage Weil	001701a0f7	mon_recovery: need n/2 + 1 monitors for quorum	2012-01-06 15:12:15 -08:00
Sage Weil	da9210779e	ceph: don't skip monitor ports We can use the same port multiple times if they are on a different hosts.	2012-01-06 13:36:54 -08:00
Josh Durgin	561f06cf94	suite: make email-on-success the default behavior This way you can tell when a run is complete, instead of wondering if it's stuck in the queue.	2012-01-05 17:27:31 -08:00
Josh Durgin	ec3a3a9654	rados: fix example config	2012-01-03 14:07:45 -08:00
Josh Durgin	cdd5c456a0	nuke-on-error: only unlock if this run locked the machines	2012-01-03 13:02:31 -08:00
Josh Durgin	0176c9ab0f	Remove unused mon.0 variables.	2012-01-03 13:02:31 -08:00
Josh Durgin	2e9b1c75f9	rados: use testrados instead of testsnaps and testreadwrite	2012-01-03 13:02:29 -08:00
Josh Durgin	932257fb6e	rados: remove unused variable	2011-12-30 14:37:45 -08:00
Josh Durgin	0af9c0a2e7	rados: clean up argument construction Only the client id varies, so it can be done outside the loop. Also handle coredumps and coverage, and use LD_LIBRARY_PATH instead of LD_PRELOAD.	2011-12-30 14:37:45 -08:00
Josh Durgin	6df4ce5075	rados: fix references to testrados	2011-12-30 14:37:45 -08:00
Josh Durgin	cdf142b597	rados: fix documentation format	2011-12-30 14:37:45 -08:00
Josh Durgin	2f71f03fdd	misc: simplify reconnect logic Ignore all errors until the timeout expires so we don't have to worry about whitelisting them.	2011-12-30 14:37:37 -08:00
Mark Kampe	f04e29557e	teuthology rgw-admin: annotated test cases for inventory this is not a nose suite, so I simply added test case descriptions in csv format, and put a file to extract them at the top of the file. Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>	2011-12-29 13:09:08 -08:00
Josh Durgin	d0e90d71bd	syslog checking: forgot a pipe	2011-12-16 18:09:17 -08:00
Yehuda Sadeh	7eec30946d	rountrip: add task	2011-12-15 13:24:53 -08:00
Yehuda Sadeh	97cc6c2990	readwrite: fix task with default conf	2011-12-15 12:39:39 -08:00
Yehuda Sadeh	659e66aa09	readwrite: fix conf, task runs	2011-12-14 17:14:30 -08:00
Yehuda Sadeh	7d085ad939	readwrite: add readwrite task still not really running, but at least getting configured	2011-12-14 16:12:55 -08:00
Josh Durgin	31b5ccbf1b	coverage: use locally stored build instead of downloading from a gitbuilder	2011-12-13 16:16:09 -08:00
Josh Durgin	c9e4504fbd	Ignore lockdep being turned off for now. Some machines are hitting this udev issue: http://marc.info/?l=linux-kernel&m=132033587908426&w=2 and lockdep is turned off after the first warning.	2011-12-12 16:29:41 -08:00
Josh Durgin	a768ad738a	coverage: don't generate html reports for each test These can always be generated from the lcov files later, right now they just waste space.	2011-12-08 17:47:14 -08:00
Josh Durgin	7b52dd1410	syslog: ignore 'task blocked' warnings These will happen under heavy load (usually on the osd).	2011-12-08 17:17:47 -08:00
Josh Durgin	e69057e4a1	internal: check syslog for errors This should catch lockdep warnings and mark tests with them as failed.	2011-12-07 15:20:33 -08:00
Josh Durgin	95e632475f	workunit: set client id and secretfile env vars These are used by the kernel rbd workunit to know how to map images. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-06 16:16:38 -08:00
Tommi Virtanen	e80c32c442	Rename "testrados" and "testswift" tasks to not begin with "test". Anything "test*" looks like a unit test, and shouldn't be used for actual code.	2011-12-05 10:07:25 -08:00
Tommi Virtanen	0dd4d69ffe	Fix unit tests for SSH keep-alive setting. Commit `6e3e0d7cdc` failed to pass unit tests.	2011-12-05 10:02:30 -08:00
Tommi Virtanen	50c4b312a2	Handle interactive-on-error also when error is from contextmanager exit. Closes: http://tracker.newdream.net/issues/1745	2011-11-30 17:07:26 -08:00
Tommi Virtanen	c651c88eac	Properly handle case where first error is inside a context manager __exit__. Closes: http://tracker.newdream.net/issues/1743	2011-11-21 16:00:49 -08:00
Sage Weil	721c0e9720	nuke: don't specify full path /tmp/cephtest/binary may have been removed; kill stray daemons by name only. we really don't care about false positives here!	2011-11-19 20:56:49 -08:00
Sage Weil	4b53288b0c	ceph_manager: %	2011-11-19 20:56:49 -08:00
Josh Durgin	508f4f8359	Save summary after nuking machines. This way you can tell when tests are entirely finished running.	2011-11-18 13:53:51 -08:00
Josh Durgin	42cecb5e55	suite: put common config before facets This lets you add tasks to the beginning of a run, like the chef task.	2011-11-17 17:26:21 -08:00
Josh Durgin	044a88ce59	suite: schedule a list of collections for running instead of a single suite directory	2011-11-17 17:16:23 -08:00
Yehuda Sadeh	23aae67aff	testswift: fix config	2011-11-17 16:53:57 -08:00
Tommi Virtanen	d8fc151365	Clean up C++isms.	2011-11-17 17:00:44 -08:00
Tommi Virtanen	c545094895	Add a task for easily running chef-solo on all the nodes.	2011-11-17 16:49:47 -08:00
Sage Weil	89f80412c2	ceph_manager: fix logging	2011-11-17 13:46:02 -08:00
Josh Durgin	f85f5dd7e3	ceph: deep merge overrides, so e.g. log whitelists can be overridden	2011-11-17 13:07:03 -08:00
Josh Durgin	a763297685	misc: move deep_merge out of the MergeConfig class - it's generic	2011-11-17 13:06:36 -08:00
Josh Durgin	c6988a07f4	Save config after locking nodes, so targets are included.	2011-11-17 11:57:07 -08:00
Josh Durgin	4e6cd55c59	filestore_idempotent: remove unused import	2011-11-17 11:18:24 -08:00
Josh Durgin	7d51e3d381	mon_recovery: remove unused code and import	2011-11-17 11:16:08 -08:00
Josh Durgin	f4d527e743	thrashosds: timeout for every clean check, not just the last one	2011-11-17 11:11:33 -08:00
Josh Durgin	9d12b720e8	ceph_manager: add a default timeout of 5 minutes for mon quorum	2011-11-17 11:05:12 -08:00
Josh Durgin	cb9ac0897b	ceph_manager: log mon quorum status so the logs show progress (or lack thereof)	2011-11-17 10:45:19 -08:00
Yehuda Sadeh	f3c569ee23	rgw: add swift task still not completely working (for some reason it skips all the tests)	2011-11-16 16:00:01 -08:00
Sage Weil	c5f070b8a9	filestore_idempotent.py: simple task to test non-idempotent osd ops Write some non-idempotent events to the osd. Simulate a failure. Verify the result is correct on replay. This must be preceeded by the ceph task just so that we get the binaries installed. Should clean this up later if/when the installation gets factored out of ceph.py. Signed-off-by: Sage Weil <sage@newdream.net>	2011-11-10 21:35:11 -08:00
Sage Weil	77c977c1cf	misc: allow >1 monitor per role in get_mon_names() Signed-off-by: Sage Weil <sage@newdream.net>	2011-11-10 14:13:24 -08:00
Josh Durgin	afa56f16d1	nuke: increase reboot timeout Some sepia nodes are very slow to reboot.	2011-11-09 10:49:37 -08:00
Sage Weil	6618a0275c	mon_recovery: add task to test monitor cluster failure recovery Some simple tests to start with. We still need some sort of mon cluster thrashing. Signed-off-by: Sage Weil <sage@newdream.net>	2011-11-08 22:17:00 -08:00
Sage Weil	60863f70eb	ceph_manager: manipulate monitors	2011-11-08 22:17:00 -08:00
Sage Weil	6d39cc1146	ceph: keep ceph.conf at ctx.ceph.conf Signed-off-by: Sage Weil <sage@newdream.net>	2011-11-08 22:17:00 -08:00
Josh Durgin	006a0dd423	Remove unused imports and variable.	2011-11-08 16:09:21 -08:00
Josh Durgin	5d32bcae50	Add nuke-on-error option. This lets automated jobs nuke and unlock machines after failed tests. Each machine is nuke individually, so one down machine won't keep others from being nuked and unlocked.	2011-11-08 16:09:21 -08:00
Tommi Virtanen	c764b2475b	Fix leftover orchestra import clause. This seems to be a leftover from `a2372fce12`, no idea how it stayed hidden this long.	2011-11-07 13:05:14 -08:00
Josh Durgin	4f3b113832	ceph_manager: log ceph -s output so progress is visible in the logs	2011-11-03 13:27:44 -07:00
Josh Durgin	0b451f9475	Keep each ssh connection alive. With long-running jobs like thrashing, ssh connections were timing out.	2011-11-03 13:08:49 -07:00
Josh Durgin	6e3e0d7cdc	connection: allow the caller to specify whether keep-alive should be used	2011-11-03 13:07:21 -07:00
Josh Durgin	b1a0c1adea	locker: fix race in locking The isolation level is lower than I thought. This made it possible for two clients to think they both locked the same machines, since the update would still be modifying each row to change the locked_since time.	2011-11-03 11:29:18 -07:00
Samuel Just	a2f406ef49	testrados: set CEPH_CLIENT_ID without a ; Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2011-11-02 11:33:37 -07:00
Samuel Just	810cae1a1d	testrados: specify CEPH_CONF directly Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2011-10-31 14:54:24 -07:00
Yehuda Sadeh	10c3508741	rgw: add user suspend/enable test	2011-10-27 12:11:28 -07:00
Yehuda Sadeh	86aa940ffb	rgw: log-to-stderr is now a binary flag	2011-10-27 11:32:12 -07:00
Samuel Just	8d0a7c5977	testrados: rename testsnaps to testrados and make snap testing optional Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2011-10-24 14:25:22 -07:00
Josh Durgin	a1249d07ca	workunit: set PYTHONPATH so we can test python bindings	2011-10-24 13:52:58 -07:00
Sage Weil	61cbb3218e	ceph.conf: python parser doens't like ; comments	2011-10-23 10:30:27 -07:00
Sage Weil	3ed065625b	ceph.conf: more frequent osd scrubbing; remove old cruft	2011-10-22 22:16:39 -07:00
Sage Weil	b8beff3dd5	ceph_manager: count active+clean+<somjething else> as active+clean In my case, one pg was active+clean+scrubbing. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-21 10:54:05 -07:00
Sage Weil	4ec37b2391	add lost_unfound task Also some misc useful bits to ceph_manager.	2011-10-17 15:32:22 -07:00
Josh Durgin	bcded7f163	ceph: add whitelist for cluster log errors Some messages are expected when thrashing osds or creating unfound objects. Fixes: #1622	2011-10-17 14:42:08 -07:00
Josh Durgin	fba220ecaa	nuke: reset syslog configuration after rebooting Previously we removed a file and rebooted without syncing, so the file was never deleted.	2011-10-17 10:40:19 -07:00
Yehuda Sadeh	493596a7fd	radosgw-admin: test swift keys creation/removal	2011-10-12 15:37:33 -07:00
Josh Durgin	321381d75f	teuthology-worker: remove --keep-locked-on-error	2011-10-07 14:51:46 -07:00
Josh Durgin	3d3eb0efea	Remove --keep-locked-on-error, and behave as if it were specified This will help prevent machines with cephtest dirs still present from being used. It's easy to unlock machines - the targets yaml fragment is output during a run.	2011-10-07 14:49:53 -07:00
Josh Durgin	c56ab97442	reconnect: ignore SSHExceptions before the timeout expires Fixes: #1587	2011-10-06 17:18:35 -07:00
Samuel Just	4722d468c6	task/watch_notify_stress: watch_notify_stress now thrashes clients This should exercise the watch notify timeout code. Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2011-10-06 14:34:44 -07:00
Sage Weil	4e61e4835e	rgw: keep radosgw in foreground It defaults to a daemon now.	2011-10-06 12:50:12 -07:00
Josh Durgin	107db6a913	Retry listing machines if the lock server goes down.	2011-10-04 17:21:00 -07:00
Sage Weil	39a1e76065	rgw: use normal logging mechanism Keep capturing stdout/err, even though it should end up empty. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-04 16:09:51 -07:00
Josh Durgin	7b7ff6e8ce	teuthology-worker: clean up last_in_suite jobs There's no reason not to delete them once they start.	2011-10-04 12:32:58 -07:00
Josh Durgin	3d3ba1ebb1	daemon-helper: detect the signal actually sent I thought I fixed this when I implemented coverage collection, but I guess it got lost in a rebase or something.	2011-10-04 12:17:19 -07:00
Josh Durgin	d305d61b86	ceph_manager: remove unused raw_pg_status method	2011-10-03 17:49:53 -07:00

... 2 3 4 5 6 ...

651 Commits