RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-26 21:43:10 +00:00

Author	SHA1	Message	Date
Sage Weil	fe9fb49e27	ceph_manager: use get() for self.config powercycle checks I think this is what is going on... Traceback (most recent call last): File "/var/lib/teuthworker/teuthology-master/teuthology/contextutil.py", line 27, in nested yield vars File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph.py", line 1158, in task yield File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 25, in run_tasks manager = _run_one_task(taskname, ctx=ctx, config=config) File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 14, in _run_one_task return fn(**kwargs) File "/var/lib/teuthworker/teuthology-master/teuthology/task/dump_stuck.py", line 93, in task manager.kill_osd(id_) File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph_manager.py", line 665, in kill_osd if 'powercycle' in self.config and self.config['powercycle']: TypeError: argument of type 'NoneType' is not iterable	2013-02-02 21:01:08 -08:00
Samuel Just	fadc22c0b9	ceph_manager: wait for admin socket on restart, use for set_config Fixes: #3966 Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-01-31 12:59:00 -08:00
Sam Lang	8f720454cb	Assign devices to osds using the device wwn Linux doesn't guarantee device names (/dev/sdb, etc.) are always mapped to the same disk. Instead of assigning nominal devices to osds, we map devices by their wwn (/dev/disk/by-id/wwn-*) to an osd (both data and journal). Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-31 08:23:39 -06:00
Sam Lang	58111595d4	Support power cycling osds/nodes through ipmi This patch defines a RemoteConsole class associated with each Remote class instance, allowing power cycling a target through ipmi. Fixes/Implements #3782. Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-31 08:23:37 -06:00
Sam Lang	ace4cb07b2	Replace /tmp/cephtest/ with configurable path Teuthology uses /tmp/cephtest/ as the scratch test directory for a run. This patch replaces /tmp/cephtest/ everywhere with a per-run directory: {basedir}/{rundir} where {basedir} is a directory configured in .teuthology.yaml (/tmp/cephtest if not specified), and {rundir} is the name of the run, as given in --name. If no name is specified, {user}-{timestamp} is used. To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest in .teuthology.yaml. This change was modivated by #3782, which requires a test dir that survives across reboots, but also resolves #3767. Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-31 08:23:31 -06:00
Sam Lang	14730276b9	Fixes for syntax errors found by pyflakes. This patch includes minor fixes to the teuthology python code for syntax errors found by running check-syntax.sh (which runs pyflakes on each file). Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-31 07:58:57 -06:00
Samuel Just	1c31194920	osd_recovery: inject a recovery delay Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-01-28 20:22:33 -08:00
Sage Weil	b5f81636a2	osdthrasher: inject pause on a live (on in) osd	2013-01-26 13:13:08 -08:00
Samuel Just	3a5c70b89b	ceph_manager: turn long stall injection off by default Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-01-24 17:31:38 -08:00
Sage Weil	20af01f23b	ceph_manager: fix get_num_active_recovered() The states now have 'backfill' or 'recover' in them.	2013-01-24 16:23:33 -08:00
Samuel Just	6a859bcd56	ceph_manager: use 80/70 as pause_long, pause_check_after defaults OSD::op_tp suicides after 150. Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-01-24 12:50:26 -08:00
Samuel Just	0f24dca2d7	ceph_manager: use do_rados for rmpool Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-01-24 10:08:44 -08:00
Samuel Just	ec5a14553f	ceph_manager: default chance_down to 0.4 Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-01-23 17:44:05 -08:00
Samuel Just	566ae5332e	ceph_manager: add filestore and heartbeat stalls Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-01-23 17:40:40 -08:00
David Zafman	e714c77812	osd: Testing of deep-scrub omap changes Fix scrub_test.py and add omap corruption test Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>	2013-01-22 15:48:45 -08:00
Sam Lang	53f22d9493	task/mds_thrasher: New task for thrashing the mds Signed-off-by: Sam Lang <sam.lang@inktank.com>	2013-01-18 15:48:52 -06:00
Joao Eduardo Luis	e88b909a1d	task: ceph_manager: add 'get_mon_health' function Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>	2013-01-04 17:03:55 +00:00
Samuel Just	f2dbe5edd7	CephManager: add ability to test split Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-12-11 15:11:06 -08:00
Samuel Just	f309c33d2d	Clean up string interpolation operator spacing ceph_manager.py Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-11-09 10:52:16 -08:00
Samuel Just	f82d4a7b86	Add divergent_priors test Tests scenario where merge_old_entry encounters a divergent entry where the prior_version is prior to log_tail. This is a problem since it will go into the missing set, but won't be re-added to the missing set during read_log() if the node restarts prior to recovering the object. Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-11-09 10:52:15 -08:00
Samuel Just	bd83ed70dc	ceph_manager: add test_min_size action Thrasher can now with configurable frequency test min_size by taking down all but one osd, waiting, killing that osd and bringing back the others, and verifying that the cluster goes clean. Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-11-07 12:56:31 -08:00
Mike Ryan	3b85b2311b	task: verify scrub detects files whose contents changed Signed-off-by: Mike Ryan <mike.ryan@inktank.com>	2012-08-02 11:14:51 -07:00
Sage Weil	a9f2bf622f	ceph_manager: wait_for_active	2012-07-28 10:23:18 -07:00
Sage Weil	731d520900	ceph_manager: count 'incomplete' as 'down'	2012-07-28 10:23:18 -07:00
Josh Durgin	ddb98f7773	ceph_manager: don't try to start greenlet twice spawn already scheduled it. Trying to start it again hits an assert.	2012-04-10 16:23:58 -07:00
Samuel Just	b4aa098f47	make Thrasher not inherit from Greenlet	2012-03-29 18:08:19 -07:00
Sage Weil	84cd4ed6c3	peer: wait for peering to complete, or block We need to wait for peering to either complete, or block because it is waiting for another PG. _Then_ look at all the PG states and compare the mon values with what we get from qeurying the OSDs directly.	2012-02-25 21:05:00 -08:00
Sage Weil	c43e87d118	ceph_manager: list_pg_missing List missing objects for the given pgid.	2012-02-24 12:42:39 -08:00
Josh Durgin	995dc1f751	Add a task for testing stuck pg visibility.	2012-02-21 15:12:48 -08:00
Sage Weil	45b6189b7d	ceph_manager: ignore stale states when counting also remove assumptions about ordering of states	2012-02-18 14:44:53 -08:00
Sage Weil	196d4a1f16	wait_till_clean -> wait_for_clean and wait_for_recovery Clean now also means the correct number of replicas, whereas recovered means we have done all the work we can do given the replicas/osds we have. For example, degraded and clean are now mutually exclusive. Also move away from 'till'.	2012-02-17 21:53:25 -08:00
Sage Weil	6f3abc6ced	ceph_manager: mark in a bit more often than out Otherwise we can get into cases where many/most nodes are out, and things don't work as well. e.g., crush may start to fail.	2012-02-13 15:28:24 -08:00
Sage Weil	e337c4727c	ceph_manager: add manager.blackhole_kill_osd() This will suspend disk writes for a couple seconds and then kill the daemon. It helps us similute a hardware failure.	2012-01-31 16:13:59 -08:00
Samuel Just	4aa9ca4551	CephManager: base timeout on time since last change in active+clean Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2012-01-24 11:28:38 -08:00
Sage Weil	45e4c924fa	thrashosds: maxdead default to 0 This avoids any possibility of blocking peering.	2012-01-17 09:24:54 -08:00
Sage Weil	71390f9784	thrashosds: fix action selection I'm not sure what the old code was trying to do, but I'm pretty sure it wasn't doing it correctly.. a .1 chance_down was killing an OSD for me virtually every time.	2012-01-16 15:05:43 -08:00
Sage Weil	8fc6086986	thrashosds: make actions less nonsensical Make marking OSD up/down and in/out totally orthogonal. Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-16 15:05:43 -08:00
Sage Weil	59369237c9	thrasher: don't mark down osds out; tell monitor same Stopping ceph-osd doesn't make it out (immediately). Prevent monitor from doing this after a delay too so we can keep our notion of what is up/down/in/out accurate.	2012-01-11 12:54:09 -08:00
Sage Weil	6dae2f8ae3	thrasher: adjust min_dead default Make this 1, not 2. That's a bit more friendly. It doesn't strictly matter, tho, since we revive osds before waiting for clean.	2012-01-11 12:54:09 -08:00
Sage Weil	fb74b90152	thrasher: add max_dead Add max_dead, and revive osds prior to waiting for clean. Otherwise we can leave too many OSDs down and the cluster will never go clean.	2012-01-11 12:54:08 -08:00
Sage Weil	13445d237b	ceph_manager: a booting osd is no longer automatically marked in as of ceph.git commit `96b7b0d83e`	2012-01-06 17:21:38 -08:00
Sage Weil	4b53288b0c	ceph_manager: %	2011-11-19 20:56:49 -08:00
Sage Weil	89f80412c2	ceph_manager: fix logging	2011-11-17 13:46:02 -08:00
Josh Durgin	f4d527e743	thrashosds: timeout for every clean check, not just the last one	2011-11-17 11:11:33 -08:00
Josh Durgin	9d12b720e8	ceph_manager: add a default timeout of 5 minutes for mon quorum	2011-11-17 11:05:12 -08:00
Josh Durgin	cb9ac0897b	ceph_manager: log mon quorum status so the logs show progress (or lack thereof)	2011-11-17 10:45:19 -08:00
Sage Weil	60863f70eb	ceph_manager: manipulate monitors	2011-11-08 22:17:00 -08:00
Josh Durgin	006a0dd423	Remove unused imports and variable.	2011-11-08 16:09:21 -08:00
Josh Durgin	4f3b113832	ceph_manager: log ceph -s output so progress is visible in the logs	2011-11-03 13:27:44 -07:00
Sage Weil	b8beff3dd5	ceph_manager: count active+clean+<somjething else> as active+clean In my case, one pg was active+clean+scrubbing. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-21 10:54:05 -07:00

1 2

66 Commits