RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-11 13:41:02 +00:00

Author	SHA1	Message	Date
Sage Weil	abd2ae7423	mon: factor reporter lagginess into grace adjustment Use reporters as a proxy for laggy subclusters within the overall cluster. See #3046. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:39:00 -07:00
Sage Weil	adf0fe6a10	mon: scale heartbeat grace based on laggy probability, interval If, based on historical behavior, an observed osd failure is likely to be due to unresponsiveness and not the daemon stopping, scale the heartbeat grace period accordingly: grace' = grace + laggy_probabiliy * laggy_interval This will avoid fruitlessly marking OSDs down and generating additional map update overhead when the cluster is overloaded and potentially struggling to keep up with map updates. See #3045. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:39:00 -07:00
Sage Weil	3f51d31639	mon: check failures in tick Currently we only trigger a failure on receipt of a failure report. Move the checks into a helper and check during tick() too, so that we will trigger failures even when the thresholds are not met at failure report time. This is rarely true now, but will be true once we locally scale the grace period. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:39:00 -07:00
Sage Weil	09b251cd22	mon: clean up osd failure logging Debug log when we get a report, info log when we actual fail the osd. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:39:00 -07:00
Sage Weil	a3e8ed1e4e	mon: reply to all reporters when an osd is failed Track the latest report message for each reporter. When the osd is eventually marked failed, send map updates to them all. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:39:00 -07:00
Sage Weil	7952c35926	mon: locally apply osd heartbeat grace to failure checks Aggregate the failure reports into a single mon 'failed_since' value (the max, currently), and wait until we have exceeded the grace period to consider the osd failed. WARNING: This slightly changes the semantics. Previously, the grace could be adjusted in the [osd] section. Now, the [osd] option controls when the failure messages are sent, and the [mon] option controls when it is marked down, and sane users should set it once in [global]. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:39:00 -07:00
Sage Weil	3eb7341aab	mon: no_reply() to failure messages we don't reply to This makes use clean up request state when requests have been forwarded. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:39:00 -07:00
Sage Weil	d328a28cc6	mon: send 'null' reply to requests we won't reply to This is a no-op if the client was talking to us, but in the forwarded request case will clean up the request state (and request message) on the forwarding monitor. Otherwise, MOSDFailure messages (and probably others) can accumulate on the non-leader mon indefinitely. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:39:00 -07:00
Sage Weil	e06818be04	mon: refactor osd failure report tracking - use structs to track allegedly failed nodes, and reports against them. - use methods to handle report, and failure threshold logic. - calculate failed_since based on OSD's reported failed_for duration This will make it simpler to extend the logic when we add dynamic grace periods. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:38:59 -07:00
Sage Weil	66f31c1091	mon: adjust or decay laggy probabilities on osd boot On each osd boot, determine whether the osd was laggy (wrongly marked down) or newly booted. Either update the laggy probability and interval or decay the values, as appropriate. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:38:59 -07:00
Sage Weil	e9f051ef3c	osdmap: include osd_xinfo_t to track laggy probabilities, timestamps Track information about laggy probabilities for each OSD. That is, the probability that if it is marked down it is because it is laggy, and the expected interval over which it will take to recovery if it is laggy. We store this in the OSDMap because it is not convenient to keep it elsewhere in the monitor. Yet. When the new mon infrastructure is in place, there is a bunch of stuff that can be moved out of the OSDMap 'extended' section into other mon data structures. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:38:59 -07:00
Sage Weil	b64641c3dd	osd: include boot_epoch in MOSDBoot This will let the monitor infer whether we were wrongly marked down or the daemon restarted. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:38:59 -07:00
Sage Weil	4f1792d769	osd: include failed_for in MOSDFailure reports The monitor will need this to dynamically adjust the heartbeat grace. Closes: #3044 Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-18 14:38:59 -07:00
Sage Weil	331bbcfbc0	Merge remote-tracking branch 'gh/wip-crush' Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-09-11 16:04:58 -07:00
Tommi Virtanen	d8cb19dd09	upstart: Add ceph-create-keys.conf to package. Signed-off-by: Tommi Virtanen <tv@inktank.com>	2012-09-11 15:31:06 -07:00
John Wilkins	ced6c2c358	:doc: Fixed typo. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 15:24:12 -07:00
Sage Weil	de811db914	obsync: if OrdinaryCallingFormat fails, try SubdomainCallingFormat This blindly tries the Subdomain calling format if the ordinary method fails. In particular, this works around buckets that present a PermanentRedirect message. See bug #3128. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Matthew Wodrich <matthew.wodrich@dreamhost.com>	2012-09-11 14:50:53 -07:00
Samuel Just	ef3eab74e3	Merge remote-tracking branch 'upstream/next' Conflicts: src/osd/ReplicatedPG.cc	2012-09-11 14:06:51 -07:00
Samuel Just	4e5283d476	ReplicatedPG: do not start_recovery_op if we are already pushing Should fix bug #2761. If we are already pushing soid, recovery_ops will only be decremented once for all current pushes, so only increment recovery_ops if we are not currently pushing it. This bug causes us to leak a recovery op and get stuck in backfill. Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-09-11 13:37:03 -07:00
Sage Weil	656ab158ce	osd: fill in user log entry last after snapdir tran Reorder the snapdir logic and ctx->at_version adjustments prior to filling in the object_info_t and user_versions and all that stuff. Adjust at_version after appending the log entry (so that it points to the next position/version we will write at.. culminating in the actual user event). The user log entry contains the request id, which will be used by replay ops to put themselves in the correct place in the waiting_for_commit/ack maps. Thus, the repop needs to be tagged with the same version as the log entry with the request id. Thus, the request id bearing log entry should be the last in the log entry vector. This should fix #3072, wherein a replay which should wait on the repop tagged as version '36 will instead wait on '35. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>	2012-09-11 13:37:03 -07:00
John Wilkins	a4fb9c1a09	:doc: Added tunables to cruch-map.rst. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 13:05:07 -07:00
John Wilkins	911433fd7d	:doc: Removed old pg tuning. New section was added. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 13:00:22 -07:00
John Wilkins	9256a2955a	:doc: Trimmed the old ops tree. Will remove when all porting verified. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 12:59:35 -07:00
John Wilkins	203ba59ed2	:doc: Trimmed the tree for failures/troubleshooting. RGW remains. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 12:52:12 -07:00
John Wilkins	662fd0325b	:doc: removed. RBD now has its own section. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 12:51:06 -07:00
Sage Weil	e6141005f2	mon: adjust number of req args for loc At least one loc key/value pair is required to do anything useful with these commands. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 12:05:01 -07:00
Sage Weil	344fef772e	mon: move loc map parsing into a helper Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 12:05:01 -07:00
Sage Weil	50c957dbdc	crush: constify loc map arguments Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 12:05:01 -07:00
Sage Weil	9636991376	crush: add const string& versions of accessors Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 12:05:01 -07:00
Sage Weil	babef41a06	doc/control.rst: add 'osd crush create-or-move ...' Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 12:05:01 -07:00
Sage Weil	dd9819e376	doc: make note of crush usage change Even tho it is compatible. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 12:05:01 -07:00
Sage Weil	0817b941d5	mon: make redundant osd.NNN argument optional Instead of 'osd crush set NNN osd.NNN weight loc...', make the second osd.NNN option optional, and allow either NNN or osd.NNN to specify the osd id. This makes the usage much more sane, but maintains backward compatibility. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 12:05:01 -07:00
Sage Weil	01a8146983	ceph tool: add 'osd crush create-or-move ...' to help Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 11:35:45 -07:00
John Wilkins	44fa233b77	:doc: Deleting this. Wrote a new one, but will be revised a bit soon. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:26:19 -07:00
John Wilkins	32f30f9aff	:doc: Removed old ops pool section. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:19:47 -07:00
John Wilkins	0313365ddf	:doc: Removed old authentication section. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:19:22 -07:00
John Wilkins	d1053d9d75	:doc: Removed old resize OSD section. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:18:55 -07:00
John Wilkins	ad909f3f45	:doc: Removed old mon resize section. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:15:17 -07:00
John Wilkins	7d881dc809	:doc: Removed from old ops doc. Still needs to be composed though. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:14:40 -07:00
John Wilkins	bf342d1474	:doc: New cluster ops section addresses the todo. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:13:53 -07:00
John Wilkins	e844989576	:doc: Removed old OSD troubleshooting. New version to be updated shortly. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:13:17 -07:00
John Wilkins	72f802c52e	:doc: Removed old monitor troubleshooting. New version to be revised shortly. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:12:37 -07:00
John Wilkins	fe609b7a10	:doc: Removed old mds troubleshooting. Still needs to be composed. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:12:01 -07:00
John Wilkins	a4733b864e	:doc: Removed old cephfs disucssion. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:11:16 -07:00
John Wilkins	d4e00bce76	:doc: Trimmed toctree to last bits of legacy data. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:10:44 -07:00
John Wilkins	922c59ff10	:doc: Updated FAQ with a friendlier message. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-09-11 11:09:01 -07:00
Sage Weil	f1b605c0cb	mon: parse '<id>' or 'osd.<id>' for 'osd crush create-or-move ...' Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 10:48:02 -07:00
Sage Weil	1da73e5df4	mon: fail on trailing characters after parsing numbers parse '8' but not '8asdf'. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 10:48:02 -07:00
Sage Weil	b2409a2c80	mon: 'osd crush create-or-move <id> <initial-weight> <loc ...>' Create an item in the tree with the given weight, or move it (without touching the weight) if it is already present. Closes: #3101 Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 10:48:02 -07:00
Sage Weil	adedd6b600	crush: create_or_move_item() Create an item if it doesn't exist, with the specified weight. If it is already in the tree, move it, but do not adjust the weight. Signed-off-by: Sage Weil <sage@inktank.com>	2012-09-11 10:48:01 -07:00

1 2 3 4 5 ...

21105 Commits