RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-02-23 19:17:37 +00:00

Author	SHA1	Message	Date
Sage Weil	ce04e3dbaf	osd: add ability to explicitly mark unfound as lost Instead of automatically marking unfound objects lost (once we've tried every location we can think of), do it when the administator explicitly says to. This avoids marking things wrong incorrectly when there are peering issues, and also allows the administrator to decide whether there may be offline osds that are worth bringing online. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 09:47:06 -07:00
Sage Weil	87309e946b	osd: make automatically marking of unfound as lost optional We may not want to do this automatically until we have more confidense in the recovery code. Even then, possible not. In particular, the OSDs may believe they have contact all possible homes for the data even though there is some long-lost OSD that has the data on disk that if offline. For now, we make the marking process explicit so that the administrator can make the call. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 09:42:39 -07:00
Sage Weil	081acc4ce2	mds: initialize stray_index on startup Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 09:24:42 -07:00
Sage Weil	d66c6ca19b	v0.28.1	2011-05-23 21:11:44 -07:00
Sage Weil	127dcde18b	crushtool: default to hash 0 (rjenkins1) Otherwise we get 255 which is undefined and get bad results! Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:44:15 -07:00
Sage Weil	4a83de1832	osd: update last_epoch_clean in PG::Info::History::merge() Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:15:12 -07:00
Sage Weil	c22aca1f99	osd: small cleanup Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:15:12 -07:00
Sage Weil	e3191b7dc6	osd: merge history when primary sends replica new pg info This, among other things, lets us update last_epoch_started and last_epoch_clean. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:15:12 -07:00
Sage Weil	a51bf3e9df	osd: more heartbeat rework A few things: - track Connection* instead of entity_inst_t for hb peers - we can only send maps over the cluster_messenger - if peer is still alive, do that - if peer is not, send dying MOSDPing ping with YOU_DIED flag	2011-05-20 15:15:12 -07:00
Sage Weil	b5ebe6b5a6	msgr: don't close close_on_empty until outgoing messages are acked Otherwise, if we close the socket, we may lose in-flight data. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:15:12 -07:00
Sage Weil	bac1021e06	osd: only forget peer epochs if they are down AND no longer heartbeat peers If we forget the peer epoch when we see them go down, we won't share the map later in update_heartbeat_peers() to tell them they're down. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:15:12 -07:00
Sage Weil	bc960ac1ea	osd: show last_epoch_clean in PG::Info::History printer Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:15:12 -07:00
Sage Weil	726aebea8f	osd: rework peer map epoch caching We try to keep track of which epochs our peers have so that we can be semi-intelligent about which map incrementals we send preceeding any messages. Since this is useful from the heartbeat and cluster channels/ threads, protect the data with an inner lock and clean up the callers. Be smarter about when we forget. Make note of peer epoch when we receive a ping. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:15:12 -07:00
Sage Weil	27c0bce63c	mon: fix parsing of 'osd foo N ...' commands with multiple ids Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 15:15:12 -07:00
Colin Patrick McCabe	68021ce81e	dout: reopen log files on SIGHUP Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>	2011-05-20 14:27:27 -07:00
Sage Weil	74691e7cea	osd: clean up old _from target cleanup; fix one case; share map Clean up the code to mirror the _to case. Previously we would not mark down an old _from that is still a _to but with a new address. Now we do. Share a map while we're at it, just to be nice! Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 11:29:05 -07:00
Sage Weil	0f1be62914	osd: mark down old _to targets If a peer remains a _to target but their address changes, we still want to mark down the old connection. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 11:25:27 -07:00
Sage Weil	3811d8bf1f	osd: share map with old _to peers Use new msgr hooks to do this cleanly. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 11:20:20 -07:00
Sage Weil	f87e1dd58e	osd: clean up handle_osd_ping output Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 11:17:19 -07:00
Sage Weil	3a7931c749	osd: ignore stale requests for heartbeats Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 10:54:46 -07:00
Sage Weil	f9bea34034	osd: don't prioritize heartbeat requests This could conceivably screw up ordering, and priority doesn't matter anyway when this is the first message we send to this peer. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 10:43:12 -07:00
Sage Weil	7a574d88ae	osd: do not clobber explicitly requested heartbeat_to target addresss Consider peer P. - P does down in, say, epoch 60, and back up in epoch 70 - P and requests a heartbeat, as_of 70 - We update to map 50, and coincidentally add the same peer as a target - We set the heartbeat_to[P] = 50 and start sending to the _old_ address - P marks us down because we stop sending to the new addr - We eventually get map 70, but it's too late! Make sure we preserve any _to targets _and_ their epoch+inst. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 10:42:16 -07:00
Sage Weil	e1830dbd09	osd: request proper log extent for missing We can't blinding ask for everything since last_epoch_started because that may mean we get some fragment of a backlog. Look at the peer's log ranges and request the correct thing. Also, in fulfill_log, infer what the primary should have asked for if they make a bad request. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 09:29:10 -07:00
Sage Weil	ff031ce810	osd: fix log bounds check We weren't accounting for the case where we have (foo,foo]+backlog i.e., everything is backlog, and rbegin().version != log.head. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 08:44:42 -07:00
Sage Weil	1dba8dd6e8	osd: osd# is in log entry header/prefix Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 08:35:44 -07:00
Sage Weil	d75f62378e	osd: log broken pg state to monitor on startup, activate Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 08:33:07 -07:00
Sage Weil	b7b8127e0f	osd: fix proc_replica_log when peer log is empty If the peer log is empty, and we break out of the loop on the first pass, then clearly last_update has not been adjusted. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 08:09:11 -07:00
Sage Weil	f400110859	osd: encode keyring as plaintext after --mkkey Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 07:25:24 -07:00
Sage Weil	93709f89f6	keyring: make encode_plaintext method Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 07:25:16 -07:00
Sage Weil	6995fd515c	Merge branch 'wip_choose_acting' into stable	2011-05-20 00:41:31 -07:00
Sage Weil	bdc371e593	osd: take remote log when it is clearly superior I'm hitting a case where the primary is compensating for a replica's last_complete < log.tail by sending a log+backlog, but the replica isn't smart enough to take advantage. In this case, replica: log(781'26629,781'26631] from primary: log(781'26629,781'26631]+backlog result: log(781'26629,781'26631] Doh! Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 00:27:00 -07:00
Sage Weil	4c97cb5f34	osd: fix compensation for bad last_complete If the peer has a last_complete below their tail, we can get by with our log (without backlog) if our tail if _before_ their last_complete, not after. Otherwise, we need a backlog! Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-20 00:14:24 -07:00
Sage Weil	332565f1f8	osd: remove some build_prior stringstream cruft Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-19 23:48:53 -07:00
Sage Weil	45e8627ce9	osd: remove useless debug print We dump this (and more) at the end of the PgPriorSet constructor. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-19 23:46:19 -07:00
Sage Weil	a2cb690d8d	osd: include past acting osds if they were up This fixes a bug where we were excluding up (but not acting) nodes from past intervals, which in turn was triggering a nasty choose_acting loop (because we _do_ already include acting but !up from the current interval). Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-19 23:40:12 -07:00
Sage Weil	d4b44f9e5a	osd: do not exclude me during build_prior Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-19 23:38:25 -07:00
Sage Weil	f7e6b1c1fe	osd: show final build_prior result Signed-off-by: Sage Weil <sage.weil@dreamhost.com>	2011-05-19 23:25:32 -07:00
Josh Durgin	dfe52d9e02	OSD, PG: ignore peering messages from before the last peering restart Check them before entering the state machine so we can safely enter the Crashed state on unexpected messages from the current interval. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-05-19 18:02:55 -07:00
Josh Durgin	628665bcb9	OSD: decrement message refcount before returning Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-05-19 18:02:42 -07:00
Samuel Just	a71981c00d	PG: add_event, add_next_event: ignore prior_version on backlog events We would not have the previous version if we are merging backlog events. Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2011-05-19 15:59:40 -07:00
Josh Durgin	bc2c31e070	PG: choose_log_location: prefer OSDs with a backlog Without preferring an OSD with a backlog, PGs would get stuck in the active state when acting != up and the backlog was on an OSD with the same last_update but a lower number or log_tail. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-05-19 14:33:29 -07:00
Josh Durgin	fe298f6461	OSD: send a log in response to a log query when the pg dne Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-05-19 14:33:29 -07:00
Samuel Just	bcbcf30237	ReplicatedPG: wait_for_missing_object in _rollback_to Previously, we failed if the relevant clone had not yet been recovered. Signed-off-by: Samuel Just <samuel.just@dreamhost.com>	2011-05-19 12:24:36 -07:00
Sage Weil	f16903d724	client: do not retake lock in sync_write_commit We already hold the lock from a few frames up the stack (ms_dispatch). Reported-by: Simon Tian <aixt2006@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 10:14:29 -07:00
Sage Weil	4d39f1be6f	journaler: ENOENT is okay on trim Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 09:41:24 -07:00
Sage Weil	ecb7c96167	mkcephfs: pick rdir based on whether current daemon is local or not We need to pick $rdir as local or remote inside the for name loop. Fixes: #1094 Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 09:29:11 -07:00
Josh Durgin	2a0f0cd179	PG: remove unused argument to adjust_need_up_thru Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-05-18 17:04:17 -07:00
Josh Durgin	2452d41503	PG: include ourselves in the prior set All acting OSDs should be in the prior set, since any of them may have the newest update. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-05-18 17:04:17 -07:00
Josh Durgin	cad3dfaeaf	PG: choose acting set and newest_update_osd based on a map of all osds newest_update osd should be stable when the primary changes, to prevent cycles of acting set choices. For the same reason, we should not treat the primary as a special case in choose_acting. Also remove the magic -1 from representing the current primary. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-05-18 17:04:17 -07:00
Josh Durgin	524ab3a6f8	PG: GetLog: don't fail if we get an outdated log If we request a log from one osd, and then another member of our prior set comes up with a later last_update, we should not fail when we receive the first log. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-05-18 17:04:17 -07:00

1 2 3 4 5 ...

14387 Commits