Commit Graph

14387 Commits

Author SHA1 Message Date
Sage Weil
ce04e3dbaf osd: add ability to explicitly mark unfound as lost
Instead of automatically marking unfound objects lost (once we've tried
every location we can think of), do it when the administator explicitly
says to.  This avoids marking things wrong incorrectly when there are
peering issues, and also allows the administrator to decide whether there
may be offline osds that are worth bringing online.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 09:47:06 -07:00
Sage Weil
87309e946b osd: make automatically marking of unfound as lost optional
We may not want to do this automatically until we have more confidense in
the recovery code.  Even then, possible not.  In particular, the OSDs may
believe they have contact all possible homes for the data even though there
is some long-lost OSD that has the data on disk that if offline.

For now, we make the marking process explicit so that the administrator can
make the call.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 09:42:39 -07:00
Sage Weil
081acc4ce2 mds: initialize stray_index on startup
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 09:24:42 -07:00
Sage Weil
d66c6ca19b v0.28.1 2011-05-23 21:11:44 -07:00
Sage Weil
127dcde18b crushtool: default to hash 0 (rjenkins1)
Otherwise we get 255 which is undefined and get bad results!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:44:15 -07:00
Sage Weil
4a83de1832 osd: update last_epoch_clean in PG::Info::History::merge()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:15:12 -07:00
Sage Weil
c22aca1f99 osd: small cleanup
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:15:12 -07:00
Sage Weil
e3191b7dc6 osd: merge history when primary sends replica new pg info
This, among other things, lets us update last_epoch_started and
last_epoch_clean.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:15:12 -07:00
Sage Weil
a51bf3e9df osd: more heartbeat rework
A few things:
 - track Connection* instead of entity_inst_t for hb peers
 - we can only send maps over the cluster_messenger
   - if peer is still alive, do that
   - if peer is not, send dying MOSDPing ping with YOU_DIED flag
2011-05-20 15:15:12 -07:00
Sage Weil
b5ebe6b5a6 msgr: don't close close_on_empty until outgoing messages are acked
Otherwise, if we close the socket, we may lose in-flight data.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:15:12 -07:00
Sage Weil
bac1021e06 osd: only forget peer epochs if they are down AND no longer heartbeat peers
If we forget the peer epoch when we see them go down, we won't share the
map later in update_heartbeat_peers() to tell them they're down.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:15:12 -07:00
Sage Weil
bc960ac1ea osd: show last_epoch_clean in PG::Info::History printer
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:15:12 -07:00
Sage Weil
726aebea8f osd: rework peer map epoch caching
We try to keep track of which epochs our peers have so that we can be
semi-intelligent about which map incrementals we send preceeding any
messages.  Since this is useful from the heartbeat and cluster channels/
threads, protect the data with an inner lock and clean up the callers.

Be smarter about when we forget.

Make note of peer epoch when we receive a ping.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:15:12 -07:00
Sage Weil
27c0bce63c mon: fix parsing of 'osd foo N ...' commands with multiple ids
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 15:15:12 -07:00
Colin Patrick McCabe
68021ce81e dout: reopen log files on SIGHUP
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-05-20 14:27:27 -07:00
Sage Weil
74691e7cea osd: clean up old _from target cleanup; fix one case; share map
Clean up the code to mirror the _to case.

Previously we would not mark down an old _from that is still a _to but with
a new address.  Now we do.

Share a map while we're at it, just to be nice!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 11:29:05 -07:00
Sage Weil
0f1be62914 osd: mark down old _to targets
If a peer remains a _to target but their address changes, we still want
to mark down the old connection.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 11:25:27 -07:00
Sage Weil
3811d8bf1f osd: share map with old _to peers
Use new msgr hooks to do this cleanly.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 11:20:20 -07:00
Sage Weil
f87e1dd58e osd: clean up handle_osd_ping output
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 11:17:19 -07:00
Sage Weil
3a7931c749 osd: ignore stale requests for heartbeats
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 10:54:46 -07:00
Sage Weil
f9bea34034 osd: don't prioritize heartbeat requests
This could conceivably screw up ordering, and priority doesn't matter
anyway when this is the first message we send to this peer.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 10:43:12 -07:00
Sage Weil
7a574d88ae osd: do not clobber explicitly requested heartbeat_to target addresss
Consider peer P.

- P does down in, say, epoch 60, and back up in epoch 70
- P and requests a heartbeat, as_of 70
- We update to map 50, and coincidentally add the same peer as a target
- We set the heartbeat_to[P] = 50 and start sending to the _old_ address
- P marks us down because we stop sending to the new addr
- We eventually get map 70, but it's too late!

Make sure we preserve any _to targets _and_ their epoch+inst.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 10:42:16 -07:00
Sage Weil
e1830dbd09 osd: request proper log extent for missing
We can't blinding ask for everything since last_epoch_started because that
may mean we get some fragment of a backlog.  Look at the peer's log
ranges and request the correct thing.  Also, in fulfill_log, infer what
the primary should have asked for if they make a bad request.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 09:29:10 -07:00
Sage Weil
ff031ce810 osd: fix log bounds check
We weren't accounting for the case where we have

 (foo,foo]+backlog

i.e., everything is backlog, and rbegin().version != log.head.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 08:44:42 -07:00
Sage Weil
1dba8dd6e8 osd: osd# is in log entry header/prefix
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 08:35:44 -07:00
Sage Weil
d75f62378e osd: log broken pg state to monitor on startup, activate
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 08:33:07 -07:00
Sage Weil
b7b8127e0f osd: fix proc_replica_log when peer log is empty
If the peer log is empty, and we break out of the loop on the first pass,
then clearly last_update has not been adjusted.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 08:09:11 -07:00
Sage Weil
f400110859 osd: encode keyring as plaintext after --mkkey
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 07:25:24 -07:00
Sage Weil
93709f89f6 keyring: make encode_plaintext method
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 07:25:16 -07:00
Sage Weil
6995fd515c Merge branch 'wip_choose_acting' into stable 2011-05-20 00:41:31 -07:00
Sage Weil
bdc371e593 osd: take remote log when it is clearly superior
I'm hitting a case where the primary is compensating for a replica's
last_complete < log.tail by sending a log+backlog, but the replica
isn't smart enough to take advantage.  In this case,

      replica: log(781'26629,781'26631]
 from primary: log(781'26629,781'26631]+backlog
       result: log(781'26629,781'26631]

Doh!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 00:27:00 -07:00
Sage Weil
4c97cb5f34 osd: fix compensation for bad last_complete
If the peer has a last_complete below their tail, we can get by with our
log (without backlog) if our tail if _before_ their last_complete, not
after.  Otherwise, we need a backlog!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-20 00:14:24 -07:00
Sage Weil
332565f1f8 osd: remove some build_prior stringstream cruft
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-19 23:48:53 -07:00
Sage Weil
45e8627ce9 osd: remove useless debug print
We dump this (and more) at the end of the PgPriorSet constructor.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-19 23:46:19 -07:00
Sage Weil
a2cb690d8d osd: include past acting osds if they were up
This fixes a bug where we were excluding up (but not acting) nodes from
past intervals, which in turn was triggering a nasty choose_acting loop
(because we _do_ already include acting but !up from the current
interval).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-19 23:40:12 -07:00
Sage Weil
d4b44f9e5a osd: do not exclude me during build_prior
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-19 23:38:25 -07:00
Sage Weil
f7e6b1c1fe osd: show final build_prior result
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-05-19 23:25:32 -07:00
Josh Durgin
dfe52d9e02 OSD, PG: ignore peering messages from before the last peering restart
Check them before entering the state machine so we can
safely enter the Crashed state on unexpected messages
from the current interval.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-05-19 18:02:55 -07:00
Josh Durgin
628665bcb9 OSD: decrement message refcount before returning
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-05-19 18:02:42 -07:00
Samuel Just
a71981c00d PG: add_event, add_next_event: ignore prior_version on backlog events
We would not have the previous version if we are merging backlog events.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-05-19 15:59:40 -07:00
Josh Durgin
bc2c31e070 PG: choose_log_location: prefer OSDs with a backlog
Without preferring an OSD with a backlog, PGs would get stuck in the
active state when acting != up and the backlog was on an OSD with the
same last_update but a lower number or log_tail.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-05-19 14:33:29 -07:00
Josh Durgin
fe298f6461 OSD: send a log in response to a log query when the pg dne
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-05-19 14:33:29 -07:00
Samuel Just
bcbcf30237 ReplicatedPG: wait_for_missing_object in _rollback_to
Previously, we failed if the relevant clone had not yet been recovered.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-05-19 12:24:36 -07:00
Sage Weil
f16903d724 client: do not retake lock in sync_write_commit
We already hold the lock from a few frames up the stack (ms_dispatch).

Reported-by: Simon Tian <aixt2006@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19 10:14:29 -07:00
Sage Weil
4d39f1be6f journaler: ENOENT is okay on trim
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19 09:41:24 -07:00
Sage Weil
ecb7c96167 mkcephfs: pick rdir based on whether current daemon is local or not
We need to pick $rdir as local or remote inside the for name loop.

Fixes: #1094
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19 09:29:11 -07:00
Josh Durgin
2a0f0cd179 PG: remove unused argument to adjust_need_up_thru
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-05-18 17:04:17 -07:00
Josh Durgin
2452d41503 PG: include ourselves in the prior set
All acting OSDs should be in the prior set, since any of them may have
the newest update.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-05-18 17:04:17 -07:00
Josh Durgin
cad3dfaeaf PG: choose acting set and newest_update_osd based on a map of all osds
newest_update osd should be stable when the primary changes, to
prevent cycles of acting set choices. For the same reason, we should
not treat the primary as a special case in choose_acting.

Also remove the magic -1 from representing the current primary.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-05-18 17:04:17 -07:00
Josh Durgin
524ab3a6f8 PG: GetLog: don't fail if we get an outdated log
If we request a log from one osd, and then another member of our prior
set comes up with a later last_update, we should not fail when we
receive the first log.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-05-18 17:04:17 -07:00