Commit Graph

33298 Commits

Author SHA1 Message Date
Gregory Farnum
523619b0e5 Merge pull request #1532 from ceph/wip-fast-dispatch
fast dispatch
This series adds an ms_fast_dispatch interface to the Messenger/Dispatcher, designed so that you can dispatch messages directly from the Pipe threads without going through the Dispatch queue.
It also sets the OSD to make use of this interface for most operations, and switches to finer-grained locking and use of local data in a bunch of different paths to enable that.

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-05-06 23:06:14 -07:00
Greg Farnum
5e5a0867b1 Merge remote-tracking branch 'origin/master' into wip-fast-dispatch
Conflicts:
	src/osd/OSD.cc
2014-05-06 22:23:06 -07:00
Sage Weil
b7134c9a2e Merge pull request #1774 from ceph/wip-8296
osd/ReplicatedPG: fix whiteouts for other cache mode

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-05-06 13:08:39 -07:00
Greg Farnum
2d5d3097c3 Pipe: wait for Pipes to finish running, instead of just stop()ing them
Add a stop_and_wait() function that, in addition to closing the Pipe and killing
its socket, waits for any fast_dispatch call which is in-progress. Use this in
several parts of the Pipe and SimpleMessenger code where appropriate.

This fixes several races with fast_dispatch and other avenues; here are two:
1) It could be that we grab the lock while the existing pipe is fast_dispatching
and then proceed to dispatch messages ourself, beating it. Instead, wait for
the other pipe. Add a "reader_dispatching" member which tells bus this is
happening, and when re-locking, signal the cond if we're shutting down.

2) It could be that a normally-dispatched Message in the OSD triggers a
mark_down() on the Connection and then clears out the Session
(Connection::priv) pointer, causing a racing fast_dispatch()'ed function to
assert out in the OSD because it requires a valid Session.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-05-06 11:39:34 -07:00
Sage Weil
3e387d62ed osd/ReplicatedPG: fix whiteouts for other cache mode
We were special casing WRITEBACK mode for handling whiteouts; this needs to
also include the FORWARD and READONLY modes.  To avoid having to list
specific cache modes, though, just check != NONE.

Fixes: #8296
Backport: firefly
Signed-off-by: Sage Weil <sage@inktank.com>
2014-05-06 11:01:27 -07:00
Samuel Just
650051cd17 Merge pull request #1601 from ceph/wip-7576
osd: prevent pg map epochs from lagging too far behind

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-05-06 10:49:46 -07:00
Josh Durgin
2b48e52c4c Merge pull request #1748 from onlyjob/docs
sample.ceph.conf update:

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-05-06 10:12:45 -07:00
Josh Durgin
9c0e92f0ea Merge pull request #1653 from ceph/wip-7499
rgw, radosgw-admin: bucket link uses bucket instance id now

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-05-06 10:10:59 -07:00
Sage Weil
f31e3ee00a Merge pull request #1768 from daniel-j-h/code_quality
Variable length array of std::strings (not legal in C++) changed to std::vector<std::string>

Reviewed-by: Sage Weil <sage@inktank.com>
2014-05-06 07:12:00 -07:00
Sage Weil
e65a9da93a Revert "Fix installation into user home directory, broken by d3f0c0b"
This reverts commit 7539281037.

This breaks mount.fuse.ceph installation.
2014-05-06 07:04:56 -07:00
John Wilkins
cdbbf86fa3 doc: Fixed artifacts from merge.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-05-06 03:54:45 -07:00
John Wilkins
a31b9e9c75 doc: Added sudo to setenforce. Restored merge artifact.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-05-06 03:54:08 -07:00
John Wilkins
515827223f doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-05-06 03:53:03 -07:00
Daniel J. Hofmann
08a4e88897 Variable length array of std::strings (not legal in C++) changed to std::vector<std::string>
Signed-off-by: Daniel J. Hofmann <daniel@trvx.org>
2014-05-06 09:51:37 +02:00
Sage Weil
38408f6b68 Merge pull request #1770 from ceph/wip-8290
client: check snap_caps in Incode::is_any_caps()

Reviewed-by: Sage Weil <sage@inktank.com>
2014-05-05 16:55:49 -07:00
Yan, Zheng
ae434a3536 client: check snap_caps in Incode::is_any_caps()
Fixes: #8290
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-05-06 07:46:12 +08:00
Sage Weil
7f3a4206a3 Merge pull request #1764 from eile/master
Fix installation into user home directory, broken by d3f0c0b

Reviewed-by: Sage Weil <sage@inktank.com>
2014-05-05 16:26:27 -07:00
Greg Farnum
4bf20afcf8 SimpleMessenger: Don't grab the lock when sending messages if we don't have to
We'd like it if sending a message didn't require any global locks, but the
submit_message() function conditionally needs it in order to create new
Pipes. So:
1) When failing on a dud Pipe, verify that it's still the Pipe the Connection
is linked to; if not, try sending along the newly-linked Pipe.
2) Add an "already_locked" param to submit_message
3) Have the Connection-based interface set this param to false, and
the addr-based interface set it to true, reflecting whether they have
taken the SimpleMessenger::lock.
4) If we discover we need to reference global data structures in
submit_mesage:
  4a) if locked, do as we previously have
  4b) if not locked, take the lock and call into submit_message again.

The net effect of this is that in the typical case, the Connection-based
_send_message() function no longer acquires global locks, only per-Connection
ones. In the case where the Connection must recreate a Pipe, it falls back to
performing like the addr-based _send_message() does. In the case where
we are racing with somebody else recreating a Pipe(either us or the other
end), we may try twice but we will still only take per-Connection/Pipe locks,
which is a fair tradeoff for not taking the global lock.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-05-05 15:29:21 -07:00
Greg Farnum
b038f0c504 OSD: rename share_map_incoming and share_map_outgoing
share_map_incoming -> share_map
share_map_outgoing -> share_map_peer

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:21 -07:00
Greg Farnum
e1277ba6b3 OSD: move the peer_epoch and map sharing infrastructure into OSDService
None of this code requires OSD-internal data or acquring locks from
anybody else.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:21 -07:00
Greg Farnum
938feb49bd OSD: move the {boot,up,bind} epochs into OSDService
Provide interfaces around setting and retrieving them, instead of accessing
them directly with a lock.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:20 -07:00
Greg Farnum
2ec92c7602 OSD: scan for dropped PGs in consume_map instead of advance_map
We have to wait until after we know that nobody will be adding ops for
newly-dead PGs to the list. While we're moving it, switch the locking
so we only hold a write lock while deleting the actual lists.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:20 -07:00
Greg Farnum
fccf1c7010 OSD: do not take the pre_publish_lock in connection utility functions
They loop back around for local connections and deadlock, so we use the
map reservation mechanism instead.
TODO: actually that issue is out of date, do we still want this change?

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:20 -07:00
Greg Farnum
fd2b57eac0 OSD: enable ms_fast_dispatch
We've been setting it up, now this patch actually adds a fast path for osd ops
which bypasses the osd_lock and should not block on any longly held locks. In
addition to the actual ms_fast_dispatch; we take advantage of the fast_notify
functions in order to create a Session for every peer, since that is now the
data structure around which we handle incoming Messages and waitlisting; and
fast_preprocess in order to track when a peer has already sent us a new map
(otherwise, if we see an op with a too-new epoch, we have to request it from
the monitor).

Signed-off-by: Samuel Just <sam.just@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:20 -07:00
Greg Farnum
62b2d43acb OSD: remove dead comment
enqueue_op no longer requires holding the osd_lock.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:20 -07:00
Greg Farnum
9028f95e54 OSD: Juggle the locking when resurrecting a PG
Don't hold the old PG's lock in _create_lock_pg. Instead, just copy the
necessary data bits into a holding location. Note that this means we aren't
protecting it against change while the new PG is created, which I *think*
is okay...

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:20 -07:00
Greg Farnum
5268e51b79 OSD: don't share_map_incoming() directly from handle_replica_op()
Let the op_tp handle it, or our C_SendMap callback in the op_gen_wq.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:19 -07:00
Greg Farnum
ebdc097047 OSD: use the async workqueue to send OSDMap updates on dropped ops
Check whether we actually want to send a map in-line, and if we do, create
a GenContext which does so and put that in the op_gen_wq.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:19 -07:00
Greg Farnum
6c98e36f89 OSD: add an op threadpool GenContext workqueue
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:19 -07:00
Greg Farnum
9fba69a11a OSD: allow build_incremental_map_msg to fail on lookups
Since we're now building incremental map messages out-of-band with doing
other map updates now, we need to tolerate lookup failures at the bottom
end. Do so by returning a NULL message in that case.
Handle that in send_incremental_map by looping until we get a
message back -- if we fail on the first attempt, we'll get
the OSDSuperblock again and deal with it.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:19 -07:00
Greg Farnum
0ffdeab900 OSD: fix a few map sharing bugs
1) do not share OSD maps with peers that already have them
2) do not share maps with oneself

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:19 -07:00
Greg Farnum
0fbaa160c1 OSD: move should_share_map and share_map_incoming to OSDService
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:19 -07:00
Greg Farnum
399e67f884 OSD: pass a pointer to last_sent_epoch instead of the whole Session
We don't use any other part of the Session, and this interface will
be easier to move out of the OSD class.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:19 -07:00
Greg Farnum
c97f96837a OSD: share map updates in the op_tp threads instead of the main dispatch thread
Sharing maps can require disk accesses and things. We don't want to do that
in our fast path, so do it in OSD::dequeue_op instead of OSD::handle_op. We're
cheating slightly and still doing it in handle_op if no op actually gets queued,
but we're going to put those into a separate work queue next. We'll also be
moving all the functions necessary for this into OSDService so that our completion
struct doesn't need to be a friend to OSD.
To make this easier, we're adding send_map_update and sent_epoch members to
OpRequest.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:18 -07:00
Greg Farnum
d78988bf41 OSD: refactor handle_op error handling cases
We move our map version-checking code earlier (to dispatch_op) and refactor
our other fail-to-dispatch cases. This is friendlier for the no-lock
message processing we'll use with fast dispatch.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:18 -07:00
Greg Farnum
276a4fe422 OSD: change Session handling around _share_map_incoming
Move responsibility for the reference up to _share_map_incoming's caller,
and start using the Session::sent_epoch_lock. This looks a little silly
now, but we're going to split up the decide-to-send-maps and send-maps steps
and don't want to block in the decide-to-send step, so we need some
pretty flexible locking up at this level.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:18 -07:00
Greg Farnum
1e3c4959a9 OSD: add a Session::sent_epoch_lock
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:18 -07:00
Greg Farnum
667769c624 OSD: simplify _share_map_incoming based on _should_share_map()
Also, remove the bool return code since nobody looks at it.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:18 -07:00
Greg Farnum
b53cec43d1 OSD: add _should_share_map function
Just copy _share_map_incoming and rip out all the parts that actually
update data structures.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:18 -07:00
Greg Farnum
b2187ac935 OSD: use an OSDMapRef& and require the Session* in _share_map_incoming
You can pass in a NULL Session*, but both callers do that; and using
an OSDMapRef& reduces shared_ptr copies.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:18 -07:00
Greg Farnum
9835866e8e OSD: use safe params in map-sharing functions
We were previously using unprotected access to OSD members.

Unfortunately, this does not make them completely safe: we are looking up
maps asynchronously from when we got access to the cached map bounds, and
so the OSD could delete a map out from underneath us. Fixing that will
require some kind of map bounds lock. :/

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:17 -07:00
Samuel Just
b199194db1 OSD::send_incremental_map: use service superblock so we can avoid locking osd_lock
TODO: make it actually safe by dealing with build_incremental_map_msg()

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-05-05 15:29:17 -07:00
Samuel Just
812c67236d OSD::_share_map_incoming: pass osdmap in explicitly
We'll want to be able to use this method without the osd_lock. Note
that we can't do so yet -- we call send_incremental_map, which is not
safe to call unlocked.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:17 -07:00
Greg Farnum
2f97f4776f OSD: protect state member with a Spinlock
This member was previously protected by the osd_lock (although setting
SHUTDOWN was synchronized with the heartbeat lock, too), but we need
to read it for fast dispatch, so protect it under its own lock at all times.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:17 -07:00
Greg Farnum
a94a64d9d0 OSD: protect access to boot_epoch, up_epoch, bind_epoch
We need to access these members in some call chains via fast_dispatch,
where they're otherwise unprotected.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:17 -07:00
Greg Farnum
767e94ac3d OSD: shard heartbeat_lock
heartbeat_need_update must be protected independently in order to avoid
a loop with the pg_map_lock and the PG::_lock.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:17 -07:00
Greg Farnum
9d8c797e65 OSD: Push responsibility for grabbing pg_map_lock up to callers of _remove_pg()
The atomicity requirements of other systems prevent us dropping the PG lock
inside that function, and the PG lock is ordered underneath the pg_map_lock.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:17 -07:00
Samuel Just
00d36f6e8a OSD: wake_pg_waiters atomically with pg_map update
Also, call enqueue_op directly rather than going back
through the entire dispatch machinery.
Be sure to grab the pg lock under the pg_map_lock in _open_lock_pg() to
preserve necessary lock ordering.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:16 -07:00
Samuel Just
3755318342 OSD: remove wake_all_pg_waiters
We shouldn't need this -- we check the pg waiters list on each
map.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:16 -07:00
Samuel Just
eb30f88c94 OSD: add session waiting_for_map mechanisms
This will replace the existing waiting_for_osdmap mechanism
with a per-session wait list.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2014-05-05 15:29:16 -07:00