Commit Graph

20035 Commits

Author SHA1 Message Date
Sage Weil
7b398a5d9c msgr: indicate whether clients are lossy
We need to know whether the client is lossy before we connect to the peer
in order to know whether to deliver a RESET event or not on connection
failure.  Lossy clients get one, lossless do not.

And in any case, we know ahead of time, so we may as well indicate as much
in the Policy.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:36:32 -07:00
Sage Weil
525830cd0b msgr: do not discard_queue in Pipe reaper
The IncomingQueue can live beyond the Pipe.  In particular, there is no
reason not to deliver messages we've received on this connection even
though the socket has errored out.

Separate incoming queue discard from outgoing, and only do the latter in
the reaper.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:36:32 -07:00
Sage Weil
8966f71ae5 msg/IncomingQueue: make the pipe parent informational only
Use this pointer only for debug output prefix; do not dereference, as we
may live beyond the original parent.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:36:32 -07:00
Sage Weil
999c506d5e msg/DispatchQueue: give IncomingQueue ref to queue
We want to be able to queue an event (e.g., RESET) and deliver it even
after the Pipe is destroyed.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:36:31 -07:00
Sage Weil
5a62dfef3d msg/DispachQueue: hold lock in IncomingQueue::discard_queue()
This prevents races with the dispatch thread, among other things.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:36:31 -07:00
Sage Weil
35b7bca357 msg: kill tcp.{cc,h}
Move the remaining comparator into msg_types.h and kill this off.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:36:31 -07:00
Sage Weil
5ecc5bce18 msg/DispatchQueue: cleanup debug prefix
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:22:27 -07:00
Sage Weil
89b07f4703 msg/Pipe: move tcp_* functions into Pipe class
This lets us print nice debug prefixes.  It also calls BS on the
Pipe vs tcp.cc separation.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:22:24 -07:00
Sage Weil
d034e46dd8 msgr: move Accepter into separate .cc
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:22:21 -07:00
Sage Weil
3e98617c3a msg/Pipe: get_state_name()
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:22:15 -07:00
Sage Weil
f78a4010f1 msgr: rework accept() connect_seq/race handling
We change a couple of key things here:

 * If there is a matching connect_seq and the existing connection is in OPEN (or
   STANDBY; same thing + a failure), we send a RETRY_SESSION and ask the peer to
   bump their connect_seq.  This handles the case where there was a race, our
   end successfully opened, but the peer's racing attempt was slowly processed.
 * We always reply with connect_seq + 1.  This handles the above case
   more cleanly, and lets us use the same code path.

Also avoid duplicating the RETRY_SESSION path with a goto.  Beautiful!

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-20 18:22:10 -07:00
Sage Weil
a542d89ee5 mds: fix race in connection accept; fix con replacement
We solve two problems with this patch.  The first is that the messenger
will now reuse an existing session's Connection with a new connection,
which means that we don't want to change session->connection when we
are validating an authorizer.  Instead, set (but do not change) it.

We also want to avoid a race where:

 - mds recovers, replays Sessions with no con's
 - multiple connection attempts for the same session race in the msgr
 - both are authorized, but out of order
 - Session->connection gets set to the losing attempt's Connection*

Instead, we take advantage of an accept event that is called only for
accepted winners.

Signed-off-by: Sage Weil <sage@inktank.com>

fixup
2012-07-10 19:04:42 -07:00
Sage Weil
68bad03b2c msgr: queue accept event when pipe is accepted
Queue an event when an incoming connection is accepted.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:59:24 -07:00
Sage Weil
fab6e824c4 msg/DispatchQueue: queue and deliver accept events
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:59:21 -07:00
Sage Weil
d4ef004e64 dispatcher: new 'accept' event type
Create a new event type when we successfully accept a connection.  This is
distinct from the authorizor verification, which may happen for multiple
racing connection attempts.  In contrast, this will only happen on those
that win the race(s).  I don't think this is that important for stateless
servers (OSD, MON), but it is important for the MDS to ensure that it keeps
its Session con reference pointing to the most recently-successful
connection attempt.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:30:44 -07:00
Sage Weil
1a4a133071 msgr: drop unnecessary (un)locking on queuing connection events
This used to be necessary because the pipe_lock was used when queueing
the pipe in the dispatch queue.  Now that is handled by IncomingQueue's
own lock, so these can be removed.

By no longer dropping the lock, we eliminate a whole category of potential
hard-to-debug races.  (Not that any were observed, but now we dno't need to
worry about them.)

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:30:33 -07:00
Sage Weil
e84565d9e8 msgr: move dispatch thread into DispatchQueue
The DispatchQueue class now completely owns message delivery.  This is
cleaner and lets us drop the redundant destination_stopped flag from
msgr (DQ has its own stop flag).

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:30:33 -07:00
Sage Weil
9e291bae96 msgr: simplify checks for queueing connection events
Looking through git history it is not clear exactly how these checks
came to be.  They seem to have grown during the multiple-entity-per-rank
transition a few years back.  I'm not fully convinced they are necessary,
but we will keep them regardless.

Push checks into DispatchQueue and look at the local stop flag to
determine whether these events should be queued.  This moves us away from
the kludgey SimpleMessenger::destination_stopped flag (which will soon
be removed).

Also move the refcount futzing into the DispatchQueue methods.  This makes
the callers much simpler.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:30:33 -07:00
Sage Weil
bafcbdeb74 msgr: remove unnecessary accept check
We don't need to worry about racing with shutdown here; the cleanup
procedure will stop the accepter thread before cleaning up all the
pipes.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:30:33 -07:00
Sage Weil
bffd46c56a msgr: remove obsolete dead path
This hasn't triggered in years.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:30:33 -07:00
Sage Weil
34908140bf msgr: uninline ctor and dtor
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:30:33 -07:00
Sage Weil
912123f28d msgr: move Pipe, DispatchQueue into separate files
These don't need to be subclasses of SimpleMessenger.  Separate!

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:30:33 -07:00
Sage Weil
c7caf91baf msgr: simplify IncomingQueue creation, pointers
* create it via DispatchQueue
 * keep pointer to parent DispatchQueue
 * drop now-useless contextual arguments to most methods

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:28:31 -07:00
Sage Weil
fa4c19208b msgr: use local IncomingQueue instead of Pipe
Simpler, cleaner.  No need for the rest of the Pipe crap.  We just need to
queue messages for ourselves.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:28:31 -07:00
Sage Weil
2d071b9ec1 msgr: use explicit Connection for messages sent to ourself
Move to an explicit Connection for messages sent to ourselves, instead of
using the one on the local_pipe (which we'll remove shortly).

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:28:31 -07:00
Sage Weil
35b1326692 msgr: take over existing Connection on Pipe replacement
If a new pipe/socket is taking over an existing session, it should also
take over the Connection* associated with the existing session.  Because
we cannot clear existing->connection_state, we just take another reference.

Clean up the comments a bit while we're here.

This affects MDS<->client sessions when reconnecting after a socket fault.
It probably also affects intra-cluster (osd/osd, mds/mds, mon/mon)
sessions as well, but I did not confirm that.

Backport: argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 13:25:31 -07:00
Sage Weil
f7d4e39740 msgr: restart_queue when replacing existing pipe and taking over the queue
The queue may have been previously stopped (by discard_queue()), and needs
to be restarted.

Fixes consistent failures from the mon_recovery.py integration tests.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
5dfd2a512d msgr: choose incoming connection if ours is STANDBY
If the connect_seq matches, but our existing connection is in STANDBY, take
the incoming one.  Otherwise, the other end will wait indefinitely for us
to connect but we won't.

Alternatively, we could "win" the race and trigger a connection by sending
a keepalive (or similar), but that is more work; we may as well accept the
incoming connection we have now.

This removes STANDBY from the acceptable WAIT case states.  It also keeps
responsibility squarely on the shoulders of the peer with something to
deliver.

Without this patch, a 3-osd vstart cluster with
'ms inject socket failures = 100' and rados bench write -b 4096 would start
generating slow request warnings after a few minutes due to the osds
failing to connect to each other.  With the patch, I complete a 10 minute
run without problems.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
b7007a159f msgr: preserve incoming message queue when replacing pipes
If we replace an existing pipe with a new one, move the incoming queue
of messages that have not yet been dispatched over to the new Pipe so that
they are not lost.  This prevents messages from being lost.

Alternatively, we could set in_seq = existing->in_seq - existing->in_qlen,
but that would make the other end resend those messages, which is a waste
of bandwidth.

Very easy to reproduce the original bug with 'ms inject socket failures'.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
1f3a722e15 msgr: move dispatch_entry into DispatchQueue class
A bit cleaner.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
03445290da msgr: move incoming queue to separate class
This extricates the incoming queue and its funky relationship with
DispatchQueue from Pipe and moves it into IncomingQueue.  There is now a
single IncomingQueue attached to each Pipe.  DispatchQueue is now no
longer tied to Pipe.

This modularizes the code a bit better (tho that is still a work in
progress) and (more importantly) will make it possible to move the
incoming messages from one pipe to another in accept().

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
0dbc541695 msgr: make D_CONNECT constant non-zero, fix ms_handle_connect() callback
A while ago we inadvertantly broke ms_handle_connect() callbacks because
of a check for m being non-zero in the dispatch_entry() thread.  Adjust the
enums so that they get delivered again.

This fixes hangs when, for example, the ceph tool sends a command, gets a
connection reset, and doesn't get the connect callback to resend after
reconnecting to a new monitor.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
2429556a51 msgr: fix pipe replacement assert
We may replace an existing pipe in the STANDBY state if the previous
attempt failed during accept() (see previous patches).

This might fix #1378.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
204bc594be msgr: do not try to reconnect con with CLOSED pipe
If we have a con with a closed pipe, drop the message.  For lossless
sessions, the state will be STANDBY if we should reconnect.  For lossy
sessions, we will end up with CLOSED and we *should* drop the message.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:54:00 -07:00
Sage Weil
e6ad6d25a5 msgr: move to STANDBY if we replace during accept and then fail
If we replace an existing pipe during accept() and then fail, move to
STANDBY so that our connection state (connect_seq, etc.) is preserved.
Otherwise, we will throw out that information and falsely trigger a
RESETSESSION on the next connection attempt.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 17:53:59 -07:00
Sage Weil
a1fe589209 mon: initialize quorum_features
This could cause us to incorrectly encode new features into the monstore
that an old mon won't understand.

This is overly conservative; we probably need to persist the set of quorum
features that are supported and use those.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 16:05:16 -07:00
Samuel Just
2472034c4f OSD::do_command: unlock pg only if we had it
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-02 09:51:37 -07:00
Samuel Just
841451f2fe MOSDSubOp: set hobject_incorrect_pool in decode_payload
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-02 09:49:52 -07:00
Sage Weil
deceb709ea filestore: initialize m_filestore_do_dump
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-02 07:10:33 -07:00
Sage Weil
0810ab6de6 osdmap: check new pool name on rename
Ensure the new pool name doesn't already exist, both in the current and
project map.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-29 19:56:07 -07:00
Sage Weil
5a93550912 osd: handle pool name changes properly
* Remove the old name from the name->id map.

Fixes: #2676
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-29 19:54:35 -07:00
Sage Weil
a8d7fd959d mon: 'osd pool rename <oldname> <newname>'
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-29 14:51:32 -07:00
Yehuda Sadeh
15ebf2028e rest-bench: mark request as complete later
We marked a request as complete in the callback, however
it might be that we're still inside S3_runall_request_context()
which means that request is not really complete yet.
Possibly fixes bug #2652.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-06-28 11:28:35 -07:00
Samuel Just
335b918dc0 DBObjectMap: clones must inherit spos from parent
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-28 09:53:22 -07:00
Samuel Just
cc1da95895 filestore: sync object_map object in lfn_remove when nlink > 1
In the following sequence:

1) create (a, 1)
2) setattr (a, 1)
3) link (a, 1), (b, 1)
4) remove (a, 1)

If we play 1-4 and then replay 1-4 again, we will end up removing
(b, 1)'s attributes since nlink for (a, 1) the second time through
is 1.  We fix this by marking spos on the object_map header for
(a, 1) when we remove (a, 1) but not eh attributes.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-06-28 09:53:19 -07:00
Sage Weil
9d6013e0db debian: move metadata server into ceph-mds
Also adjust the recommends and depends, so that libcephfs1 and ceph-fuse
hang off of ceph-mds instead of ceph.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-27 20:25:13 -07:00
Sage Weil
915f321096 debian: move mount.ceph and cephfs into ceph-fs-common
Based on patches from Laszlo Boszormenyi (GCS) <gcs@debian.hu>.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-27 20:25:13 -07:00
Sage Weil
0d9b558f5c debian: arch linux-any
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-27 20:25:13 -07:00
Laszlo Boszormenyi (GCS)
89492329d1 debian: build with libnss instead of crypto++
Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>
2012-06-27 20:25:13 -07:00
Sage Weil
9d7f048073 doc/config-cluster/authentication: keyring default locations, simplify key management
- keyrings have new default locations that everyone should use.
- the user key setup is vastly simplified if you use the
  'ceph auth get-or-create' command.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-27 17:49:23 -07:00