RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-11 13:41:02 +00:00

Author	SHA1	Message	Date
Sage Weil	d58df35f88	msg/Pipe: simplify Pipe::tcp_read() return value 0 for success; no reason to return length (always == len). Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	76954c13c1	msg/Pipe: document tcp_*() Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	5d5045d31a	msg/Accepter: use learned_addr() from Accepter::bind() Normally we never go from need_addr == false to need_addr == true. It always starts out as true, so this else is useless on the first call to Accepter::bind(). The only exception is rebind(). Add an unlearn_addr() that will clear need_addr. This is almost unnecessary, but doing so fixes a small bug where the local_connection->peer_addr doesn't get updated when we do a rebind(). Drop now-unused set_need_addr(). We keep get_need_addr() only because it is useful in the debug output and for the assert. Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	1b8f2e0599	msg/SimpleMessenger: push need_addr check into learned_addr() This puts all of the do/do not lock logic in one place, and documents it. Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	8453a8198c	msg/Accepter: pass nonce on start This lets us drop the otherwise awkward SimpleMessenger::get_nonce() accessor. Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	a0017fcc25	msgr: protect set_myaddr() This is used by Messenger implementation (and their constituent components). Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	c84b7289c1	msg/Accepter: make members private Nobody uses these. Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	608c776bf9	msgr: remove useless SimpleMessenger::msgr Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	b97f6e3544	msgr: some SimpleMessenger docs Document basic modules and the lock ordering. Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	8c1632ba50	cephtool: send keepalive to tell target If we 'ceph tell <foo> ...' to a non-monitor, we need to send keepalives to ensure we detect a tcp drop. (Not so for monitors; monclient already does its own keepalive thing.) Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:46:14 -07:00
Sage Weil	90e0ef907f	cephtool: retry 'ceph tell <who> ...' command if connection fails It was easy to reproduce a hang with 'ceph osd tell osd.0 foo' and messenger failure injection. Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:45:48 -07:00
Sage Weil	ee206a52b6	cephtool: set messenger policy Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:45:48 -07:00
Sage Weil	b30ad9a1c9	cephtool: fix deadlock on fault when waiting for osdmap send_command() was blocking for the osdmap, and also called from the connect callback. Instead, re-call it from the handle_osd_map() callback so that it never blocks. This was easy to trigger with 'ceph osd tell osd.0 foo' and ms failure injection. Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-13 08:45:48 -07:00
Sage Weil	bbc49179a9	msg/Pipe: if we send a wait, make sure we follow through Mark our outgoing connection attempt if we send a WAIT in accept(). This ensures we don't go to standby or closed in fault() on the outgoing connection for any reason. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-27 10:45:21 -07:00
Sage Weil	6c01d46ee6	client: handle fault during session teardown We may have a sequence like: - client does REQUEST_CLOSE - mds sends reply - connection faults, client does get reply - mds closes out its connection - client tries to reconnect/resend, gets RESET_SESSION -> continues lamely waiting If we get a session reset and we were asking to close the connection, we are happy--it was closed. This was exposed with ceph-fuse start/stop tests with socket failure injection. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-27 10:44:13 -07:00
Sage Weil	a879425b37	msg/Pipe: make STANDBY behavior optional In particular, lossless_peers should use STANDBY, but lossless_clients should reconnect immediately since they are already doing their own session management. Specifically, this fixes the problem where the Client tries to open a connection to the MDS and faults after delivering its OPEN_SESSION message but before it gets the reply: the session isn't open yet, so it isn't pinging. It could, but it is simpler and faster to make the msgr layer keep the connection open instead of waiting for a periodic keepalive. Fixes: #2824 Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-27 10:44:09 -07:00
Sage Weil	7cf1f1fb7f	msg/Pipe: go to STANDBY on lossless accept fault Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:34 -07:00
Sage Weil	ef3fd1c39d	msg/Pipe: go to standby on lossless server connection faults Go directly to the STANDBY state, and print a more accurate message. Otherwise, we do the same check in writer() and go to STANDBY then. This is less confusing. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:33 -07:00
Sage Weil	9348bb388b	osd: reopen heartbeat connections when they fail If we have an active peer whose Connection fails, open a new one. This is necessary now that a lossy client connection does not automatically reopen on its own (which is necessary to avoid races with session-based lossy clients and the ms_handle_reset callback). Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:33 -07:00
Sage Weil	ea7511b83b	msg/Pipe: fix leak of Connection in ctor Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:33 -07:00
Sage Weil	60eb36ef9d	msgr: close get_connection() race This could null deref if the Pipe is registered but failed. We need to loop here because the Pipe vs Connection stuff sucks; hopefully this gets fixed up soonish. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:33 -07:00
Sage Weil	04fde5180e	msgr: drop CLOSED checks during queueing AFAICS these checks are pointless. There should be no harm in queueing messages on a closed connection; they'll get cleaned up when it is deregistered. Moreover, the queuer shouldn't be the one who has to unregister a Pipe. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:33 -07:00
Sage Weil	adce6df207	msgr: simplify submit_message() Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:33 -07:00
Sage Weil	2e67b7a383	msgr: do not reopen failed lossy Connections There was a race where: - sending stuff to a lossy Connection - it fails, and queues itself for reap, queues a RESET event - reaper clears the Pipe - some thread queues new messages and the Pipe is reopened, messages sent - RESET event delivered to dispatch, connection is closed and reopened. The result was that messages got sent to the OSD out of order during the window between the fault() and ms_handle_reset() getting called. This will prevent that. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:33 -07:00
Sage Weil	9a4e702795	msg/Pipe: unregister pipe immediately on fault; fix mark_down This fixes a problem where: - pipe faults, con->pipe is cleared - ms_handle_reset tries to mark_down, but it doesn't know the pipe Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:32 -07:00
Sage Weil	541694f768	msg/Pipe: disconnect Pipe from lossy Connection immediately on failure When we have a lossy connection failure, immediately disconnect the Pipe and set the Connection failed flag. There is no reason to wait until the reaper comes along. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:32 -07:00
Sage Weil	cef8510560	msg/Connection: add failed flag for lossy Connections If a lossy Connection fails and we disconnect the Pipe, set a failed flag. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:32 -07:00
Sage Weil	472d14f717	msg/DispatchQueue: fix locking in dispatch thread The locking was awkward with locally delivered messages.. we dropped dq lock, inq lock, re-took dq lock, etc. We would also take + drop + retake + drop the dq lock when queuing events. Blech! Instead: * simplify the queueing of cons for the local_queue * dequeue the con under the original dq lock * queue events under a single dq lock interval, by telling local_queue.queue() we already have it. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:32 -07:00
Sage Weil	9d94ed1caa	test_stress_watch: verify that the watch operation isn't slow Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:32 -07:00
Sage Weil	7b398a5d9c	msgr: indicate whether clients are lossy We need to know whether the client is lossy before we connect to the peer in order to know whether to deliver a RESET event or not on connection failure. Lossy clients get one, lossless do not. And in any case, we know ahead of time, so we may as well indicate as much in the Policy. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:32 -07:00
Sage Weil	525830cd0b	msgr: do not discard_queue in Pipe reaper The IncomingQueue can live beyond the Pipe. In particular, there is no reason not to deliver messages we've received on this connection even though the socket has errored out. Separate incoming queue discard from outgoing, and only do the latter in the reaper. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:32 -07:00
Sage Weil	8966f71ae5	msg/IncomingQueue: make the pipe parent informational only Use this pointer only for debug output prefix; do not dereference, as we may live beyond the original parent. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:32 -07:00
Sage Weil	999c506d5e	msg/DispatchQueue: give IncomingQueue ref to queue We want to be able to queue an event (e.g., RESET) and deliver it even after the Pipe is destroyed. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:31 -07:00
Sage Weil	5a62dfef3d	msg/DispachQueue: hold lock in IncomingQueue::discard_queue() This prevents races with the dispatch thread, among other things. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:31 -07:00
Sage Weil	35b7bca357	msg: kill tcp.{cc,h} Move the remaining comparator into msg_types.h and kill this off. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:36:31 -07:00
Sage Weil	5ecc5bce18	msg/DispatchQueue: cleanup debug prefix Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:22:27 -07:00
Sage Weil	89b07f4703	msg/Pipe: move tcp_* functions into Pipe class This lets us print nice debug prefixes. It also calls BS on the Pipe vs tcp.cc separation. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:22:24 -07:00
Sage Weil	d034e46dd8	msgr: move Accepter into separate .cc Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:22:21 -07:00
Sage Weil	3e98617c3a	msg/Pipe: get_state_name() Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:22:15 -07:00
Sage Weil	f78a4010f1	msgr: rework accept() connect_seq/race handling We change a couple of key things here: * If there is a matching connect_seq and the existing connection is in OPEN (or STANDBY; same thing + a failure), we send a RETRY_SESSION and ask the peer to bump their connect_seq. This handles the case where there was a race, our end successfully opened, but the peer's racing attempt was slowly processed. * We always reply with connect_seq + 1. This handles the above case more cleanly, and lets us use the same code path. Also avoid duplicating the RETRY_SESSION path with a goto. Beautiful! Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-20 18:22:10 -07:00
Sage Weil	a542d89ee5	mds: fix race in connection accept; fix con replacement We solve two problems with this patch. The first is that the messenger will now reuse an existing session's Connection with a new connection, which means that we don't want to change session->connection when we are validating an authorizer. Instead, set (but do not change) it. We also want to avoid a race where: - mds recovers, replays Sessions with no con's - multiple connection attempts for the same session race in the msgr - both are authorized, but out of order - Session->connection gets set to the losing attempt's Connection* Instead, we take advantage of an accept event that is called only for accepted winners. Signed-off-by: Sage Weil <sage@inktank.com> fixup	2012-07-10 19:04:42 -07:00
Sage Weil	68bad03b2c	msgr: queue accept event when pipe is accepted Queue an event when an incoming connection is accepted. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:59:24 -07:00
Sage Weil	fab6e824c4	msg/DispatchQueue: queue and deliver accept events Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:59:21 -07:00
Sage Weil	d4ef004e64	dispatcher: new 'accept' event type Create a new event type when we successfully accept a connection. This is distinct from the authorizor verification, which may happen for multiple racing connection attempts. In contrast, this will only happen on those that win the race(s). I don't think this is that important for stateless servers (OSD, MON), but it is important for the MDS to ensure that it keeps its Session con reference pointing to the most recently-successful connection attempt. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:30:44 -07:00
Sage Weil	1a4a133071	msgr: drop unnecessary (un)locking on queuing connection events This used to be necessary because the pipe_lock was used when queueing the pipe in the dispatch queue. Now that is handled by IncomingQueue's own lock, so these can be removed. By no longer dropping the lock, we eliminate a whole category of potential hard-to-debug races. (Not that any were observed, but now we dno't need to worry about them.) Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:30:33 -07:00
Sage Weil	e84565d9e8	msgr: move dispatch thread into DispatchQueue The DispatchQueue class now completely owns message delivery. This is cleaner and lets us drop the redundant destination_stopped flag from msgr (DQ has its own stop flag). Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:30:33 -07:00
Sage Weil	9e291bae96	msgr: simplify checks for queueing connection events Looking through git history it is not clear exactly how these checks came to be. They seem to have grown during the multiple-entity-per-rank transition a few years back. I'm not fully convinced they are necessary, but we will keep them regardless. Push checks into DispatchQueue and look at the local stop flag to determine whether these events should be queued. This moves us away from the kludgey SimpleMessenger::destination_stopped flag (which will soon be removed). Also move the refcount futzing into the DispatchQueue methods. This makes the callers much simpler. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:30:33 -07:00
Sage Weil	bafcbdeb74	msgr: remove unnecessary accept check We don't need to worry about racing with shutdown here; the cleanup procedure will stop the accepter thread before cleaning up all the pipes. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:30:33 -07:00
Sage Weil	bffd46c56a	msgr: remove obsolete dead path This hasn't triggered in years. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:30:33 -07:00
Sage Weil	34908140bf	msgr: uninline ctor and dtor Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-10 13:30:33 -07:00

1 2 3 4 5 ...

20064 Commits