Commit Graph

911 Commits

Author SHA1 Message Date
Sage Weil
ea6880f8a2 msg/DispatchQueue: do not discard queued events on stop
When the shutdown/stop flag is set, continue to work through the queue.
Process events, but discard messages.  This avoids the loss of reset events
on shutdown that are necessary to clean up ref cycles.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:53:06 -07:00
Sage Weil
de64bc50f2 msgr: queue reset exactly once on any connection
Use the atomic pipe link removal as a signal that we are the one failing
the con and use that to queue the reset event.

This fixes the case where we have an open, the session gets set up via the
handle_accept callback, and then race with another connection and go into
wait + close, or just close.  In that case, fault() needs to queue a reset
event to match the accept.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:52:18 -07:00
Sage Weil
26e16c008d msg/Pipe: include con reef in debug prestring
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:52:18 -07:00
Sage Weil
eea73ab88f msg/Pipe: reset replaced pipes
This gives the ms_handle_reset call a chance to clean up (for example, by
breaking a con->priv <-> session reference cycle).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:52:18 -07:00
Sage Weil
e96c0ceec7 msgr: use ConnectionRef throughout
Make RefCountedObject a private parent of Connection so that users are
forced to use ConnectionRef whenever references are taken.

Many methods can still take a raw Connection* when they are using the
caller's reference but not taking their own; this is cheaper than
twiddling the reference count, and the lifetime is still well defined.
Local variables generally use ConnectionRef, though.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:52:18 -07:00
David Zafman
7acf3de604 cls,msg: Fix use of set_in4_quad() to set a entity_addr_t
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-06-05 10:33:57 -07:00
Sage Weil
532dee523a Merge remote-tracking branch 'yan/wip-mds'
Reviewed-by: Sage Weil <sage@inktank.com>

Conflicts:
	src/mds/MDCache.cc
2013-05-29 10:26:56 -07:00
Yan, Zheng
eeb68eb33d mds: open inode by ino
This patch adds "open-by-ino" helper. It utilizes backtrace to find
inode's path and open the inode. The algorithm looks like:

1. Check MDS peers. If any MDS has the inode in its cache, goto step 6.
2. Fetch backtrace. If backtrace was previously fetched and get the
   same backtrace again, return -EIO.
3. Traverse the path in backtrace. If the inode is found, goto step 6;
   if non-auth dirfrag is encountered, goto next step. If fail to find
   the inode in its parent dir, goto step 1.
4. Request MDS peers to traverse the path in backtrace. If the inode
   is found, goto step 6. If MDS peer encounters non-auth dirfrag, it
   stops traversing. If any MDS peer fails to find the inode in its
   parent dir, goto step 1.
5. Use the same algorithm to open the inode's parent. Goto step 3 if
   succeeds; goto step 1 if fails.
6. return the inode's auth MDS ID.

The algorithm has two main assumptions:
1. If an inode is in its auth MDS's cache, its on-disk backtrace
   can be out of date.
2. If an inode is not in any MDS's cache, its on-disk backtrace
   must be up to date.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-05-28 13:57:22 +08:00
Sage Weil
27381c0c62 osd: ping both front and back interfaces
Send ping requests to both the front and back hb addrs for peer osds.  If
the front hb addr is not present, do not send it and interpret a reply
as coming from both.  This handles the transition from old to new OSDs
seamlessly.

Note both the front and back rx times.  Both need to be up to date in order
for the peer to be healthy.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-22 16:13:37 -07:00
Sage Weil
92a558bf0e msgr: add Messenger reference to Connection
This allows us to get the messenger associated with a connection.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-22 16:13:37 -07:00
Sage Weil
28851424bf msgr: take an arbitrary set of ports to avoid binding to
We used to only need to avoid 2 ports; now we need 3.  Make it a set so we
don't have this problem later.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-22 16:13:37 -07:00
Samuel Just
49eeaeba3f Messenger: add interface to get oldest queued message arrival time
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-23 18:27:28 -07:00
Samuel Just
297c6714b3 DispatchQueue: track queued message arrival times and expose oldest
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-23 18:27:28 -07:00
Greg Farnum
cecbb4d88a Merge remote-tracking branch 'origin/wip-osd-throttle2' into next
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-09 12:11:15 -07:00
Samuel Just
d7b7acefc8 Pipe: call discard_requeued_up_to under pipe_lock
Fixes: #4627
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-08 17:02:45 -07:00
Sage Weil
f7070e9568 msgr: add second per-message throttler to message policy
We already have a throttler that lets of limit the amount of memory
consumed by messages from a given source.  Currently this is based only
on the size of the message payload.  Add a second throttler that limits
the number of messages so that we can effectively throttle small requests
as well.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-06 08:17:01 -07:00
Greg Farnum
4f8ba0e775 msgr: allow users to mark_down a NULL Connection*
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
2013-03-29 10:42:04 -07:00
Samuel Just
b8929c4262 messages: add MOSDMarkMeDown
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-03-21 18:37:35 -07:00
Sage Weil
541cd3c64b msg/Pipe: fix seq handshake on reconnect
We go to the trouble to exchange our seq numbers during the handshake, but
the bit that then avoids resending old messages was broken because we
already requeue_sent() before we get to this point.  Fix it by discarding
queued items (in the high prio slot) that we don't need to resend, and
adjust out_seq as needed.

Drop the optional arg to requeue_sent() now that it is unused.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-20 21:52:21 -07:00
Sage Weil
7e7a19ba18 Merge pull request #115 from ceph/wip-4199
Resolves #4199

Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-18 15:53:56 -07:00
Joao Eduardo Luis
b781400f1f mon: HealthMonitor: Keep track of monitor cluster's health
The HealthMonitor builds upon the QuorumService interface, and should be
used to keep track of all and any relevant information about the monitor
cluster (maybe even about all the cluster if need be).

This patch also introduces the HealthService interface, used to define
a HealthMonitor service, responsible for dispatching 'MMonHealth' messages
(the QuorumService interface dispatches generic 'Message').

Based on the HealthService interface, we introduce the DataHealthService
class, a service that will track disk space consumption by the monitors,
warn when a given threshold is crossed, and gracefully shutdown the monitor
if disk space usage hits critical levels that might affect the correct
monitor behavior.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-03-18 22:43:55 +00:00
Danny Al-Gaaf
9704dde121 msg/Pipe.cc: prefer prefix ++operator for iterators
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-03-18 12:35:35 +01:00
Danny Al-Gaaf
bd4f1a3705 msg/Messenger.h: prefer prefix ++operator for iterators
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-03-13 16:59:40 +01:00
Danny Al-Gaaf
a6f4de924c msg/SimpleMessenger.cc: use static_cast instead of C-Style cast
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-03-11 13:45:33 +01:00
Danny Al-Gaaf
f1f1c77697 SimpleMessenger.cc: use static_cast instead of C-Style cast
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-03-01 15:54:21 +01:00
Danny Al-Gaaf
5009c77d60 msg/Pipe.cc: reduce scope of some variables
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-03-01 15:40:51 +01:00
Sage Weil
862c761554 Merge branch 'next' 2013-02-28 16:58:02 -08:00
Sage Weil
0f42eddef5 msgr: drop messages on cons with CLOSED Pipes
Back in commit 6339c5d439, we tried to make
this deal with a race between a faulting pipe and new messages being
queued.  The sequence is

- fault starts on pipe
- fault drops pipe_lock to unregister the pipe
- user (objecter) queues new message on the con
- submit_message reopens a Pipe (due to this bug)
- the message managed to make it out over the wire
- fault finishes faulting, calls ms_reset
- user (objecter) closes the con
- user (objecter) resends everything

It appears as though the previous patch *meant* to drop *m on the floor in
this case, which is what this patch does.  And that fixes the crash I am
hitting; see #4271.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-28 16:57:42 -08:00
Sage Weil
c47c02dd87 msg/Pipe: allow tuning of TCP receive buffer size
Performance tests on high-end machines have indicated the Linux autotuning
of the receive buffer sizes can cause throughput collapse.  See bug
#2100, and this email discussion:

   http://marc.info/?l=ceph-devel&m=133009796706284&w=2

Initially default to 0, which leaves us with the default.  We may adjust
the default in the future.

Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-28 13:01:27 -08:00
Sage Weil
e10c1d1453 msg/Pipe: move setting of socket options into a common method
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-28 13:00:19 -08:00
Sage Weil
d6b4a7be20 Merge branch 'next' 2013-02-26 17:29:48 -08:00
Sage Weil
c8dd2b67b3 msg: fix entity_addr_t::is_same_host() for IPv6
We weren't checking the memcmp return value properly!  Aie...

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-26 14:07:12 -08:00
Sage Weil
be31390978 msgr: print dump before asserting (if that is enabled)
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-26 12:38:33 -08:00
Sage Weil
20b093395d msgr: dump corrupt message to log (at high debug levels)
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-26 12:35:06 -08:00
Joao Eduardo Luis
beafca57fb Merge branch 'wsp.bobtail.2merge' into wsp.bobtail.master
Conflicts:
	src/.gitignore
	src/Makefile.am
	src/include/ceph_features.h
	src/mon/MDSMonitor.cc
	src/mon/PGMonitor.cc
2013-02-21 18:04:22 +00:00
Joao Eduardo Luis
6db25a3885 message: MMonSync: Monitor Synchronization message
The monitor's synchronization process requires a specific message type
to carry the required informations. Since this process significantly
differs from slurping, reusing the MMonProbe message is not an option as
it would require major changes and, for all intetions and purposes, it
would be far outside the scope of the MMonProbe message.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-02-21 18:02:22 +00:00
Sage Weil
1b05517a83 msg/Messenger: rename option
It's unhandled, not unexpected.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-12 14:18:37 -08:00
Sage Weil
1e68ccf6aa msg/Messenger: do not crash on unhandled message
This is just polite.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-12 14:10:18 -08:00
Danny Al-Gaaf
99f217503a src/msg/Messenger.h: pass function parameter by reference
Fix "(performance) Function parameter 'm' should be passed by reference."
from cppchecker.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-02-11 11:38:02 +01:00
Danny Al-Gaaf
d427d9828e src/msg/msg_types.h: pass function parameter by reference
Fix "Function parameter 'm' should be passed by reference." from cppchecker.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-02-11 11:38:02 +01:00
Danny Al-Gaaf
db0dbe5db8 msg/Message.h: fix C-style pointer casting
Replace C-style pointer casting with correct static_cast<>().

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2013-02-06 08:42:04 -08:00
Sage Weil
a7059eb3f3 msgr: add get_loopback_connection() method
Return the Connection* for ourselves, so we can queue messages for
ourselves.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-25 09:38:30 -08:00
Sage Weil
6e3363b20e common/PrioritizedQueue: add min cost, max tokens per bucket
Two problems.

First, we need to cap the tokens per bucket.  Otherwise, a stream of
items at one priority over time will indefinitely inflate the tokens
available at another priority.  The cap should represent how "bursty"
we allow a given bucket to be.  Start with 4MB for now.

Second, set a floor on the item cost.  Otherwise, we can have an
infinite queue of 0 cost items that start over queues.  More
realistically, we need to balance the overhead of processing small items
with the cost of large items.  I.e., a 4 KB item is not 1/1000th as
expensive as a 4MB item.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:41 -08:00
Sage Weil
e8e0da1a57 osd: use Message::get_cost() function for queueing
The data payload is a decent proxy for cost in most cases, but not all.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:40 -08:00
Sage Weil
50db10dc25 msg/Pipe: require MSG_AUTH feature on server if option is enabled
If we

  negotiate cephx AND
  are a server AND
  cephx require signatures = true

then require the MSG_AUTH feature bit.  Put this in the Policy struct for
this connection so that the existing feature bit checks and error reporting
are used, and the peer knows what feature it is missing.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-17 15:12:00 -08:00
Joao Eduardo Luis
aa40de9088 messages: add MTimeCheck
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-11 00:44:21 +00:00
Sage Weil
d16ad9263d msg/Pipe: prepare Message data for wire under pipe_lock
We cannot trust the Message bufferlists or other structures to be
stable without pipe_lock, as another Pipe may claim and modify the sent
list items while we are writing to the socket.

Related to #3678.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-07 13:02:58 -08:00
Sage Weil
40706afc66 msgr: update Message envelope in encode, not write_message
Fill out the Message header, footer, and calculate CRCs during
encoding, not write_message().  This removes most modifications from
Pipe::write_message().

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-07 13:02:58 -08:00
Sage Weil
4cfc4903c6 msg/Pipe: encode message inside pipe_lock
This modifies bufferlists in the Message struct, and it is possible
for multiple instances of the Pipe to get references on the Message;
make sure they don't modify those bufferlists concurrently.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-06 20:38:28 -08:00
Sage Weil
a058f16113 msg/Pipe: associate sending msgs to con inside lock
Associate a sending message with the connection inside the pipe_lock.
This way if a racing thread tries to steal these messages it will
be sure to reset the con point *after* we do such that it the con
pointer is valid in encode_payload() (and later).

This may be part of #3678.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-06 20:38:25 -08:00