Commit Graph

38640 Commits

Author SHA1 Message Date
Sage Weil
6986ec1cea osd/PG: populate blocked_by with peers we are trying to activate
Once peering finishes, all osds need to persist their info and ack before
we are fully active.  Populate blocked_by with those peers so we can tell
when they are stalling the process.

Fixes: #10477
Signed-off-by: Sage Weil <sage@redhat.com>
2015-01-14 16:41:45 -08:00
Dan Mick
9542416890 Merge pull request #3366 from ceph/wip-formatter
formatter: improve pretty output, rename factory method

Reviewed-by: Dan Mick <dan.mick@redhat.com>
2015-01-14 15:25:25 -08:00
Loic Dachary
5b0e8aef67 mailmap: Yehuda Sadeh name normalization
Signed-off-by: Loic Dachary <ldachary@redhat.com>
2015-01-15 00:12:07 +01:00
Sage Weil
3f03a7b2ee doc/release-notes: v0.91
Signed-off-by: Sage Weil <sage@redhat.com>
2015-01-14 15:11:19 -08:00
Sage Weil
4ca69313e5 doc/release-notes: typo
Signed-off-by: Sage Weil <sage@redhat.com>
2015-01-14 15:11:19 -08:00
Josh Durgin
e7cc6117ad qa: ignore duplicates in rados ls
These can happen with split or with state changes due to reordering
results within the hash range requested. It's easy enough to filter
them out at this stage.

Backport: giant, firefly
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2015-01-14 15:02:38 -08:00
Gregory Farnum
6fa29f6f19 Merge pull request #3372 from ceph/wip-10539
qa: fail_all_mds between fs reset and fs rm

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2015-01-14 14:50:46 -08:00
John Spray
e5591f8a98 qa: fail_all_mds between fs reset and fs rm
Because fs reset opens a brief window for the previously
failed MDSs to spring back into life.

Fixes: #10539

Signed-off-by: John Spray <john.spray@redhat.com>
2015-01-14 22:08:09 +00:00
Loic Dachary
26a2df2835 mailmap: Josh Durgin name normalization
Signed-off-by: Loic Dachary <ldachary@redhat.com>
2015-01-14 23:00:32 +01:00
Sage Weil
d6a9d25cf1 doc/release-notes: v0.80.8
Signed-off-by: Sage Weil <sage@redhat.com>
2015-01-14 13:48:32 -08:00
Matt Benjamin
45e9cd5bd4 Fix make check blockers.
Replace ceph-helpers.sh check for ms_nocrc with the new formula
for this.  Fixes make check for default build.

	Additionally, fix linkage of several unittests when building with
	--enable-xio.
	xio:  add missing noinst headers
		The common/address_helper.h file was not mentioned, also
		msg/xio/XioSubmit.h.
	Fix for Message.cc compilation error when Xio disabled.
	Mention simple_dispatcher.h and xio_dispatcher.h in noinst_HEADERS.
	xio:  require boost-regex.
	Make address_helper conditional on Xio.
		This carries over to simple_client/simple_server,
		for convenience.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:44:47 -05:00
Vu Pham
daefad7a4b xio: enable accelio debug on level 2
Enable accelio debug (mostly on connection) on level 2
and sync with XioConnection debug events

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:44:37 -05:00
Vu Pham
aa5f1955a8 xio: Get the right Accelio errno code
Get the right Accelio errno code on xio_send_msg in
order to correctly requeue or fail the xmsg

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:44:30 -05:00
Matt Benjamin
37719c3b57 Dequeue XioMsg on send-fail
If a message send hard fails, don't omit to remove it from the
send_q--this results in an assert when the queue is in safe-mode,
but with safe mode disabled, send_q would be corrupted.

Dont fall through and erase the iterator twice.  Continue the loop,
as in the incoming release case.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:44:24 -05:00
Matt Benjamin
d16e1817b6 Reduce lock spam in XioPortal SubmitQueue.
In SubmitQueue::deq, scan up to nlanes, exit immediately with
found work, recalling last lane.

Yield to Accelio iff a full scan finds no work.

xio: avoid starving run loop, don't stop loop in shutdown()
     Fix 2 issues flagged in review by Alex Rosenbaum.
     * In the XioPortal main loop, the recent reduce lock contention change
       also made easier to starve Accelio under steady send work.  Restore
       the original behavior.
     * Remove the call to xio_context_stop_loop() in XioPortal::shutdown(),
       to ensure that Accelio can finish cleaning up.

Move queue guard check.
Release der spinlock.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:44:17 -05:00
Matt Benjamin
f276145fb6 xio: initial mark_* and queueing/flow control
This changes implements explicit support for Accelio sender-side
flow control, which requires queuing messages for later delivery
when the connection is ready to send.

This rquirement to queue messages for later delivery, and related
connection state logic, is substantially shared with new session
reset behavior, so we've pulled a subset of that logic foward.

Again due to shared implementation logic, this change also adds
implementations of mark_down(), mark_down_all(), mark_disposable(),
and related methods from Messenger, which were required to be
implemented after Hammer.

Add XioSubmit.h.

For now, start at state UP, READY.

When considering if a flow-controlled connection can be unblocked,
consider only the computed queue depth.  Re-activate and flush the
connection iff the computed queue depth <= 1/2 of the queue
high-water mark.

Placeholder added for byte-throttled case.
Fix lock flags abuse (found by Casey).

Discard deferred and unsent messages on unplanned disconnect.
	The change causes discard_input_queue() to be called in Accelio's
	on_disconnect_event() handler, as well as on mark_down().

xio: Change new established connection's state to up and ready
     Change the new established passive connection's state to up and
     ready then flush all pending msgs in input_queue

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:44:11 -05:00
Vu Pham
1c2efde84d xio: Enable Accelio flow control with msgs and bytes throttlers
* Enable Accelio flow control in general
* Read out policy for messages and bytes throttlers from connection's peer_type
* Set Accelio connection flow control with policy throttlers or default values
* Set q_high_mark for xio_connection (80% of queue_depth)
  xio: Correct q_high_mark setting

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:44:00 -05:00
Vu Pham
3c7e857b83 xio: Configure Accelio internal pool
Temporarily hardcoded all 6 allocators + growing quantum and max size
1k allocator - quantum 4k - max 256k
4k allocator - quantum 4k - max 256k
16k allocator - quantum 4k - max 256k
64k allocator - quantum 1k - max 64k
256k allocator - quantum 512 - max 16k
1m allocator - quantum 128 - max 8k

Later we need to calculate the sustainable workload and dynamically
configure Accelio's interal pool accordingly

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:43:54 -05:00
Matt Benjamin
dcfb80a8db Accelio Autotools glue.
Add Accelio to build process with --enable-xio is provided.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:43:43 -05:00
Casey Bodley
aba35bcaff cmake: add xio
Signed-off-by: Casey Bodley <casey@linuxbox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:43:36 -05:00
Matt Benjamin
610d66f531 Ceph Accelio/RDMA Transport (XioMessenger).
XioMessenger implements a Ceph Messenger provider for Accelio,
a high-performance messaging transport by Mellanox.  Current
Accelio is layered on ibverbs, and supports Infiniband, ROCE,
and other RDMA transports.  Future Accelio verions will support
alternative transports (including TCP), and flexible transport
selection.

config: cluster_rdma drives messenger creation
ceph_mds ceph_mon and ceph_osd use XioMessengers for cluster
	communication when cluster_rdma is set
Move XioMessenger to msg/xio.
	This matches the other new Messenger locations.
test: tests for tcp and xio messengers
	(Not tests only.)
buffer: add subclass for xio buffers
xio: convert to Connection::send_message interface
config: -x, --xio as aliases for client_rdma
ceph-fuse: create xio messenger if client_rdma
Find XioMessenger.h and QueueStrategy.h in msg/xio.
ceph-syn: create xio messenger if client_rdma
librados: create xio messenger if client_rdma
Find XioMessenger.h and QueueStrategy.h in msg/xio.
Restore non-abort from Xio Mon integration.
Fix xio_client send count, again.
xio: must signal cond under mutex lock
xio: dispatch strategies support ms_fast_dispatch
xio: config variable xio_port_shift
remove set_port_shift() from XioMessenger, and just use the value
	from the configuration
xio: don't depend on g_ceph_context for dout
XioMessenger now uses its own cct for all logging operations
	the accelio log function, however, still depends on a global
	CephContext. so we maintain an extra one, separate from g_ceph_context,
	in XioMessenger.cc that is initialized on first construction and a
	reference is held indefinitely
script: cephfsnew to automate pool and fs creation
Use new on_ow_msg_send_complete hook.
	Replaces on_msg_delivered for one-way message style.
Prototype new xio_discon behavior.
	On shutdown, XioPortal threads should not exit before Accelio
	finalizes all sessions.
Inline join_sessions, it needs sh_mtx held across wait loop.
Fix assert on Cond::Signal.  Adds Cond2.
Avoid deadlock, xio_disconnect can deliver a session teardown event.
Also Mutex2.
	(Note, Mutex2 and Cond2 are replaced by standard C++ downstream.)
Restore SimpleDispatcher Timings.
	The simple_client/simple_server timings are based on a ping/pong
	of messages between the client and server, unlike those of the
	xio_client/server programs, which are one-way (so their corresponding
	1-way bandwidth is appx. 2x what the test reports).  We assert
	that the results are in general comparable, because in both setups,
	a fixed number of messages (def. 50) is maintained in flight.
Wrap Accelio mempool in XioPool, add stats.
	To enable stat prints, set xio_trace_mempool.  Currently, prints
	to stdout at each 64K messages sent or received.
Restore _send_message(..).
Fix merge errors in simple_client, simple_dispatcher.
xio: fix for size in pool stats
Add in/outbound msg counters to XioPoolStats.
Pool stats are easier to read.
	Pool stats are easier to read, and if enabled, print on session
	teardown. This is a convenient time to view stats, and with a small
Make pool stats counters atomic.
Track requests using hook ctor/dtor.
	Lockless, portal thread provides atomicity.
Adapt to recent changes on Accelio for_next
	* Accelio options now of opaque type
	* on_msg_err with extra direction param
	* RDMA behavior now governed by 2 new options
    		XIO_OPTNAME_MAX_INLINE_DATA
   	 	XIO_OPTNAME_MAX_INLINE_HEADER
	* Separated send and recv queue depth
xio_messenger: Change xio optname queue depth msgs
	* Set 16k threshold to rdma buffers instead of send
	* Change xio optname for queue depth msgs
  		XIO_OPTNAME_SND/RCV_QUEUE_DEPTH_MSGS
xio_messenger: Protect Accelio queue depth.
	(Minimal send flow control.)
	The guard is per xio_connection, and considering batches.
	Increment happens only if xio_send_msg succeeded, decrement in
	on_ms_ow_send_complete and on_msg_error.  Note that we don't need
	atomics because counters are touched only in the correct portal
	thread.
Find XioMsg.h in msg/xio
Find XioMessenger.h and QueueStrategy.h in msg/xio (tests).
Adapt to 2 Accelio API changes.
	1. xio_context_stop loop takes only 1 argument
	2. xio_connect() now takes a structure argument, by reference
Set CMP0046 iif CMake version >= 3
Move XioMessenger to msg/xio
xio: fix for segfault on xio_connect()
No more Mutex2, Cond2.
xio: number of portal threads is configurable
xio: only create additional portals on bind()
xio: use QueueStrategy(1) as default
xio: Messenger factory accepts ms_type "xio"
xio: use ms_type instead of client,cluster_rdma
     removing the ability to configure the client and cluster networks
     separately in favor of a single global messenger type
     --xio is now a command-line alias for --ms_type xio
     all daemons now use the Messenger::create() factory function instead of
     conditionally creating XioMessengers
     the OSD and Monitor classes no longer need separate messengers to
     deal with both tcp/rdma clients
xio: portal binding honors ms_bind_port_min,max
xio: remove xio_port_shift
     port shifting is no longer necessary, because we won't create both tcp
     and xio messengers for the same service
     Use Accelio sglist helper macros.
     xio: make xio buffer unshareable
xio: Nuke special_handling.
Replace GENERIC with MON (requested by Sage).

Signed-off-by: Casey Bodley <casey@cohortfs.com>
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:43:30 -05:00
Matt Benjamin
a064237166 Cosmetic ceph_mon.cc.
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:43:14 -05:00
Matt Benjamin
53bc4d1757 Cosmetic ceph_osd.cc.
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:43:08 -05:00
Matt Benjamin
fd5cd938d2 Cosmetic ceph_mds.cc.
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:42:57 -05:00
Matt Benjamin
d53b378ea0 Introduce Message flag values used by XioMessenger.
These correspond to bits in Message::magic and the erstwhile
"special_handling" member.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:42:39 -05:00
Matt Benjamin
b4447e9a33 Add Message::set_src(const entity_name_t& src)
Permit setting the source endpoint of a Message.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:42:30 -05:00
Matt Benjamin
a96373faab Remove pure virtuals from Message::CompletionHook.
This was introduced for XioMessenger, but is no longer used.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:42:24 -05:00
Matt Benjamin
ef7e735559 Add intrusive list anchor for Message dispatch to Message.
This is currently used by XioMessenger dispatch strategies, but
could be extended to other Messenger types.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:42:13 -05:00
Matt Benjamin
984a3eedab Add MDataPing.
This message type is used for Messenger testing.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:42:03 -05:00
Matt Benjamin
71d08b4ab3 Accelio ceph::buffer Extensions
Adds custom buffer::raw type xio_mempool, with hooks into Accelio
memory lifecycle.

The xio_mempool type is non-sharable by default.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:41:57 -05:00
Matt Benjamin
4cbf2d5595 Cosmetic: Normalize an entity_name_t initialization in ceph-syn.
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:41:40 -05:00
Casey Bodley
2ffacbe6ef msg: crc configuration in messenger
Add new header_crc and data_crc configuration booleans, and use
them consistently to govern whether CRC is performed in the
Message encode, decode, and transit paths.

Remove ms_nocrc, changes per Sage.
Mimimally adapt AsyncMessenger for crcflags.

Signed-off-by: Casey Bodley <casey@linuxbox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:41:32 -05:00
Matt Benjamin
b677a86fb2 Build rbd-fuse as a C++ unit (matching its existing linkage).
Since rbd-fuse is linking C++ libraries, link it with the C++
runtime as we already do for ceph-fuse.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:40:46 -05:00
Casey Bodley
80b3ff0ef7 mon: OSDMonitor sends maps over connection
Signed-off-by: Casey Bodley <casey@linuxbox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:40:39 -05:00
Casey Bodley
9fff0c53bd msg: remove create_anon_connection from Messenger
the monitor now defines its own subclass of Connection to use for
Monitor::handle_forward(), rather than tying it to the Messenger
interface

Signed-off-by: Casey Bodley <casey@linuxbox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:40:33 -05:00
Matt Benjamin
2401c3ba90 dout: dlog_p macro for should_gather
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
Signed-off-by: Casey Bodley <casey@linuxbox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:40:26 -05:00
Casey Bodley
a39cbe2b6f atomic: add and sub return their result
Signed-off-by: Casey Bodley <casey@linuxbox.com>
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:40:20 -05:00
Ali Maredia
0f6b9f2816 Combined CMake Build for Hammer
CMake Ceph Build System (Firefly)
CMake.  Add tests.
Respace src/CMakeLists.txt.
CMake.  Spacing cleanups.
CMake for Firefly is Triumphant
CMake for Giant
Adapt to Giant.
Fix installation for scripts and man pages
Fix CEPH_LIBDIR and CEPH_PKGLIBDIR defines
Add erasure-code libraries
	uses try_compile() to detect support for -msse flags
Fix rados object classes
Propagate Casey's cls library change to src/test.
Fix CMake build for Hammer.
Try-add rados and common to librbd link.
Fix name and linkage of libec_lrc.
Rename arch/neon.c arm.c
Fix libcommon.a dependencies (some unit tests).

Authors:
	Ali Maredia <ali@cohortfs.com>
	Casey Bodley <casey@cohortfs.com>
	Adam Emerson <aemerson@cohortfs.com>
	Marcus Watts <mdw@cohortfs.com>
	Matt Benjamin <matt@cohortfs.com>

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:40:05 -05:00
Matt Benjamin
4368e0a513 Null tracepoint macro when !WITH_LTTNG.
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:39:57 -05:00
Matt Benjamin
f57383ae74 Don't use __cplusplus to mean !__KERNEL__
Of course, this is Linux-centric.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:39:50 -05:00
Matt Benjamin
71e49879ef Add missing Messenger::create ms_type in test_msgr.
Fixes trivial build breakage.

Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:39:43 -05:00
Matt Benjamin
3ce683a308 Fixup int_types.h.
Signed-off-by: Matt Benjamin <matt@cohortfs.com>
2015-01-14 16:39:35 -05:00
Jason Dillaman
3424baed03 librbd: fix coverity false-positives for tests
Coverity flagged two variables as uninitialized prior to use.
Explicitly initialize the variables in the constructors to remove
the false-positives.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2015-01-14 15:39:28 -05:00
Yehuda Sadeh
f3a57ee6a6 rgw: wait for completion only if not completion available
In a bucket aio operation, wait for completions only if there are no
completions available. Otherwise we might wait forever, as everything
already complete.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2015-01-14 11:47:18 -08:00
Samuel Just
833b277769 ceph_test_objectstore: enable keyvaluestore experimental option
Fixes: #10535
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-01-14 09:48:58 -08:00
Samuel Just
204fa0f09e ReplicatedPG::_scrub: don't record digests for snapdirs
They are always empty, and the finish_ctx machinery doesn't really
work for snapdirs anyway.

Fixes: #10536
Signed-off-by: Samuel Just <sjust@redhat.com>
2015-01-14 09:47:52 -08:00
Sage Weil
5c8ee3388f Merge remote-tracking branch 'gh/next' 2015-01-14 08:57:33 -08:00
John Spray
9daeaec5c6 mds: handle heartbeat_reset during shutdown
Because any thread might grab mds_lock and call heartbeat_reset
immediately after a call to suicide() completes, this needs
to be handled as a special case where we tolerate MDS::hb having
already been destroyed.

Fixes: #10382
Signed-off-by: John Spray <john.spray@redhat.com>
2015-01-14 12:00:17 +00:00
Zhiqiang Wang
fc5cb3cf2e osd/ReplicatedPG: remove unnecessary parameters
In functions can_skip_promote and do_cache_redirect.

Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-01-14 14:26:03 +08:00
Zhiqiang Wang
78b2cf0327 osd: force promotion for watch/notify ops
Watch/notify ops can't be proxied.

Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
2015-01-14 14:25:10 +08:00