Commit Graph

34812 Commits

Author SHA1 Message Date
Ilya Dryomov
eb697dd9ee librbd: make rbd_get_parent_info() accept NULL out params
The C++ version of rbd_get_parent_info() allows passing NULL for parent
image name, image name and snapshot name out parameters.  Make C API do
the same both for consistency and to make it easier to check whether
the image at hand has a parent or not.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-07-28 13:53:54 +04:00
Xiaoxi Chen
04d0526718 PGMonitor: fix bug in caculating pool avail space
Currently for pools with different rules, "ceph df" cannot report
right available space for them, respectively. For detail assisment
of the bug ,pls refer to bug report #8943

This patch fix this bug and make ceph df works correctlly.

Fixes Bug #8943

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2014-07-28 17:47:51 +08:00
Sage Weil
3695b255ae Merge pull request #2149 from yuyuyu101/wip-flush-set
Fix dup bh_write for TX state bh

Tested-by: Sage Weil <sage@redhat.com>
Reviewed-by: Haomai Wang <haomaiwang@gmail.com>

Original changeset 

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-27 19:39:34 -07:00
Sage Weil
b08470f0bf configure.ac: link libboost_thread only with json-spirit
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-27 16:58:08 -07:00
Sage Weil
9d23cc6aa6 configure: don't link blkid, udev to everything
These are already explicitly called out for libkrbd; don't need them in
LIBS.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-27 11:25:47 -07:00
Haomai Wang
de9cfcaa7d Only write bufferhead when it's dirty
The TX state bh should be skipped because the bh should be inflight. We only
need to write dirty bh. And TX and dirty state bh both should be waited until
flushed.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-07-27 13:37:49 +08:00
Josh Durgin
1c26266dbf ObjectCacher: fix bh_{add,remove} dirty_or_tx_bh accounting
tx buffers need to go on the bh_lru_rest as well, and removing erases
(not inserts) them into dirty_or_tx_bh.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-27 13:36:28 +08:00
Josh Durgin
727ac1d084 ObjectCacher: fix dirty_or_tx_bh logic in bh_set_state()
The else-if chain here was wrong. Handling dirty or tx buffers and
errors should be in independent conditions.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-27 13:36:19 +08:00
Haomai Wang
5283cfee5b Wait tx state buffer in flush_set
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-07-27 13:33:51 +08:00
Haomai Wang
d858fdc501 Add rbdcache max dirty object option
Librbd will calculate max dirty object according to rbd_cache_max_size, it
doesn't suitable for every case. If user set image order 24, the calculating
result is too small for reality. It will increase the overhead of trim call
which is called each read/write op.

Now we make it as option for tunning, by default this value is calculated.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-07-27 13:33:44 +08:00
Haomai Wang
b8a56685fe Reduce ObjectCacher flush overhead
Flush op in ObjectCacher will iterate the whole active object set, each
dirty object also may own several BufferHead. If the object set is large,
it will consume too much time.

Use dirty_bh instead to reduce overhead. Now only dirty BufferHead will
be checked.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-07-27 13:33:38 +08:00
Sage Weil
288908b331 Revert "Merge pull request #2129 from ceph/wip-librbd-oc"
This reverts commit 74b386f03e, reversing
changes made to 36265d0db0.

The dirty_or_tx list is used by flush_set, which means we can
resubmit new IOs for writes that are already in progress.  This
has a compounding effect that overwhelms the OSDs with dup IOs
and stalls out the client.

See, for example, teh failues in this run:
  /a/sage-2014-07-25_17:14:20-fs-wip-msgr-testing-basic-plana

The fix is probably pretty simple, but reverting for now to make
the tests pass.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-26 21:19:34 -07:00
Sage Weil
2088c267d6 Merge remote-tracking branch 'gh/next'
Conflicts:
	src/osdc/Journaler.h
2014-07-25 21:42:35 -07:00
Yehuda Sadeh
0553890e79 rgw: call processor->handle_data() again if needed
Fixes: #8937

Following the fix to #8928 we end up accumulating pending data that
needs to be written. Beforehand it was working fine because we were
feeding it with the exact amount of bytes we were writing.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-07-25 20:38:44 -07:00
John Spray
d3de69f8a5 mds: fix journal reformat failure in standbyreplay
In the 0.82 release, standbyreplay MDS daemons would try
to reformat the jouranl if they saw an older version on
disk, where this should have only been done by the active
MDS for the rank.  Depending on timing, this could cause
fatal corruption of the journal.

This change handles the following cases:
* only do reformat if not in standbyreplay (else raise EAGAIN
to keep trying til an active mds reformats it)
* if journal header goes away while in standbyreplay then raise
EAGAIN (handle rewrite happening in background)
* if journal version is greater than the max supported, suicide

Fixes: #8811

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 5438500af8)
2014-07-25 15:34:09 -07:00
Sage Weil
96fb418f0e Merge pull request #2112 from ceph/wip-rbd-defaults
respect rbd_default_* parameters in /usr/bin/rbd

Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-25 15:23:25 -07:00
Sage Weil
8fb761b660 osd/ReplicatedPG: requeue cache full waiters if no longer writeback
If the cache is full, we block some requests, and then we change the
cache_mode to something else (say, forward), the full waiters don't get
requeued until the cache becomes un-full.  In the meantime, however, later
requests will get processed and redirected, breaking the op ordering.

Fix this by requeueing any full waiters if we see that the cache_mode is
not writeback.

Fixes: #8931
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-25 14:50:52 -07:00
Sage Weil
36aaab9eee osd/ReplicatedPG: fix cache full -> not full requeueing when !active
We only want to do this if is_active().  Otherwise, the normal
requeueing code will do its thing, taking care to get the queue orders
correct.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-25 14:50:51 -07:00
Josh Durgin
ba9d52e8e1 librbd: store and retrieve snapshot metadata based on id
Snapshots are usually accessed by id internally, so change accessors
to take id instead of name. Keep a separate map of name -> id for
looking up human-specified names.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-25 14:08:16 -07:00
Sage Weil
c5f766bb16 ceph_test_rados_api_tier: do fewer writes in HitSetWrite
We don't need to do quite so many writes.  It can be slow when we are
thrashing and aren't doing anything in parallel.

Fixes: #8932
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-25 13:53:03 -07:00
Dan Mick
d0b98bcb35 Merge pull request #2145 from ceph/wip-ref-put
common/RefCountedObject: fix use-after-free in debug print

Reviewed-by: Dan Mick <dan.mick@inktank.com>
2014-07-25 13:19:42 -07:00
Sage Weil
f3609205e7 common/RefCountedObject: fix use-after-free in debug print
We could race with another thread that deletes this right after we call
dec().  Our access of cct would then become a use-after-free.  Valgrind
managed to turn this up.

Copy it into a local variable before the dec() to be safe, and move the
dout line below to make this possibility explicit and obvious in the code.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-25 13:17:32 -07:00
Josh Durgin
d8eb656069 Merge pull request #2143 from ceph/wip-rgw-align
Wip rgw align

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-25 11:36:29 -07:00
Yehuda Sadeh
14cad5ece7 rgw: object write should not exceed part size
Fixes: #8928

This can happen if the stripe size is not a multiple of the chunk size.

Backport: firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-07-25 11:35:10 -07:00
Yehuda Sadeh
fc83e197ab rgw: align object chunk size with pool alignment
Fixes: #8442
Backport: firefly
Data pools might have strict write alignment requirements. Use pool
alignment info when setting the max_chunk_size for the write.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-07-25 11:35:01 -07:00
Sage Weil
c91b22c0b4 Merge pull request #2141 from ceph/wip-8882
osd: set pg flag INCOMPLETE_CLONES when turning off cache pool

Reviewed-by: Greg Farnum <greg@inktank.com>

First patch Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2014-07-25 10:34:33 -07:00
John Wilkins
1f9c7324a0 doc: Add additional hyperlink to Cache Tiering defaults.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-07-25 09:55:52 -07:00
John Wilkins
4047660ce4 doc: Update doc from user feedback.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-07-25 09:55:28 -07:00
Sage Weil
d1dfb9b607 osd: fix bad Message* defer in C_SendMap and send_map_on_destruct
We were carrying a bare Message*, which could get freed if the op was
canceled (or possibly completed).  Instead, just stash the entity_name_t,
the only piece we need.  The Connection is properly ref counted so no
worries there.

Fixes: #8926
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-25 09:20:50 -07:00
Sage Weil
3e7ed42379 Merge pull request #2142 from ceph/wip-data-pool
test: catch a straggler still using 'data' pool

Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-25 09:03:34 -07:00
John Spray
5740266096 test: catch a straggler still using 'data' pool
Used rbd pool instead, which is still created by default.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-25 17:01:39 +01:00
Ma Jianpeng
4eb18dd487 os/FileJournal: Update the journal header when closing journal
When closing journal, it should check must_write_header and update
journal header if must_write_header alreay set.
It can reduce the nosense journal-replay after restarting osd.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-24 18:54:33 -07:00
Sage Weil
63c1711a9e msg/SimpleMessenger: drop local_conneciton priv link on shutdwon
This breaks ref cycles between the local_connection and session, and let's
us drop the explicit set_priv() calls in OSD::shutdown().

Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-24 18:22:22 -07:00
Josh Durgin
2545e80d27 librbd: fix crash using clone of flattened image
The crash occurs due to ImageCtx->parent->parent being uninitialized,
since the inital open_parent() -> open_image(parent) ->
ictx_refresh(parent) occurs before ImageCtx->parent->snap_id is set,
so refresh_parent() is not called to open an ImageCtx for the parent
of the parent. This leaves the ImageCtx->parent->parent NULL, but the
rest of ImageCtx->parent updated to point at the correct parent snapshot.

Setting the parent->snap_id earlier has some unintended side effects
currently, so for now just call refresh_parent() during
open_parent(). This is the easily backportable version of the
fix. Further patches can clean up this whole initialization process.

Fixes: #8845
Backport: firefly, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-24 16:15:20 -07:00
John Wilkins
4fe07925e4 doc: Updated mon doc per feedback. Fixed hyperlinks.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-07-24 16:00:52 -07:00
Gregory Farnum
2a6b5309e5 Merge pull request #2079 from nereocystis/seq_read_bench-args
Make the declaration argument names match those in the implementation (as used by callers).

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-07-24 14:36:21 -07:00
Abhishek Lekshmanan
c51182257e doc: update radosgw man page with available opts
Fixes:#8112

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
2014-07-24 13:21:25 -07:00
Abhishek Lekshmanan
e259aca55a rgw: list all available options during help()
Adding the available help arguments from the man page

Fixes: #8112

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
2014-07-24 13:17:26 -07:00
Abhishek Lekshmanan
99e80a5f62 rgw: format help options to align with the rest
Whitespace removal to make all help options align in a similar fashion

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
2014-07-24 13:15:53 -07:00
Sage Weil
95aaeb654a osd: use Connection::send_message()
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:07 -07:00
Sage Weil
be91daf0a8 common/LogClient: use con-based send_message
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:07 -07:00
Sage Weil
694ced9e56 client: use con-based send_message
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:07 -07:00
Sage Weil
2c28548e5d msgr: remove Messenger::mark_disposable()
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:07 -07:00
Sage Weil
8b49b3a3d1 mds: use Connection::mark_disposable()
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:07 -07:00
Sage Weil
2970396d46 msgr: add Connection::mark_disposable()
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:07 -07:00
Sage Weil
3ca533e4f8 msgr: kill mark_down_on_empty()
No users!

Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:07 -07:00
Sage Weil
637ada2da9 msgr: kill addr-based send_keepalive()
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:06 -07:00
Sage Weil
322908b952 msg: drop Messenger::mark_down() and send_keepalive() con-based calls
All users must stick to the Connection-based calls.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:06 -07:00
Sage Weil
f04810251d osdc/Objecter: use Connection::mark_down()
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:06 -07:00
Sage Weil
841d5ac836 osd: use Connection::mark_down()
Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-24 11:37:06 -07:00