Commit Graph

32750 Commits

Author SHA1 Message Date
Josh Durgin
1af95e7b02 Merge pull request #1557 from ceph/wip-7867
client: fix assert(!unclean) due to readahead vs close race

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-03-28 15:08:07 -07:00
Sage Weil
f1c7b4ef0c client: pin Inode during readahead
Make sure the Inode does not go away while a readahead is in progress.  In
particular:

 - read_async
   - start a readahead
   - get actual read from cache, return
 - close/release
   - call ObjectCacher::release_set() and get unclean > 0, assert

Fixes: #7867
Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-28 14:55:13 -07:00
Sage Weil
032d4ec53e osdc/ObjectCacher: call read completion even when no target buffer
If we do no assemble a target bl, we still want to return a valid return
code with the number of bytes read-ahead so that the C_RetryRead completion
will see this as a finish and call the caller's provided Context.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-28 14:55:09 -07:00
Sage Weil
c166215d81 Merge pull request #1553 from ceph/wip-7874
ReplicatedPG: disable clone subsets for cache pools

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-28 14:07:50 -07:00
Sage Weil
3c18fad4de Merge pull request #1554 from ceph/wip-7828
ReplicatedPG:: s/_delete_head/_delete_oid, adjust head_exists iff is_hea...

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-28 14:06:24 -07:00
Sage Weil
decbe2c0a8 Merge pull request #1555 from ceph/wip-7835
ReplicatedPG::make_writeable: fill in ssc on clone

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-28 14:05:41 -07:00
Yehuda Sadeh
68dc0c6b87 rgw: move max_chunk_size initialization
RGWRados::initialize() is not called when doing
RGWRados::get_raw_storage_provider(). This was the culprit for issue

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-03-28 14:05:00 -07:00
Yehuda Sadeh
dfd3cb5140 rgw: only look at prefetched data if we actually prefetched
Fixes: #7903
Since we didn't prefetch data then we couldn't rely on the data to
actually exist there. In that case just move on and read the object.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-03-28 13:25:47 -07:00
Sage Weil
d78e678855 osd/PG: fix choose_acting revert to up case
If we decide to revert back to up, we need to

1- return false, so that we go into the NeedActingChange state, and
2- actually ask for that change.

It's too fugly to try to jump down to the existing queue_want_pg_temp
call 100+ lines down in this function, so just do it here.  We already
know that we are requesting to clear the pg_temp.

Fixes: #7902
Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-28 13:10:06 -07:00
Yan, Zheng
0bb911c6a8 mds: don't trim non-auth root inode/dirfrag
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:14 +08:00
Yan, Zheng
90b4e53c19 mds: include authority of the overwrited inode in rename witnesses
Rename operation needs to adjust the overwrited inode's link count.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:14 +08:00
Yan, Zheng
367987faff mds: don't increase nlink when rollback stray reintegration
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:14 +08:00
Yan, Zheng
79aa26ffba mds: allow sending MMDSFindIno to MDS who is in clientreplay state
Because MDCache::kick_find_ino_peers() is called when a MDS enters
clientreplay state.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:14 +08:00
Yan, Zheng
34ef91a279 mds: fix negative dirstat assertion
When splitting dirfrag, delta dirstat is always added to the first new
dirfrag. Before the delta dirstat is propagated to inode, unlinking file
from the rest dirfrags can cause nagtive inode dirstat.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:14 +08:00
Yan, Zheng
ce93616169 mds: fix stack overflow caused by nested dispatch
Commit bc3325b37 fixes a stack overflow bug happens when replaying
client requests. Similar stack overflow can happens when processing
finished contexts.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:14 +08:00
Yan, Zheng
71fa779e08 mds: don't clear scatter dirty when cache rejoin ack is received
The auth mds has received dirty scatterlock state. But it hasn't
journaled the dirty state yet. The log segment that marked the
scatterlock dirty need to be preserved. Therefore, we can't clear
the dirty flag of scatterlock.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:14 +08:00
Yan, Zheng
fbf4fbc37a mds: explicitly set nonce for imported dirfrag
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
d14ec95e8a mds: skip non-opened session when flushing client sessions
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
fb19100f44 mds: fix null pointer dereference in MDCache::rejoin_send_rejoins()
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
ed49d5ef90 mds: journal EFragment::OP_COMMIT before drop locks
Dropping locks can dispatch other requests. These request can submit
log entry.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
1bd575e223 mds: fix CInode::get_approx_dirfrag
return NULL if there is no opened dirfrag

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
a1f5c645bb mds: don't trim ambiguous import dirfrags
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
598c5f18b2 mds: trim empty non-auth dirfrags
Fragmenting a non-auth dirfrag results several smaller dirfrags. Some
of the resulting dirfrags can be empty, which are not used to connected
to auth subtree.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
3c6c712414 mds: trim non-auth inode with remote parents
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
e811b07e19 mds: properly journal fragment rollback
If dirfrags are subtree roots, mark the dirfragtreelock as scattered
dirty, otherwise journal the dirfragtree change.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
6a548a97f8 mds: fix CDir::WAIT_ANY_MASK
make sure CDir::WAIT_ANY_MASK include MDSCacheObject::WAIT_UNFREEZE

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
e535f7f2b9 mds: avoid journaling non-auth opened inode
Exporting inode has AUTH bit set while EExport event is being
journaled.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
ffcbcdd61f mds: handle race between cache rejoin and fragmenting
MDCache::handle_cache_expire() ignores mismatched dirfrags. this is
OK during normal operation because MDS doesn't trim replica inode
whose dirfrags are likely being fragmented (see commit 22535340).

During recovery, the recovering MDS can reveive survivor MDS' cache
expire message before it sends cache rejoin acks. In this case,
there still can be mismatched dirfrags, but nothing prevents the
survivor MDS to trim inode of these mismatched dirfrags. So there
can be unconnected dirfrags when the recovering MDS sends cache
rejoin acks.

The fix is, when mismatched dirfrag is encountered during recovery,
check if inode of the dirfrag is still replicated to the sender MDS.
If the inode is not replicated, remove the sender MDS from replica
maps of all child dirfrags.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Yan, Zheng
6963a8f9cb mds: handle interaction between slave rollback and fragmenting
For slave rename and rmdir events, the MDS needs to preserve non-auth
dirfrag where the renamed inode originally lives in until slave commit
event is encountered. Current method to handle this is use MDCache::
uncommitted_slave_rename_olddir to track any non-auth dirfrag that
need to be preserved. This method does not works well if any preserved
dirfrag gets fragmented by log event (such as ESubtreeMap) between the
slave prepare event and the slave commit event.

The fix is tracking inode of dirfrag instead of tracking dirfrag that
need to preserved directly.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-29 02:08:13 +08:00
Sage Weil
0dcb54f71e Merge pull request #1549 from dachary/wip-doc
doc: fix typos in tiering dev doc
2014-03-28 08:23:46 -07:00
Loic Dachary
72eaa5e885 doc: fix typos in tiering dev doc
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-03-28 14:02:25 +01:00
Yan, Zheng
1b5e8f4306 mds: properly propagate dirty dirstat to auth inode
Propagate dirty dirstat to freezing auth inode if the inode is
already auth pinned by the Mutation. Otherwiese the dirstat can
be propagated to inode after client changes inode's mtime.

Fixes: #7880
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-28 13:24:36 +08:00
Samuel Just
7f4be9e9d0 Merge pull request #1547 from ceph/wip-cache-scrub
osd: improve scrub checks on clones; tolerate missing clones on cache pools

Fixes: #7885
Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-03-27 17:14:34 -07:00
Greg Farnum
38d4c71a45 Pipe: rename keepalive->send_keepalive
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-03-27 16:45:45 -07:00
Sage Weil
7a1990b66e Merge branch 'wip-7875'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-03-27 16:39:36 -07:00
Sage Weil
c64d03d0a8 mon/OSDMonitor: require OSD_CACHEPOOL feature before using tiering features
The OSDs need to support this feature before we allow users to turn it
on.  This is similar to what the erasure pool support does.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 16:39:01 -07:00
Sage Weil
69321bf57f mon/OSDMonitor: prevent setting hit_set unless all OSDs support it
We are using OSD_CACHEPOOL as a proxy for the support for the tiering
OSDMap infrastructure.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 16:38:46 -07:00
Sage Weil
eb71924ea2 osd/ReplicatedPG: tolerate missing clones in cache pools
A few cases:

- As we are working through the list, if we see a clone that is lower than
  the next one we were expecting, we should be able to skip them.
- If we see a head, we can skip all of the rest of the clones.
- If we get to the end and next_clone was set, we can ignore it.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 15:12:25 -07:00
Sage Weil
6508d5efe3 osd/ReplicatedPG: improve clone vs head checking
- notice when we are missing a clone (that isn't at the end of the list)
- notice when we are missing a clone on the last object in the scrub map
- do not assert when we are missing a clone

There is still more we could do to improve this (like noticing one missing
clone but still checking the others), but we'll leave that aside for just
a moment...

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 15:00:52 -07:00
Sage Weil
9e2cd5feaf osd/ReplicatedPG: do not assert on clone_size mismatch
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 13:48:33 -07:00
Sage Weil
7f026ba608 ceph_test_rados_api_tier: scrub while cache tier is missing clones
Trigger a scrub to verify that we can handle a cache tier that is missing
some clones.  We rely on the test harness to notice the error, and we do
not confirm that the scrub happened.  In practice this is plenty of time,
however.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 13:28:10 -07:00
Dan Mick
c5682e78e9 Merge pull request #1546 from ceph/wip-fix-pools
fix pool ops test
2014-03-27 13:01:05 -07:00
Sage Weil
7cb1d3a43d qa/workunits/mon/pool_ops.sh: fix test
The pool create command doesn't take k/v pairs any more.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 12:57:40 -07:00
Sage Weil
233801c622 qa/workunits/mon/pool_ops.sh: use expect_false
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 12:56:44 -07:00
Josh Durgin
ce59760aea Merge pull request #1545 from ceph/wip-7849-b
ceph-conf: do not log

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-03-27 12:35:50 -07:00
Sage Weil
72715b235a ceph-conf: no admin_socket
We don't need to worry about pidfile because that is done by the fork
functions, which ceph-conf doesn't call.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 12:30:39 -07:00
Josh Durgin
e91f5c8cc4 Merge pull request #1522 from themgt/patch-1
document adding dev key for custom Apache/FCGI install

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-03-27 12:03:25 -07:00
Sage Weil
fb208237a1 jerasure: fix up .gitignore
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 11:41:57 -07:00
Sage Weil
acc31e75a3 ceph-conf: do not log
If you are querying the conf for an osd and it has a log configured, we
should not generate any log activity.

This isn't super pretty, but it is much less intrusive that wiring a 'do
not log' flag down into CephContext and a zillion other places.

Fixes: #7849
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-27 11:36:42 -07:00
Josh Durgin
3f1417a850 Merge pull request #1542 from onlyjob/debian
logrotate: do not rotate empty logs (2nd logrotate file)

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-03-27 11:33:58 -07:00