Commit Graph

23602 Commits

Author SHA1 Message Date
Josh Durgin
3bc2114355 ObjectCacher: fix flush_set when no flushing is needed
C_GatherBuilder takes ownership of the Context we pass it. Deleting it
in flush_set after constructing the C_GatherBuilder results in a
double delete.

Fixes: #3946
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-01-29 14:35:28 -08:00
Alex Elder
59ac4d3534 qa: add rbd/concurrent workunit
This defines a new workunit shell script that performs a bunch of
rbd operations concurrently in order to exercise code paths and
catch reference count and bad pointer problems.

Signed-off-by: Alex Elder <elder@inktank.com>
2013-01-29 16:42:44 -06:00
Sam Lang
907c709ccb mds: Send created ino in journaled_reply
The MDS avoids sending an early reply if a request
triggered inode allocation (no preallocated inodes yet).
For create, this prevented the created ino from being
sent back to the client, which is used to indicate
creation (as apposed to already existing) of the file.
This commit fixes the issue by adding the created ino
to the journaled (safe) reply.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-01-29 11:28:00 -06:00
Sam Lang
cf7c3f7d3f client: Don't use geteuid/gid for fuse ll_create
Fixes a bug in ll_create where files that already exist at the MDS
don't get the created flag set on reply.  This causes a permissions
check, which fails because geteuid/getegid are 0/0 for ll_create.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-01-29 10:18:29 -06:00
Gary Lowell
0b66994c18 ceph.spec.in: package rbd udev rule
Package udev/50-rbd.rules per bug 3930.

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-01-28 22:49:45 -08:00
Sage Weil
a7d15afb52 mon: smooth pg stat rates over last N pgmaps
This smooths the recovery and throughput stats over the last N pgmaps,
defaulting to 2.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 19:46:33 -08:00
Sage Weil
0f7a9e56fd Merge remote-tracking branch 'yan/wip-mds'
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-28 19:17:48 -08:00
Ross Turk
ecda12081a doc: fix overly-big fixed-width text in Firefox
Changed font size for <pre> elements to be 15pt instead of 1.5em - Firefox seems to render 1.1em a bit bigger than other browsers.

Signed-off-by: Ross Turk <ross@inktank.com>
2013-01-28 19:03:56 -08:00
Sage Weil
3f6837e022 mon/PGMap: report IO rates
This does not appear to be very accurate; probably the stat values we're
displaying are not being calculated correctly.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 18:50:03 -08:00
Sage Weil
208b02a748 mon/PGMap: report recovery rates
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 18:50:03 -08:00
Sage Weil
76e9fe5f06 mon/PGMap: include timestamp
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 18:50:03 -08:00
Sage Weil
a2495f658c osd: track recovery ops in stats
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 18:50:03 -08:00
Sage Weil
4aea19ee60 osd_types: add recovery counts to object_sum_stats_t
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 18:50:03 -08:00
Sage Weil
193dbedb91 rbd-fuse: fix warning
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 18:49:20 -08:00
John Wilkins
1e24ce22a9 doc: Removed indep, and clarified explanation.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-28 18:44:07 -08:00
Yan, Zheng
829aeba63a mds: clear inode dirty when slave rename finishes.
The inode is linked to a non-auth directory, so remove it from LogSegment's
dirty inode list.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:37 +08:00
Yan, Zheng
5884177667 mds: mark export bounds for cross authority directory rename
this guarantees that the importing MDS gets directory fragment's
up-to-date fragstat/rstat.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:37 +08:00
Yan, Zheng
abc4c78550 mds: allow handling slave request in the clientreplay stage
replaying a client request may need to create slave request and the slave
MDS can be also in the clientreplay stage.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:37 +08:00
Yan, Zheng
e69e7e5d0e mds: fix 'discover' handling in the rejoin stage
If the MDS is the resolve stage, current MDCache::handle_discover() only handles
'discover' from MDS that it has already gotten rejoin acknowledgement. This can
cause circular wait because MDCache::rejoin_gather_finish() fetches reconnected
inodes before send rejoin acknowledgements, and fetching reconnected inode may
triggers 'discover'. The fix is not delay handling 'discover' from MDS that are
also in the rejoin stage.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:37 +08:00
Yan, Zheng
0e9c8124a1 mds: add projected rename's subtree bounds to ESubtreeMap
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:37 +08:00
Yan, Zheng
77946dcdae mds: fetch missing inodes from disk
The problem of fetching missing inodes from replicas is that replicated inodes
does not have up-to-date rstat and fragstat. So just fetch missing inodes from
disk

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:36 +08:00
Yan, Zheng
f4abf00af5 mds: rejoin remote wrlocks and frozen auth pin
Includes remote wrlocks and frozen authpin in cache rejoin strong message

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:36 +08:00
Yan, Zheng
710bba3a4b mds: move variables special to rename into MDRequest::more
My previous patches add two pointers (ambiguous_auth_inode and
auth_pin_freeze) to class Mutation. They are both used by cross
authority rename, both point to the renamed inode. Later patches
need add more rename special state to MDRequest, So just move them
into MDRequest::more

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:36 +08:00
Yan, Zheng
4fc68a4811 mds: properly clear CDir::STATE_COMPLETE when replaying EImportStart
when replaying EImportStart, we should set/clear directory's COMPLETE
flag according with the flag in the journal entry.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:36 +08:00
Yan, Zheng
9a0cfcc56d mds: don't journal opened non-auth inode
If we journal opened non-auth inode, during journal replay, the corresponding
entry will add non-auth objects to the cache. But the MDS does not journal all
subsequent modifications (rmdir,rename) to these non-auth objects, so the code
that manages cache and subtree may get confused. Besides non-auth objects will
be trimmed at the resolve stage.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:36 +08:00
Yan, Zheng
0cf5e4e55d mds: journal inode's projected parent when doing link rollback
Otherwise the journal entry will revert the effect of any on-going
rename operation for the inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:36 +08:00
Yan, Zheng
c93cf2d23b mds: fix for MDCache::disambiguate_imports
In the resolve stage, if no MDS claims other MDS's disambiguous subtree
import, the subtree's dir_auth is undefined.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:36 +08:00
Yan, Zheng
baa6bd6b1c mds: fix for MDCache::adjust_bounded_subtree_auth
After swallowing extra subtrees, subtree bounds may change, so it
should re-check.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:35 +08:00
Yan, Zheng
e0aa64d04d mds: don't replace existing slave request
The MDS may receive a client request, but find there is an existing
slave request. It means other MDS is handling the same request, so
we should not replace the slave request with a new client request,
just forward the request.

The client request may include embeded cap releases, we need process
them even the request is forwarded.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:35 +08:00
Yan, Zheng
85294a5988 mds: always use {push,pop}_projected_linkage to change linkage
Current code skips using {push,pop}_projected_linkage to modify replica
dentry's linkage. This confuses EMetaBlob::add_dir_context() and makes
it record out-of-date path when TO_ROOT mode is used. This patch changes
the code to always use {push,pop}_projected_linkage to modify dentry's
linkage. It makes sure MDCache::create_subtree_map() record correct and
up-to-date subtree map.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:35 +08:00
Yan, Zheng
3a66656b4f mds: send resolve messages after all MDS reach resolve stage
Current code sends resolve messages when resolving MDS set changes.
There is no need to send resolve messages when some MDS leave the
resolve stage. Sending message while some MDS are replaying is also
not very useful.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:35 +08:00
Yan, Zheng
a42a9187f4 mds: split reslove into two sub-stages
The resolve stage serves to disambiguate the fate of uncommitted slave
updates and resolve subtrees authority. The MDS sends resolve message
that claims subtrees authority immediately when reslove stage is entered,
When receiving a resolve message, the MDS also processes it immediately.
This may cause problem if there are uncommitted slave rename and some of
them need rollback later. It's because slave rename rollback may modify
subtree map.

The fix is split reslove into two sub-stages, the first sub-stage serves
to disambiguate slave updates, do slave commit or rollback. After the
the first sub-stage finishes, the MDS sends resolve messages that claim
subtrees authority to other MDS and processes received resolve messages.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:35 +08:00
Yan, Zheng
844cd46c77 mds: fix slave rename rollback
The main issue of old slave rename rollback code is that it assumes
all affected objects are in the cache. The assumption is not true
when MDS does rollback in the resolve stage. This patch removes the
assumption and makes Server::do_rename_rollback() check individual
object and roll back change.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:35 +08:00
Yan, Zheng
1a6626f032 mds: preserve non-auth/unlinked objects until slave commit
The MDS should not trim objects in non-auth subtree immediately after
replaying a slave rename. Because the slave rename may require rollback
later and these objects are needed for rollback.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:35 +08:00
Yan, Zheng
9944d9fbc9 mds: don't journal non-auth rename source directory
After replaying a slave rename, non-auth directory that we rename out of will
be trimmed. So there is no need to journal it.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:35 +08:00
Yan, Zheng
fb49713514 mds: force journal straydn for rename if necessary
rename may overwrite an empty directory inode and move it into stray
directory. MDS who has auth subtree beneath the overwrited directory
need journal the stray dentry when handling rename slave request.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:34 +08:00
Yan, Zheng
ce431eb5db mds: splits rename force journal check into separate function
the function will be used by later patch that fixes rename rollback

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:34 +08:00
Yan, Zheng
c9ff21a9e6 mds: fix "had dentry linked to wrong inode" warning
The reason of "had dentry linked to wrong inode" warning is that
Server::_rename_prepare() adds the destdir to the EMetaBlob before
adding the straydir. So during MDS recovers, the destdir is first
replayed. The old inode is directly replaced by the source inode.
We can void the warning by adding the straydir first.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:34 +08:00
Yan, Zheng
cd8d91078c mds: don't set xlocks on dentries done when early reply rename
_rename_finish() does not send dentry link/unlink message to replicas.
We should prevent dentries that are modified by the rename operation
from getting new replicas while the rename operation is committing.
So don't set xlocks on dentries "done".

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-01-29 10:17:05 +08:00
Sage Weil
87d85fa263 Merge remote-tracking branch 'gh/next' 2013-01-28 18:15:35 -08:00
John Wilkins
e58fe51980 Merge branch 'master' of https://github.com/ceph/ceph 2013-01-28 17:51:20 -08:00
John Wilkins
b429a3a3bb doc: Updated to add indep and first n to chooseleaf. Num only used with firstn.
fixes: #3711

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-28 17:50:47 -08:00
Yehuda Sadeh
f41010c44b rgw: fix crash when missing content-type in POST object
Fixes: #3941
This fixes a crash when handling S3 POST request and content type
is not provided.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-01-28 17:31:10 -08:00
Josh Durgin
c79f7c6c03 Merge branch 'wip-pool-delete'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-28 16:53:53 -08:00
Sage Weil
26988038e1 Merge branch 'wip-osd-down-out'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-01-28 17:22:25 -08:00
Yehuda Sadeh
09522e5a62 rgw: fix crash when missing content-type in POST object
Fixes: #3941
This fixes a crash when handling S3 POST request and content type
is not provided.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-01-28 17:14:53 -08:00
Sage Weil
b955a599a6 mon: set limit so that we do not an entire down subtree out
Add new configurable 'mon osd down out subtree limit' so that you can
prevent marking out an entire subtree.  If for example an entire rack is
down, do not mark anything in it out.  If less than the whole rack is down,
everything is fair game.

Set the default to 'rack'.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 17:13:59 -08:00
Sage Weil
2b8ba7ca23 osdmap: implement subtree_is_down() and containing_subtree_is_down()
Implement two methos to see if an entire subtree is down, and if the
containing parent node of type T of a given node is completely down.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 17:13:59 -08:00
Sage Weil
75f6ba56e1 crush: implement get_children(), get_immediate_parent_id()
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-28 17:13:59 -08:00
Sage Weil
428ddb7dff Merge remote-tracking branch 'gh/wip-timecheck
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-28 17:12:07 -08:00