It's possible we have non-auth metadata only because we have a subtree
nested beneath. If we rename a directory out of a non-auth subtree, we
should try to trim any non-auth content from that subtree that may now
be possible due to the child subtrees being linked elsewhere.
Fixes: #1146
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If we replay a metablob that unlinks something, throw it out immediately.
Recursively. This comes up when:
- we rename a file from one mds to another, and we replay the event on
the source mds. the inode gets thrown out.
- we rename a directory from one mds to another, and when journaled, the
source mds had no nested metadata. same thing: we throw it out. we
may have something in our cache nested beneath that, though, that was
since committed and such, but the fact that we didn't journal it being
reattached elsewhere implies that it was clean and gone when our event
was journaled, and we can throw it all out. recursively.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Open up any child frags of the imported renamed inode that are noted in
the journal event. (Note we blindly open up that list here; it's up to the
journaler to only populate it when appropriate.) If the listed frags are
not already open, open them up and set the dir_auth to unknown; presumably
they belong to the rename source/exporter. If we already had them open,
then the adjust_subtree_after_rename call above will have caught them and
already done the necessary subtree adjustment.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If we are importing the renamed inode, and it is a directory, journal a
list of all open dirfrags (currently, this is actually all frags) so that
we can open them up during journal replay.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If a rename witness has any subtrees that are nested beneath the renamed
directory, we need to journal the rename event so that our cache is
properly updated on journal replay.
Further, if we are exporting srci, we also need to journal the dest
(even if we aren't auth for destdn) if we have any open dirfrags because
those will turn into nested subtrees shortly.
We still need to ensure that the cache is properly trimmed during replay.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- make the pop match position with the project in prepare
- don't pop on linkmerge, since we don't project in that case
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We need the ability to drop caps on another inode that isn't req->inode
or req->old_inode in the request struct.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If the target is a remote dentry, we need to consider that the destdn
and desti may have different auths.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We want to send link requests to the auth for the new name, not the
target inode. We also want to drop FILE_SHARED caps on new name's
directory.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Only add target into the stray dir if we are renaming over a primary
dentry. (Otherwise we aren't moving the target.)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- create xlock import/export helpers
- fix/simplify checks: we want to export/import only xlocks on the inode
that is being migrated, unless they are locallock.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If we replay a dir rename operation, we need to adjust the subtree map
accordingly.
This covers the case where the metablob contains both the src and dest
dentries. Remaining cases will follow.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
function-scoped globals are protected by a mutex, and taking a mutex
inside a spin lock implementation kind of defeats the point...
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
MonClient should contain a KeyRing and a RotatingKeyRing. All the
MonClient users, except possibly csyn, don't want to manage those
objects themselves.
Don't chdir until after we have opened the KeyRing. If the KeyRing is at
a relative path, a chdir may make it inaccessible. Separate the chdir
function from the daemonize function.
Refactor the cmds argument parsing a little bit. Separate the special
actions from the normal operations of the daemon.
This should allow librados and libceph to support CephX finally! yay!
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
This reverts commit e8ac5aa2a4.
This commit is just erroneous. It adds checks on a pipe write
for the result and an abort if the write failed. But that's broken
in the desired case where we succeed, block on ceph_fuse_ll_main(),
and the parent process is long-gone by the time we get to this code!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Somehow, in the last major change, the constraints that kept the
bencher from trying to read non-existent objects got removed. Put
a check back in the main bench loop to fix that.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
This has been broken for a while in terms of journaling
things the MDS isn't auth for. This patch should fix that, and
adds a few asserts to that effect.
Also adds a new not_journaling flag to _rename_prepare
for those cases which call the function and then discard
the bufferlist results.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Previously, we could get stuck thinking that we'd flushed caps
(that went to the original MDS, waited on freeze for export,
and then were dropped) without ever telling the auth MDS that we
wanted to do so. This caused hung shutdowns:
1) during shutdown we drop all our caps
2) we get stuck and notice that we have a flushing cap
3) we send cap flush
4) MDS ignores it (I think because actual data already got updated?
and now we don't have the proper caps either)
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Previously we used a check for if there were CEPH_CAP_FILE_BUFFER refs,
but that was racy if we had other threads (they could hold caps for
sync writes or something). Instead, see if we have any in-flight
writes or uncommitted objects.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>