This makes replay work on the auth by updating the subtrees accordingly
(since rmdir is really just renaming into the stray dir).
Note that this doesn't completely fix things for the dirfrag auth; see
#1295.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We were only deferring if frozen. But if freezing we need to too, because
of the way cap messages are deferred. We defer cap messages if
- inode is frozen
- inode is freezing and locks are stable (to avoid starvation)
So if we are in a stable freezing state and start deferring caps, we can't
twiddle locks further or else we can
- potentially starve (okay, in rare cases)
- get stuck because we already started deferring cap messages
We would also screw up the cap message ordering if we became unstable again
and were allowed to start processing cap messages while others were still
deferred.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
This slightly changes the unlock order for drop_locks() (rdlocks now
last instead of after xlocks, before [remote_]wrlocks).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
This ensures that we hold a wrlock on the srcdn auth when the slave
makes it's changes to the src directory, and prevents us from corrupting
the scatterlock state.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
For the rename code to behave, we need to hold a wrlock on the slave node
to ensure that any racing gather (mix->lock) is not sent prior to the
_rename_prepare() running; otherwise we violate the locking rules and
corrupt rstats.
Implement a remote_wrlock that will be used by rename. The wrlock is held
on a remote node instead of the local node, and is set up similarly to
remote_xlocks.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
There is a problem with the wrlocks and cross-mds renames:
- master (dest auth, srci auth, srcdir replica) takes wrlock on srcdiri
- something triggers a srcdiri lock, putting inest/ifile lock in mix->lock
state
- slave (srcdir auth) sends LOCKACK
- master sends prepare_rename
- slave (srcdir auth) does rename prepare, which modifies srcdir
Even though the master holds a wrlock on the srcdiri, the gather starts
immediately and the slave sends the LOCKACK before the master's wrlock is
released.
To fix this, we add a new mix->lock(2) state, and we do not start the
mix->lock gather from replicas until the local gather completes, _after_
the auth's wrlock is released. This makes the master's wrlock sufficient
to ensure the prepare_rename on the slave is save.
This also works when the slave is the srci auth, since the gather won't
complete until the master releases its wrlock. BUT, it does NOT work if a
third MDS is the srcdiri auth, since it can still gather from the slave
prior to the master releasing its wrlock.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
You can't look at pending_inc in preprocess methods. Or return an error
based on pending_inc before it commits. Fix up the snap-related error
checking.
Rename _pool_op() to _pool_op_reply().
Signed-off-by: Sage Weil <sage@newdream.net>
Recent kernels got the new CEPH_LOCK_DN definition but we were still
setting the old bit. Set both so we work with both classes of clients. In
the meantime, update the kernel to ignore this field so that eventually we
can drop/reuse it.
Signed-off-by: Sage Weil <sage@newdream.net>
We need to know the Ceph absolute path. We can't actually
derive that for sure (if we aren't mounted into the root), but this
at least lets us deal with being in our own subdirectories.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
work in current directory, not hardcoded mnt path
use CEPH_TOOL variable rather than hardcoded local executable
pass CEPH_ARGS to scripts so you don't need to export it into the environment.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
The scatter_writebehind_finish() is always followed up by an eval_gather(),
which does the clear_flushed(). For everyone else (replicas!), we need to
clear it immediately to avoid confusing things later.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
finish() requires the caller to delete. complete() does that for you by
calling finish() and then doing delete this. Unless you overload it and
do something else. This will allow us to make Contexts are are reusable,
for example, by overloading complete() instead of finish() and managing
the lifecycle in some other way.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Logrotate ignores entries after a rule that doesn't match any files.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Sage Weil <sage@newdream.net>
If we are in XSYN state and want to move to anything else, we must go via
EXCL, but we may not be loner anymore. Weaken the file_excl() assert so we
don't crash.
Reported-by: Fyodor Ustinov <ufm@ufm.su>
Signed-off-by: Sage Weil <sage@newdream.net>