This ensures that we hold a wrlock on the srcdn auth when the slave
makes it's changes to the src directory, and prevents us from corrupting
the scatterlock state.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
For the rename code to behave, we need to hold a wrlock on the slave node
to ensure that any racing gather (mix->lock) is not sent prior to the
_rename_prepare() running; otherwise we violate the locking rules and
corrupt rstats.
Implement a remote_wrlock that will be used by rename. The wrlock is held
on a remote node instead of the local node, and is set up similarly to
remote_xlocks.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
There is a problem with the wrlocks and cross-mds renames:
- master (dest auth, srci auth, srcdir replica) takes wrlock on srcdiri
- something triggers a srcdiri lock, putting inest/ifile lock in mix->lock
state
- slave (srcdir auth) sends LOCKACK
- master sends prepare_rename
- slave (srcdir auth) does rename prepare, which modifies srcdir
Even though the master holds a wrlock on the srcdiri, the gather starts
immediately and the slave sends the LOCKACK before the master's wrlock is
released.
To fix this, we add a new mix->lock(2) state, and we do not start the
mix->lock gather from replicas until the local gather completes, _after_
the auth's wrlock is released. This makes the master's wrlock sufficient
to ensure the prepare_rename on the slave is save.
This also works when the slave is the srci auth, since the gather won't
complete until the master releases its wrlock. BUT, it does NOT work if a
third MDS is the srcdiri auth, since it can still gather from the slave
prior to the master releasing its wrlock.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
The scatter_writebehind_finish() is always followed up by an eval_gather(),
which does the clear_flushed(). For everyone else (replicas!), we need to
clear it immediately to avoid confusing things later.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
finish() requires the caller to delete. complete() does that for you by
calling finish() and then doing delete this. Unless you overload it and
do something else. This will allow us to make Contexts are are reusable,
for example, by overloading complete() instead of finish() and managing
the lifecycle in some other way.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Logrotate ignores entries after a rule that doesn't match any files.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Sage Weil <sage@newdream.net>
If we are in XSYN state and want to move to anything else, we must go via
EXCL, but we may not be loner anymore. Weaken the file_excl() assert so we
don't crash.
Reported-by: Fyodor Ustinov <ufm@ufm.su>
Signed-off-by: Sage Weil <sage@newdream.net>
If the id is specified, mark a non-existant osd rank as existant. The id
must fall within the current [0,max) range. This is the counterpart of
'osd rm <id>'.
If the id is not specified, allocate an unused osd id and set the EXISTS
flag. Increase max_osd as needed.
Closes: #1244
Signed-off-by: Sage Weil <sage@newdream.net>
We grew several copies of this code, and it turns out none of them were correct.
- assign flush tid in send_cap() helper
- pin inode on (dirty | flushing), not either/both
- add a proper mark_caps_flushing helper
and a bunch of other stuff. This brings this bit of code in alignment with
the kernel implementation.
And, flush_caps() on cap import.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We need to wait for the client to flush snapped caps if the client has
not already flushed for the given snap. If the client has already flushed
caps through the last snapid for the old inode, we do not need to set up
the snapped inode's locks to wait for that.
This fixes an occasional hang on the snaps/snaptest-multiple-capsnaps.sh
workunit.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Formerly, CEPH_CONF was not respected by libraries. But now it is.
It overrides the default when reading the config file.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
This mirrors a kclient change a while back (e835124).
We only want to send one flushsnap cap message per MDS session:
- it's a waste to send multiples
- the mds will only reply to the first one
If the mds restarts we need to resend.
This fixes a hang where we send multiples, the first (and only) reply is
ignored (due to tid mismatch), and we are left with dangling references to
the inode and hang on umount. (Reliably reproduced by running the full
snaps/ workunit directory.)
Fixes: #1239
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Add an activate() function that must be called before we call the
onfinish callback. This is especially important in multi-threaded
contexts, since otherwise if completions come in in the wrong order, we
may delete the C_Gather object right before calling new_sub on it!
Also delete rm_subs because it is redundant with sub_finish.
Finally, num_subs_created, num_subs_remaining are now methods on
C_GatherBuilder rather than C_Gather.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Filer.h now uses C_GatherBuilder to avoid memory leaks.
Also, C_GatherBuilder's constructor now takes a Context.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
C_Gather objects are deleted by the last sub-context to execute.
If you create a C_Gather object manually, you must worry about the case
where there are no sub-contexts.
C_GatherBuilder is a little object that sits on the stack that allows
you to build C_Gather objects without worrying about this.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
The past primary was sending out scrub unreserve messages to all the
non-primary OSDs in the acting set on a PG state change. They're
spurious since the other OSDs will cancel the scrubs themselves
on state change, and they weren't right anyway because the loop
was looking at all the non-primary OSDs and sending out a message,
which could have excluded the new primary (if it was a replica before)
included other OSDs new to the PG, and included the current OSD.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
The dentries we reference may have been unlinked prior to us sending this
request. That's fine as long as we don't dereference a null dentry.
Signed-off-by: Sage Weil <sage@newdream.net>