Note that this will let the parent nestlock 'dirty' state get out of
sync with the lock state, as the whole point of the dirty rstat lists is
that it can happen any time. It does, however, queue us up.
We put some of the predirty_journal_parents() code that calls the
project_rstat_inode_to_frag() into a common helper and use that.
Signed-off-by: Sage Weil <sage@newdream.net>
The importer also needs to scatter pin. This avoids scatterlock gather
races like so:
A: start exporting to B
A: freeze, scatter pin tree
C: initiate gather
A: delay replay to gather
B: reply to gather, do not include (non-auth) dirfrag
A,B: finish migration
A: reply to gather, do not include (now non-auth) dirfrag
C: gets no info about the dirfrag!
By pinning on the importer, we ensure that at least one MDS will respond
to the gather with auth dirfrag info.
Signed-off-by: Sage Weil <sage@newdream.net>
The accounted_rstat must always remain consistent with the parent dirfrag,
which in turn means it is governed by the parent's nestlock.
The rstat is protected by _this_ inode's nestlock, and is updated by
scatter_writebehind() or predirty_journal_parents().
Signed-off-by: Sage Weil <sage@newdream.net>
Be careful about when we update bounding dirfrag info during an import. If
the lock is in a MIX state, we do NOT want to update, since the inode
auth doesn't know jack (unless they are also dirfrag auth, in which case
we'll find out when we unscatter anyway).
Fixes fix 9d81f9d6.
This can cause the inode rstat etc to become out of sync with dirfrag
accounted_rstat when the scatterlock is not in a gathered state: the
local values will get updated but those on other nodes will not, and the
inode will drift out of sync with the dirfrags.
Other callers to scatter_writebehind() are all in contexts where we have
_just_ gathered dirfrag state, or there is no remote dirfrag state to
gather.
Signed-off-by: Sage Weil <sage@newdream.net>
This is simpler (for the migrator), and wrlocks allow scatter_writebehind,
which is a no-no for a frozen tree. By pinning the frozen dir's parent
inode, we prevent any scatter or unscatter operations from implicitly
updating metadata within the frozen root dirfrag.
An election can start either because we call it, or because someone else
calls it. Either way, we need to reset our state, so move that code into
the election_starting() callback, which is called by the elector's
start()/call_election() anyway.
This hopefully fixes a case where we see a timeout expire on the monitor
and fail the assertion
mon/Paxos.cc: In function 'void Paxos::lease_timeout()':
mon/Paxos.cc:684: FAILED assert(mon->is_peon())
1: (SafeTimer::EventWrapper::finish(int)+0x259) [0x52da29]
2: (Timer::timer_entry()+0x8e3) [0x52f523]
3: (Timer::TimerThread::entry()+0xd) [0x46d45d]
4: (Thread::_entry_func(void*)+0xa) [0x458aca]
5: (()+0x6a3a) [0x7fe0bd6a4a3a]
6: (clone()+0x6d) [0x7fe0bc8c277d]
The Paxos::election_starting() hook resets the timer, and will at least
close this possible cause.
Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Make sure the straydn->first matches the rename target (destdnl->inode).
Unfortunately the cow happens _after_ the destdn->first is set, so instead
of trivially copying it, we dup the MAX calculation. Add some temp
variables to clean up similar code in this method.
Signed-off-by: Sage Weil <sage@newdream.net>
The dir commit/fetch and LogSegment::try_to_expire() rely on any new or
items in the directory getting new versions that correspond to a bump in
the dirfrag version. This must include dentries/inodes that are created
by the cow process, or else we have problems during dir commit/fetch or
segment expire.
Change the dirty list in the Mutation to include the pv so that we can
properly mark them dirty later.
Leave the inode one alone. We could theoretically do the same for the
dirty inodes, but this way we avoid projecting them and copying stuff
around. Any dirty cowed inode will also have a dirty dentry, so it will
still get saved regardless.
Signed-off-by: Sage Weil <sage@newdream.net>
We should only return the pdnvec for a full traverse. i.e., either a
success, or a failure in which we instantiate a null dn for the trailing
entry. This makes pdnvec well defined, and allows callers like
rdlock_path_pin_ref() to reply with a null lease when appropriate.
Signed-off-by: Sage Weil <sage@newdream.net>
The dentry needs a [first,last] range and we don't know what first is when
we miss a lookup. And part of the point of instantiating null dentires is
to issue leases against them, which we don't do. The client will cache
the null result.