This patch adds "open-by-ino" helper. It utilizes backtrace to find
inode's path and open the inode. The algorithm looks like:
1. Check MDS peers. If any MDS has the inode in its cache, goto step 6.
2. Fetch backtrace. If backtrace was previously fetched and get the
same backtrace again, return -EIO.
3. Traverse the path in backtrace. If the inode is found, goto step 6;
if non-auth dirfrag is encountered, goto next step. If fail to find
the inode in its parent dir, goto step 1.
4. Request MDS peers to traverse the path in backtrace. If the inode
is found, goto step 6. If MDS peer encounters non-auth dirfrag, it
stops traversing. If any MDS peer fails to find the inode in its
parent dir, goto step 1.
5. Use the same algorithm to open the inode's parent. Goto step 3 if
succeeds; goto step 1 if fails.
6. return the inode's auth MDS ID.
The algorithm has two main assumptions:
1. If an inode is in its auth MDS's cache, its on-disk backtrace
can be out of date.
2. If an inode is not in any MDS's cache, its on-disk backtrace
must be up to date.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
We may want to fetch backtrace while corresponding inode isn't
instantiated. MDCache::fetch_backtrace() will be used by later
patch.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
To queue a backtrace update, current code allocates a BacktraceInfo
structure and adds it to log segment's update_backtraces list. The
main issue of this approach is that BacktraceInfo is independent
from inode. It's very inconvenient to find pending backtrace updates
for given inodes. When exporting inodes from one MDS to another
MDS, we need find and cancel all pending backtrace updates on the
source MDS.
This patch brings back old backtrace handling code and adapts it
for the current backtrace format. The basic idea behind of the old
code is: when an inode's backtrace becomes dirty, add the inode to
log segment's dirty_parent_inodes list.
Compare to the current backtrace handling, another difference is
that backtrace update is journalled in EMetaBlob::full_bit
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Current way to journal backtrace update is set EMetaBlob::update_bt
to true. The problem is that an EMetaBlob can include several inodes.
If an EMetaBlob's update_bt is true, journal replay code has to queue
backtrace updates for all inodes in the EMetaBlob.
This patch adds two new flags to class EMetaBlob::fullbit, make it be
able to journal backtrace update.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
When there are more than one active MDS, restarting MDS triggers
assertion "reconnected_snaprealms.empty()" quite often. If there
is no snapshot in the FS, the items left in reconnected_snaprealms
should be other MDS' mdsdir. I think it's harmless.
If there are snapshots in the FS, the assertion probably can catch
real bugs. But at present, snapshot feature is broken, fixing it is
non-trivial. So replace the assertion with a warning.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
If a MDiscover message is for discovering base inode, want_base_dir
should be false, path should be empty.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
For replica, filelock in LOCK_LOCK state doesn't allow Fc cap. So
filelock in LOCK_SYNC_LOCK/LOCK_EXCL_LOCK state shouldn't allow Fc
cap either.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
When inode is freezing or frozen, we defer processing MClientCaps
messages and cap release embedded in requests. The same deferral
logical should also cover MClientCapRelease messages.
After sending cache rejoin message, replica need notify auth MDS when
cap_wanted changes. But it can send MInodeFileCaps message only after
receiving auth MDS' rejoin ack. Locker::request_inode_file_caps() has
correct wait logical, but it skips sending MInodeFileCaps message if
the auth MDS is still in rejoin state.
The fix is defer sending MInodeFileCaps message until the auth MDS
is active. It makes the function's wait logical less tricky.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
CInode:mds_caps_wanted is used to keep track of caps wanted by non-auth
MDS. The auth MDS checks it when choosing locks' states.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
when failure of peer is detected, MDCache::handle_mds_failure()
checks if there are requests waiting for slave replies from the
failed peer, and adds them to the "wait for active peer" list.
The "retry request" logical only covers slave requests sent before
MDCache::handle_mds_failure() is called. If a slave request was
sent while peer isn't up, we wait for its reply forever.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
I previously added code to handle a corner case of cache rejoin:
entire subtree, together with the inode subtree root belongs to,
were trimmed between sending cache rejoin and receiving rejoin ack.
In this case, we should send cache expire message to the subtree's
auth MDS. But the code is complete broken, remove it temporarily.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Current code uses import state to detect obsolete import discover/prep
message. it does not work for the case: cancel a subtree import, import
the same subtree again, the discover/prep message for the first import
get dispatched.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
For unlink/rename request, the target dentry's linkage may change
before all locks are acquired. So we need check if the existing stray
dentry is valid.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
MDS may crash after journalling a slave commit, but before sending
commit ack to the master. Later when the MDS restarts, it will not
send commit ack to the master. So the master waits for the commit
ack forever. The fix is remove failed MDS from requests' uncommitted
slave list. When failed MDS recovers, its resolve message will tell
the master which slave requests are not committed. The master will
re-add the recovering MDS to requests' uncommitted slave list if
necessary.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
We may add new waiter while the master is committing. so we should
take the waiters and wake up them when the master is committed.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
We only journal the finish of exporting subtree, so we shouldn't
consider export bounds as subtree root.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Call AioCompletion::release() if the completion is no longer
needed to free the resources.
CID 727981 (#3 of 3): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "top_aioc" going out of scope leaks the
storage it points to.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Call AioCompletion::release() if the completion is no longer
needed to free the resources.
CID 727983 : Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "aioc" going out of scope leaks the
storage it points to.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>