The PGLog (merge) code stills the pg_log_t entries, but operator<< (called
by the message printer) doesn't look at it. Document.
Signed-off-by: Sage Weil <sage@redhat.com>
The RBD C API now has scatter/gather IO support via the API. The C++
API uses bufferlists which can also support scatter/gather IO.
Fixes: http://tracker.ceph.com/issues/13025
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
MDCache::handle_cache_rejoin_strong(). may add new inodes (race with
cache expire). Updating these inodes is at the very end of the function.
Before these inodes get updated, MDCache::handle_cache_rejoin_strong()
may add dentries to these inodes. So dir_hash type of these inodes
should be set to the default value.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
MDSRankDispatcher::handle_mds_map() calls kick_discovers() when
the recovering mds enters rejoin state. No need to call it again
when the recovering mds entry clientreplay/active state.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
commit 22535340 tried to fix race between cache expire and
MDentryLink. It avoids trimming null dentry whose lock is
not readable. The fix does not handle the case that MDS
first recevies a MDentryUnlink message, then receives a
MDentryLink message.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When handling trans-authority rename, the master mds may ask slave
mds to wrlock a lock, then try to wrlock the same lock locally.
If the master can't wrlock the lock locally, it need to drop the
remote wrlock and wait. Otherwise deadlock happens. The code does
not handle a corner case: Lock::wrlock_start() can sleep even
when SimpleLock::can_wrlock() return true.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
If early reply is not allowed for open request, MDS does not send
reply to client immediately after adding adds new caps. Later when
MDS sends the reply, client session can be in stale stale. If MDS
does not issue the new caps to client along with the reply, the
new caps get lost. This issue can cause MDS to hang at revoking
caps.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
mds can do a slave rename that moves directory inode (whoes dirfrags
are all non-auth) to new auth. Then rolls back the slave rename. If
There is a ESubtreeMap event between log event of slave rename and
log event of rollback. The ESubtreeMap does not have information
about the inode's non-auth dirfrags.
Later when mds replays the log, the log event of slave rename can
be missing. So mds need to re-create subtree bounds when replaying
the log event of rename rollback
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Linkage of rename source dentry may change during freezing auth
pin for the rename source inode. So we may freeze auth pin for
the wrong inode.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When doing trans-authority rename, the master mds may send two slave
requests to auth mds of rename source inode. The first slave request
set ambiguous auth on rename source inode. The second slave request
is sent after receiving all bystanders' slave request replies.
Current code uses mdr->more()->is_ambiguous_auth bit to indicate if
the first slave reuqest was sent. The is_ambiguous_auth is set when
when calling Server::_rename_prepare_witness(). This causes problem
if Server::_rename_prepare_witness() can't send the slave request
immediately and wants to retry the MDRequest laster. The fix is set
is_ambiguous_auth when receiving reply for the first slave request
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When mds cluster is in rejoin state, we know all mds have finished
their exports. All export abort notifications have been processed
by standby mds. So it's safe to disambiguate other mds' imports.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
To disambiguate other mds's failed import, survivor bystander mds
need to receive either exporter mds' export abort notification or
exporter mds' resolve message. For bystander mds, it's hard to
distinguish "export succeeded" from the case "hasn't received
export abort notification".
To handle this problem, we rely on the fact that surviver mds does
not send resolve message to the recovering mds until it finishes
all its exports. Without the resolve message, the recovering mds
can't go to rejoin state. So when mds cluster is in rejoin state,
we know all mds have finished their exports. If export abort
notifications also require acknowledgments. When mds cluster is
in rejoin state, we know all export abort notifications have been
proceesed by bystander mds. So bystander mds can disambiguate other
mds' imports
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When auth mds of rename source dentry fails, slave updates in witness
mds become ambiguous. Witnesses need to ask the master if they should
rollback the updates. This type of rollback is special, corresponding
MDRequest struct need to be preserved after rollback. If the master
mds also fails, slave updates in witness mds are no longer special.
Corresponding MDRequest struct need to be cleanup after rollback.
see commit e62e48bb for more information.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When survivor mds sends resolve message to recovering mds, aslo
records committing slave request in the message. So the recovering
mds knows the slave commit is still being journaled. It journals
master commit after receiving corresponding OP_COMMITTED message.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When handling mds failure, we need to distinguish committing and
'rolling back' slave requests from unprepared slave requests.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
EMetaBlob::add_dir_contex() skips adding inodes that has already
been journaled in the last ESubtreeMap. The log replay code only
replays the first ESubtreeMap. For the rest ESubtreeMap, it just
verifies subtree map in the cache matches the ESubtreeMap. If
unnessary inodes were included in non-first ESubtreeMap, these
inodes do not get added to the cache, the log replay code can
find these inodes are missing when replaying the rest events in
the log segment.
Previous attempt (commit a9b959dfb7) to fix this issue is not
complete. This patch makes MDCache::create_subtree_map() journal
dirfrags according to simplified subtree map. It should fix this
issue completely.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
DEB packaging builds happen with LTTNG enabled but are missing a few
files.
* libosd_tp.so*, libos_tp.so* are needed to trace OSD
* librados_tp.so, librbd_tp.so are needed along with the other files
for trace visibility within lttng tool.
Signed-off-by: Ganesh Mahalingam <ganesh.mahalingam@intel.com>