Inode with multiple hardlinks is attached to global snaprealm.
Before modifying a hardlink, record snaps that reference the
the hardlink. When all hardlinks are removed, stray inode gets
moved into normal snaprealm. By checking the recorded snaps,
mds knows if there still are snaps reference the stray inode.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
The dummy global snaprealm includes all snapshots in the filesystem.
For any later snapshot, mds will COW the inode and preserve snap data.
These snap data will cover any possible snapshot on remote linkages
of the inode.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
This simplifies trans-authority rename. Master can prepare new snaplream
for source inode even it's not auth mds.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
When creating new snaprealm, we need to split its parent snaprealm's
inodes_with_caps. If new snaprealm is newly created during rename,
inode's original snaprealm's inodes_with_caps should be split. So in
rename/rmdir cases, we should pop projectd snaprealm before inode's
parent changes
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
unlike locks of other types, isnap lock and dentry lock in unreadable
state can block path traverse, so it should be in sync state as much
as possible.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
In multimds setup, it's possible that mds receives snap update message
after receiving client requests that lookup the newly created snapshot.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
For new format snaprealm, there is no need to open past parent,
SnapRealm::have_past_parents_open() always return true, In multimds
setup, mds may use snaprealm whithout opening past parents. So the
assertion in SnapRealm::check_cache() is wrong.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
rmdir and rename may create/update snaprealms. If snaprealms are
created/updated, encode the updated snaprealms in slave requests
and dentry unlink messages. So that when rmdir or rename finishes,
snaprealms in different mds are in sync.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
sending CEPH_SNAP_OP_SPLIT and CEPH_SNAP_OP_UPDATE messages to
clients centrally in MDCache::open_snaprealms()
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
The basic idea is:
1. For recovering mds:
Learn other mds' pending snaptable commits from resolve messages.
Load snaptable cache from snapserver when resolve done.
2. For survivor mds:
Refresh snaptable cache from snapserver when cluster is in resolving
state.
Learn recovering mds' pending snaptable commits from resolve messages.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
After snaptable update get prepared, push the update preparation to
all active snaptable clients, then send reply to update initiator.
By this way, the initiator know that all mds have record the update
preparation in their cache. When committing the snaptable update,
the initiator notifies all mds about the commit. Bystander mds'
snaptable cache get synchronized when it receives the notification.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
To get effective snaps in past snaprealms, we just need to filter out
deleted snaps by using global snap infos. This avoids the complexity
of opening 'past parents'
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
The idea is caching both snap infos and pending updates in snapclient.
The snapclient also tracks updates that are being committed, it applies
these commits to its cached snap infos. Steps to update snaptable are:
- mds.x acquire locks (xlock on snaplock of affected snaprealm inode)
- mds.x prepares snaptable update. (send preare to snapserver and waits
for 'agree' reply)
- snapserver sends notification about the update to all mds and waits
for ACKs. (not implemented by this patch)
- snapserver send 'agree' reply to mds.x
- mds.x journals corresponding
- mds.x commits the snaptable update and notifies all mds that it
commits that update. then mds drops locks.
When receiving committing notification, mds applies the committing
update to its cached snap infos. By this way, cached snap infos get
synchronized before snaplock become readable.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
this is preparetion for later change that caches snaptable in
snapclient and sync the cached snaptable between mds.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
* refs/pull/19954/head:
test/encoding: refactor to avoid escaping shell magic
mds: minor refactor of SimpleLock
mds: track Capability in mempool
mds: move CInode container members to mempool
mds: move CDentry container members to mempool
mds: move CDir container members to mempool
mds: put MDSCacheObject compact_map in mempool
common: use size_t for object size
mds: convert to allocator agnostic string_view
mds: simplify initialization
compact_*: support mempool allocated containers
Reviewed-by: Zheng Yan <zyan@redhat.com>
* refs/pull/20190/head:
mon: allow removal of tier of ec overwritable pool
Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: João Eduardo Luís <joao@suse.de>