This test deletes the CephFS already present on the cluster at the very
beginning and unmounts the first client beforehand. But it leaves the
second client mounted on this deleted CephFS that doesn't exist for the
rest of the test. And then at the very end of this test it attempts to
remount the second client (during tearDown()) which hangs and causes
test runner to crash.
Unmount the second client beforehand to prevent the bug and delete
mount_b object to avoid confusion for the readers in future about
whether or not 2nd mountpoint exists.
Fixes: https://tracker.ceph.com/issues/66077
Signed-off-by: Rishabh Dave <ridave@redhat.com>
After blocklisted/failed, wait for the mirror daemon restart
which is after 30 seconds timeout and then check for the new rados_inst.
Fixes: https://tracker.ceph.com/issues/64927
Signed-off-by: Jos Collin <jcollin@redhat.com>
* refs/pull/57619/head:
qa/cephfs: use wait_for_daemon() instead of sleep()-ing
qa/cephfs: mark file system joinable for fs rename tests before unmounting clients
Reviewed-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/53503/head:
qa: add tests for `mds last-seen` command
doc/cephfs: add documentation for `mds last-seen`
PendingReleaseNotes: add note on last-seen command
mon/MDSMonitor: add command to lookup when mds was last seen
mon/MDSMonitor: set birth time on FSMap during encode
pybind/mgr/dashboard: show context diff for openapi check
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Adds a test class test_misc.TestSessionClientEvict
which contains test for the issues mentioned in this PR.
Fixes: https://tracker.ceph.com/issues/58619
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
When evicting the clients or sessions during the
mds_session_blocklist_on_evict option is disabled the clients should
reconnect to MDS successfully later after new IOs being sent.
URL: https://tracker.ceph.com/issues/65647
Signed-off-by: Xiubo Li <xiubli@redhat.com>
The asok interface will mangle stdout if the command actually fails.
The reason `flush path` is done via the asok interface is because the tell/asok
interfaces were unified after these tests were written and `flush path` was
only available via the asok interface.
Fixes: https://tracker.ceph.com/issues/66184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
New name will make it easier to find these tests by making it similar to
health warnings (MDS_CACHE_OVERSIZED) they do testing for.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
1. Instead of accepting health report as argument, get one directly.
2. Since it is not being used elsewhere move it to the class where it is
being used.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Code to generate MDS_TRIM and MDS_CACHE_OVERSIZED health warnings is
repeated in test methods of TestMDSFail and TestFSFail. Move this code
to separate helper methods so that it can be reused instead of
duplicating it. And move these helper methods to TestAdminCommands so
to make them conveniently available for reuse.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/57579/head:
mds/quiesce: disable quiesce root debug parameters by default
mds/quiesce-agt: never send a synchronous ack
mds/quiesce-agt: add test for a rapid async ack
mds/quiesce: always abort fragmenting asynchronously to prevent reentrancy
mds/quiesce: overdrive an export if it hasn't frozen the tree yet
mds/quiesce: quiesce_inode should not hold on to remote auth pins
qa/cephfs: check that a completed quiesce doesn't hold remote auth pins
mds: add `--lifetime` parameter to the `lock path` asok command
mds/quiesce: accept a regular file as the quiesce root
mds: command_quiesce_path: rename `--wait` to `--await` for consistency
mds: command_quiesce_path: do not block the asok thread and return an adequate rc
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
1. avoid taking a remote authpin for the quiesce lock
2. drop remote authpins that were taken because of other locks
We should not be forcing a mustpin when taking quiesce lock.
This creates unnecessary overhead due to the distributed nature
of the quiesce: all ranks will execute quiesce_inode, including
the auth rank, which will authpin the inode.
Auth pinning on the auth rank is important to synchronize quiesce
with operations that are managed by the auth, like fragmenting
and exporting.
If we let a remote quiesce process take a foreign authpin then
it may block freezing on the auth, which will stall quiesce locally.
This wouldn't be a problem if the quiesce that is blocked on the auth
and the quiesce that's holding a remote authpin from the replica side
were unrelated, but in our case it may be the same logical quiesce
that effectively steps on its own toes. This creates an opportunity
for a deadlock.
Fixes: https://tracker.ceph.com/issues/66152
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
To avoid all sorts for races that could happen when using
sleep().
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
When a request is blocked on the quiesce lock, it should release
all remote authpins, especially those that make an inode AUTHPIN_FROZEN
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
* when the quiesce lock is taken by this op, don't consider the inode `quiesced`
* drop all locks taken during traversal
* drop all local authpins after the locks are taken
* add --await functionality that will block the command until locks are taken or an error is encountered
* return the RC that represents the operation result. 0 if the operation was scheduled and hasn't failed so far
* add authpin control flags
** --ap-freeze - to auth_pin_freeze the target inode
** --ap-dont-block - to pass auth_pin_nonblocking when acquiring the target inode locks
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Block test_idem_unaffected_root_squash temporarily and
test_multifs_single_path_rootsquash.
This test fails due to a known bug. Block it temporarily so that
test_admin.py can run fully and PRs under QA can be tested fully.
Otherwise, this test fails and that halts test_admin.py, which leaves
the PR partially untested.
This failure is then seen as an unrelated failure which lets the buggy
code get merged. This has happened recently.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.
Fixes: https://tracker.ceph.com/issues/65841
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This issue was not caught in original QA run because "ceph mds fail"
returns 0 even though MDS name received by it in argument is
non-existent. This is done for the sake of idempotency, however it
caused this bug to go uncaught.
Fixea: https://tracker.ceph.com/issues/65864
Signed-off-by: Rishabh Dave <ridave@redhat.com>