Commit Graph

1598 Commits

Author SHA1 Message Date
Venky Shankar
92ad0c83aa
Merge pull request #57458 from lxbsz/wip-session-evict
qa/cephfs: add test_session_evict_non_blocklisted test case

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-08-07 09:50:00 +05:30
Venky Shankar
f4b5465815 Merge PR #51332 into main
* refs/pull/51332/head:
	qa: add test for ceph tell with unknown cephtype
	pybind/ceph_argparse: fixing error message for ceph tell command

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2024-08-01 12:19:52 +05:30
Kotresh HR
983f893fb9 qa: Add mds caps test for testing fs read and a path rw
Fixes: https://tracker.ceph.com/issues/67212
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2024-07-30 23:18:22 +05:30
neeraj pratap singh
decf32e823 qa: add test for ceph tell with unknown cephtype
Fixes: https://tracker.ceph.com/issues/59624
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
2024-07-15 17:28:30 +05:30
Venky Shankar
2ab14159a6
Merge pull request #49974 from neesingh-rh/wip-58619
mds: fix session/client evict command.

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-07-15 15:55:20 +05:30
Venky Shankar
62eb72731a
Merge pull request #56193 from joscollin/wip-B64927-test_cephfs_mirror_blocklist-fail
cephfs_mirror, qa: fix mirror daemon doesn't restart when blocklisted or failed

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-07-15 15:25:19 +05:30
Rishabh Dave
2130ec8ebc qa/cephfs: fix test_single_path_authorize_on_nonalphanumeric_fsname
This test deletes the CephFS already present on the cluster at the very
beginning and unmounts the first client beforehand. But it leaves the
second client mounted on this deleted CephFS that doesn't exist for the
rest of the test. And then at the very end of this test it attempts to
remount the second client (during tearDown()) which hangs and causes
test runner to crash.

Unmount the second client beforehand to prevent the bug and delete
mount_b object to avoid confusion for the readers in future about
whether or not 2nd mountpoint exists.

Fixes: https://tracker.ceph.com/issues/66077
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2024-07-11 22:08:06 +05:30
Rishabh Dave
384acdeb47
Merge pull request #57492 from rishabh-d-dave/qa-fs-mds-fail-improve
qa/cephfs: improvements for "mds fail" and "fs fail"

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2024-07-11 16:37:16 +05:30
Venky Shankar
69704e91bf Merge PR #53301 into main
* refs/pull/53301/head:
	qa: adding test for preventing scrub when mds is inactive
	mds: prevent scrub start for standby-replay MDS

Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
2024-07-09 11:08:26 +05:30
neeraj pratap singh
b9a2d0571f qa: adding test for preventing scrub when mds is inactive
Fixes: https://tracker.ceph.com/issues/62537
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
2024-07-08 15:33:54 +05:30
Jos Collin
a9a56919ff
qa: Wait for mirror daemon restart before getting new rados_inst
After blocklisted/failed, wait for the mirror daemon restart
which is after 30 seconds timeout and then check for the new rados_inst.

Fixes: https://tracker.ceph.com/issues/64927
Signed-off-by: Jos Collin <jcollin@redhat.com>
2024-07-05 10:14:20 +05:30
Venky Shankar
dbc9816d2e Merge PR #57619 into main
* refs/pull/57619/head:
	qa/cephfs: use wait_for_daemon() instead of sleep()-ing
	qa/cephfs: mark file system joinable for fs rename tests before unmounting clients

Reviewed-by: Rishabh Dave <ridave@redhat.com>
2024-06-27 22:04:37 +05:30
Venky Shankar
cac7dcd634 Merge PR #53755 into main
* refs/pull/53755/head:
	PendingReleaseNotes: add note about CephFS set_vxattrs
	doc/cephfs: Update docs to match remove functionality and respective vxattrs
	qa: Add test coverage for vxattr behavior
	qa: Add removexattr to support setfattr removal.
	mds: Implement remove for ceph vxattrs

Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
2024-06-27 19:58:50 +05:30
Patrick Donnelly
c8053b11e7
Merge PR #53503 into main
* refs/pull/53503/head:
	qa: add tests for `mds last-seen` command
	doc/cephfs: add documentation for `mds last-seen`
	PendingReleaseNotes: add note on last-seen command
	mon/MDSMonitor: add command to lookup when mds was last seen
	mon/MDSMonitor: set birth time on FSMap during encode
	pybind/mgr/dashboard: show context diff for openapi check

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-06-25 12:27:28 -04:00
Venky Shankar
d17c681296 Merge PR #56052 into main
* refs/pull/56052/head:
	qa/suites: ignore unresponsive client when the test passes
	qa: enhance per-client labelled perf counters test

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-06-25 12:40:25 +05:30
Venky Shankar
0627148110 Merge PR #57034 into main
* refs/pull/57034/head:
	qa: cleanup snapshots before subvolume delete

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-06-25 12:37:56 +05:30
Patrick Donnelly
edc584a533
qa: add tests for mds last-seen command
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-06-20 21:32:56 -04:00
Patrick Donnelly
ac092f63bf
qa: add test to verify recovery of alternate_name from journal
Test without the fix:

    2024-05-16T22:34:21.781 DEBUG:teuthology.orchestra.run.smithi044:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 300 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.a.asok --format=json dump tree /dir 0
    ...
    [
      {
        "accounted_rstat": {
          "rbytes": 4096,
          "rctime": "2024-05-16T22:34:15.251864+0000",
          "rfiles": 1,
          "rsnaps": 0,
          "rsubdirs": 1,
          "version": 0
        },
        "atime": "2024-05-16T22:34:14.498880+0000",
        "auth_pins": 0,
        "auth_state": {
          "replicas": {}
        },
        "authlock": {},
        "backtrace_version": 14,
        "btime": "2024-05-16T22:34:14.498880+0000",
        "change_attr": 3,
        "client_caps": [
          {
            "client_id": 5184,
            "issued": "pAsLsXsFsx",
            "last_sent": 4,
            "pending": "pAsLsXsFsx",
            "wanted": "pAsLsXsFsxcral"
          }
        ],
        "client_ranges": [],
        "ctime": "2024-05-16T22:34:15.249864+0000",
        "damage_flags": 0,
        "dir_layout": {
          "dir_hash": 2,
          "unused1": 0,
          "unused2": 0,
          "unused3": 0
        },
        "dirfrags": [
          {
            "auth_pins": 0,
            "auth_state": {
              "replicas": {}
            },
            "committed_version": "0",
            "committing_version": "0",
            "dentries": [
              {
                "alternate_name": "bUHUwH9E8uiVaf8xZ+zONcB1CToj53x5aUUnKdnNj5U37zbh28l1AaWwHhbOT3HyzqKjmSKKW1o4odQJc7nF9xrKIB8D3b4qqb2Cs6s7t2106hHhQk5/YV7DtpeNPZnorcTqxPM/hExtWHSS4P+S+Dpwj62hMyh/77sGhiW1Filvv1gQjV+sN/GozPNwHgfleadkUs1OkRkYtgWrCjbKP0MayRtiOLrVTRuYyOp/Qt3+XCIyiS87B9bUcOFjWratF+yR0kpJ0RYriix7NKVkBJ0kGWYSCY+PYjiLeMYJBMQcCxW/nwfVku+m6fgFJvb6pjEFxIk9zT5cunSImsjr",
                "auth_pins": 0,
                "auth_state": {
                  "replicas": {}
                },
                "inode": 1099511628283,
                "is_auth": true,
                "is_freezing": false,
                "is_frozen": false,
                "is_new": false,
                "is_null": false,
                "is_primary": true,
                "is_remote": false,
                "lock": {},
                "nref": 2,
                "path": "dir/bUHUwH9E8uiVaf8xZ+zONcB1CToj53x5aUUnKdnNj5U37zbh28l1AaWwHhbOT3HyzqKjmSKKW1o4odQJc7nF9xrKIB8D3b4qqb2Cs6s7t2106hHhQk5,YV7DtpeNPZnorcTqxPM,hExtWHSS4P+S+Dpwj62hMyh,77sGhiW1Filvv1gQjV+sN,GozPNwHgfleadkUuZ+PMLCaKQXhuid9WvmHanxJnaabYDLj4VEz+EX2WsG",
    ...
    # fail + journal recovery
    2024-05-16T22:35:31.077 DEBUG:teuthology.orchestra.run.smithi044:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 300 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.a.asok --format=json dump tree /dir 0
    ...
    [
      {
        "accounted_rstat": {
          "rbytes": 4096,
          "rctime": "2024-05-16T22:34:15.251864+0000",
          "rfiles": 1,
          "rsnaps": 0,
          "rsubdirs": 1,
          "version": 0
        },
        "atime": "2024-05-16T22:34:14.498880+0000",
        "auth_pins": 0,
        "auth_state": {
          "replicas": {}
        },
        "authlock": {},
        "backtrace_version": 14,
        "btime": "2024-05-16T22:34:14.498880+0000",
        "change_attr": 3,
        "client_caps": [],
        "client_ranges": [],
        "ctime": "2024-05-16T22:34:15.249864+0000",
        "damage_flags": 0,
        "dir_layout": {
          "dir_hash": 2,
          "unused1": 0,
          "unused2": 0,
          "unused3": 0
        },
        "dirfrags": [
          {
            "auth_pins": 0,
            "auth_state": {
              "replicas": {}
            },
            "committed_version": "5",
            "committing_version": "5",
            "dentries": [
              {
                "alternate_name": "",
                "auth_pins": 0,
                "auth_state": {
                  "replicas": {}
                },
                "inode": 1099511628283,
                "is_auth": true,
                "is_freezing": false,
                "is_frozen": false,
                "is_new": false,
                "is_null": false,
                "is_primary": true,
                "is_remote": false,
                "lock": {},
                "nref": 2,
                "path": "dir/bUHUwH9E8uiVaf8xZ+zONcB1CToj53x5aUUnKdnNj5U37zbh28l1AaWwHhbOT3HyzqKjmSKKW1o4odQJc7nF9xrKIB8D3b4qqb2Cs6s7t2106hHhQk5,YV7DtpeNPZnorcTqxPM,hExtWHSS4P+S+Dpwj62hMyh,77sGhiW1Filvv1gQjV+sN,GozPNwHgfleadkUuZ+PMLCaKQXhuid9WvmHanxJnaabYDLj4VEz+EX2WsG",

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-06-10 12:56:00 -04:00
neeraj pratap singh
f3e674d424 qa: add test for fix of client/session evict command
Adds a test class test_misc.TestSessionClientEvict
which contains test for the issues mentioned in this PR.

Fixes: https://tracker.ceph.com/issues/58619
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
2024-06-08 08:19:31 +05:30
Xiubo Li
d2645fd157 qa/cephfs: add test_session_evict_non_blocklisted test case
When evicting the clients or sessions during the
mds_session_blocklist_on_evict option is disabled the clients should
reconnect to MDS successfully later after new IOs being sent.

URL: https://tracker.ceph.com/issues/65647
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2024-06-07 10:19:38 +08:00
Patrick Donnelly
d38dfda252
qa: add killpoint testing for dirfrags
Fixes: https://tracker.ceph.com/issues/7320
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Sidharth Anupkrishnan <sanupkri@redhat.com>
2024-06-06 13:58:47 -04:00
Patrick Donnelly
94a8113a18
qa: stringify arguments to setfattr
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-06-06 13:58:47 -04:00
Patrick Donnelly
82c43d096f
qa: restore default for config to split exports
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-06-06 13:58:47 -04:00
Patrick Donnelly
7af0288505
Merge PR #57877 into main
* refs/pull/57877/head:
	qa: correct json lookup for new `lock path` output

Reviewed-by: Leonid Usov <leonid.usov@ibm.com>
2024-06-06 13:56:00 -04:00
Patrick Donnelly
715c951c34
qa: use tell interface for command that may fail
The asok interface will mangle stdout if the command actually fails.

The reason `flush path` is done via the asok interface is because the tell/asok
interfaces were unified after these tests were written and `flush path` was
only available via the asok interface.

Fixes: https://tracker.ceph.com/issues/66184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-06-04 15:44:25 -04:00
Patrick Donnelly
1b9b9a809a
qa: correct json lookup for new lock path output
Fixes: https://tracker.ceph.com/issues/66355
Fixes: 3552fc5a9e
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-06-04 14:24:05 -04:00
Christopher Hoffman
697c2fc877 qa: Add test coverage for vxattr behavior
Add tests to validate default and remove xattr behaviors.

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
2024-05-29 13:43:32 +00:00
Christopher Hoffman
26a352aae9 qa: Add removexattr to support setfattr removal.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
2024-05-29 13:43:32 +00:00
Rishabh Dave
0f41207dac qa/cephfs: rename couple of test methods
New name will make it easier to find these tests by making it similar to
health warnings (MDS_CACHE_OVERSIZED) they do testing for.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2024-05-29 17:31:15 +05:30
Rishabh Dave
b1918686a3 qa/cephfs: improve and move _get_unhealthy_mds_name to TestMDSFail
1. Instead of accepting health report as argument, get one directly.
2. Since it is not being used elsewhere move it to the class where it is
   being used.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2024-05-29 17:31:15 +05:30
Rishabh Dave
5972cafb7a qa/cephfs: use wait_for_health() instead of the new method
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2024-05-29 17:31:15 +05:30
Rishabh Dave
79b047b22e qa/cephfs: make code for generating health warnings reusable
Code to generate MDS_TRIM and MDS_CACHE_OVERSIZED health warnings is
repeated in test methods of TestMDSFail and TestFSFail. Move this code
to separate helper methods so that it can be reused instead of
duplicating it. And move these helper methods to TestAdminCommands so
to make them conveniently available for reuse.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2024-05-29 17:31:15 +05:30
Venky Shankar
75bcfd1bbf Merge PR #55758 into main
* refs/pull/55758/head:
	doc: update 'journal reset' command with --yes-i-really-really-mean-it
	qa: fix cephfs-journal-tool command options and make fs inactive
	cephfs-journal-tool: Add warning messages during 'journal reset' and prevent execution on active fs

Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-05-29 15:04:58 +05:30
Patrick Donnelly
25e4ee2fa7
Merge PR #57579 into main
* refs/pull/57579/head:
	mds/quiesce: disable quiesce root debug parameters by default
	mds/quiesce-agt: never send a synchronous ack
	mds/quiesce-agt: add test for a rapid async ack
	mds/quiesce: always abort fragmenting asynchronously to prevent reentrancy
	mds/quiesce: overdrive an export if it hasn't frozen the tree yet
	mds/quiesce: quiesce_inode should not hold on to remote auth pins
	qa/cephfs: check that a completed quiesce doesn't hold remote auth pins
	mds: add `--lifetime` parameter to the `lock path` asok command
	mds/quiesce: accept a regular file as the quiesce root
	mds: command_quiesce_path: rename `--wait` to `--await` for consistency
	mds: command_quiesce_path: do not block the asok thread and return an adequate rc

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2024-05-28 12:46:08 -04:00
Leonid Usov
b1cb6d9856 mds/quiesce: quiesce_inode should not hold on to remote auth pins
1. avoid taking a remote authpin for the quiesce lock
2. drop remote authpins that were taken because of other locks

We should not be forcing a mustpin when taking quiesce lock.
This creates unnecessary overhead due to the distributed nature
of the quiesce: all ranks will execute quiesce_inode, including
the auth rank, which will authpin the inode.

Auth pinning on the auth rank is important to synchronize quiesce
with operations that are managed by the auth, like fragmenting
and exporting.

If we let a remote quiesce process take a foreign authpin then
it may block freezing on the auth, which will stall quiesce locally.
This wouldn't be a problem if the quiesce that is blocked on the auth
and the quiesce that's holding a remote authpin from the replica side
were unrelated, but in our case it may be the same logical quiesce
that effectively steps on its own toes. This creates an opportunity
for a deadlock.

Fixes: https://tracker.ceph.com/issues/66152
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2024-05-26 11:33:52 +03:00
Leonid Usov
e32fb12b8e qa/cephfs: check that a completed quiesce doesn't hold remote auth pins
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2024-05-26 11:33:52 +03:00
Leonid Usov
f706ae8c2d mds/quiesce: accept a regular file as the quiesce root
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2024-05-26 11:33:52 +03:00
Leonid Usov
c20221574e mds: command_quiesce_path: rename --wait to --await for consistency
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2024-05-26 11:33:52 +03:00
Leonid Usov
df546a4fba mds: command_quiesce_path: do not block the asok thread and return an adequate rc
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2024-05-26 11:33:52 +03:00
Rishabh Dave
3005495225 qa/cephfs: use wait_for_daemon() instead of sleep()-ing
To avoid all sorts for races that could happen when using
sleep().

Signed-off-by: Rishabh Dave <ridave@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2024-05-23 01:01:06 -04:00
Venky Shankar
74452ad308 qa/cephfs: mark file system joinable for fs rename tests before unmounting clients
Fixes: http://tracker.ceph.com/issues/66088
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2024-05-23 01:01:06 -04:00
Patrick Donnelly
c3463d5005
Merge PR #57493 into main
* refs/pull/57493/head:
	qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2024-05-22 21:18:17 -04:00
Leonid Usov
bed8a47b80 qa/cephfs/test_quiesce: test proper handling of remote authpins
When a request is blocked on the quiesce lock, it should release
all remote authpins, especially those that make an inode AUTHPIN_FROZEN

Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2024-05-21 00:10:20 +03:00
Leonid Usov
3552fc5a9e mds: enhance the lock path asok command
* when the quiesce lock is taken by this op, don't consider the inode `quiesced`
* drop all locks taken during traversal
* drop all local authpins after the locks are taken
* add --await functionality that will block the command until locks are taken or an error is encountered
* return the RC that represents the operation result. 0 if the operation was scheduled and hasn't failed so far
* add authpin control flags
** --ap-freeze - to auth_pin_freeze the target inode
** --ap-dont-block - to pass auth_pin_nonblocking when acquiring the target inode locks

Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2024-05-21 00:10:19 +03:00
Leonid Usov
2b2af17ae4 qa/cephfs/test_quiesce: enhance the fragmentation test
Repeatedly quiesce under a heavy balancer load

Fixes: https://tracker.ceph.com/issues/65716
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2024-05-21 00:10:19 +03:00
Jos Collin
5337a127f0
qa: enhance per-client labelled perf counters test
Fixes: https://tracker.ceph.com/issues/65497
Signed-off-by: Jos Collin <jcollin@redhat.com>
2024-05-17 20:56:02 +05:30
Rishabh Dave
19ee59ecab
Merge pull request #57496 from rishabh-d-dave/block-test_idem_unaffected_root_squash
qa/cephfs: block buggy tests in test_admin.py

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2024-05-17 17:56:01 +05:30
Rishabh Dave
b7d07700d6 qa/cephfs: block buggy tests in test_admin.py
Block test_idem_unaffected_root_squash temporarily and
test_multifs_single_path_rootsquash.

This test fails due to a known bug. Block it temporarily so that
test_admin.py can run fully and PRs under QA can be tested fully.
Otherwise, this test fails and that halts test_admin.py, which leaves
the PR partially untested.

This failure is then seen as an unrelated failure which lets the buggy
code get merged. This has happened recently.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2024-05-17 10:18:13 +05:30
Rishabh Dave
faa30e03f3 qa/cephfs: set joinable on FS before exiting tests in TestFSFail
After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.

Fixes: https://tracker.ceph.com/issues/65841
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2024-05-16 22:11:01 +05:30
Rishabh Dave
ab643f7a50 qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd
This issue was not caught in original QA run because "ceph mds fail"
returns 0 even though MDS name received by it in argument is
non-existent. This is done for the sake of idempotency, however it
caused this bug to go uncaught.

Fixea: https://tracker.ceph.com/issues/65864
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2024-05-16 12:07:51 +05:30