Commit Graph

1368 Commits

Author SHA1 Message Date
Rishabh Dave
00f23e9a03
Merge pull request #52924 from rishabh-d-dave/test-nfs-pr-52556
qa: inherit RunCephCmd in CephTestCase instead of CephFSTestCase

Reviewed-by: Adam King <adking@redhat.com>
2023-09-05 20:21:30 +05:30
Milind Changire
d7dfac8111
Merge PR #52686 into main
* refs/pull/52686/head:
	PendingReleaseNotes: note about mandatory fs argument
	doc/cephfs: add note about mandatory --fs argument to snap-schedule
	qa: add test for mandatory fs argument to snap-schedule commands
	mgr/snap-schedule: tweaks to keep mypy happy
	mgr/snap_schedule: validate fs before execution

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-08-29 20:13:50 +05:30
Venky Shankar
52a908a605 Merge PR #52940 into main
* refs/pull/52940/head:
	qa: Wait for purge to complete in test_volume_info_pending_subvol_deletions

Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-08-28 11:51:11 +05:30
Milind Changire
484336e1c3
qa: add test for mandatory fs argument to snap-schedule commands
Signed-off-by: Milind Changire <mchangir@redhat.com>
2023-08-25 21:34:40 +05:30
Venky Shankar
e9f8be4bac Merge PR #52944 into main
* refs/pull/52944/head:
	PendingReleaseNotes: add a note for `mds_session_metadata_threshold` mds config
	test: add test to verify that a buggy client is blocklisted
	mds: add perf counter to track number of sessions evicted due to metadata threshold being exceeded
	mds: blocklist clients with "bloated" session metadata

Reviewed-by: Robin H. Johnson <robbat2@orbis-terrarum.net>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2023-08-25 18:46:20 +05:30
Rishabh Dave
4b369cf18e qa: inherit RunCephCmd in CephTestCase instead of CephFSTestCase
MgrTestCase also needs RunCephCmd. If RunCephCmd is inherited by
CephTestCase, instead of CephFSTestCase, MgrTestCase will automatically
inherit RunCephCmd because it inhertis CephTestCase.

Fixes: https://tracker.ceph.com/issues/62084
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-08-24 23:45:27 +05:30
Venky Shankar
726e5d7dde Merge PR #52676 into main
* refs/pull/52676/head:
	mds/Server: mark a cap acquisition throttle event in the request

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
2023-08-24 17:35:04 +05:30
Venky Shankar
f2e17e40e7 Merge PR #52741 into main
* refs/pull/52741/head:
	qa/cephfs: switch to python3 for centos stream 9

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
2023-08-22 13:23:40 +05:30
Venky Shankar
84df4b3d0c test: add test to verify that a buggy client is blocklisted
... when its session metadata is bloated due to buildup of
`completed_requests`.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
2023-08-22 00:47:27 -04:00
Leonid Usov
749c770676 mds/Server: mark a cap acquisition throttle event in the request
Fixes: https://tracker.ceph.com/issues/59067
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
2023-08-18 18:11:29 +03:00
Venky Shankar
9545e578a4 Merge PR #52547 into main
* refs/pull/52547/head:
	qa: add test cases for vanilla ops commands
	mds: dump locks when printing mutation ops
	common/TrackedOp: support overriding the _dump method
	mds: remove op field obsoleted by more usable "reqid"
	mds: dump metareq_t instead of full op
	mds: add lock type to formatter dump of SimpleLock
	mds: mark print methods const
	mds: drop MDRequestImpl::msg_lock
	mds: lock TrackedOp when dumping
	mds: avoid recursive locks dumping state
	common/TrackedOp: fix race updating description with proper lock
	common/Formatter: add support for dumping null
	common/Formatter: refactor generating xml name

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-08-14 17:52:47 +05:30
Kotresh HR
72125396d4 qa: Wait for purge to complete in test_volume_info_pending_subvol_deletions
Fixes: https://tracker.ceph.com/issues/62278
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2023-08-11 14:27:29 +05:30
Venky Shankar
53f89ea09b Merge PR #52765 into main
* refs/pull/52765/head:
	mgr/volumes: Fix pending_subvolume_deletions in volume info
	qa: Add testcase for pending_subvolume_deletions count

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Neeraj Pratap Singh <neesingh@redhat.com>
2023-08-11 11:39:52 +05:30
Patrick Donnelly
ca4d0dc42b
qa: add test cases for vanilla ops commands
To test they work, not that the output is useful.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2023-08-08 08:58:42 -04:00
Leonid S. Usov
8262586cd0
Merge pull request #52792 from leonid-s-usov/bulk-data-pool
mgr/volumes: create bulk data pool for new volumes
2023-08-08 11:24:59 +03:00
Kotresh HR
8b1303f4b1 qa: Add testcase for pending_subvolume_deletions count
Fixes: https://tracker.ceph.com/issues/62278
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2023-08-04 12:47:24 +05:30
Leonid Usov
9a8219cc2b mgr/volumes: set the 'bulk' flag for data pools created automatically for a new volume
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Fixes: https://tracker.ceph.com/issues/61595
2023-08-03 19:41:12 +03:00
Venky Shankar
cd18c51548 Merge PR #48732 into main
* refs/pull/48732/head:
	doc: add MDS treatise on segments
	pybind/mgr/dashboard: bump teuthology version
	qa/tasks/vstart_runner: stop overriding _run_python
	qa: stop overriding ceph_w prefix in vstart_runner
	qa/tasks/vstart_runner: update teuthology helper tool paths
	qa/tasks/vstart_runner: allow writing to command's stdin
	vstart.sh: always add CEPH_CONF to vstart_environment.sh
	qa: use stdin-killer for python3 command
	qa: add killpoint testing for mds shutdown
	qa: fix background exit condition
	qa: add filesystem helper for setting transient config
	qa: add helper for waiting for a rank to fail
	mds: add incompat feature for minor log segments
	mds: introduce ELid event to create/close log
	mds: change EResetJournal to major segment boundary
	mds: add killpoints for MDS shutdown
	qa: add numerous subtree test
	mds: track larger log events in perf dump
	mds: add minor LogSegment boundaries
	mds: obviate MDLog::start_entry
	mds: retype to properly sized unsigned ints
	mds: use unsigned type for event count
	mds: use base Context class for generalization
	mds: optimize segment lookup
	mds: add stream dump for LogSegment
	mds: handle conf changes in mdlog
	mds: remove redundant comment
	mds: remove unused method
	mds: set a reasonable minimum number of segments
	mds: sort configs

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-08-02 22:07:58 +05:30
Rishabh Dave
0584cd34b6
Merge pull request #52709 from rishabh-d-dave/cephfs-test_snapshots
qa/cephfs: fix test_disallow_monitor_managed_snaps_for_fs_pools

Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-08-02 19:18:07 +05:30
Xiubo Li
0a296183e0 qa/cephfs: switch to python3 for centos stream 9
Fixes: https://tracker.ceph.com/issues/62277
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2023-08-02 18:45:56 +08:00
Casey Bodley
568a21d83c qa/cephfs: redefinition of unused 'random' from line 7
seeing this run-tox-qa failure about tasks/cephfs/test_client_recovery.py:

246/285 Test #264: run-tox-qa ................................***Failed   58.54 sec
Requirement already satisfied: tox in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (4.6.4)
Requirement already satisfied: cachetools>=5.3.1 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (5.3.1)
Requirement already satisfied: chardet>=5.1 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (5.1.0)
Requirement already satisfied: colorama>=0.4.6 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (0.4.6)
Requirement already satisfied: filelock>=3.12.2 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (3.12.2)
Requirement already satisfied: packaging>=23.1 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (23.1)
Requirement already satisfied: platformdirs>=3.8 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (3.10.0)
Requirement already satisfied: pluggy>=1.2 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (1.2.0)
Requirement already satisfied: pyproject-api>=1.5.2 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (1.5.3)
Requirement already satisfied: tomli>=2.0.1 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (2.0.1)
Requirement already satisfied: virtualenv>=20.23.1 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from tox) (20.24.2)
Requirement already satisfied: distlib<1,>=0.3.7 in /home/jenkins-build/build/workspace/ceph-pull-requests/build/qa-virtualenv/lib/python3.10/site-packages (from virtualenv>=20.23.1->tox) (0.3.7)
flake8: install_deps /home/jenkins-build/build/workspace/ceph-pull-requests/qa> python -I -m pip install flake8
flake8: freeze /home/jenkins-build/build/workspace/ceph-pull-requests/qa> python -m pip freeze --all
flake8: flake8==6.1.0,mccabe==0.7.0,pip==22.3.1,pycodestyle==2.11.0,pyflakes==3.1.0,setuptools==65.6.3,wheel==0.38.4
flake8: commands[0] /home/jenkins-build/build/workspace/ceph-pull-requests/qa> flake8 --select=F,E9 --exclude=venv,.tox
./tasks/cephfs/test_client_recovery.py:12:1: F811 redefinition of unused 'random' from line 7
flake8: exit 1 (3.72 seconds) /home/jenkins-build/build/workspace/ceph-pull-requests/qa> flake8 --select=F,E9 --exclude=venv,.tox pid=706315
flake8: FAIL ✖ in 15.42 seconds

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-08-01 12:31:44 -04:00
Patrick Donnelly
20184c23d3
qa: use stdin-killer for python3 command
This relies on the new stdin-killer [1] teuthology helper that allows
interacting with the command's stdin.

[1] https://github.com/ceph/teuthology/pull/1846

Fixes: 8bb77ed9e1
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2023-08-01 11:16:26 -04:00
Patrick Donnelly
936da39a15
qa: add killpoint testing for mds shutdown
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2023-08-01 11:16:02 -04:00
Patrick Donnelly
1fa0039a98
qa: fix background exit condition
This change causes the program to exit gracefully when stdin is closed
rather than with a Python exception.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2023-08-01 11:16:02 -04:00
Patrick Donnelly
1962322f59
qa: add filesystem helper for setting transient config
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2023-08-01 11:16:02 -04:00
Patrick Donnelly
a6b8bbd2cb
qa: add helper for waiting for a rank to fail
For killpoint testing.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2023-08-01 11:16:01 -04:00
Patrick Donnelly
2142114a2d
qa: add numerous subtree test
When the ESubtreeMap is very large (~5k+ subtrees), the MDS will
end up logging only a few events (as bad as 1) per segment as the
subtree map dominates the segment size.

This test simply creates an artificially large subtree and confirms that
other file system activity completes in a timely manner. This is now
taking advantage of the minor segments which allows for a normal set of
events per log segment (and fewer subtree maps). The test fails on the
current main HEAD.

Historical note: when I first observed this abberant behavior, the
vstart cluster was actually using mds_debug_subtrees = True (the default
for every vstart cluster). This caused the MDS to write out the subtree
map (for debugging reasons) with every event. When testing the MDS with
large subtrees (distributed ephemeral pinning), this caused the MDS to
slow to a trickle of operations per second. Despite this unintentional
misconfiguration, the problem still exists but the number of auth
subtrees must be large for a particlar rank to replicate the behavior.

On main HEAD, the creation of 10k files (workload stage) takes ~110
seconds. On this branch, it takes ~30 seconds.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2023-08-01 11:16:01 -04:00
Venky Shankar
86c9a7a08d Merge PR #51995 into main
* refs/pull/51995/head:
	qa: wait for file to have correct size

Reviewed-by: Milind Changire <mchangir@redhat.com>
2023-08-01 20:36:59 +05:30
Rishabh Dave
27f43c9a89 qa/cephfs: fix test_disallow_monitor_managed_snaps_for_fs_pools
run_cluster_cmd() method is not available anymore because it was deleted
here on this PR -
https://github.com/ceph/ceph/pull/50569/files#diff-1c6c246ba42f343603d7174198dd1fb9c2654b6c883594d1a0891096b7a35875L408

Fixes: https://tracker.ceph.com/issues/62243
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-07-31 18:18:51 +05:30
Venky Shankar
4e5d800406 Merge PR #51539 into main
* refs/pull/51539/head:
	doc: users now need to provide scrub_mdsdir and recursive flags
	qa: add recursive flag to test_flag_scrub_mdsdir
	mds: remove code to bypass dumping empty header scrub info
	mds: dump_values no more needed
	mds: enqueue ~mdsdir at the time of enqueing root

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-07-18 20:15:24 +05:30
Rishabh Dave
f0588bd3b3
Merge pull request #50569 from rishabh-d-dave/CephManager-in-CephFSTestCase
qa/cephfs: add helper methods in CephFSTestCase

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-07-13 12:44:22 +05:30
Venky Shankar
a9391e5b29 Merge PR #51959 into main
* refs/pull/51959/head:
	qa: test for session ls with filters
	mds: session ls command appears twice in command listing

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-07-13 11:18:34 +05:30
Venky Shankar
94f3e167a0 Merge PR #51278 into main
* refs/pull/51278/head:
	mgr/snap_schedule: rephrase log message when pruning
	doc: add note about snap-schedule snapshot retention
	qa: test user defined number of snaps retention spec
	mgr/snap_schedule: adapt test to new argument list
	doc/cephfs: Add note how mds_max_snaps_per_dir affects snapshot retention
	mgr/snap_schedule: Use mds_max_snaps_per_dir as snapshot count limit

Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-07-13 11:17:16 +05:30
Yuri Weinstein
6e02660f10
Merge pull request #51275 from mchangir/mon-block-osd-pool-mksnap-for-fs-pools
mon: block osd pool mksnap for fs pools


Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2023-07-07 17:09:07 -04:00
Dhairya Parmar
e40ca408a1 qa: add recursive flag to test_flag_scrub_mdsdir
Code has been changed, in order to scrub ~mdsdir at root,
recursive flag also needs to be provided along with
scrub_mdsdir.

Fixes: https://tracker.ceph.com/issues/59350
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
2023-06-28 18:12:27 +05:30
Rishabh Dave
0a781ef080 qa/cephfs: use run_ceph_cmd() when cmd output is not needed
In filesystem.py and wherever instance of class Filesystem are used, use
run_ceph_cmd() instead of get_ceph_cluster_stdout() when output of Ceph
command is not required.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-28 17:38:19 +05:30
Rishabh Dave
1bf87e6eec qa/cephfs: add helper methods to filesystem.py
Add run_ceph_cmd(), get_ceph_cmd_stdout() and get_ceph_cmd_result() to
class Filesystem so that running Ceph command is easier. This affects
not only methods inside class Filesystem but also methods elsewhere that
uses instance of class Filesystem to run Ceph commands.

Instead of "self.fs.mon_manager.raw_cluster_cmd()" writing
"self.fs.run_ceph_cmd()" will suffice.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-28 17:38:19 +05:30
Rishabh Dave
c7c38ba558 qa/cephfs: when cmd output is not needed call run_ceph_cmd()
instead of get_ceph_cmd_stdout().

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-28 17:38:19 +05:30
Rishabh Dave
13168834e3 qa/cephfs: add and use get_ceph_cmd_stdout()
Add method get_ceph_cmd_stdout() to class CephFSTestCase so that one
doesn't have to type something as long as
"self.mds_cluster.mon_manager.raw_cluster_cmd()" to execute a
command and get its output. And delete and replace
CephFSTestCase.run_cluster_cmd() too.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-28 17:38:19 +05:30
Rishabh Dave
f8f2154e54 qa/cephfs: add and use run_ceph_cmd()
Instead of writing something as long as
"self.mds_cluster.mon_manager.run_cluster_cmd()" to execute a command,
let's add a helper method to class CephFSTestCase and use it instead.

With this, running a command becomes simple - "self.run_ceph_cmd()".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-28 17:38:19 +05:30
Rishabh Dave
82814ac49d qa/cephfs: add and use get_ceph_cmd_result()
To run a command and get its return value, instead of typing something
as long as "self.mds_cluster.mon_manager.raw_cluster_cmd_result" add a
hepler method in CephFSTestCase and use it. This makes this task very
simple - "self.get_ceph_cmd_result()".

Also, remove method CephFSTestCase.run_cluster_cmd_result() in favour of
this new method.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-28 17:38:14 +05:30
Venky Shankar
809d475814 Merge PR #49971 into main
* refs/pull/49971/head:
	doc/cephfs: document MDS_CLIENTS_LAGGY health warning
	qa: ignore warnings
	qa: add test cases to check client eviction if an OSD is laggy
	mds,messages: enable beacon to report clients lagginess
	mds: do not evict client on laggy osds
	common: add new config option to defer client eviction
	osd: add method to check for laggy osds

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-06-28 10:23:54 +05:30
Venky Shankar
f370b581f6 qa: assign file system affinity for replaced MDS
Otherwise, the MDS that just got replaced can transition to a rank
for another file system and the test cannot deterministically infer
which MDS needs to checked.

Fixes: http://tracker.ceph.com/issues/61764
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2023-06-27 09:23:14 +05:30
neeraj pratap singh
36bf907f9e qa: test for session ls with filters
Fixes: https://tracker.ceph.com/issues/61444
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
2023-06-26 17:55:15 +05:30
Rishabh Dave
2e12e5086d qa/cephfs: create admin_remote instance in CephFSTestCase
admin_remote contains lots of methods that can be useful during testing,
so let's have an easy access to it too.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-24 19:33:44 +05:30
Rishabh Dave
0c0041005e qa/cephfs: create CephManager instance in CephFSTestCase
To run a Ceph command conveniently, run_cluster_cmd(), raw_cluster_cmd()
or raw_cluster_cmd_result() must be called. These methods are available
in class CephManager which in turn is available only if an instance of
Filesystem, MDSCluster, CephCluster or MgrCluster is initialized. Having
an instance of CephManager in CephFSTestCase will provide easy access to
these methods.

For example, in CephFS tests writing "self.mon_manager.raw_cluser_cmd()"
instead of writing "self.mds_cluster.mon_manager.raw_cluster()" will
suffice.

This commit provides a basis for upcoming commits in this patch series.
With next patches, running Ceph command will be further simplified. Just
writing self.run_ceph_cmd() will suffice for running a CephFS command.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-24 19:33:36 +05:30
Rishabh Dave
437f2c75f5 qa/cephfs: don't import entire module needlessly
Importing entire module ceph_manager.py is pointless since only
ceph_manager.CephManager is required in qa/tasks/cephfs/filesystem.py.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2023-06-24 19:18:05 +05:30
Milind Changire
a63ffd4079
qa: test user defined number of snaps retention spec
Signed-off-by: Milind Changire <mchangir@redhat.com>
2023-06-20 20:22:33 +05:30
Rishabh Dave
67b1935a18
Merge pull request #51132 from lxbsz/wip-59349
qa: wait for 100 seconds to make sure the quota to be enforced

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
2023-06-16 17:52:00 +05:30
Xiubo Li
6183c992d7
Merge pull request #51703 from lxbsz/wip-59683
xfstests_dev: install extra packages from powertools repo for xfsprogs
2023-06-13 09:44:24 +08:00