* refs/pull/52547/head:
qa: add test cases for vanilla ops commands
mds: dump locks when printing mutation ops
common/TrackedOp: support overriding the _dump method
mds: remove op field obsoleted by more usable "reqid"
mds: dump metareq_t instead of full op
mds: add lock type to formatter dump of SimpleLock
mds: mark print methods const
mds: drop MDRequestImpl::msg_lock
mds: lock TrackedOp when dumping
mds: avoid recursive locks dumping state
common/TrackedOp: fix race updating description with proper lock
common/Formatter: add support for dumping null
common/Formatter: refactor generating xml name
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
This relies on the new stdin-killer [1] teuthology helper that allows
interacting with the command's stdin.
[1] https://github.com/ceph/teuthology/pull/1846
Fixes: 8bb77ed9e1
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This change causes the program to exit gracefully when stdin is closed
rather than with a Python exception.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
When the ESubtreeMap is very large (~5k+ subtrees), the MDS will
end up logging only a few events (as bad as 1) per segment as the
subtree map dominates the segment size.
This test simply creates an artificially large subtree and confirms that
other file system activity completes in a timely manner. This is now
taking advantage of the minor segments which allows for a normal set of
events per log segment (and fewer subtree maps). The test fails on the
current main HEAD.
Historical note: when I first observed this abberant behavior, the
vstart cluster was actually using mds_debug_subtrees = True (the default
for every vstart cluster). This caused the MDS to write out the subtree
map (for debugging reasons) with every event. When testing the MDS with
large subtrees (distributed ephemeral pinning), this caused the MDS to
slow to a trickle of operations per second. Despite this unintentional
misconfiguration, the problem still exists but the number of auth
subtrees must be large for a particlar rank to replicate the behavior.
On main HEAD, the creation of 10k files (workload stage) takes ~110
seconds. On this branch, it takes ~30 seconds.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/51539/head:
doc: users now need to provide scrub_mdsdir and recursive flags
qa: add recursive flag to test_flag_scrub_mdsdir
mds: remove code to bypass dumping empty header scrub info
mds: dump_values no more needed
mds: enqueue ~mdsdir at the time of enqueing root
Reviewed-by: Venky Shankar <vshankar@redhat.com>
* refs/pull/51959/head:
qa: test for session ls with filters
mds: session ls command appears twice in command listing
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
* refs/pull/51278/head:
mgr/snap_schedule: rephrase log message when pruning
doc: add note about snap-schedule snapshot retention
qa: test user defined number of snaps retention spec
mgr/snap_schedule: adapt test to new argument list
doc/cephfs: Add note how mds_max_snaps_per_dir affects snapshot retention
mgr/snap_schedule: Use mds_max_snaps_per_dir as snapshot count limit
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Code has been changed, in order to scrub ~mdsdir at root,
recursive flag also needs to be provided along with
scrub_mdsdir.
Fixes: https://tracker.ceph.com/issues/59350
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
In filesystem.py and wherever instance of class Filesystem are used, use
run_ceph_cmd() instead of get_ceph_cluster_stdout() when output of Ceph
command is not required.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add run_ceph_cmd(), get_ceph_cmd_stdout() and get_ceph_cmd_result() to
class Filesystem so that running Ceph command is easier. This affects
not only methods inside class Filesystem but also methods elsewhere that
uses instance of class Filesystem to run Ceph commands.
Instead of "self.fs.mon_manager.raw_cluster_cmd()" writing
"self.fs.run_ceph_cmd()" will suffice.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add method get_ceph_cmd_stdout() to class CephFSTestCase so that one
doesn't have to type something as long as
"self.mds_cluster.mon_manager.raw_cluster_cmd()" to execute a
command and get its output. And delete and replace
CephFSTestCase.run_cluster_cmd() too.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Instead of writing something as long as
"self.mds_cluster.mon_manager.run_cluster_cmd()" to execute a command,
let's add a helper method to class CephFSTestCase and use it instead.
With this, running a command becomes simple - "self.run_ceph_cmd()".
Signed-off-by: Rishabh Dave <ridave@redhat.com>
To run a command and get its return value, instead of typing something
as long as "self.mds_cluster.mon_manager.raw_cluster_cmd_result" add a
hepler method in CephFSTestCase and use it. This makes this task very
simple - "self.get_ceph_cmd_result()".
Also, remove method CephFSTestCase.run_cluster_cmd_result() in favour of
this new method.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/49971/head:
doc/cephfs: document MDS_CLIENTS_LAGGY health warning
qa: ignore warnings
qa: add test cases to check client eviction if an OSD is laggy
mds,messages: enable beacon to report clients lagginess
mds: do not evict client on laggy osds
common: add new config option to defer client eviction
osd: add method to check for laggy osds
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Otherwise, the MDS that just got replaced can transition to a rank
for another file system and the test cannot deterministically infer
which MDS needs to checked.
Fixes: http://tracker.ceph.com/issues/61764
Signed-off-by: Venky Shankar <vshankar@redhat.com>
admin_remote contains lots of methods that can be useful during testing,
so let's have an easy access to it too.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
To run a Ceph command conveniently, run_cluster_cmd(), raw_cluster_cmd()
or raw_cluster_cmd_result() must be called. These methods are available
in class CephManager which in turn is available only if an instance of
Filesystem, MDSCluster, CephCluster or MgrCluster is initialized. Having
an instance of CephManager in CephFSTestCase will provide easy access to
these methods.
For example, in CephFS tests writing "self.mon_manager.raw_cluser_cmd()"
instead of writing "self.mds_cluster.mon_manager.raw_cluster()" will
suffice.
This commit provides a basis for upcoming commits in this patch series.
With next patches, running Ceph command will be further simplified. Just
writing self.run_ceph_cmd() will suffice for running a CephFS command.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Importing entire module ceph_manager.py is pointless since only
ceph_manager.CephManager is required in qa/tasks/cephfs/filesystem.py.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
qa: wait for 100 seconds to make sure the quota to be enforced
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
Centos Stream 8 has removed the 'device-mapper-devel', 'libedit-devel'
and 'userspace-rcu-devel' packages from the mirrors and we need to
install it from powertools repo.
Fixes: https://tracker.ceph.com/issues/59683
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Otherwise suspending the netns of the other mount will prevent it from
completing a flush on the file handle or even telling the MDS that the
file size has changed!
Fixes: https://tracker.ceph.com/issues/61409
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/50875/head:
mon/MDSMonitor: ignore extraneous up:boot messages
qa: add test case for mds sending multiple boot messages
qa: support checking for a log message that should not exist
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
* refs/pull/49691/head:
qa: add test for opening a file via a hard link that is not in the same mds as the inode
mds: rdlock_path_xlock_dentry supports returning auth target inode
Reviewed-by: Venky Shankar <vshankar@redhat.com>
* refs/pull/51251/head:
PendingReleaseNotes: add a note about deleting files from lost+found directory
qa: add checks that validate removal of entries from lost+found dir
mds: allow unlink operation under lost+found directory
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Running file system scrub is recommended post running filesystem
data and metadata recovery. Running scrub isn't covered in tests.
Fixes: http://tracker.ceph.com/issues/59527
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>