About the commit date: this commit got dropped from the patch series
during some PR branch update but is added back now.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Passing "exec sudo" to "ceph -w" caused "Ceph API test" CI job to fail.
Error was not related to this tracker issue but the code added for it
is reversed now in this commit. The tracker issue -
https://tracker.ceph.com/issues/49644.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
We convert all cmd args to str and pass bash functions along to override
certain arguments in those command arguments. Let's save cmd args
without those bash functions since they can be useful later (for
example, printing cmd args in logs, which is the case in this patch.)
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The intention behind copying these note points is to document the
behaviour of vstart_runner.py inside vstart_runner.py as well so that
developer don't miss it out while working on it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Overridding commands is much better than deleting these commands from
command argument string using Python since, unlike deleting, overridding
doesn't require parsing. A note has been added for this to
vstart_runner.py's module docstring and to Ceph Developer's Guide
document.
Since functions don't work with sh shell, to make overriding work
vstart_runner.py will use bash shell here onwards.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Convert all command arguments to str from list, update checks and
adjustments performed on command arguments accordingly and update
documentation to include warnings about some critical parts of
vstart_runner.py and update tasks.cephfs.mount.MountCephFS.run_shell().
Fixes: https://tracker.ceph.com/issues/47849
Signed-off-by: Rishabh Dave <ridave@redhat.com>
When we set the proxy mode to remove a writeback cache according to
the ceph official documentation an error occurred:
[root@controller-1 root]# ceph osd tier cache-mode cachepool proxy
Invalid command: proxy not in writeback|readproxy|readonly|none
osd tier cache-mode writeback|readproxy|readonly|none [--yes-i-really-mean-it]:
specify the caching mode for cache tier
According to the description of the official website document: since
a writeback cache may have modified data, you must take steps to ensure
that you do not lose any recent changes to objects in the cache before
you disable and remove it. Change the cache mode to proxy so that new and
modified objects will flush to the backing storage pool.
Fixes: https://tracker.ceph.com/issues/54576
Signed-off-by: tan changzhi <544463199@qq.com>
The way the config option was set results in the respective
MDS codepaths to never get exercised.
Fixes: http://tracker.ceph.com/issues/55170
Signed-off-by: Venky Shankar <vshankar@redhat.com>
This commit adds logic to automatically detect when sse-s3 is
available and if not, disables sse-s3 tests by default.
Configuration opions are provided to override the default either way.
Signed-off-by: Marcus Watts <mwatts@redhat.com>
run_shell() in qa.tasks.cephfs.mount.CephFSMount prepends "sudo" to its
command arguments but it doesn't specify to the underlying method that
"sudo" shouldn't be deleted from the command arguments.
Fixes: https://tracker.ceph.com/issues/53601
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Use LocalContext instance to set LocalCephManager.cluster.
Fixes: https://tracker.ceph.com/issues/53601
Signed-off-by: Rishabh Dave <ridave@redhat.com>
In xfstests-dev, "./check generic/abcd" doesn't end in error even when
there is no test abcd in generic. It's better to check the stdout to
verify success and print the returncode, stdout and stderr of the
command in logs so that such error can be found out by reading logs.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Print return code, stdout and stderr for the command that launches
xfstests-dev tests against CephFS when it fails.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
StringIO can be operated on without extra hassles of converting. So
replace BytesIO by StringIO in test_acls.py and xfstests_dev.py
Signed-off-by: Rishabh Dave <ridave@redhat.com>
When running "sudo ./check generic/099" in test_acls.py's test method
named test_acls(), set omit_sudo to False because without it
vstart_runner.py will remove "sudo" from command arguments and so the
command will fail unnecessarily.
Fixes: https://tracker.ceph.com/issues/55374
Signed-off-by: Rishabh Dave <ridave@redhat.com>
15 minutes is unnecessarily large as a default value for timeout for a
command. Not having to wait unnecessarily on a crash of a command will
reduce teuthology's testing queue and will save individual developer's
time while running tests locally.
Whatever lines are modified for this purpose are also modified to follow
the stlye guideline, specfically wrapping at 80 characters.
Fixes: https://tracker.ceph.com/issues/54236
Signed-off-by: Rishabh Dave <ridave@redhat.com>
os/bluestore: Add CoDel to BlueStore for Bufferbloat mitigation
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
Right now cephfs-shell on receiving unrecognized command prints an
appropriate message on stderr but the return value is zero. This is a
serious problem for users as well as for tests. It must exit with
non-zero return value.
The return value chosen for this case is 127, same as bash.
Changes in test_cephfs_shell.py, besides addition of TestGeneric, are
tests that are buggy and the bug now changes the test's behaviour since
the cephfs-shell bug has now been fixed.
Fixes: https://tracker.ceph.com/issues/55399
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Exclamation mark is a special character for bash as well as
cephfs-shell. For bash, it substitutes current command with matching
command from command history and for cephfs-shell it runs the command
as OS-level command and not inside the cephfs-shell.
And evey command executed in tests (say "ls") is run by passing it as a
parameter to cephfs-shell command (that is "cephfs-shell -c <conf> --
ls"). So, exclamation mark, when used in tests, is consumed by bash
instead of cephfs-shell.
To avoid these complications it's best (and even simpler!) to issue the
command meant for bash on bash without going through cephfs-shell.
Fixes: https://tracker.ceph.com/issues/55394
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This is one test case for the possible kernel crash bug when doing
the file sync or filesystem sync.
Fixes: https://tracker.ceph.com/issues/55329
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This will test the file sync of a directory, which maybe stuck for
at most 5 seconds. This was because the related code will wait for
all the unsafe requests to get safe reply from MDSes, but the MDSes
just think that it's unnecessary to flush the mdlog immediately
after early reply, and the mdlog will be flushed every 5 seconds
in the tick thread.
This should have been fixed in kclient and libcephfs by triggering
mdlog flush before waiting requests' safe reply.
Fixes: https://tracker.ceph.com/issues/55283
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This will test the sync of the filesystem, which maybe stuck for
at most 5 seconds. This was because the related code will wait
for all the unsafe requests to get safe reply from MDSes, but the
MDSes just think that it's unnecessary to flush the mdlog immediately
after early reply, and the mdlog will be flushed every 5 seconds
in the tick thread.
This should have been fixed in kclient and libcephfs by triggering
mdlog flush before waiting requests' safe reply.
Fixes: https://tracker.ceph.com/issues/55283
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Due to lack of Windows support in the Teuthology, the test case adopts
the following workaround:
* Deploy baremetal machine with `ubuntu_latest.yaml` and
configure it with libvirt KVM.
* Create a libvirt VM and provision it with Windows Server 2019, using
the official ISO from Microsoft.
* Configure SSH in the Windows VM, and run the tests remotely via SSH.
The implementation of the test case consists of workunit scripts.
`qa/workunits/windows/test_rbd_wnbd.py` is the main Python script
to test Ceph on Windows basic functionality. This is executed in the
libvirt VM configured with Windows Server 2019.
Co-authored-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Co-authored-by: Daniel Vincze <dvincze@cloudbasesolutions.com>
Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
Otherwise the tests may run forever. This was already done for
mds upgrade sequence, justadding it in the other two places here
Related to: https://tracker.ceph.com/issues/53939
Signed-off-by: Adam King <adking@redhat.com>
- Avoid crashing when a call out to an external program expectedly does
not return exit status zero.
There are programs that communicate other information than error/no
error through exit status. E.g. `systemctl status` will return different
exit codes depending on the actual status of the units in question.
In cases where this is expected crashing with a RuntimeError exception
is inappropriate and should be avoided.
Fixes: https://tracker.ceph.com/issues/55117
Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
These tests are supposed to be validating we don't accept invalid IPs,
but they left out the "add" subcommand so they're all failing on that!
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Check that reshard prunes olh entries with empty name.
Fixes: https://tracker.ceph.com/issues/54500
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
Following test are added:
1. Set custom metadata for subvolume.
2. Set custom metadata for subvolume(Idempotency).
3. Get custom metadata for specified key.
4. Get custom metadata if specified key not exist (Expecting error ENOENT).
5. Get custom metadata if no any key-value is added means section not exist (Expecting error ENOENT).
6. Update value for existing key in custom metadata.
7. List custom metadata of subvolume.
8. List custom metadata of subvolume if no any key-value is added (Expect empty json/dictionary)
9. Remove custom metadata for specified key.
10. Remove custom metadata if specified key not exist (Expecting error ENOENT).
11. Remove custom metadata if no any key-value is added means section not exist (Expecting error ENOENT).
12. Remove custom metadata with --force option.
13. Remove custom metadata with --force option if specified key not exist (Expecting command to succeed because of '--force' option)
14. Set and Get custom metadata for legacy subvolume.
15. List and Remove custom metadata from legacy subvolume.
Fixes: https://tracker.ceph.com/issues/54472
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
This commit removes orchestrator commands from the
Rook task and the Rook test suite because the Rook
orchestrator is not being maintained, and the Rook
orchestrator CLI is obsolete. This should also
clarify the issue:
https://tracker.ceph.com/issues/53680
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
The subvolume deletion of a subvolume which is already deleted
with retain snapshots option fails with 'EAGAIN: clone in progress'
error. After subvolume deletion with retain snapshots, the subvolume
exists until the trash directory (resides inside subvolume) is
cleaned up. The subvolume deletion issued while the trash directory is not empty, should
pass. This patch fixes the same.
Credit: Issue discovery and fix suggestion to John Mulligan <jmulligan@redhat.com>
Fixes: https://tracker.ceph.com/issues/54625
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Modify test_activate_osd() to get the type of scheduler in use and then
verify the value of osd_max_backfills. This is because mclock scheduler
overrides this option to 1000 upon OSD initialization.
The test earlier used to pass because the OSD daemon was killed but not
marked down and upon being brought up, the wait for OSD up check was
passing quickly. But the OSD still didn't have the latest config values.
But now upon killing the OSD, the osd_fast_shutdown sequence notifies the
mon (see PR: https://github.com/ceph/ceph/pull/44807) and is marked down
and dead. Upon bringing it up, the wait for OSD up check takes a longer
time and this is sufficient for the config values to be updated. This
results in the correct values being read from the config 'Values' map.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Uses the client's global id to get the metrics, instead of using the index.
This ensures that test_perf_stats_stale_metrics checks only the clients mounted for
the tests.
Fixes: https://tracker.ceph.com/issues/54971
Signed-off-by: Jos Collin <jcollin@redhat.com>
The default pids-limit (docker 4096/podman 2048) prevent some
customization from working (http threads on RGW) or limits the number
of luns per iscsi target.
Fixes: https://tracker.ceph.com/issues/52898
Signed-off-by: Teoman ONAY <tonay@redhat.com>
Commit 08df6e0fd0 ("qa/workunits/rbd: expand LevelSpec parsing
coverage") didn't account for images with a separate data pool. This
was missed because of small-cache-pool.yaml breakage.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add test to verify that the NFS servers don't restart when the
access type of a CephFS NFS export is updated.
And check the NFS servers are restarted when the pseudo path of
a CephFS NFS export is updated.
Signed-off-by: Ramana Raja <rraja@redhat.com>
Add the snaptrim duration to the json formatted output of the pg dump
stats. Define methods for a PG to set the snaptrim begin time and then to
calculate the total time spent to trim all the objects for the snaps in
the snap_trimq for the PG.
Tests:
- Librados C and C++ API tests to verify the time spent for a snaptrim
operation on a PG. These tests use the self-managed snaps APIs.
- Standalone tests to verify snaptrim duration using rados pool snaps.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Add a new column, OBJECTS_TRIMMED, to the pg dump stats that shows the
number of objects trimmed when a snap is removed.
When a pg splits, the stats from the parent pg is copied to the child
pg. In such a case, reset objects_trimmed to 0 for the child pg
(see PeeringState::split_into()). Otherwise, this will result in incorrect
stats to be shown for a child pg after the split operation.
Tests:
- Librados C and C++ API tests to verify the number of objects trimmed
during snaptrim operation. These tests use the self-managed snaps APIs.
- Standalone tests to verify objects trimmed using rados pool snaps.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
mgr/rbd_support: cast pool_id from int to str when collecting LevelSpec
Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Invoke "rbd mirror snapshot schedule ls -R" and "rbd mirror snapshot
schedule status" commands on all levels, consistently. In particular,
make sure that an image level schedule is listed for a recursive query
at the pool level both before and after the schedule kicks in:
$ rbd create --size 1G --mirror-image-mode snapshot -p foo bar
$ rbd mirror snapshot schedule add -p foo --image bar 1m
$ rbd mirror snapshot schedule ls -p foo -R
POOL NAMESPACE IMAGE SCHEDULE
foo bar every 1m
<wait for schedule to become visible in status>
$ rbd mirror snapshot schedule ls -p foo -R
POOL NAMESPACE IMAGE SCHEDULE
foo bar every 1m
Also, make sure that pool and image level status queries work:
$ rbd mirror snapshot schedule status -p foo
SCHEDULE TIME IMAGE
2022-03-04 07:14:00 foo/bar
$ rbd mirror snapshot schedule status -p foo --image bar
SCHEDULE TIME IMAGE
2022-03-04 07:14:00 foo/bar
Both of these issues are fixed by the previous commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Test the commands:
`osd pool create` <pool> --target_size_ratio <float>
`osd pool set` <pool> target_size_ratio <float>
`osd pool get` <pool> target_size_ratio
Signed-off-by: Kamoltat <ksirivad@redhat.com>
Fix the expected log message to match the scrub code, by removing
the redundant part.
Fixes: https://tracker.ceph.com/issues/54458
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Added mds daemons so that it can create
cephFS pools and set options using
`do_set_pool()` in FSCommand.cc. Such that
we can cover corner cases like that in
https://tracker.ceph.com/issues/54263
Signed-off-by: Kamoltat <ksirivad@redhat.com>
mon/ConfigMonitor: fix config get key with whitespace
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
A new job that doesn't want ms_mode to be set underneath it is about to
be added. Rename rxbounce to ms_modeless to make this purpose obvious.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
That `ceph fs perf stats` doesn't output stale metrics
after the rank0 MDS failover.
Fixes: https://tracker.ceph.com/issues/50033
Signed-off-by: Jos Collin <jcollin@redhat.com>
It's more appropriate to use --subset to reduce the scheduling size. It
was previously laid out this way because we wanted to link to the common
`qa/cephfs/mount` directory so that ceph-fuse mounts are not needlessly
multiplied. We should just organize it correctly so that is not an
issue.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
These seem to be failing sometimes but in my testing
sometimes these events are happening a few seconds after
we hit the timeout. Trying to see if this makes the tests
more consistent. No need to mark the test as failed
if we report something up in 34 seconds vs 25 especially
when cephadm works on a cyclic daemon refresh.
Signed-off-by: Adam King <adking@redhat.com>
* refs/pull/42000/head:
qa: update rhel kclient to setup container tools
qa: stop overriding distro for k-testing
qa: only use RHEL for workload testing
qa: convert fs:workload to use cephadm
qa: split fs begin task
qa/tasks/cephadm: setup CephManager when OSDs are provisioned
qa/tasks/cephadm: setup file system if MDS are provisioned
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Accessing one single dentry could be fastened by set this option to
false, when dir is not in the memory.
Signed-off-by: "Shen, Hang" <shenhang@kuaishou.com>
Use the override in ./src/qa/distros/container-hosts/ubuntu_20.04.yaml
in order to use hwe kernel for Ubuntu 20.04
This is because the ubuntu 20.04 kernel (5.4) has a bug that prevents
from using nvme-loop.
see https://lkml.org/lkml/2020/9/21/1456
Fixes: https://tracker.ceph.com/issues/54094
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).
For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet. The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.
The total number of jobs remains the same.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The progress module disabled the pg recovery event by default
since the event is expensive and has interrupted other serviceis
when there is OSDs being marked in/out from the the cluster.
To turn the event on manually:
ceph config set mgr mgr/progress/allow_pg_recovery_event true
Updated qa/tasks/mgr/test_progress.py to enable
the pg recovery event when testing the progress module.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
So links can be elsewhere in the qa suite (not used yet) and to simplify
a find command in a follow-up commit.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is a continuation of previous commit
qa: only use RHEL for workload testing
We don't want to test fs:workload with centos/ubuntu to avoid packaging
issues and to reduce the matrix of distros we're running workloads on.
Also, the testing kernel should install fine on the distros we test with
"supported" random distros.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
It's not useful testing workloads with different distributions; it just
adds to the maintenance burden of this qa suite as distro upgrades often
break compilation of various tests.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Note: it's important to keep the install task which supplies packages
needed for some workloads.
Fixes: https://tracker.ceph.com/issues/51333
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>