Following test are added:
1. Set custom metadata for subvolume snapshot.
2. Set custom metadata for subvolume snapshot(Idempotency).
3. Get custom metadata for specified key.
4. Get custom metadata if specified key not exist (Expecting error ENOENT).
5. Get custom metadata if no any key-value is added means section not exist (Expecting error ENOENT).
6. Update value for existing key in custom metadata.
7. List custom metadata of subvolume snapshot.
8. List custom metadata of subvolume snapshot if no any key-value is added (Expect empty json/dictionary)
9. Remove custom metadata for specified key.
10. Remove custom metadata if specified key not exist (Expecting error ENOENT).
11. Remove custom metadata if no any key-value is added means section not exist (Expecting error ENOENT).
12. Remove custom metadata with --force option.
13. Remove custom metadata with --force option if specified key not exist (Expecting command to succeed because of '--force' option)
14. Remove subvolume snapshot and verify whether metadata for snapshot is removed or not
Fixes: https://tracker.ceph.com/issues/55401
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
whitelist_health.yaml -> ignorelist_health.yaml
whitelist_wrongly_marked_down.yaml -> ignore_wrongly_marked_down.yaml
This was mostly addressed in
2ee9365d0b,
but the rename wasn't done there.
Signed-off-by: Zack Cerza <zack@cerza.org>
These temporary files don't matter for test execution with teuthology
but they do matter for execution with vstart_runner.py since the test
fails if these files exist already. And tests are often run repeatedly
with vstart_runner.py, unlike with teuthology.
Fixes: https://tracker.ceph.com/issues/55719
Signed-off-by: Rishabh Dave <ridave@redhat.com>
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076
Signed-off-by: Laura Flores <lflores@redhat.com>
rgw/qa: enable s3-tests related to cloud-transition feature
Reviewed-by: casey Bodley <cbodley@redhat.com>
Reviewed-by: Maredia, Ali <amaredia@redhat.com>
Run cloudtier tests with parameter 'retain_head_object'
set to true and false.
However having multiple cloudtier storage classes in the same task
is increasing the transition time and resulting in spurious failures.
Hence until there is a consistent way of running the tests, without
having to depend on lc_debug_interval, disabled one of the config for
now.
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
1. Method cluster() in ceph.py creates a dictionary "ctx.ceph", attaches
a namespace to ctx.ceph[cluster_name], create an attribute "fsid" and
stores Ceph cluster's FSID in it.
2. The method kernel_mount.KernelMount._get_debug_dir() uses that "fsid"
attribute to get Ceph cluster's FSID. (The exact that does that is
"fsid = self.ctx.ceph[cluster_name].fsid").
3. Test test_readahead.TestReadahead.test_flush() crashes with
vstart_runner.py because that test eventually calls _get_debug_dir()
and "ctx" in case of vstart_runner.py doesn't hold "ceph" dictionary
or anything similar.
Adding a dictionary, similar to the one added in ceph.py, to
vstart_runner.LocalContext's instances will fix this issue.
Fixes: https://tracker.ceph.com/issues/55694
Signed-off-by: Rishabh Dave <ridave@redhat.com>
DACs are overridable for directories. For files,
Read/write DACs are always overridable but executable
DACs are overridable when there is at least one exec bit
set.
The files and directory DACS overriding were handled the
same way for root which is incorrect. This patch fixes
DACs overriding as described above for the root.
Fixes: https://tracker.ceph.com/issues/55313
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Removing the subvol support exposed a spurious argument to the status
command which was assgned to the 'subvol' parameter but was unused in
this command implementation.
The spurious argument is now removed.
Signed-off-by: Milind Changire <mchangir@redhat.com>
don't rely on the ceph manager task to parse a config file. each rgw
could be using a different config. instead, revert to an s3tests
override called 'with-sse-s3'
this way, the only job that enables sse-s3, vault_transit.yaml, contains
both the 'rgw crypt sse s3' configurables, and the flag to enable the
associated test cases
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Rationale: get and put now demand both the paths mandatorily.
Also testing of get and put without target paths
have been take of in other tests in class TestGetAndPut().
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Result of os.path.join() before "./bin/ceph-mds" and after
"./bin/./ceph-mds".
Before -
2022-05-05 19:36:11,100.100 DEBUG:__main__:> ./bin/./ceph-mds -i a
After -
2022-05-05 19:38:48,179.179 DEBUG:__main__:> ./bin/ceph-mds -i a
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The message regarding deletion of helper tools is printed for every
command. This message should be printed only when applicable.
Besides -
* Move XXX comments to _do_run() since it increases visibility of
these messages.
* Move omission of arguments stuff to new method clear up the clutter.
* And remove shell as a parameter from _perform_checks_and_adjustments
since it's redundant.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
NOTE: Although most of the issues are fixed but a few function
and variable names are unchanged in order to prevent
ambiguity and preserve their meaning.
They are:
- functions: setUp(), test_ls_H_prints_human_readable_file_size(),
- variables: ls_H_output, ls_H_file_size
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
By the introduction of range blocklist, the 'blocklist ls' command outputs
two lists. It's also straightforward to get the blocklisted clients directly
from 'osd dump' to avoid regression.
Fixes: https://tracker.ceph.com/issues/55516
Signed-off-by: Jos Collin <jcollin@redhat.com>
This methods fails to collect return value from
FuseMount._run_mount_cmd() and return it. This leads to a bug for tests
that expect mount command to fail when executed with vstart_runner.py.
Fixes: https://tracker.ceph.com/issues/55553
Signed-off-by: Rishabh Dave <ridave@redhat.com>
In these methods, parameter "sudo" indicates whether or not sudo is set
to True but the same is not indicated to methods underneath. This value
needs to be passed for the parameter to fulfill it's commitment.
Fixes: https://tracker.ceph.com/issues/55557
Signed-off-by: Rishabh Dave <ridave@redhat.com>
And therefore get rid of methods duplicated in LocalRemote and add a
call to empty constructor of RemoteShell in LocalRemote.__init__().
Signed-off-by: Rishabh Dave <ridave@redhat.com>
vstart_runner.py is written assuming that it can run commands with
superuser privileges whenever possible and vstart_runner.py is meant to
be executed without sudo.
So, it's better kill a process using "sudo kill -9 <PID>", instead of
using os.kill() because os.kill() can't kill process launched with
superuser privileges.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
About the commit date: this commit got dropped from the patch series
during some PR branch update but is added back now.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Passing "exec sudo" to "ceph -w" caused "Ceph API test" CI job to fail.
Error was not related to this tracker issue but the code added for it
is reversed now in this commit. The tracker issue -
https://tracker.ceph.com/issues/49644.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
We convert all cmd args to str and pass bash functions along to override
certain arguments in those command arguments. Let's save cmd args
without those bash functions since they can be useful later (for
example, printing cmd args in logs, which is the case in this patch.)
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The intention behind copying these note points is to document the
behaviour of vstart_runner.py inside vstart_runner.py as well so that
developer don't miss it out while working on it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Overridding commands is much better than deleting these commands from
command argument string using Python since, unlike deleting, overridding
doesn't require parsing. A note has been added for this to
vstart_runner.py's module docstring and to Ceph Developer's Guide
document.
Since functions don't work with sh shell, to make overriding work
vstart_runner.py will use bash shell here onwards.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Convert all command arguments to str from list, update checks and
adjustments performed on command arguments accordingly and update
documentation to include warnings about some critical parts of
vstart_runner.py and update tasks.cephfs.mount.MountCephFS.run_shell().
Fixes: https://tracker.ceph.com/issues/47849
Signed-off-by: Rishabh Dave <ridave@redhat.com>
When we set the proxy mode to remove a writeback cache according to
the ceph official documentation an error occurred:
[root@controller-1 root]# ceph osd tier cache-mode cachepool proxy
Invalid command: proxy not in writeback|readproxy|readonly|none
osd tier cache-mode writeback|readproxy|readonly|none [--yes-i-really-mean-it]:
specify the caching mode for cache tier
According to the description of the official website document: since
a writeback cache may have modified data, you must take steps to ensure
that you do not lose any recent changes to objects in the cache before
you disable and remove it. Change the cache mode to proxy so that new and
modified objects will flush to the backing storage pool.
Fixes: https://tracker.ceph.com/issues/54576
Signed-off-by: tan changzhi <544463199@qq.com>
The way the config option was set results in the respective
MDS codepaths to never get exercised.
Fixes: http://tracker.ceph.com/issues/55170
Signed-off-by: Venky Shankar <vshankar@redhat.com>
This commit adds logic to automatically detect when sse-s3 is
available and if not, disables sse-s3 tests by default.
Configuration opions are provided to override the default either way.
Signed-off-by: Marcus Watts <mwatts@redhat.com>
run_shell() in qa.tasks.cephfs.mount.CephFSMount prepends "sudo" to its
command arguments but it doesn't specify to the underlying method that
"sudo" shouldn't be deleted from the command arguments.
Fixes: https://tracker.ceph.com/issues/53601
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Use LocalContext instance to set LocalCephManager.cluster.
Fixes: https://tracker.ceph.com/issues/53601
Signed-off-by: Rishabh Dave <ridave@redhat.com>
In xfstests-dev, "./check generic/abcd" doesn't end in error even when
there is no test abcd in generic. It's better to check the stdout to
verify success and print the returncode, stdout and stderr of the
command in logs so that such error can be found out by reading logs.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Print return code, stdout and stderr for the command that launches
xfstests-dev tests against CephFS when it fails.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
StringIO can be operated on without extra hassles of converting. So
replace BytesIO by StringIO in test_acls.py and xfstests_dev.py
Signed-off-by: Rishabh Dave <ridave@redhat.com>
When running "sudo ./check generic/099" in test_acls.py's test method
named test_acls(), set omit_sudo to False because without it
vstart_runner.py will remove "sudo" from command arguments and so the
command will fail unnecessarily.
Fixes: https://tracker.ceph.com/issues/55374
Signed-off-by: Rishabh Dave <ridave@redhat.com>
15 minutes is unnecessarily large as a default value for timeout for a
command. Not having to wait unnecessarily on a crash of a command will
reduce teuthology's testing queue and will save individual developer's
time while running tests locally.
Whatever lines are modified for this purpose are also modified to follow
the stlye guideline, specfically wrapping at 80 characters.
Fixes: https://tracker.ceph.com/issues/54236
Signed-off-by: Rishabh Dave <ridave@redhat.com>
os/bluestore: Add CoDel to BlueStore for Bufferbloat mitigation
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
Right now cephfs-shell on receiving unrecognized command prints an
appropriate message on stderr but the return value is zero. This is a
serious problem for users as well as for tests. It must exit with
non-zero return value.
The return value chosen for this case is 127, same as bash.
Changes in test_cephfs_shell.py, besides addition of TestGeneric, are
tests that are buggy and the bug now changes the test's behaviour since
the cephfs-shell bug has now been fixed.
Fixes: https://tracker.ceph.com/issues/55399
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Exclamation mark is a special character for bash as well as
cephfs-shell. For bash, it substitutes current command with matching
command from command history and for cephfs-shell it runs the command
as OS-level command and not inside the cephfs-shell.
And evey command executed in tests (say "ls") is run by passing it as a
parameter to cephfs-shell command (that is "cephfs-shell -c <conf> --
ls"). So, exclamation mark, when used in tests, is consumed by bash
instead of cephfs-shell.
To avoid these complications it's best (and even simpler!) to issue the
command meant for bash on bash without going through cephfs-shell.
Fixes: https://tracker.ceph.com/issues/55394
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This is one test case for the possible kernel crash bug when doing
the file sync or filesystem sync.
Fixes: https://tracker.ceph.com/issues/55329
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This will test the file sync of a directory, which maybe stuck for
at most 5 seconds. This was because the related code will wait for
all the unsafe requests to get safe reply from MDSes, but the MDSes
just think that it's unnecessary to flush the mdlog immediately
after early reply, and the mdlog will be flushed every 5 seconds
in the tick thread.
This should have been fixed in kclient and libcephfs by triggering
mdlog flush before waiting requests' safe reply.
Fixes: https://tracker.ceph.com/issues/55283
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This will test the sync of the filesystem, which maybe stuck for
at most 5 seconds. This was because the related code will wait
for all the unsafe requests to get safe reply from MDSes, but the
MDSes just think that it's unnecessary to flush the mdlog immediately
after early reply, and the mdlog will be flushed every 5 seconds
in the tick thread.
This should have been fixed in kclient and libcephfs by triggering
mdlog flush before waiting requests' safe reply.
Fixes: https://tracker.ceph.com/issues/55283
Signed-off-by: Xiubo Li <xiubli@redhat.com>