15 minutes is unnecessarily large as a default value for timeout for a
command. Not having to wait unnecessarily on a crash of a command will
reduce teuthology's testing queue and will save individual developer's
time while running tests locally.
Whatever lines are modified for this purpose are also modified to follow
the stlye guideline, specfically wrapping at 80 characters.
Fixes: https://tracker.ceph.com/issues/54236
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Right now cephfs-shell on receiving unrecognized command prints an
appropriate message on stderr but the return value is zero. This is a
serious problem for users as well as for tests. It must exit with
non-zero return value.
The return value chosen for this case is 127, same as bash.
Changes in test_cephfs_shell.py, besides addition of TestGeneric, are
tests that are buggy and the bug now changes the test's behaviour since
the cephfs-shell bug has now been fixed.
Fixes: https://tracker.ceph.com/issues/55399
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Exclamation mark is a special character for bash as well as
cephfs-shell. For bash, it substitutes current command with matching
command from command history and for cephfs-shell it runs the command
as OS-level command and not inside the cephfs-shell.
And evey command executed in tests (say "ls") is run by passing it as a
parameter to cephfs-shell command (that is "cephfs-shell -c <conf> --
ls"). So, exclamation mark, when used in tests, is consumed by bash
instead of cephfs-shell.
To avoid these complications it's best (and even simpler!) to issue the
command meant for bash on bash without going through cephfs-shell.
Fixes: https://tracker.ceph.com/issues/55394
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This is one test case for the possible kernel crash bug when doing
the file sync or filesystem sync.
Fixes: https://tracker.ceph.com/issues/55329
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This will test the file sync of a directory, which maybe stuck for
at most 5 seconds. This was because the related code will wait for
all the unsafe requests to get safe reply from MDSes, but the MDSes
just think that it's unnecessary to flush the mdlog immediately
after early reply, and the mdlog will be flushed every 5 seconds
in the tick thread.
This should have been fixed in kclient and libcephfs by triggering
mdlog flush before waiting requests' safe reply.
Fixes: https://tracker.ceph.com/issues/55283
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This will test the sync of the filesystem, which maybe stuck for
at most 5 seconds. This was because the related code will wait
for all the unsafe requests to get safe reply from MDSes, but the
MDSes just think that it's unnecessary to flush the mdlog immediately
after early reply, and the mdlog will be flushed every 5 seconds
in the tick thread.
This should have been fixed in kclient and libcephfs by triggering
mdlog flush before waiting requests' safe reply.
Fixes: https://tracker.ceph.com/issues/55283
Signed-off-by: Xiubo Li <xiubli@redhat.com>
qa/workunits/fs/misc/subvolume.sh is getting in the way of fs:workload
testing with subvolumes. Hence moved this script to a python test.
Signed-off-by: Milind Changire <mchangir@redhat.com>
Following test are added:
1. Set custom metadata for subvolume.
2. Set custom metadata for subvolume(Idempotency).
3. Get custom metadata for specified key.
4. Get custom metadata if specified key not exist (Expecting error ENOENT).
5. Get custom metadata if no any key-value is added means section not exist (Expecting error ENOENT).
6. Update value for existing key in custom metadata.
7. List custom metadata of subvolume.
8. List custom metadata of subvolume if no any key-value is added (Expect empty json/dictionary)
9. Remove custom metadata for specified key.
10. Remove custom metadata if specified key not exist (Expecting error ENOENT).
11. Remove custom metadata if no any key-value is added means section not exist (Expecting error ENOENT).
12. Remove custom metadata with --force option.
13. Remove custom metadata with --force option if specified key not exist (Expecting command to succeed because of '--force' option)
14. Set and Get custom metadata for legacy subvolume.
15. List and Remove custom metadata from legacy subvolume.
Fixes: https://tracker.ceph.com/issues/54472
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
This commit removes orchestrator commands from the
Rook task and the Rook test suite because the Rook
orchestrator is not being maintained, and the Rook
orchestrator CLI is obsolete. This should also
clarify the issue:
https://tracker.ceph.com/issues/53680
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
The subvolume deletion of a subvolume which is already deleted
with retain snapshots option fails with 'EAGAIN: clone in progress'
error. After subvolume deletion with retain snapshots, the subvolume
exists until the trash directory (resides inside subvolume) is
cleaned up. The subvolume deletion issued while the trash directory is not empty, should
pass. This patch fixes the same.
Credit: Issue discovery and fix suggestion to John Mulligan <jmulligan@redhat.com>
Fixes: https://tracker.ceph.com/issues/54625
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Uses the client's global id to get the metrics, instead of using the index.
This ensures that test_perf_stats_stale_metrics checks only the clients mounted for
the tests.
Fixes: https://tracker.ceph.com/issues/54971
Signed-off-by: Jos Collin <jcollin@redhat.com>
Add test to verify that the NFS servers don't restart when the
access type of a CephFS NFS export is updated.
And check the NFS servers are restarted when the pseudo path of
a CephFS NFS export is updated.
Signed-off-by: Ramana Raja <rraja@redhat.com>
That `ceph fs perf stats` doesn't output stale metrics
after the rank0 MDS failover.
Fixes: https://tracker.ceph.com/issues/50033
Signed-off-by: Jos Collin <jcollin@redhat.com>
These seem to be failing sometimes but in my testing
sometimes these events are happening a few seconds after
we hit the timeout. Trying to see if this makes the tests
more consistent. No need to mark the test as failed
if we report something up in 34 seconds vs 25 especially
when cephadm works on a cyclic daemon refresh.
Signed-off-by: Adam King <adking@redhat.com>
* refs/pull/42000/head:
qa: update rhel kclient to setup container tools
qa: stop overriding distro for k-testing
qa: only use RHEL for workload testing
qa: convert fs:workload to use cephadm
qa: split fs begin task
qa/tasks/cephadm: setup CephManager when OSDs are provisioned
qa/tasks/cephadm: setup file system if MDS are provisioned
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
The progress module disabled the pg recovery event by default
since the event is expensive and has interrupted other serviceis
when there is OSDs being marked in/out from the the cluster.
To turn the event on manually:
ceph config set mgr mgr/progress/allow_pg_recovery_event true
Updated qa/tasks/mgr/test_progress.py to enable
the pg recovery event when testing the progress module.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
The Filesystem object may use this when configuring EC data pools at
file system creation (via a FuseMount).
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
On rhel/centos the ceph user does not have permission
to access these certs which leads to s3-test failures
in teuthology.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
This PR adds some visual hints for osds that are near full or full
Fixes: https://tracker.ceph.com/issues/53334
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
The following command:
```
echo /dev/sda | tee /sys/kernel/config/nvmet/subsystems/sda/namespaces/1/device_path
```
makes nvme_loop fail because fascinatingly, it adds an unexpected newline.
See:
```
/dev/sda
/dev/sda
1
tee: /sys/kernel/config/nvmet/subsystems/sda/namespaces/1/enable: No such file or directory
/dev/sda
1
```
Other distros don't have the same behavior:
```
CentOS 8
/dev/sda
/dev/sda
1
Ubuntu 20.04
/dev/sda
/dev/sda
1
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The `fs volume rename` command renames the volume, i.e.,
orchestrator MDS service, file system, and the data and
metadata pool of the file system.
Fixes: https://tracker.ceph.com/issues/51162
Signed-off-by: Ramana Raja <rraja@redhat.com>
rgw: Add rgw rate limiting per user and per bucket
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Adding a sleep after running ./kafka-server-stop.sh and ./zookeeper-server-stop.sh
scripts so that nothing gets logged into the kafka logs after the sleep time.
And finally killing the process.
This resolves: https://tracker.ceph.com/issues/53220
Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>
Filestore will be deprecated in Quincy, considering
that BlueStore has been the default objectstore for
quite some time.
Fixes: https://tracker.ceph.com/issues/49275
Signed-off-by: Prashant D <pdhange@redhat.com>
* refs/pull/44342/head:
mds: trigger stray reintegration when loading dentry
qa: test that scrub causes reintegration
Reviewed-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/44322/head:
mds: skip directory size checks for reintegration
qa: test reintegration with directory limits
Reviewed-by: Xiubo Li <xiubli@redhat.com>
1) Write more data to the pool so we operate with larger ratios.
2) Round up ratios when truncating.
Fixes: https://tracker.ceph.com/issues/53677
Signed-off-by: Mykola Golub <mgolub@suse.com>
This commit adds testing for the drive_group_loop in the Rook orchestrator
that reapplies drive groups that were applied previously.
This test removes an OSD, zaps the underlying device then waits for the OSD
to be re-created by the drive_group_loop.
This commit also updates the rook test suite to test v1.7.2 instead of 1.7.0
since `orch device zap` is only supported from v1.7.2 onwards.
Fixes: https://tracker.ceph.com/issues/53501
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit is a workaround of a bug in the virtio interface in qemu 6.1.0+.
Fixes: https://tracker.ceph.com/issues/53587
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Adds a get_name() method to rgw::sal::Store, by which each store
returns its unique name in lowercase.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Otherwise, certain upgrade tests fail which install pacific
or earlier releases since the mount helper does not understand
this mount option, thereby passing it to the kernel which would
does not handle this config causing mount to fail in tests.
Note that this mount config is only used during teuthology tests
to catch v2-style syntax implementation bugs in the kernel.
Fixes: http://tracker.ceph.com/issues/53487
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Validate the symlink xattr is stored on first data
object of the symlink along with backtrace if
'mds_symlink_recovery' option is enabled and
vice-versa.
Also add 'string_wrapper' class to decode bufferlist
to string. This helps 'ceph-dencoder' tool to decode
the symlink target stored, which is used in tests to
validate.
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Fixes: https://tracker.ceph.com/issues/46166
Validates that the 'cephfs-data-scan' tool recovers
symlink during disaster recovery of metadata pool
from data pool correctly as symlink
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Fixes: https://tracker.ceph.com/issues/46166
1m isn't quite enough for teuthology, mainly because ceph.py
creates the monmap, then does --mkfs on all mons and osds (to create
the initial keyring), and *then* starts the mons.
2m looks like it'll be enough for most cases.
sage-2021-12-02_14:45:50-rados-wip-sage2-testing-2021-12-01-2041-distro-basic-smithi/6540015
Signed-off-by: Sage Weil <sage@newdream.net>
TestFragmentation.test_deep_split relies on `num_strays`
to reach zero expecting that the purge threads would
have deleted the directory entries. However, checking
`num_strays` cannot be relied on since PurqeQueue merely
journals the purge item (see PurgeQueue::push) followed
by the StrayManager marking the stray as removed thereby
accounting `num_strays`.
So, add an additional condition to check if the purge
threads have finished processing items.
Fixes: http://tracker.ceph.com/issues/52487
Signed-off-by: Venky Shankar <vshankar@redhat.com>
But, do not throw away the old style mount syntax since we would
want to continue testing it since users (scripts) might still be
using it.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
We cannot schedule a daemon start if there is another daemon action
with a higher priority (including stop) scheduled. However,
that state isn't cleared until *after* the osd goes down, the
systemctl command returns, and mgr/cephadm gets around to updating
the inventory scheduled_daemon_action state.
Semi-fix: (1) wait for the orch status to change, and then (2)
wait a few more seconds after that.
Signed-off-by: Sage Weil <sage@newdream.net>
This doesn't affect bootstrap, but it does mean we avoid any delay
the first time we cephadm.shell on some non-boostrap host.
Signed-off-by: Sage Weil <sage@newdream.net>
If we use a new remote for each shell command, we end up waiting
for the image to pull on every host in sequence.
Signed-off-by: Sage Weil <sage@newdream.net>
Test if the number of snaps on the file-system and the stats on created
snaps in the DB match.
NOTE:
Since it is difficult to get the snapshot created on the exact second,
the timestamp comparison has been limited up to the last 'minute' as the
comparison granularity.
Signed-off-by: Milind Changire <mchangir@redhat.com>
Restore ability to run radosgw_admin.py unit standalone--improved
to use vstart_runner hooks.
Local rgwadmin(...) wrapper suggested as a cleanup in review by Casey.
Fixes: https://tracker.ceph.com/issues/52837
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Recently, Luis posted a patch to turn the metrics debugfs file into a
directory with separate files for the different sections in the old
metrics file.
Account for this change in get_op_read_count().
Fixes: https://tracker.ceph.com/issues/53214
Signed-off-by: Jeff Layton <jlayton@redhat.com>
cephfs mirror damon thrasher needs to send SIGTERM to mirror
daemons. The mirror daemon needs to run in foreground for
it to receive signal via `daemon.signal`.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
* refs/pull/43666/head:
qa/vstart_runner: add "managers" to LocalContext instances
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sometimes the OpenFileTable::commit() will just come after the 30
seconds' waiting.
Fixes: https://tracker.ceph.com/issues/52887
Signed-off-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/43590/head:
qa: test that new mounts of same fs function after old mount is evicted
qa: remove REQUIRE_KCLIENT_REMOTE
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
This commit implements `orch apply rbd-mirror` in the rook orchestrator,
it creates a CR with a default name if the service_id isn't specified in
the spec, else it sets the name of the CR to the service_id in the spec.
This commit also adds `orch apply rbd-mirror` to the rook QA. This commit
also implements `orch rm rbd-mirror`.
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
Put the vault token file in a location that ceph can read.
Make it readable only by ceph.
On rhel8 (and indeed, any vanilla rhel machine), $HOME is liable to be
mode 700. This means the ceph user can't read things in that user's
directory. This causes radosgw to emit the confusing message "ERROR:
Vault token file ... not found" even though the teuthology log will
plainly show it was created and made readable by ceph.
Fixes: http://tracker.ceph.com/issues/51539
Signed-off-by: Marcus Watts <mwatts@redhat.com>
Without this plenty tests become incompatible with vstart_runner.py.
Ideally, vstart_runner.py should've been updated in commit 7812cfb674.
Fixes: https://tracker.ceph.com/issues/53043
Signed-off-by: Rishabh Dave <ridave@redhat.com>
`_run_tests()` accepts subdir argument (to run a workunit with
the passed in sub-directory as cwd). One invocation was missing
the subdir argument causing `subdir` tag in yaml to be ineffective.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
* refs/pull/38752/head:
qa: enable dynamic debug support to kclient
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
mgr/dashboard: move NFS_GANESHA_SUPPORTED_FSALS to mgr_module.py
Importing from nfs module throws AttributeError because as a side effect the dashboard module is impersonating the nfs module.
https://gist.github.com/varshar16/61ac26426bbe5f5f562ebb14bcd0f548
mgr/dashboard: 'Create NFS export' form: list clusters from nfs module
mgr/dashboard: frontend+backend cleanups for NFS export
Removed all code and references related to daemons. UI cleanup and adopted unit-testing for
nfs-epxort create form for CEPHFS backend. Cleanup for export list/get/create/set/delete endpoints.
mgr/dashboard: rm set-ganesha ref + update docs
Remove existing set-ganesha-clusters-rados-pool-namespace references as
they are no longer required. Moreover, nfs doc in dashboard doc is
updated accordingly to the current nfs status.
mgr/dashboard: add nfs-export e2e test coverage
mgr/dashboard: 'Create NFS export' form: remove RGW user id field.
- Improve bucket typeahead behavior.
- Increase version for bucket list endpoint.
- Some refactoring.
mgr/dashboard: 'Create NFS export' form: allow RGW backend only when default realm is selected.
When RGW multisite is configured, the NFS module can only handle buckets in the default realm.
mgr/dashboard: 'Create service' form: fix NFS service creation.
After https://github.com/ceph/ceph/pull/42073, NFS pool and namespace are not customizable.
mgr/dashboard: 'Create NFS export' form: add bucket validation.
- Allow only existing buckets.
- Refactoring:
- Moved bucket validator from bucket form to cd-validators.ts
- Split bucket validator into 2: bucket name validator and bucket existence (that checks either existence or non-existence).
mgr/dashboard: 'Create NFS export' form: path validation refactor: allow only existing paths.
Fixes: https://tracker.ceph.com/issues/46493
Fixes: https://tracker.ceph.com/issues/51479
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
* we use setup_iscsi_client.py to deploy iscsi client services,
configuring intiator and mulitpath this is done by qa task
ceph_iscsi_client
* qa/cephadm: adds remotes ip addresses to iscsi gateway,
* rename poolname: iscsi >> datapool, which we usually use for tests and
expresses type of pool more clearly.
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
The osd backfill reservation does not take compression into account so
we need to operate with "uncompressed" bytes when calculating nearfull
ratio.
Signed-off-by: Mykola Golub <mgolub@suse.com>