Using an nvme loop device makes the LVs look like "real" disks,
which means we can exercise all of the normal code paths for
provisioning, deprovisioning, and zapping.
Signed-off-by: Sage Weil <sage@newdream.net>
* refs/pull/43163/head:
qa: fsync dir for asynchronous creat on stray tests
qa: refactor and generalize create_n_files
qa: only set frag confs for workloads
mds: improve debugging for fragment size check
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
modified: qa/standalone/erasure-code/test-erasure-code-plugins.sh
new file: qa/suites/rados/thrash-erasure-code-isa/arch/aarch64.yaml
Signed-off-by: Dai Zhiwei <daizhiwei3@huawei.com>
Use the enhanced create_n_files to dedup code. Also split the large test
into three.
Fixes: https://tracker.ceph.com/issues/52606
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Few things:
- Allow calling fsync on directory (to support async create kernel).
- Allow immediately unlinking the created file (for stray testing).
- Close any file descriptors created.
- Write unique content (the i variable) to each file.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
After 3863eb89512f1698b8e56f1f1ffc78a6ca8d5826--rgw: permit logging of
list-bucket (and any other no-bucket op-- the radosgw ops-log
contains entries for ops with no associated buckets--e.g., list_buckets.
When examining such a log object in the radosgw_admin task, don't assert
that it has any bucket name.
Fixes: https://tracker.ceph.com/issues/52647
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
The test is not needing to check that the new MDS becomes active, only
that a replacement occurs.
Fixes: https://tracker.ceph.com/issues/52677
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Stop importing CommandFailedError from teuthology.orchestra.run, it is
actually defined in teuthology.exception.
Fixes: https://tracker.ceph.com/issues/51226
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This also checks max_mds>1 and allow_standby_replay are restored to
previous values.
Future work can add tests for multiple file systems (or volumes).
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This allows hooks for `cephadm shell` to function so that this code
works with cephadm deployments.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This needs to be available for the cephfs_setup task so administration
mounts can run ceph commands, potentially through `cephadm shell`.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Addition of a new column, SCRUB_DURATION, to the pg stats that stores the time taken for a PG scrub.
Fixes: https://tracker.ceph.com/issues/52605
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
* refs/pull/43287/head:
mgr/rook, qa/tasks/rook: change rgw daemon service name
mgr/rook: fix placement_spec_to_node_selector
mgr/rook: orch rm no longer uses rook api delete
qa/tasks/rook: fix cluster deletion hanging due to CephObjectStore CR
mgr/rook: use default replication size in orch apply rgw
mgr/rook: add placement specs to apply rgw
Reviewed-by: Sage Weil <sage@redhat.com>
qa/mgr/dashboard: add extra wait to test
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
mgr/dashboard: Move force maintenance test to the workflow test suite
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
This commit changes the rgw daemon service name format from
rgw.<realm name>.<zone name> to rgw.<resource_name> and changes the daemon
removal in the QA accordingly. This also gets rid of the Rook API when
describing services.
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit fixes the issue where the cluster deletion hangs in the QA
while a CephObjectStore CR is still up by removing all rgw/nfs/mds/rbd-mirror
daemons before tearing down the rest of the cluster.
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit changes `orch apply rgw` to use the osd_pool_default_size
when setting the replication size for the data pool and metadata pool
of the rgw daemon. This commit also adds `orch apply rgw` to the Rook
QA.
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
mgr/dashboard: make modified API endpoints backward compatible
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Fixes: https://tracker.ceph.com/issues/52480
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Introducing APIVersion class to handle versioning for API-endpints and making
them backward compatible.
* refs/pull/42719/head:
mgr/volumes: Fix permission during subvol creation with mode
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/42584/head:
doc: fix `daemon status` interface (exclude file system name)
test: adjust mirroring tests for `daemon status` change
mgr/mirroring: `daemon status` command does not require file system name
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Add a 'kmount_count' counter in ctx to make sure the dynamic debug
log won't be disabled until the last kernel mounter is unmounted.
Fixes: https://tracker.ceph.com/issues/48736
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Fixes a lexical error in one line of code added in
90e9307ab0, removing the dependency
on lsb_release, on 8/16/2021.
Fixes: https://tracker.ceph.com/issues/52613
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Currently, to recover a file system after recovering monitor store, you
need to stop all the MDSs; create FSMap with defaults using `fs new`
command; execute `fs reset` command to get the file system's rank 0 into
existing but failed state; and then restart MDSs.
Add 'recover' flag to the `fs new` command that sets the file system's
rank 0 to existing but failed state, and sets the file system's
'joinable' setting to False. Using the `fs new` command with 'recover'
flag gets rid of the steps to stop all the MDSs and execute `fs reset`
command when recovering the file system after recoving monitor store.
Fixes: https://tracker.ceph.com/issues/51716
Signed-off-by: Ramana Raja <rraja@redhat.com>
mon/MonCap: Update osd profile to allow cmd to set iops capacity on mon db
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
The subvolume creation with specific mode leads to
creation of parent directories ('/volumes/_no_group') with
the same mode if it's not already created. Fixed the same.
Similarly, the subvolumegroup creation with specific mode
leads to creation of parent directory ('/volumes') with
same mode if it's not already created. Fixed the same.
Fixes: https://tracker.ceph.com/issues/51870
Signed-off-by: Kotresh HR <khiremat@redhat.com>
IMO the amount of symlinks we have to manually maintain
is tedious and error prone. Any ideas on improving thing?
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Force a subset of tests that explicitly employ the filestore backend to
use WPQ scheduler. This is because mclock scheduler will not be
optimized for filestore.
Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
This commit adds OSD creation to the Rook QA tasks. The Rook task will
explicitly wait for the mgr to start and the CLI to work (instead of
implicitly doing so while waiting for 'ceph osd dump' to work).
Then it will do `ceph orch apply osd --all-available-devices` to create
OSDs on the rest of the PVs.
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
Add a standalone test - test_activate_osd_skip_benchmark() in ceph-helpers.sh
that exercises the osd-mclock-skip-benchmark option.
Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Assign the default caps for osds to be the same as what the AuthMonitor
sets for a new osd. See AuthMonitor::validate_osd_new() which sets the
following caps for a new osd:
mon='allow profile osd'
mgr='allow profile osd'
osd=''allow *'
When an actual real world cluster is deployed, the above caps are applied.
Unless the user modifies the defaults, a cluster will operate with the
above caps. Therefore, it makes sense to use the defaults when testing
Ceph so that issues if any due to the default settings may be caught and
fixed.
Therefore, the caps for the 'osd' type is reset to the default in
generate_caps(). The caps for 'mgr' already reflects the system defaults.
The caps for 'mds' type is not changed in this commit and will be
investigated and changed if necessary later.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
* refs/pull/42687/head:
qa: test the "ms_mode" options in kclient workflows
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/38481/head:
qa/vstart_runner: inherit methods instead of duplicating them
qa/ceph_manager: make it possible to reuse few methods
qa/vstart_runner: don't use "shell=False" in run_ceph_w()
qa/ceph_manager: minor refactor
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/42371/head:
mgr/volumes: Fix a race during clone cancel
mgr/volumes: Fail subvolume removal if it's in progress
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
mgr/dashboard: stats=false not working when listing buckets
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
not really fixing anything, but moves the failures out of the normal
upgrade suite
Fixes: https://tracker.ceph.com/issues/49955
Signed-off-by: Casey Bodley <cbodley@redhat.com>
The lsb_release utility brings in a lot of other dependencies. Remove
it from the RGW workunit Perl scripts.
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
In the LoadRequest in the ImageMap class add initial cleanup to remove
stale entries. To cleanup the LoadRequest will query the mirror image
list and remove all the image_map that are notin the list.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
This make sure that all images are deleted in the existing qa scripts
and checks if all rbd-mirror metadata in OMAP are correctly deleted.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
This commit adds the device ls command to the rook qa task
since that command should be working from now on.
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
Note that I didn't bother adding the prefer-* options, as I figure it's
better to be definite.
Fixes: https://tracker.ceph.com/issues/52068
Signed-off-by: Jeff Layton <jlayton@redhat.com>
* refs/pull/42691/head:
mgr/nfs: add --port to 'nfs cluster create' and port to 'nfs cluster info'
qa/suites/orch/cephadm/smoke-roleless: test taking ganeshas offline
qa/tasks/vip: exec with bash -ex
qa/suites/orch/cephadm: separate test_nfs from test_orch_cli
Reviewed-by: Varsha Rao <varao@redhat.com>
- Rename the dashboard command to better reflect its behavior.
- Rename '_radosgw_admin' method to 'send_rgwadmin_command' for consistency with
'send_mon_command' and move it to the mgr_module.py .
- Cleanup: remove unneeded rgw settings.
- Better error handling and test coverage.
Fixes: https://tracker.ceph.com/issues/44605
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
This is no longer required because we removed cosbench workloads in
fd350fd015. This is also required to prevent
failures like the following or any other changes that break the rgw task:
```
2021-08-06T20:13:25.812 INFO:teuthology.orchestra.run.smithi060.stderr:curl: (7) Failed to connect to smithi060.front.sepia.ceph.com port 80: Connection refused
2021-08-06T20:15:33.813 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_04c2febe7099917d97a71271f17abb5710030132/teuthology/contextutil.py", line 31, in nested
vars.append(enter())
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/rgw.py", line 191, in start_rgw
wait_for_radosgw(url, remote)
File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/util/rgw.py", line 94, in wait_for_radosgw
assert exit_status == 0
AssertionError
```
Signed-off-by: Neha Ojha <nojha@redhat.com>
Downloading 461087a514/cryptography-3.4.7.tar.gz (546kB)
Complete output from command python setup.py egg_info:
=============================DEBUG ASSISTANCE==========================
If you are seeing an error here please try the following to
successfully install cryptography:
Upgrade to the latest pip and try again. This will fix errors for most
users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip
=============================DEBUG ASSISTANCE==========================
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-7fhnk5us/cryptography/setup.py", line 14, in <module>
from setuptools_rust import RustExtension
ModuleNotFoundError: No module named 'setuptools_rust'
Fixes: https://tracker.ceph.com/issues/52070
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Inherit methods run_ceph_w(), run_cluster_cmd(), raw_cluster_cmd() and
raw_cluster_cmd_result() from ceph_manager.CephManager in
vstart_runner.LocalCephManager instead of duplicating them.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Make minor adjustments to ceph_manager.CephManager so that methods
run_ceph_w(), run_cluster_cmd() raw_cluster_cmd() and
raw_cluster_cmd_result() can be reused, instead of duplicating, in
subclasses. The adjustments are -
* Having variables contain arguments that'll be prepended to every
command received by the methods above.
* Grouping variables that needs to be overridden together so that it is
easy to spot and override them for users.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Instead prepend "exec sudo" to the command arguments of
LocalCephManager.run_ceph_w(). This makes the default parameter
"shell=False" redundant in case of
ceph_manager.CephManager.run_ceph_w(), so get rid of it too and update
calls to run_ceph_w() accordingly.
The reason behind using any of these workarounds is that running "ceph
-w" with "shell" set to True leads to crash for Ceph API CI job. See
this ticket for more details: https://tracker.ceph.com/issues/49644.
The reason behind switching the workaround is that in the following
commits to reduce duplication LocalCephManager.run_ceph_w() will be
deleted and CephManager.run_ceph_w() will be used by LocalCephManager
via inheritance. However, due to the issue described above, Ceph API
test will fail since "shell" is set to "True" for the command issued by
CephManager.run_ceph_w(). Prepending "exec sudo" to the command when it
is used in LocalCephManager makes this duplication unnecessary and also
prevents Ceph API test from failing.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Save the return value of method "teuthology.get_testdir()" instead of
calling it repeatedly in the same class.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
so we don't need to use virtualenv python package for creating a
virtualenv, the "venv" module in Python3 would suffice.
see also https://docs.python.org/3/library/venv.html
Signed-off-by: Kefu Chai <kchai@redhat.com>
Modified test cases:
1. ver-health.sh:
a. TEST_check_version_health_1():
To avoid intermittent timeouts observed in wait_for_health_string(),
increase the wait time to 20 secs.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The following tests in the test files mentioned below use the
"osd_scrub_sleep" option to introduce delays during scrubbing to help
determine scrubbing states, validate reservations during scrubbing etc..
This works when using the "wpq" scheduler.
But when the "mclock_scheduler" is enabled, the "osd_scrub_sleep" is
disabled and overridden to 0. This is done to delegate the scheduling of
the background scrubs to the "mclock_scheduler" based on the set QoS
parameters. Due to this, the checks to verify the scrub states,
reservations etc. fail since the window to check them is very short
due to scrubs completing very quickly. This affects a small subset of
scrub tests mentioned below,
1. osd-scrub-dump.sh -> TEST_recover_unexpected()
2. osd-scrub-repair.sh -> TEST_auto_repair_bluestore_tag()
3. osd-scrub-test.sh -> TEST_scrub_abort(), TEST_deep_scrub_abort()
Only for the above tests, until there's a reliable way to query scrub
states with "--osd-scrub-sleep" set to 0, the "osd_op_queue" config
option is set to "wpq".
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Modified test cases:
1. test-erasure-eio.sh:
a. Test_ec_backfill_unfound():
- Set osd_mclock_profile to high_recovery_ops profile.
- Increase the wait for backfill_unfound timeout to 240 secs.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Modified test cases:
1. osd-backfill-prio.sh:
Set osd_op_queue = wpq for all tests since the mclock doesn't
consider recovery priority as part of its scheduling algorithm.
2. osd-backfill-space.sh:
Set osd_mclock_profile to high_recovery_ops and increase the wait
for backfills timeout to 1200 secs for the following tests:
- TEST_backfill_test_simple()
- TEST_backfill_test_multi()
- TEST_backfill_test_sametarget()
- TEST_backfill_multi_partial()
- TEST_ec_backfill_simple()
- TEST_ec_backfill_multi()
- SKIP_TEST_ec_backfill_multi_partial()
- SKIP_TEST_ec_backfill_multi_partial()
3. osd-backfill-stats:
- TEST_backfill_ec_down_all_out():
Set osd_mclock_profile to high_recovery_ops and increase the wait
for recovery timeout to 240 secs.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Modified test cases:
1. osd-recovery-prio.sh:
Set osd_op_queue = wpq for all tests since mclock
doesn't consider recovery priority as part of its
scheduling algorithm.
2. osd-recovery-stats.sh:
a. TEST_recovery_undersized():
- Set osd_mclock_profile to high_recovery_ops profile.
- Increase wait for recovery timeout to 300 secs.
3. osd-rep-recov-eio.sh:
a. TEST_rep_backfill_unfound():
- Set osd_mclock_profile to high_recovery_ops profile.
- Increase wait for backfill_unfound to 360 secs.
4. repeer-on-acting-back.sh:
a. TEST_repeer_on_down_act():
- Set osd_mclock_profile to high_recovery_ops profile.
(To improve the test duration)
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
List of changes:
1. Remove the enforcement to use osd_op_queue=wpq when an osd is brought
up in the following functions:
- run_osd()
- run_osd_filestore() and
- activate_osd()
2. New functions:
- get_op_scheduler() - Get the current osd_op_queue for an osd.
3. Modified test cases:
- test_run_osd() - Add check for osd_max_backfill count.
The mclock scheduler overrides the count to 1000.
4. New test cases:
- test_activate_osd_after_mark_down()
- test_get_op_scheduler()
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Removing an in-progress subvolume clone with force doesn't
remove the clone index (tracker). This results in the cloner
thread to stuck in loop trying to clone the deleted one.
This patch addresses the issue by not allowing the subvolume clone
to be removed if it's not complete/cancelled/failed even with force option.
It throws the error EAGAIN, asking the user to cancel the pending clone
and retry.
Fixes: https://tracker.ceph.com/issues/51707
Signed-off-by: Kotresh HR <khiremat@redhat.com>