We currently run "iogen -n 5 -s 2g" for about 10 minutes. This workload
does not always generate export/import of subtrees that is being checked
by iogen.yaml. iogen workload is suited for running heavily fragmented I/O
on a file system, and not for growing directory trees.
Fixes: https://tracker.ceph.com/issues/54108
Signed-off-by: Ramana Raja <rraja@redhat.com>
scrub/osd: add clearer reminders that a scrub is blocked
Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
As some Teuthology tests seem to block objects for long minutes,
we must not issue the "scrub is blocked for too long" warning
(that warning causes the tests to fail).
A new configuration parameter now controls the grace period before
the warning is issued. Some tests were modified to set this
configuration parameter to a large value.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
commit 4fbf4c4f58 increases the
number of tags used in snaptest-git-ceph.sh tests. This makes
the tests run longer (than default 3h) thereby timing out.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
The RWL mode needs DAX and is dog slow otherwise -- qemu_xfstests.yaml
job always hits the 6 hour max_job_time limit.
As our tmpfs instance is limited and qemu_xfstests.yaml opens three
images at the same time, reduce the "big cache" size to 5G. This facet
was added to iron out 32-bit head/tail pointer issues and 5G still does
the job there.
Going through the loop device is needed because tmpfs doesn't support
O_DIRECT.
Fixes: https://tracker.ceph.com/issues/55400
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
qa/tasks/rgw_multisite.py uses 'zonegroup set' to create zonegroups from
their json format. this doesn't enable any of the supported zonegroup
features by default, so this adds the 'enabled_features' field to the
json representations
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Added a test where we have 5 MONs on
5 different hosts and try to reduce
the number of MONs from 5 to 3 using
the command ``ceph orch apply mon 3``.
Also,increasing the number of MONs from
3 to 5 using the command:
``ceph orch apply mon 5``.
Evaluating the correctness of the commands
and inspecting if there are crashes.
This test was motivated by the bug:
https://tracker.ceph.com/issues/50089
Signed-off-by: Kamoltat <ksirivad@redhat.com>
Changes a few NULL to nullptr.
Adds std::filesystem for path building so they're platform independant.
Fixes a bug for DBStoreManager's second constructor not creating the DB.
Adds unit tests to test DB path and prefix.
Fixes: https://tracker.ceph.com/issues/55731
Signed-off-by: 0xavi0 <xavi.garcia@suse.com>
whitelist_health.yaml -> ignorelist_health.yaml
whitelist_wrongly_marked_down.yaml -> ignore_wrongly_marked_down.yaml
This was mostly addressed in
2ee9365d0b,
but the rename wasn't done there.
Signed-off-by: Zack Cerza <zack@cerza.org>
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076
Signed-off-by: Laura Flores <lflores@redhat.com>
rgw/qa: enable s3-tests related to cloud-transition feature
Reviewed-by: casey Bodley <cbodley@redhat.com>
Reviewed-by: Maredia, Ali <amaredia@redhat.com>
Run cloudtier tests with parameter 'retain_head_object'
set to true and false.
However having multiple cloudtier storage classes in the same task
is increasing the transition time and resulting in spurious failures.
Hence until there is a consistent way of running the tests, without
having to depend on lc_debug_interval, disabled one of the config for
now.
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
DACs are overridable for directories. For files,
Read/write DACs are always overridable but executable
DACs are overridable when there is at least one exec bit
set.
The files and directory DACS overriding were handled the
same way for root which is incorrect. This patch fixes
DACs overriding as described above for the root.
Fixes: https://tracker.ceph.com/issues/55313
Signed-off-by: Kotresh HR <khiremat@redhat.com>
don't rely on the ceph manager task to parse a config file. each rgw
could be using a different config. instead, revert to an s3tests
override called 'with-sse-s3'
this way, the only job that enables sse-s3, vault_transit.yaml, contains
both the 'rgw crypt sse s3' configurables, and the flag to enable the
associated test cases
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Due to lack of Windows support in the Teuthology, the test case adopts
the following workaround:
* Deploy baremetal machine with `ubuntu_latest.yaml` and
configure it with libvirt KVM.
* Create a libvirt VM and provision it with Windows Server 2019, using
the official ISO from Microsoft.
* Configure SSH in the Windows VM, and run the tests remotely via SSH.
The implementation of the test case consists of workunit scripts.
`qa/workunits/windows/test_rbd_wnbd.py` is the main Python script
to test Ceph on Windows basic functionality. This is executed in the
libvirt VM configured with Windows Server 2019.
Co-authored-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Co-authored-by: Daniel Vincze <dvincze@cloudbasesolutions.com>
Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
Otherwise the tests may run forever. This was already done for
mds upgrade sequence, justadding it in the other two places here
Related to: https://tracker.ceph.com/issues/53939
Signed-off-by: Adam King <adking@redhat.com>
qa/workunits/fs/misc/subvolume.sh is getting in the way of fs:workload
testing with subvolumes. Hence moved this script to a python test.
Signed-off-by: Milind Changire <mchangir@redhat.com>
This commit removes orchestrator commands from the
Rook task and the Rook test suite because the Rook
orchestrator is not being maintained, and the Rook
orchestrator CLI is obsolete. This should also
clarify the issue:
https://tracker.ceph.com/issues/53680
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
The default pids-limit (docker 4096/podman 2048) prevent some
customization from working (http threads on RGW) or limits the number
of luns per iscsi target.
Fixes: https://tracker.ceph.com/issues/52898
Signed-off-by: Teoman ONAY <tonay@redhat.com>
Added mds daemons so that it can create
cephFS pools and set options using
`do_set_pool()` in FSCommand.cc. Such that
we can cover corner cases like that in
https://tracker.ceph.com/issues/54263
Signed-off-by: Kamoltat <ksirivad@redhat.com>
A new job that doesn't want ms_mode to be set underneath it is about to
be added. Rename rxbounce to ms_modeless to make this purpose obvious.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It's more appropriate to use --subset to reduce the scheduling size. It
was previously laid out this way because we wanted to link to the common
`qa/cephfs/mount` directory so that ceph-fuse mounts are not needlessly
multiplied. We should just organize it correctly so that is not an
issue.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/42000/head:
qa: update rhel kclient to setup container tools
qa: stop overriding distro for k-testing
qa: only use RHEL for workload testing
qa: convert fs:workload to use cephadm
qa: split fs begin task
qa/tasks/cephadm: setup CephManager when OSDs are provisioned
qa/tasks/cephadm: setup file system if MDS are provisioned
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Accessing one single dentry could be fastened by set this option to
false, when dir is not in the memory.
Signed-off-by: "Shen, Hang" <shenhang@kuaishou.com>
Use the override in ./src/qa/distros/container-hosts/ubuntu_20.04.yaml
in order to use hwe kernel for Ubuntu 20.04
This is because the ubuntu 20.04 kernel (5.4) has a bug that prevents
from using nvme-loop.
see https://lkml.org/lkml/2020/9/21/1456
Fixes: https://tracker.ceph.com/issues/54094
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).
For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet. The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.
The total number of jobs remains the same.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
So links can be elsewhere in the qa suite (not used yet) and to simplify
a find command in a follow-up commit.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
It's not useful testing workloads with different distributions; it just
adds to the maintenance burden of this qa suite as distro upgrades often
break compilation of various tests.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Note: it's important to keep the install task which supplies packages
needed for some workloads.
Fixes: https://tracker.ceph.com/issues/51333
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
With mon (rbd_support mgr module in this case) command definitions
generated automatically by @CLI{Read,Write}Command decorator, it's
very easy to accidentally break the external facing API.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently, every rados run of ~400 jobs is running ~150 cephadm tests,
which is unnecessary and redundant. With this change, we will run some
basic cephadm tests within the rados suite. The following seems to be
a good start.
qa/suites/rados/cephadm/osds
qa/suites/rados/cephadm/smoke
qa/suites/rados/cephadm/smoke-singlehost
qa/suites/rados/cephadm/workunits
Signed-off-by: Neha Ojha <nojha@redhat.com>
On rhel/centos the ceph user does not have permission
to access these certs which leads to s3-test failures
in teuthology.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
disable the sending of async datalog notifications on one zone per
cluster. this helps to verify that tests don't rely on notifications to
succeed
Signed-off-by: Casey Bodley <cbodley@redhat.com>
qa/suites/orch/cephadm: Also run the rbd/iscsi suite
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Melissa Li <mingkli@redhat.com>
set and unset the noautoscale flag,
evaluate if the results are what
we expected. As well as, evaluate
if the flag is correct when we
create new pools.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
This commit adds testing for the drive_group_loop in the Rook orchestrator
that reapplies drive groups that were applied previously.
This test removes an OSD, zaps the underlying device then waits for the OSD
to be re-created by the drive_group_loop.
This commit also updates the rook test suite to test v1.7.2 instead of 1.7.0
since `orch device zap` is only supported from v1.7.2 onwards.
Fixes: https://tracker.ceph.com/issues/53501
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
mgr/cephadm: store contianer registry credentials in config-key
Reviewed-by: Sage Weil <sage@newdream.net>
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
mgr/cephadm: Add client.admin keyring when upgrading from older version
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sage Weil <sage@newdream.net>
* refs/pull/43936/head:
qa/tasks/cephadm: pull image to all hosts in parallel
qa/tasks/cephadm: add hosts via mon remote
qa/tasks/cephadm: use shortname for remote directory
qa/tasks/cephadm: deploy no more than 5 mons in roleless mode
qa/tasks/radosbench: default clients to all clients (not client.0)
qa/tasks/ceph_manager: parallelize flush_pg_stats()
qa/suites/big: remove thrasher
qa/suites/big: update for cephadm
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
* refs/pull/43974/head:
qa: disable metrics on kernel client during upgrade
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
v16.2.4 MDS triggers an assert from these messages.
Also: add latest pacific for extra coverage.
Fixes: https://tracker.ceph.com/issues/53293
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>