The I/O workload in this test is xfstests (qa/run_xfstests_qemu.sh)
which isn't subjected to any timeout other than global max_job_time
limit in any other subsuite (e.g. qemu/workloads/qemu_xfstests.yaml).
But here, there is a parallel "op" workload defined as a workunit.
The workunit task has a default timeout of 3 hours which is effectively
imposed on the entire job. In the "rbd cache = false" configuration,
it's sometimes exceeded.
Fixes: https://tracker.ceph.com/issues/48038
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It doesn't really thrash anything, just repeatedly restarts the
workload on top of a dirty cache file. rbd_pwl_cache_recovery is
more on point and gets covered by existing CODEOWNERS.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
add thrash test for persistent write log cache. run rbd bench
on persistent write log cache, thrashes rbd bench, test the
recovery function of persistent write log cache.
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
We currently run "iogen -n 5 -s 2g" for about 10 minutes. This workload
does not always generate export/import of subtrees that is being checked
by iogen.yaml. iogen workload is suited for running heavily fragmented I/O
on a file system, and not for growing directory trees.
Fixes: https://tracker.ceph.com/issues/54108
Signed-off-by: Ramana Raja <rraja@redhat.com>
scrub/osd: add clearer reminders that a scrub is blocked
Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
As some Teuthology tests seem to block objects for long minutes,
we must not issue the "scrub is blocked for too long" warning
(that warning causes the tests to fail).
A new configuration parameter now controls the grace period before
the warning is issued. Some tests were modified to set this
configuration parameter to a large value.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
commit 4fbf4c4f58 increases the
number of tags used in snaptest-git-ceph.sh tests. This makes
the tests run longer (than default 3h) thereby timing out.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
The RWL mode needs DAX and is dog slow otherwise -- qemu_xfstests.yaml
job always hits the 6 hour max_job_time limit.
As our tmpfs instance is limited and qemu_xfstests.yaml opens three
images at the same time, reduce the "big cache" size to 5G. This facet
was added to iron out 32-bit head/tail pointer issues and 5G still does
the job there.
Going through the loop device is needed because tmpfs doesn't support
O_DIRECT.
Fixes: https://tracker.ceph.com/issues/55400
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
qa/tasks/rgw_multisite.py uses 'zonegroup set' to create zonegroups from
their json format. this doesn't enable any of the supported zonegroup
features by default, so this adds the 'enabled_features' field to the
json representations
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Changes a few NULL to nullptr.
Adds std::filesystem for path building so they're platform independant.
Fixes a bug for DBStoreManager's second constructor not creating the DB.
Adds unit tests to test DB path and prefix.
Fixes: https://tracker.ceph.com/issues/55731
Signed-off-by: 0xavi0 <xavi.garcia@suse.com>
whitelist_health.yaml -> ignorelist_health.yaml
whitelist_wrongly_marked_down.yaml -> ignore_wrongly_marked_down.yaml
This was mostly addressed in
2ee9365d0b,
but the rename wasn't done there.
Signed-off-by: Zack Cerza <zack@cerza.org>
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076
Signed-off-by: Laura Flores <lflores@redhat.com>
rgw/qa: enable s3-tests related to cloud-transition feature
Reviewed-by: casey Bodley <cbodley@redhat.com>
Reviewed-by: Maredia, Ali <amaredia@redhat.com>
Run cloudtier tests with parameter 'retain_head_object'
set to true and false.
However having multiple cloudtier storage classes in the same task
is increasing the transition time and resulting in spurious failures.
Hence until there is a consistent way of running the tests, without
having to depend on lc_debug_interval, disabled one of the config for
now.
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
DACs are overridable for directories. For files,
Read/write DACs are always overridable but executable
DACs are overridable when there is at least one exec bit
set.
The files and directory DACS overriding were handled the
same way for root which is incorrect. This patch fixes
DACs overriding as described above for the root.
Fixes: https://tracker.ceph.com/issues/55313
Signed-off-by: Kotresh HR <khiremat@redhat.com>