The RWL mode needs DAX and is dog slow otherwise -- qemu_xfstests.yaml
job always hits the 6 hour max_job_time limit.
As our tmpfs instance is limited and qemu_xfstests.yaml opens three
images at the same time, reduce the "big cache" size to 5G. This facet
was added to iron out 32-bit head/tail pointer issues and 5G still does
the job there.
Going through the loop device is needed because tmpfs doesn't support
O_DIRECT.
Fixes: https://tracker.ceph.com/issues/55400
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
qa/tasks/rgw_multisite.py uses 'zonegroup set' to create zonegroups from
their json format. this doesn't enable any of the supported zonegroup
features by default, so this adds the 'enabled_features' field to the
json representations
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Changes a few NULL to nullptr.
Adds std::filesystem for path building so they're platform independant.
Fixes a bug for DBStoreManager's second constructor not creating the DB.
Adds unit tests to test DB path and prefix.
Fixes: https://tracker.ceph.com/issues/55731
Signed-off-by: 0xavi0 <xavi.garcia@suse.com>
whitelist_health.yaml -> ignorelist_health.yaml
whitelist_wrongly_marked_down.yaml -> ignore_wrongly_marked_down.yaml
This was mostly addressed in
2ee9365d0b,
but the rename wasn't done there.
Signed-off-by: Zack Cerza <zack@cerza.org>
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076
Signed-off-by: Laura Flores <lflores@redhat.com>
rgw/qa: enable s3-tests related to cloud-transition feature
Reviewed-by: casey Bodley <cbodley@redhat.com>
Reviewed-by: Maredia, Ali <amaredia@redhat.com>
Run cloudtier tests with parameter 'retain_head_object'
set to true and false.
However having multiple cloudtier storage classes in the same task
is increasing the transition time and resulting in spurious failures.
Hence until there is a consistent way of running the tests, without
having to depend on lc_debug_interval, disabled one of the config for
now.
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
DACs are overridable for directories. For files,
Read/write DACs are always overridable but executable
DACs are overridable when there is at least one exec bit
set.
The files and directory DACS overriding were handled the
same way for root which is incorrect. This patch fixes
DACs overriding as described above for the root.
Fixes: https://tracker.ceph.com/issues/55313
Signed-off-by: Kotresh HR <khiremat@redhat.com>
don't rely on the ceph manager task to parse a config file. each rgw
could be using a different config. instead, revert to an s3tests
override called 'with-sse-s3'
this way, the only job that enables sse-s3, vault_transit.yaml, contains
both the 'rgw crypt sse s3' configurables, and the flag to enable the
associated test cases
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Due to lack of Windows support in the Teuthology, the test case adopts
the following workaround:
* Deploy baremetal machine with `ubuntu_latest.yaml` and
configure it with libvirt KVM.
* Create a libvirt VM and provision it with Windows Server 2019, using
the official ISO from Microsoft.
* Configure SSH in the Windows VM, and run the tests remotely via SSH.
The implementation of the test case consists of workunit scripts.
`qa/workunits/windows/test_rbd_wnbd.py` is the main Python script
to test Ceph on Windows basic functionality. This is executed in the
libvirt VM configured with Windows Server 2019.
Co-authored-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Co-authored-by: Daniel Vincze <dvincze@cloudbasesolutions.com>
Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
Otherwise the tests may run forever. This was already done for
mds upgrade sequence, justadding it in the other two places here
Related to: https://tracker.ceph.com/issues/53939
Signed-off-by: Adam King <adking@redhat.com>
qa/workunits/fs/misc/subvolume.sh is getting in the way of fs:workload
testing with subvolumes. Hence moved this script to a python test.
Signed-off-by: Milind Changire <mchangir@redhat.com>
This commit removes orchestrator commands from the
Rook task and the Rook test suite because the Rook
orchestrator is not being maintained, and the Rook
orchestrator CLI is obsolete. This should also
clarify the issue:
https://tracker.ceph.com/issues/53680
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
The default pids-limit (docker 4096/podman 2048) prevent some
customization from working (http threads on RGW) or limits the number
of luns per iscsi target.
Fixes: https://tracker.ceph.com/issues/52898
Signed-off-by: Teoman ONAY <tonay@redhat.com>