Commit Graph

874 Commits

Author SHA1 Message Date
Yuri Weinstein
556bd56b70
Merge pull request #48175 from amathuria/wip-add-test-case-bz-2011756
DaemonServer.cc: fix config show command for RGW daemons

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
2024-01-05 07:37:54 -08:00
Radoslaw Zarzynski
7fc77efe2b qa: qa/suites/rados/upgrade/parallel points to reef
```
$ git rm qa/suites/rados/upgrade/parallel
$ ln -s ../../upgrade/reef-x/parallel qa/suites/rados/upgrade/parallel
$ git add qa/suites/rados/upgrade/parallel
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2023-12-04 16:27:51 +01:00
Radoslaw Zarzynski
081177f6a4 qa: stubs and clean up for reef
- remove upgrades from octopus
- stubs for completing upgrade to reef

Still missing the quincy-x upgrade tests.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2023-12-04 16:27:51 +01:00
Vallari Agrawal
b896bebf38
Merge pull request #54209 from VallariAg/wip-xml-scanner
qa: use Remote.run_unit_test and ValgrindScanner
2023-11-29 12:21:02 +05:30
Gabriel BenHanokh
abba1a8b2c osd/SnapMapper: maintain the prefix_itr between calls to SnapMapper::get_next_objects_to_trim()
Maintain the prefix_itr between calls to SnapMapper::get_next_objects_to_trim() to prevent searching depleted prefixes.
We got 8 distinct hash prefixes used for searching objects owned by a given PG.
On each call to SnapMapper::get_next_objects_to_trim() we start from the first prefix even after all objects mapped to it were depleted.
This means that we will be searching for 1 non-existing prefix after the first prefix was depleted, 2 after the first two prefixes were depleted... and so on until we will search 7 non-existing prefixes after the first 7 prefixes were depleted.

This is a performance improvement PR only!
It maintains the existing behavior and does not try to fix/change any of the TRIM logic.
I added an extra step after the last object is trimmed doing a full scan of the DB and only if no object was found it will return ENOENT.
This should make the new code no-worse than existing code which returns ENOENT after a full scan found no object.
It should not impact performance in real life snaps as it should only happen once per-snap.

added snap-mapper tests to rados-test-suite
disabled osd_debug_trim_objects when running (SnapMapperTest, prefix_itr) to prevent asserts(as this code does illegal inserts into DELETED snaps)
Code beautifing

Disabled the assert as there is a corner case when we retrieve the last valid object/s in a snap
The prefix_itr is advanced past the last valid value (as we completed a full scan)
If the OSD will call get_next_objects_to_trim() before the retrieved object/s was processed and removed from the SnapMapper DB it won't be found by the next call (as the prefix_itr is invalid).
The object will be found in the second-pass which will seems as if it was added after the trim was started (which is illegal) and will trigger an ASSERT

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
2023-11-02 19:25:16 +00:00
Aishwarya Mathuria
b88cecdc7c DaemonServer.cc: fix config show command for RGW daemons
RGW daemons register in the servicemap by gid which allows multiple radosgw instances to share an auth key/identity. The daemon name is sent as part of the metadata.  (84c265238b).
All other daemons register by the daemon name and the manager stores all daemon state information with daemon name as key. The 'config show' command looks up the daemon_state map with the daemon name the user mentions as key (for example: 'osd.0', 'client.rgw', 'mon.a').
Due to the change in RGW daemon registration, the key used for storing daemon state has become rgw.gid and 'config show client.rgw' no longer works.

This change will take care of going through the daemon metadata to look for the RGW daemon name when a user enters the config show command for a RGW daemon. Once the correct daemon is found, we retrieve the corresponding daemon key (rgw.gid) and use that to query the daemon_state map.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2011756
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2023-10-30 17:28:05 +00:00
Vallari Agrawal
ccf2bba418
qa: use Remote.run_unit_test when "unit_test_scan" config is used
"unit_test_scan" optional config uses Remote.run_unit_test()
to scan xml files generated by unit tests to throw better failure
messages (for s3tests and gtests run by workunit)
It also creates "unit_test_summary.yaml" for more exception details
from xml files.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2023-10-26 12:13:37 +05:30
Matan Breizman
cc4b75718f qa/suites/rados/thrash/thrashers/mapgap: Increase trimming probability
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-09-13 15:29:35 +00:00
Radoslaw Zarzynski
b77ebe8cd4 Revert "osd/SnapMapper: Maintain the prefix_itr between calls to avoid search…"
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
2023-08-23 20:12:11 +02:00
Gabriel BenHanokh
690739e821 osd/SnapMapper:
Maintain the prefix_itr between calls to SnapMapper::get_next_objects_to_trim() to prevent searching depleted prefixes.
We got 8 distinct hash prefixes used for searching objects owned by a given PG.
On each call to SnapMapper::get_next_objects_to_trim() we start from the first prefix even after all objects mapped to it were depleted.
This means that we will be searching for 1 non-existing prefix after the first prefix was depleted, 2 after the first two prefixes were depleted... and so on until we will search 7 non-existing prefixes after the first 7 prefixes were depleted.

This is a performance improvement PR only!
It maintains the existing behavior and does not try to fix/change any of the TRIM logic.
I added an extra step after the last object is trimmed doing a full scan of the DB and only if no object was found it will return ENOENT.
This should make the new code no-worse than existing code which returns ENOENT after a full scan found no object.
It should not impact performance in real life snaps as it should only happen once per-snap.

added snap-mapper tests to rados-test-suite
disabled osd_debug_trim_objects when running (SnapMapperTest, prefix_itr) to prevent asserts(as this code does illegal inserts into DELETED snaps)
Code beautifing

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
2023-08-23 13:47:45 +00:00
Prashant D
990806e635 mon, qa: issue pool application warning even if pool is empty
Ceph status fail to report pool application warning if
the pool is empty. Report pool application warning
even if pool has 0 objects stored in it.

Add POOL_APP_NOT_ENABLED cluster warnings to log-ignorelist
to fix rados suite.

Fixes: https://tracker.ceph.com/issues/57097

Signed-off-by: Prashant D <pdhange@redhat.com>
2023-07-31 19:09:29 -04:00
Casey Bodley
3281eb85ce
Merge pull request #52143 from cbodley/wip-61567
test/pybind: replace nose with pytest

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-07-20 14:51:05 -04:00
Yuri Weinstein
bace1df330
Merge pull request #51438 from NitzanMordhai/wip-nitzan-cbt-perf-ci
cbt perf ci

Reviewed-by: Samuel Just <sjust@redhat.com>
2023-07-19 12:06:32 -04:00
Casey Bodley
cbdd520995 qa/suites: install pytest for pybind tasks
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-07-17 16:31:08 -04:00
Nitzan Mordechai
4a73f963c5 test: Add Rado/perf suites with cbt stat collection
To be able to collect cpu per operation, adding stat collection

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2023-07-09 08:25:37 +00:00
Laura Flores
2fca433c71 qa/suites/rados/singleton/all: remove test_envlibrados_for_rocksdb
In rocksdb 7.0, all envlibrados files were moved to a separate repository (ref: https://github.com/facebook/rocksdb/pull/9206).
The new repo is temporary and serves as an example before it is finalized where and who to host RADOS support.

Since this new repo is outside of the rocksdb repo and in an unceratin state, we should remove support for it in main
and Reef test suites. Quincy and below still use rocksdb 6.0, so the same does not apply.

Fixes: https://tracker.ceph.com/issues/59057
Signed-off-by: Laura Flores <lflores@redhat.com>
2023-07-06 12:14:05 -05:00
NitzanMordhai
293f13ed49 qa: adding clay test to thrash erasure code big
Currently we don't have any clay test in the erasure code big tests
adding also clay tests.

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2023-06-16 06:58:32 +00:00
Laura Flores
8c6374e8da
Merge pull request #51927 from ljflores/wip-rook-tests
qa/suites/rados: remove rook coverage from the rados suite
2023-06-06 13:35:35 -05:00
Laura Flores
66a6e7fdeb qa/suites/rados: whitelist POOL_APP_NOT_ENABLED for rados cls tests
Fixes: https://tracker.ceph.com/issues/59192
Signed-off-by: Laura Flores <lflores@redhat.com>
2023-06-05 15:35:54 -05:00
Laura Flores
c26674ef4c qa/suites/rados: remove rook coverage from the rados suite
The rook team relies on a daily CI system to validate
rook changes. It doesn't seem that the teuthology tests
are maintained, so it makes sense to remove them from the
rados suite.

By removing this symlink, rook test coverage will remain
in the orch suite, and coverage will only be removed from the
rados suite.

Workaround for: https://tracker.ceph.com/issues/58585
Signed-off-by: Laura Flores <lflores@redhat.com>
2023-06-05 15:23:42 -05:00
Yuri Weinstein
b2ec2aff80
Merge pull request #50651 from rosinL/cleanup
Cleanup the LevelDB residue


Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2023-06-05 11:32:51 -04:00
Yuri Weinstein
925edda1cb
Merge pull request #51527 from NitzanMordhai/wip-nitzan-thrash-eio-pool-size-correct
test: correct osd pool default size


Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>
2023-05-25 12:08:48 -04:00
Nitzan Mordechai
c9d98ec310 test: correct osd pool default size
Using the default pool size of 2 with random eio thrashing can cause
some of the object to mark as lost.
fixing typo from 'osd default pool size: 3' to 'osd pool default size: 3'
so we will have pool size 3 correctly.

Fixes: https://tracker.ceph.com/issues/49888
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2023-05-18 04:34:50 +00:00
Nitzan Mordechai
3a91670aa5 tests: change override to overrides so conf will take affect
We have few test suites that using 'override' in yaml file
while ceph.py task is looking for 'overrides', in that case
those configure params won't take any affects.

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2023-05-17 10:39:59 +00:00
Kamoltat Sirivadhna
78a43309b2
Merge pull request #50857 from kamoltat/wip-ksirivad-iswriteable
mon/Monitor.cc: exit function if !osdmon()->is_writeable()
Reviewd-by: Gregory Farnum <gfarnum@redhat.com>
2023-05-08 21:04:59 -04:00
luo rixin
75adc57feb qa: remove leveldb support from qa
qa/suites: remove leveldb log setting
qa/rebuild_mondb: replace leveldb to rocksdb
qa/valgrind: remove leveldb from valgrind.supp

Signed-off-by: luo rixin <luorixin@huawei.com>
2023-05-04 10:43:08 +08:00
Kamoltat
431c4559c4 qa/standalone: create mon-stretch standalone test
Separate `mon-stretch` from `mon`.

Renamed `mon-stretched-cluster.sh` to
`mon-stretch-fail-recovery.sh`.

This isolation of stretch cluster test will enable
developers to get results faster for stretch-cluster
related stuff.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2023-04-17 16:06:22 +00:00
Yuri Weinstein
23a958d647
Merge pull request #47893 from kotreshhr/ceph-mgr-finisher-block
mgr: Add one finisher thread per module

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
2023-04-06 09:23:24 -07:00
Kotresh HR
2c2ef6d56b qa: Add test for per-module finisher thread
Fixes: https://tracker.ceph.com/issues/51177
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2023-03-29 11:34:07 +05:30
Nitzan Mordechai
4c4967dbb9 qa/suites: change all osd objectstore filestore
Removing and changing all suites to no longer use filestore

Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>

ceph_volume: remove all filestore tests suites
Since filestore removed, no need to test it

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2023-02-12 06:11:29 +00:00
Brad Hubbard
d371237c57
Merge pull request #49109 from NitzanMordhai/wip-nitzan-fixing-few-rados/test.sh
Wip nitzan fixing few rados/test.sh

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
2023-01-27 09:46:53 +10:00
Laura Flores
c190aa9c82
Merge pull request #49181 from ljflores/wip-envlibrados-rocksdb-fix
qa/workunits/rados: skip running envlibrados rocksdb tests on ubuntu
2023-01-17 16:32:20 -06:00
Laura Flores
acc8c7e2ef qa/workunits/rados: skip running envlibrados rocksdb tests on ubuntu
This test passes on centos and rhel, but fails on ubuntu from an
invalid pointer. Since the envlibrados rocksdb tests are experimental
and don't have any actual users, we can just run them on rhel and
centos.

At the moment, the actual bug is not fully understood, but it was
decided that fixing it is low priority, and removing the test from
problematic distros is okay for the time being. This commit
is considered a workaround to the actual issue.

Related tracker: https://tracker.ceph.com/issues/57632
Signed-off-by: Laura Flores <lflores@redhat.com>
2023-01-17 11:38:19 -06:00
NitzanMordhai
8e76c17571 qa/suites/rados/verify: set watch timeout longer
Some of the tests in Rados.sh can fail when trying to test
watch_list return size if we hit watch timeout.
increase the watch timeout for rados test

Fixes: https://tracker.ceph.com/issues/47025
Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
2022-12-21 14:36:07 +00:00
Sridhar Seshasayee
5b2fee21e8 qa: Allow tests to override recovery configs with mClock scheduler enabled
Set osd_mclock_override_recovery_settings option to true for tests that
modify recovery/backfill configuration options. This prevents logging of
the cluster warning when modifying recovery/backfill limits.

Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-12-12 18:12:46 +05:30
Radoslaw Zarzynski
5eaff49330 qa: qa/suites/rados/upgrade/parallel points to quincy
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2022-09-20 14:29:57 +00:00
Radoslaw Zarzynski
4baea08565 doc, qa: stubs and clean up for reef
- remove upgrades from octopus
- stubs for completing upgrade to reef

Still missing the quincy-x upgrade tests.

`c8e1f4c2b547a152e049af2b529bf415f6d76e59` has moved
the `thrash-old-clients` tests back to the rados suite.
This commit fixes the `release-checklists.rst` accordingly.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2022-09-20 14:29:47 +00:00
Sage Weil
39da18b31b qa/workunits/mon/auth_key_rotation.sh: exercise pending key / rotation
Signed-off-by: Sage Weil <sage@newdream.net>
2022-09-12 17:02:59 +00:00
Matan Breizman
5db85e6f45 qa/suites: Reduce rados_python time out
This test runs for few minutes,
reducing timeout from 3h to 1h to avoid hanging jobs.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2022-08-17 12:05:20 +00:00
Laura Flores
40062676c2 qa/suites/rados/thrash-erasure-code-big/thrashers: add osd max backfills setting to mapgap and pggrow
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.

The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.

The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.

WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/

WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/

I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/

Fixes: https://tracker.ceph.com/issues/51076
Signed-off-by: Laura Flores <lflores@redhat.com>
2022-05-19 18:29:00 -05:00
Neha Ojha
8a8945e640
Merge pull request #44868 from neha-ojha/wip-move-to-stream
qa/distros: remove centos8

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2022-02-04 11:56:08 -08:00
Patrick Donnelly
1f714da814
qa: fix or add missing .qa links
Using this command:

    find qa/suites/ -type d -execdir ln -sfT ../.qa/ {}/.qa \;

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2022-02-03 10:08:30 -05:00
Neha Ojha
8ca5729d21 qa/suites/rados/thrash-old-clients: remove centos_8.3_container_tools_3.0
Signed-off-by: Neha Ojha <nojha@redhat.com>
2022-02-02 23:26:54 +00:00
Neha Ojha
f849f1554c qa/suites/rados: reduce the number of cephadm tests
Currently, every rados run of ~400 jobs is running ~150 cephadm tests,
which is unnecessary and redundant. With this change, we will run some
basic cephadm tests within the rados suite. The following seems to be
a good start.

qa/suites/rados/cephadm/osds
qa/suites/rados/cephadm/smoke
qa/suites/rados/cephadm/smoke-singlehost
qa/suites/rados/cephadm/workunits

Signed-off-by: Neha Ojha <nojha@redhat.com>
2022-01-21 23:38:53 +00:00
Pere Diaz Bou
15dfa71cf7 mgr: TTLCache basic implementation
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Fixes: https://tracker.ceph.com/issues/48388
2022-01-05 10:11:58 +01:00
Kamoltat
bb42c71e7e qa: Added workunit test for noautoscale flag
set and unset the noautoscale flag,
evaluate if the results are what
we expected. As well as, evaluate
if the flag is correct when we
create new pools.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-12-22 21:42:28 +00:00
Kamoltat
c194f4a3eb qa/workunits/mon/pg_autoscaler: modified test script
Modified test scrtipt to include `bulk` and
remove all `profile` options.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-12-20 21:46:37 +00:00
Sage Weil
b430fd538f qa/suites/rados/thrash-old-clients: use better-support cephadm distro/podman
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-30 10:47:53 -06:00
Ernesto Puerta
515af762bb
Merge pull request #43987 from rhcs-dashboard/53123-dashboard-nfs-cleanup
mgr/dashboard: NFS non-existent files cleanup

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: ljflores <NOT@FOUND>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2021-11-19 20:40:41 +01:00
Sage Weil
411b2d39c2 qa/suites/rados/dashboard: use single-container-host.yaml
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-17 09:02:42 -06:00