Commit Graph

842 Commits

Author SHA1 Message Date
Laura Flores
c190aa9c82
Merge pull request #49181 from ljflores/wip-envlibrados-rocksdb-fix
qa/workunits/rados: skip running envlibrados rocksdb tests on ubuntu
2023-01-17 16:32:20 -06:00
Laura Flores
acc8c7e2ef qa/workunits/rados: skip running envlibrados rocksdb tests on ubuntu
This test passes on centos and rhel, but fails on ubuntu from an
invalid pointer. Since the envlibrados rocksdb tests are experimental
and don't have any actual users, we can just run them on rhel and
centos.

At the moment, the actual bug is not fully understood, but it was
decided that fixing it is low priority, and removing the test from
problematic distros is okay for the time being. This commit
is considered a workaround to the actual issue.

Related tracker: https://tracker.ceph.com/issues/57632
Signed-off-by: Laura Flores <lflores@redhat.com>
2023-01-17 11:38:19 -06:00
Sridhar Seshasayee
5b2fee21e8 qa: Allow tests to override recovery configs with mClock scheduler enabled
Set osd_mclock_override_recovery_settings option to true for tests that
modify recovery/backfill configuration options. This prevents logging of
the cluster warning when modifying recovery/backfill limits.

Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-12-12 18:12:46 +05:30
Radoslaw Zarzynski
5eaff49330 qa: qa/suites/rados/upgrade/parallel points to quincy
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2022-09-20 14:29:57 +00:00
Radoslaw Zarzynski
4baea08565 doc, qa: stubs and clean up for reef
- remove upgrades from octopus
- stubs for completing upgrade to reef

Still missing the quincy-x upgrade tests.

`c8e1f4c2b547a152e049af2b529bf415f6d76e59` has moved
the `thrash-old-clients` tests back to the rados suite.
This commit fixes the `release-checklists.rst` accordingly.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2022-09-20 14:29:47 +00:00
Sage Weil
39da18b31b qa/workunits/mon/auth_key_rotation.sh: exercise pending key / rotation
Signed-off-by: Sage Weil <sage@newdream.net>
2022-09-12 17:02:59 +00:00
Matan Breizman
5db85e6f45 qa/suites: Reduce rados_python time out
This test runs for few minutes,
reducing timeout from 3h to 1h to avoid hanging jobs.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2022-08-17 12:05:20 +00:00
Laura Flores
40062676c2 qa/suites/rados/thrash-erasure-code-big/thrashers: add osd max backfills setting to mapgap and pggrow
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.

The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.

The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.

WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/

WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/

I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/

Fixes: https://tracker.ceph.com/issues/51076
Signed-off-by: Laura Flores <lflores@redhat.com>
2022-05-19 18:29:00 -05:00
Neha Ojha
8a8945e640
Merge pull request #44868 from neha-ojha/wip-move-to-stream
qa/distros: remove centos8

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2022-02-04 11:56:08 -08:00
Patrick Donnelly
1f714da814
qa: fix or add missing .qa links
Using this command:

    find qa/suites/ -type d -execdir ln -sfT ../.qa/ {}/.qa \;

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2022-02-03 10:08:30 -05:00
Neha Ojha
8ca5729d21 qa/suites/rados/thrash-old-clients: remove centos_8.3_container_tools_3.0
Signed-off-by: Neha Ojha <nojha@redhat.com>
2022-02-02 23:26:54 +00:00
Neha Ojha
f849f1554c qa/suites/rados: reduce the number of cephadm tests
Currently, every rados run of ~400 jobs is running ~150 cephadm tests,
which is unnecessary and redundant. With this change, we will run some
basic cephadm tests within the rados suite. The following seems to be
a good start.

qa/suites/rados/cephadm/osds
qa/suites/rados/cephadm/smoke
qa/suites/rados/cephadm/smoke-singlehost
qa/suites/rados/cephadm/workunits

Signed-off-by: Neha Ojha <nojha@redhat.com>
2022-01-21 23:38:53 +00:00
Pere Diaz Bou
15dfa71cf7 mgr: TTLCache basic implementation
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Fixes: https://tracker.ceph.com/issues/48388
2022-01-05 10:11:58 +01:00
Kamoltat
bb42c71e7e qa: Added workunit test for noautoscale flag
set and unset the noautoscale flag,
evaluate if the results are what
we expected. As well as, evaluate
if the flag is correct when we
create new pools.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-12-22 21:42:28 +00:00
Kamoltat
c194f4a3eb qa/workunits/mon/pg_autoscaler: modified test script
Modified test scrtipt to include `bulk` and
remove all `profile` options.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-12-20 21:46:37 +00:00
Sage Weil
b430fd538f qa/suites/rados/thrash-old-clients: use better-support cephadm distro/podman
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-30 10:47:53 -06:00
Ernesto Puerta
515af762bb
Merge pull request #43987 from rhcs-dashboard/53123-dashboard-nfs-cleanup
mgr/dashboard: NFS non-existent files cleanup

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: ljflores <NOT@FOUND>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2021-11-19 20:40:41 +01:00
Sage Weil
411b2d39c2 qa/suites/rados/dashboard: use single-container-host.yaml
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-17 09:02:42 -06:00
Alfonso Martínez
045d2d0f76 mgr/dashboard: NFS non-existent files cleanup
After https://github.com/ceph/ceph/pull/42526 and https://github.com/ceph/ceph/pull/43725 merges,
the following files do not exist but there were still references to them:
- src/pybind/mgr/dashboard/services/ganesha.py
- qa/tasks/mgr/dashboard/test_ganesha.py

The following files were renamed but there were still references to old names:
- src/pybind/mgr/dashboard/controllers/nfsganesha.py:  nfsganesha.py --> nfs.py
- src/pybind/mgr/dashboard/tests/test_ganesha.py:  test_ganesha.py --> test_nfs.py

Other changes in qa/suites/rados/dashboard/tasks/dashboard.yaml:
- Add missing task: tasks.mgr.dashboard.test_api
- Sort dashboard tasks alphabetically.

Fixes: https://tracker.ceph.com/issues/53123
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
2021-11-17 13:25:17 +01:00
Sebastian Wagner
116a8c4208
qa/suites/rados/mgr: use only one objectstore instead of all
I think we have enough coverage. Always testing all
objectstores is a bit excessive in my opinion

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-10-28 15:01:29 +02:00
Kefu Chai
70b049ffdb
Merge pull request #43239 from trociny/wip-48959
osd: handle inconsistent hash info during backfill and deep scrub gracefully

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-10-14 22:43:16 +08:00
Ernesto Puerta
90bbcab09f
Merge pull request #42557 from ceph/feature-50336-cluster-creation-wizard
mgr/dashboard: Cluster Creation/Expansion Wizard

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: sebastian-philipp <NOT@FOUND>
Reviewed-by: Volker Theile <vtheile@suse.com>
2021-10-14 15:12:42 +02:00
Nizamudeen A
59cbf97e6c mgr/dashboard: Cluster Creation Add Host Section and e2es
Add host section of the cluster creation workflow.

1. Fix bug in the modal where going forward one step on the wizard and coming back opens up the add host modal.
2. Rename Create Cluster to Expand Cluster as per the discussions
3. A skip confirmation modal to warn the user when he tries to skip the
   cluster creation
4. Adapted all the tests
5. Did some UI improvements like fixing and aligning the styles,
   colors..
- Used routed modal for host Additon form
- Renamed the Create to Add in Host Form

Fixes: https://tracker.ceph.com/issues/51517
Fixes: https://tracker.ceph.com/issues/51640
Fixes: https://tracker.ceph.com/issues/50336
Fixes: https://tracker.ceph.com/issues/50565
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
2021-10-13 15:55:23 +05:30
Zack Cerza
b57539dc94 Revert "qa: support isal ec test for aarch64"
This commit has been causing scheduled jobs to request e.g. aarch64
smithi machines, which don't exist. The dispatcher then tries to find them forever, requiring the dispatcher to be killed and restarted. The queue
will sit idle until someone notices the problem.

Signed-off-by: Zack Cerza <zack@redhat.com>
2021-10-12 12:53:58 -06:00
Dai Zhiwei
eaa385f3da qa: support isal ec test for aarch64
modified:   qa/standalone/erasure-code/test-erasure-code-plugins.sh
	new file:   qa/suites/rados/thrash-erasure-code-isa/arch/aarch64.yaml

Signed-off-by: Dai Zhiwei <daizhiwei3@huawei.com>
2021-10-08 14:37:25 +08:00
Kefu Chai
958b22e3ab
Merge pull request #43335 from liewegas/debug-51815
mon,auth: fix proposal (and mon db rebuild) of rotating secrets

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-10-07 06:45:45 +08:00
Neha Ojha
363b223844
Merge pull request #42964 from trociny/wip-52448
osd: re-cache peer_bytes on every peering state activate

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-10-06 09:26:16 -07:00
Sage Weil
eddfbbc421 qa/suites/rados/singleton/rebuild-mon-db: debug auth 30
Hunting https://tracker.ceph.com/issues/51815

Signed-off-by: Sage Weil <sage@newdream.net>
2021-10-01 14:42:23 -04:00
Mykola Golub
d35920da5e qa/suites/rados: add inconsistent hinfo test
Signed-off-by: Mykola Golub <mgolub@suse.com>
2021-09-28 16:43:02 +01:00
Sage Weil
0b361fc8b9 qa/packages: install ceph-volume
Signed-off-by: Sage Weil <sage@newdream.net>
2021-09-19 21:51:19 -04:00
Mykola Golub
76743e0058 qa/suites/rados: add backfill_toofull test
Signed-off-by: Mykola Golub <mgolub@suse.com>
2021-09-15 17:21:11 +03:00
Sridhar Seshasayee
7dcede75df qa: Use osd_op_queue=wpq for tests using filestore backend.
Force a subset of tests that explicitly employ the filestore backend to
use WPQ scheduler. This is because mclock scheduler will not be
optimized for filestore.

Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-09-02 18:15:54 +05:30
Mykola Golub
7311f6656f qa/suites/rados: add crushdiff test
Signed-off-by: Mykola Golub <mykola.golub@clyso.com>
2021-08-27 17:45:40 +03:00
Sebastian Wagner
e436483c77
qa/distro: Add centos_8.2_container_tools_3.0.yaml
Let's avoid latest kubic stable

Fixes: https://tracker.ceph.com/issues/52279
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-08-20 10:53:11 +02:00
Neha Ojha
119544bb29 qa/suites/rados/perf/ceph.yaml: remove rgw
This is no longer required because we removed cosbench workloads in
fd350fd015. This is also required to prevent
failures like the following or any other changes that break the rgw task:

```
2021-08-06T20:13:25.812 INFO:teuthology.orchestra.run.smithi060.stderr:curl: (7) Failed to connect to smithi060.front.sepia.ceph.com port 80: Connection refused
2021-08-06T20:15:33.813 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_04c2febe7099917d97a71271f17abb5710030132/teuthology/contextutil.py", line 31, in nested
    vars.append(enter())
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/rgw.py", line 191, in start_rgw
    wait_for_radosgw(url, remote)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/util/rgw.py", line 94, in wait_for_radosgw
    assert exit_status == 0
AssertionError
```

Signed-off-by: Neha Ojha <nojha@redhat.com>
2021-08-09 15:08:11 +00:00
Neha Ojha
c9f8846b7f
Merge pull request #41907 from kamoltat/wip-ksirivad-progress-time-interval
pybind/mgr/progress: introduce 5 second sleep interval

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-07-21 16:53:38 -07:00
Volker Theile
f7f163e75c mgr/dashboard: Add configurable MOTD or wall notification
Fixes: https://tracker.ceph.com/issues/51408

Signed-off-by: Volker Theile <vtheile@suse.com>
2021-07-14 10:48:49 +02:00
Kamoltat
5f33f2f6e0 mgr/test_progress.py: Delay recover in test_progress
Changes some the tests in teuthology to make
the test more deterministic.
Using:

`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.

took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.

Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-07-13 19:33:20 +00:00
Kefu Chai
15fa32dc86 qa: run e2e test on centos only
this change is a follow up of 02b8b0f490,
which failed to remove the random facet for distro.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-02 23:06:27 +08:00
Kefu Chai
e5c9315b11
Merge pull request #42084 from tchaikov/wip-49638
qa: run e2e test on centos only

Reviewed-by: Laura Paduano <lpaduano@suse.com>
2021-06-30 19:26:42 +08:00
Kefu Chai
812e58c597
Merge pull request #42013 from ronen-fr/wip-ronenf-scrubs-config
qa/suites/rados: add simultaneous scrubs to the thrasher

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-06-29 16:21:52 +08:00
Kefu Chai
02b8b0f490 qa: run e2e test on centos only
it's a regression introduced by the restrcuture of the test suites,
let's pin the test to CentOS8.

See-also: https://tracker.ceph.com/issues/49638
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-29 13:09:53 +08:00
Kefu Chai
29064f1bf8
Merge pull request #41937 from liewegas/mgr-crash
mgr: generate crash dumps for Python exceptions in mgr modules

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-26 22:18:14 +08:00
Sage Weil
3edc04a46b qa/suites/rados/mgr: whitelist module crash during selftest
One of the selftests triggers an exception from serve().

Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-25 13:48:45 -04:00
Ronen Friedman
d232c4e8d8 qa/suites/rados: add simultaneous scrubs (multiple options) to the thrasher
Setting osd-max-scrubs to either 2 or 3.

Triggered by https://tracker.ceph.com/issues/50346

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-06-24 18:53:50 +03:00
Sage Weil
fe9963b03c qa/suites/rados/dashboard: fix e2e test
Move roles into task yaml.  Rename e2e.

Fixes: https://tracker.ceph.com/issues/51292
Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-23 09:54:40 -05:00
Sage Weil
9074e87611 Merge PR #41827 into master
* refs/pull/41827/head:
	qa: move dashboard e2e from cephadm -> rados suite

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2021-06-14 09:11:04 -04:00
Sage Weil
ac05b3568f qa: move dashboard e2e from cephadm -> rados suite
This test fails ~20% of the time.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-12 07:52:54 -05:00
Patrick Donnelly
d6c66f3fa6
qa,pybind/mgr: allow disabling .mgr pool
This is mostly for testing: a lot of tests assume that there are no
existing pools. These tests relied on a config to turn off creating the
"device_health_metrics" pool which generally exists for any new Ceph
cluster. It would be better to make these tests tolerant of the new .mgr
pool but clearly there's a lot of these. So just convert the config to
make it work.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-11 19:35:17 -07:00
Radoslaw Zarzynski
cec7c15f19 qa: use dump_metrics as alternative of get_heap_property
"get_heap_property *" asock commands are exposed to operators
to check the tcmalloc internals for understanding the performance
of the memory subsystem. but crimson uses the builtin seastar allocator
which is not backed by tcmalloc. but we can dump the metrics using
the "dump_metrics" asock command which is only available from
crimson-osd.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-03 14:24:23 +08:00