Commit Graph

11292 Commits

Author SHA1 Message Date
Casey Bodley
4ee8e591f3
Merge pull request #56597 from liangmingyuanneo/optimize-reshard
rgw reshard: optimize reshard process to minimum blocking time

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2024-09-05 13:21:47 -04:00
Laura Flores
c2456be1ff
Merge pull request #59474 from athanatos/sjust/for-review/wip-67755-fix-msr-feature
osd: fix require_min_compat_client handling for msr rules
2024-09-04 20:03:28 -05:00
Nizamudeen A
93ba7b05d0
Merge pull request #59530 from rhcs-dashboard/api_test_mgr_module_failure
qa/tests: fix test_list_enabled_modules timeout error

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2024-09-04 19:27:52 +05:30
Ilya Dryomov
f7168600a8
Merge pull request #59551 from idryomov/wip-67845
librbd/migration: prune snapshot extents in RawFormat::list_snaps()

Reviewed-by: Ramana Raja <rraja@redhat.com>
2024-09-04 13:03:04 +02:00
Nizamudeen A
b2da7394ee qa/tests: fix test_list_enabled_modules timeout error
This test deals with enabling/disabling the modules. The assumption I
have is after enabling the
module test will wait for an active mgr but its not able to find it in
time and it fails. so taking inspiration from 6c7253be6f adding retries and logs to see if that's the case

Fixes: https://tracker.ceph.com/issues/62972
Signed-off-by: Nizamudeen A <nia@redhat.com>
2024-09-04 14:51:07 +05:30
liangmingyuan
196a73cbd4 cls/rgw: add a helper function for calls to cls_cxx_map_remove_key()
Add some testing cases and do cleanup too.

Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
2024-09-04 09:49:18 +08:00
Casey Bodley
a10155d43e
Merge pull request #59535 from cbodley/wip-qa-rgw-multisite-account-zone
qa/rgw/multisite: specify realm/zonegroup/zone args for 'account create'

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2024-09-03 20:43:34 -04:00
Vallari Agrawal
53cc78b222
Merge pull request #59178 from VallariAg/wip-nvmeof-teuthology-v6
qa: add namespace and scale testing for nvmeof teuthology suite
2024-09-03 11:29:01 +05:30
Vallari Agrawal
da8e95c392
qa/suites/nvmeof: wait for service "nvmeof.mypool.mygroup0"
This is because nvmeof gateway group names are now
part of service id.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2024-09-02 19:42:34 +05:30
Vallari Agrawal
4d97b1aa6b
qa/suites/nvmeof: increase hosts in cluster setup
In "nvmeof" task, change "client" config to "installer"
which allows to take inputs like "host.a".

nvmeof/basic: change 2-gateway-2-initiator to
	       4-gateway-2-inititator cluster
nvmeof/thrash: change 3-gateway-1-initiator to
	        4-gateway-1-inititaor cluster

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2024-09-02 19:42:12 +05:30
Vallari Agrawal
2ed818ebd8
qa: move nvmeof shell scripts to qa/workunits/nvmeof
Move all scripts qa/workunits/rbd/nvmeof_*.sh
to qa/workunits/nvmeof/*.sh

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2024-09-02 17:04:55 +05:30
Venky Shankar
f070510eb3
Merge pull request #58543 from rishabh-d-dave/tracker-65808
cephfs: disallow removing root_squash via "fs authorize" cmd

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-09-02 15:13:34 +05:30
Ilya Dryomov
d9192b5aca librbd/migration: prune snapshot extents in RawFormat::list_snaps()
list-snaps is exempt from clipping in ImageDispatcher::PreprocessVisitor
because it's considered to be an internal API.  Further, reads issued
by ObjectCopyRequest based on list-snaps results may also be exempt
because of READ_FLAG_DISABLE_CLIPPING.

Since RawFormat allows specifying a set of snapshots (possibly of
varying size!) to be imported, it needs to compensate for that in its
list-snaps implementation.  Otherwise, an out-of-bounds read will
eventually be submitted to the stream.

Fixes: https://tracker.ceph.com/issues/67845
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-09-01 19:37:30 +02:00
Casey Bodley
15df7efca5 qa/rgw/multisite: add optional --default arg to 'realm pull'
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2024-08-30 13:52:01 -04:00
Casey Bodley
7bbaa31664 qa/rgw/multisite: fix spelling of is_default in realm configs
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2024-08-30 11:32:38 -04:00
Casey Bodley
e4157c8e98 qa/rgw/multisite: specify realm/zonegroup/zone args for 'account create'
in the rgw/multisite suite, jobs fail on user creation:

> radosgw-admin --cluster c1 account create --account-id RGW11111111111111111
> radosgw-admin --cluster c1 user create --uid rgw-multisite-test-user --account-id RGW11111111111111111 --account-root --rgw-zone test-zone1 --rgw-zonegroup test-zonegroup --rgw-realm test-realm --display-name TestUser --gen-access-key --gen-secret
> could not create user: unable to create user, Failed to load account by id

realms/two-zones.yaml misspells `is_default` as `is default` for the
realm, so it doesn't get set as default. the `account create` command
doesn't specify a realm/zonegroup/zone, so operates on the "default"
zone and zonegroup

use `zone_args()` to add the explicit realm/zonegroup/zone arguments

Fixes: https://tracker.ceph.com/issues/67839

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2024-08-30 11:32:29 -04:00
Venky Shankar
1650722139 Merge PR #59309 into main
* refs/pull/59309/head:
	qa: ignore warnings variations

Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-08-30 17:53:31 +05:30
Venky Shankar
52deba6b14 Merge PR #58547 into main
* refs/pull/58547/head:
	qa: failfast mount for better performance

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2024-08-30 11:03:26 +05:30
Samuel Just
4f9289e11a qa/tasks/ceph_manager: set-require-min-compat-client to squid for msr profiles
Signed-off-by: Samuel Just <sjust@redhat.com>
2024-08-30 00:34:46 +00:00
Patrick Donnelly
0a05dacc07
Merge PR #59310 into main
* refs/pull/59310/head:
	qa: load all dirfrags before testing altname recovery

Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
2024-08-29 13:39:56 -04:00
Adam King
f597caacea
Merge pull request #59419 from phlogistonjohn/jjm-smb-ctdb-vips
smb: cluster public ip addresses support

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2024-08-28 08:45:43 -04:00
Vallari Agrawal
e5a9cda326
qa/suites/nvmeof/basic: add nvmeof_scalability test
Add test to upscale/downscale nvmeof
gateways.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2024-08-28 18:12:29 +05:30
Vallari Agrawal
58d8be9fd8
qa: Expand nvmeof thrasher and add nvmeof_namespaces.yaml job
1. qa/tasks/nvmeof.py: add other methods to stop nvmeof daemons
2. add qa/workunits/rbd/nvmeof_namespace_test.sh which adds and
   deletes new namespaces. It is run in nvmeof_namespaces.yaml
   job where fio happens to other namespaces in background.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2024-08-28 18:12:28 +05:30
Vallari Agrawal
02fe44ac60
Merge pull request #59434 from VallariAg/fix-nvmeof-apply-teuthology
qa/tasks/nvmeof.py: add nvmeof gw-group to deployment
2024-08-28 18:07:35 +05:30
John Mulligan
dc09d17eca qa/suites/orch: add test for smb with ctdb and cluster public ips
Signed-off-by: John Mulligan <jmulligan@redhat.com>
2024-08-27 17:12:56 -04:00
Patrick Donnelly
782c88aa96
qa: ignore warnings variations
Fixes: https://tracker.ceph.com/issues/67601
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-08-27 16:51:52 -04:00
Adam King
601fcfa918
Merge pull request #58380 from adk3798/squid-base-mds-upgrade-sequence-cephadm
qa/suites/fs: pull compiled cephadm for squid branch in mds_upgrade_sequence

Reviewed-by: John Mulligan <jmulligan@redhat.com>
2024-08-27 13:33:15 -04:00
Adam King
639916859f
Merge pull request #59421 from phlogistonjohn/jjm-teuth-cephadm-from-ctr
qa/tasks: add a new cephadm_from_container feature to cephadm task

Reviewed-by: Adam King <adking@redhat.com>
2024-08-27 13:32:43 -04:00
Patrick Donnelly
64e2bd347b
Merge PR #58419 into main
* refs/pull/58419/head:
	mds: generate correct path for unlinked snapped files
	qa: add test for cephx path check on unlinked snapped dir tree
	mds: add debugging for stray_prior_path

Reviewed-by: Milind Changire <mchangir@redhat.com>
2024-08-27 13:10:54 -04:00
Patrick Donnelly
925c1f9fb1
Merge PR #58987 into main
* refs/pull/58987/head:
	qa/cephfs: update ignorelist

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2024-08-27 13:10:10 -04:00
Patrick Donnelly
305235f11e
Merge PR #59095 into main
* refs/pull/59095/head:
	qa: wait for file creation before changing mode

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2024-08-27 13:09:11 -04:00
Adam King
418b53c1a3
Merge pull request #59409 from adk3798/teuth-reinstall-nvme-cli
qa/distros: reinstall nvme-cli on centos 9 nodes

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-08-27 08:48:26 -04:00
Matan Breizman
b034233666
Merge pull request #57952 from NitzanMordhai/wip-nitzan-bench-osd-admin-command
crimson: Add support for bench osd command

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
2024-08-27 13:03:02 +03:00
Ilya Dryomov
2b4a221c57
Merge pull request #59433 from idryomov/wip-drop-xmlstarlet-variable
qa: drop XMLSTARLET variable, use xmlstarlet directly

Reviewed-by: Ramana Raja <rraja@redhat.com>
2024-08-27 08:53:38 +02:00
Venky Shankar
409001969e
Merge pull request #54620 from rishabh-d-dave/mgr-vol-clone-stats
mgr/vol: show progress and stats for the subvolume snapshot clones

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2024-08-26 15:44:53 +05:30
Vallari Agrawal
c9a6fedbfa
qa/tasks/nvmeof.py: add nvmeof gw-group to deployment
Groups was made a required parameter to be
`ceph orch apply nvmeof <pool> <group>` in
https://github.com/ceph/ceph/pull/58860.
That broke the `nvmeof` suite so this PR fixes that.

Right now, all gateway are deployed in a single group.
Later, this would be changed to have multi groups for a better test.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2024-08-26 15:15:10 +05:30
Nitzan Mordechai
ee84f8970a crimson: Add support for bench osd command
this commit adds support for the 'bench' admin command in the OSD,
allowing administrators to perform benchmark tests on the OSD. The
'bench' command accepts 4 optional parameters with the following
default values:

1. count - Total number of bytes to write (default: 1GB).
2. size - Block size for each write operation (default: 4MB).
3. object_size - Size of each object to write (default: 0).
4. object_num - Number of objects to write (default: 0).

The results of the benchmark are returned in a JSON formatted output,
which includes the following fields:

1. bytes_written - Total number of bytes written during the benchmark.
2. blocksize - Block size used for each write operation.
3. elapsed_sec - Total time taken to complete the benchmark in seconds.
4. bytes_per_sec - Write throughput in bytes per second.
5. iops - Number of input/output operations per second.

Example JSON output:

```json
{
  "osd_bench_results": {
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 0.5,
    "bytes_per_sec": 2147483648,
    "iops": 512
  }
}

Fixes: https://tracker.ceph.com/issues/66380
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2024-08-26 05:43:17 +00:00
Ronen Friedman
d3a1626108
Merge pull request #58858 from ronen-fr/wip-rf-entry
osd/scrub: a scrub queue of level-specific entries

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>
2024-08-25 19:44:03 +03:00
Ronen Friedman
dffbdf45ae test/osd/scrub: fix searched-for log string
To match the modified log message in
OsdScrub::restrictions_on_scrubbing().

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2024-08-25 08:01:00 -05:00
Ronen Friedman
503ebee8f9 test/osd/scrub: disable tests for deleted scrub functionality
The scrub scheduler no longer "upgrades" shallow scrubs into
deep ones on error, so the tests that check this functionality
are no longer valid.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2024-08-25 08:01:00 -05:00
Ronen Friedman
51a593e7e2 osd/scrub: fix the conditions for auto-repair scrubs
The conditions for auto-repair scrubs should have been changed
when need_auto lost some of its setters.

Also fix the rescheduling of repair scrubs
when the last scrub ended with errors.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2024-08-25 08:01:00 -05:00
Ronen Friedman
709302478e qa/standalone/scrub: disable scrub_extended_sleep test
Disabling osd-scrub-test.sh::TEST_scrub_extended_sleep,
as the test is no longer valid (updated code no longer
produces the same logs or the same behavior).

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2024-08-25 08:01:00 -05:00
Ilya Dryomov
4f309603ca qa: drop XMLSTARLET variable, use xmlstarlet directly
The variable was added in commit 9b6b7c35d0 ("Handle
differently-named xmlstarlet binary for *suse") but this
compatibility business is long outdated:

  Mon Oct 13 08:52:37 UTC 2014 - toms@opensuse.org

  - SPEC file changes
    - Added link from /usr/bin/xml to /usr/bin/xmlstarlet as other
      distributions do the same
    - Did the same for the manpage

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-08-25 13:22:08 +02:00
John Mulligan
0baf2e4f19 qa/tasks: add a new cephadm_from_container feature to cephadm task
The cephadm_from_container allows one to do a single container build
and then point teuthology at that image as the "single source of truth".
I find this extremely convenient when running teuthology locally and
I keep carrying this patch around - I figure having it upstream will
simplify my workflow. Maybe someday it'll benefit others too.

To use it I set up a yaml overrides file with the following content:
```yaml
overrides:
  cephadm:
    image: "quay.io/phlogistonjohn/ceph:dev"
    cephadm_from_container: true
  verify_ceph_hash: false
verify_ceph_hash: false
```

This let's me test my custom builds fairly easily!

Signed-off-by: John Mulligan <phlogistonjohn@asynchrono.us>
2024-08-23 14:35:55 -04:00
Venky Shankar
db4959e44f Merge PR #58487 into main
* refs/pull/58487/head:
	qa/suites/fs/workload: drop mgrmodules stanza
	qa/tasks/ceph: fix "ceph mgr module enable" command

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
2024-08-23 22:02:34 +05:30
Adam King
bc103d8702
Merge pull request #59086 from phlogistonjohn/jjm-smb-ctdb-clustering
smb: ctdb clustering

Reviewed-by: Adam King <adking@redhat.com>
2024-08-23 09:06:33 -04:00
Milind Changire
daf4798086
qa: failfast mount for better performance
During teuthology tests, the tearing down of the cluster between two
tests causes the config to be reset and a config_notify generated. This
leads to a race to create a new mount using the old fscid. But by the
time the mount is attempted the new fs gets created with a new fscid.
This situation leads to the client mount waiting for a connection
completion notification from the mds for 5 minutes (default timeout)
and eventually giving up.
However, the default teuthology command timeout is 2 minutes. So,
teuthology fails the command and declares the job as failed way before
the mount can timeout.

The resolution to this case is to lower the client mount timeout to 30
seconds so that the config_notify fails fast paving the way for
successive commands to get executed with the new fs.

An unhandled cluster warning about an unresponsive client also gets
emitted later during qa job termination which leads to teuthology
declaring the job as failed. As of now this warning seems harmless since
it is emitted during cluster cleanup phase.
So, this warning is added to the log-ignorelist section in the
snap-schedule YAML.

Fixes: https://tracker.ceph.com/issues/66009
Signed-off-by: Milind Changire <mchangir@redhat.com>
2024-08-23 15:06:13 +05:30
Ilya Dryomov
9ac05d9030
Merge pull request #44470 from orozery/rbd-external-migrate
librbd/migration: add external clusters support

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
2024-08-23 10:20:47 +02:00
Adam King
4e5f269c01 qa/distros: reinstall nvme-cli on centos 9 nodes
To work around a potential linking issue between
nvme-cli ad libnvme that prevents nvme-cli from
correctly generating a hostnqn, causing

nvme_fabrics: found same hostid edb4e426-766f-44c6-b127-da2a5b7446ef but different hostnqn hostnqn

messages in dmesg and the inability to setup nvme
loop devices

Fixes: https://tracker.ceph.com/issues/67684

Signed-off-by: Adam King <adking@redhat.com>
2024-08-22 15:30:44 -04:00
Ilya Dryomov
4a6800f146 qa/workunits/rbd: exercise snap_{name,id} parsing in test_import_native_format()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-08-22 12:30:38 +02:00