Commit Graph

10646 Commits

Author SHA1 Message Date
Ramana Raja
b70160ac4d rbd-nbd: map using netlink interface by default
Mapping rbd images to nbd devices using ioctl interface is not
robust. It was discovered that the device size or the md5 checksum
of the nbd device was incorrect immediately after mapping using
ioctl method. When using the nbd netlink interface to map RBD images
the issue was not encountered. Switch to using nbd netlink interface
for mapping.

Fixes: https://tracker.ceph.com/issues/64063
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit fcbf7367d2)

Conflicts:
	PendingReleaseNotes [ moved to >=18.2.5 section ]
2025-03-07 21:05:56 +01:00
Yuri Weinstein
2db28f8e95
Merge pull request #55431 from adk3798/reef-mcltf-true
reef: qa/tasks/cephadm: enable mon_cluster_log_to_file

Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2025-03-05 10:43:20 -08:00
NitzanMordhai
3f5efed5ed
Merge pull request #61434 from idryomov/wip-57864-reef
reef: qa/tasks: Include stderr on tasks badness check.
2025-03-05 18:53:57 +02:00
Casey Bodley
d3510e5b41 qa/rgw: avoid 'user rm' of keystone users
partial backport of 2390788b89 did not
include a nearby change from ff81a31ad6

Fixes: https://tracker.ceph.com/issues/70152

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2025-03-04 09:43:29 -05:00
SrinivasaBharathKanta
ebe369c04e
Merge pull request #56408 from batrick/wip-65082-reef
reef: mon: do not log MON_DOWN if monitor uptime is less than threshold
2025-03-04 04:14:45 +05:30
Ilya Dryomov
5555ae2b27 qa/workunits/rbd: add a test for force promote with a user snapshot
Add a reproducer for the crash on a bad variant access which was fixed
in commit 7d75161051 ("librbd: fix a crash in get_rollback_snap_id").

The reproducer deliberately works around many other issues with force
promote in snapshot-based mirroring: stopping rbd-mirror daemon
shouldn't be necessary (let alone with SIGKILL), get_rollback_snap_id()
and its caller can_create_primary_snapshot() are flawed and can pick
the wrong snapshot to roll back to or skip rollback when it's actually
required, the user snapshot in this scenario should be removed as part
of force promoting because it's incomplete and won't be usable after
the image is promoted, etc.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0f4a37dd9f)

Conflicts:
	qa/workunits/rbd/rbd_mirror_journal.sh [ commits 3fd8a03887
	  ("qa/workunits/rbd: merge journal and snapshot test scripts")
	  and 3fdbc160bb ("rbd-mirror: allow mirroring to a different
	  namespace") not in reef ]
	qa/workunits/rbd/rbd_mirror_snapshot.sh [ duplicated/cloned for
	  snapshot-based mirroring ]
2025-02-28 20:47:57 +01:00
Patrick Donnelly
ad999dfb5e
Merge PR #57190 into reef
* refs/pull/57190/head:
	pybind/mgr/mgr_module: turn off all automatic transactions
	pybind/mgr: disable sqlite3/python autocommit
	qa/tasks/mgr: add tests for sqlite autocommit
	qa/tasks/vstart_runner: run daemons in foreground
	qa/tasks/vstart_runner: add missing poll method
	qa/suites/rados/mgr: add cli/devicehealth tasks
	qa: reorganize mgr unit tests
	qa: use position-independent link
	qa: add missing terminating newline
	pybind/mgr: add killpoint for sqlite3 database setup
	mgr: allow specifying module option level
	mon/MgrMonitor: promote standby when unsetting down flag
	mon/MgrMonitor: only drop active if exists

Reviewed-by: Laura Flores <lflores@redhat.com>
2025-02-27 19:29:26 -05:00
Yuri Weinstein
13daa36a0f
Merge pull request #61831 from idryomov/wip-69911-reef
reef: librbd: fix mirror image status summary in a namespace

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2025-02-26 07:59:50 -08:00
Patrick Donnelly
8aab45c653
qa/tasks/mgr: add tests for sqlite autocommit
That autocommit is properly turned off and that commits via context managers
work as expected.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit fb82b6d35a)
2025-02-25 11:17:15 -05:00
Patrick Donnelly
97170e78e6
qa/tasks/vstart_runner: run daemons in foreground
This mirrors teuthology and makes it possible to check the exit status of a
daemon.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit e2e2144a56)
2025-02-25 11:17:15 -05:00
Patrick Donnelly
fd8ea7ba82
qa/tasks/vstart_runner: add missing poll method
Otherwise you cannot use LocalDaemon.check_status.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 9748d0c465)
2025-02-25 11:17:15 -05:00
Patrick Donnelly
ef70d3bb19
qa/suites/rados/mgr: add cli/devicehealth tasks
These should have been part of the commit adding the tests.

Fixes: 9ebcbdbed0
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 440f25e1ec)
2025-02-25 11:17:15 -05:00
Patrick Donnelly
49c0b76725
qa: reorganize mgr unit tests
Refactor common tasks and allow loading mgrmodules before unittests start.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 2f48dc9a00)
2025-02-25 11:17:15 -05:00
Patrick Donnelly
3a2d10bcf8
qa: use position-independent link
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 1749edd668)
2025-02-25 11:17:15 -05:00
Patrick Donnelly
7d8a3b290e
qa: add missing terminating newline
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 8ac4bbc682)
2025-02-25 11:17:15 -05:00
Patrick Donnelly
6cf495fbc4
mgr: allow specifying module option level
Some are for development purposes and should be filtered out by the dashboard.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 0d94eebb0d)
2025-02-25 11:17:14 -05:00
Yuri Weinstein
3d65c524a2
Merge pull request #61405 from cbodley/wip-69183-reef
reef: Revert "rgw/auth: Fix the return code returned by AuthStrategy,"

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2025-02-24 12:13:51 -08:00
Adam King
ac33352c4f
Merge pull request #61917 from adk3798/wip-68647-reef
reef: qa/cephadm: wait a bit before checking rgw daemons upgraded w/ `ceph versions`

Reviewed-by: John Mulligan <jmulligan@redhat.com>
2025-02-24 11:47:20 -05:00
NitzanMordhai
0f835fa155
Merge pull request #61750 from NitzanMordhai/wip-69888-reef
reef: workunit/dencoder: dencoder test forward incompat fix
2025-02-24 13:58:51 +02:00
Adam King
c73f8c4ed5 qa/cephadm: wait a bit before checking rgw daemons upgraded w/ ceph versions
As this seems to take a little bit to be updated and the tests end
up failing despite the rgw daemons actually being upgraded successfully

Fixes: https://tracker.ceph.com/issues/67758

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit b9f63e1257)
2025-02-19 15:48:37 -05:00
Adam King
7791542538
Merge pull request #61718 from adk3798/wip-67463-reef
reef: qa/upgrade: fix checks to make sure upgrade is still in progress

Reviewed-by: John Mulligan <jmulligan@redhat.com>
2025-02-19 13:28:36 -05:00
Adam King
325e29ad9b
Merge pull request #61711 from adk3798/wip-66476-reef
reef: qa/suites: add "mon down" log variations to ignorelist

Reviewed-by: John Mulligan <jmulligan@redhat.com>
2025-02-19 11:46:21 -05:00
Adam King
cbb1a17054
Merge pull request #61027 from adk3798/wip-69186-reef
reef: qa/tasks/nvme_loop: update task to work with new nvme list format

Reviewed-by: Laura Flores <lflores@ibm.com>
2025-02-19 11:43:45 -05:00
Adam King
0228900530
Merge pull request #56714 from adk3798/reef-test-cephadm-correct-bootstrap-image
reef: qa/cephadm: use reef image as default for test_cephadm workunit

Reviewed-by: John Mulligan <jmulligan@redhat.com>
2025-02-19 11:40:16 -05:00
Ilya Dryomov
37f1145287 qa/workunits/rbd: use create_image_and_enable_mirror() in bootstrap tests
The reason create_image() + enable_mirror() happens to work for
PARENT_POOL is that PARENT_POOL is enabled for mirroring in image mode
unconditionally, unlike POOL, POOL/NS1 or PARENT_POOL/NS1 for which
MIRROR_POOL_MODE setting is respected.  This isn't immediately obvious
because it's done in setup_pools() in rbd_mirror_helpers.sh.

Switch to create_image_and_enable_mirror() for clarity.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 44804a374a)
2025-02-14 15:11:34 +01:00
Ilya Dryomov
490956786c librbd: fix mirror image status summary in a namespace
For the purposes of the summary with image counts, "rbd mirror pool
status" command is supposed to count each image only once.  To this
end, for unidirectional mirroring the status of the receiving site
should be taken while for bidirectional mirroring the statuses should
be combined/reduced.  For example, if mirroring is enabled on a single
image and everything is in order, the summary is expected to be

  image health: OK
  images: 1 total
      1 replaying

on both clusters even though on the primary the local status is
MIRROR_IMAGE_STATUS_STATE_STOPPED and only on the secondary it's
MIRROR_IMAGE_STATUS_STATE_REPLAYING.

Currently this isn't the case for custom namespaces.  In the same
scenario the primary ends up reporting

  image health: OK
  images: 1 total
      1 stopped

based solely on the local status in a namespace.

Fixes: https://tracker.ceph.com/issues/69911
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f5eadfff80)

Conflicts:
	qa/workunits/rbd/rbd_mirror_bootstrap.sh [ commits 3fd8a03887
	  ("qa/workunits/rbd: merge journal and snapshot test scripts")
	  and 3fdbc160bb ("rbd-mirror: allow mirroring to a different
	  namespace") not in reef ]
2025-02-14 15:09:03 +01:00
Hemanth
dfde102975
Merge pull request #59375 from rishabh-d-dave/mds-fs-fail-reef
reef: qa/cephfs: use different config options to generate MDS_TRIM
2025-02-13 19:47:21 +05:30
Nitzan Mordechai
a3420e3382 workunit/dencoder: fix corpus test for backword and forward compability
- changed the check for non-deterministic, return code 1 is also legit
- unneeded check for is_dir, if it exist
- limit the number of threads to prevent error

Fixes: https://tracker.ceph.com/issues/67263
Signed-off-by: NitzanMordhai <nmordech@redhat.com>
(cherry picked from commit 30921272dd)
2025-02-11 08:48:31 +00:00
nmordech@redhat.com
b1ec68cd1c suites: adding dencoder test multi versions
We are currently conducting regular ceph-dencoder tests for backward compatibility.
However, we are omitting tests for forward compatibility.
This suite will introduce tests against the ceph-objects-corpus to address forward
compatibility issues that may arise.
the script will install N-2 version and run against the latest version corpus objects
that we have, then install N-1 to N version and check them as well.

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
(cherry picked from commit 3f26a965f6)
2025-02-11 08:48:23 +00:00
Patrick Donnelly
03ed227588
qa: extend mon timeout coming up after mondb creation
Fixes: https://tracker.ceph.com/issues/64968
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 57b9e161f4)
2025-02-10 15:43:13 -05:00
Patrick Donnelly
ed98069731
qa: update dashboard schema for mon_status
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 23de8e318f)
2025-02-10 15:43:13 -05:00
SrinivasaBharathKanta
a618b6e4df
Merge pull request #60004 from kamoltat/wip-68281-reef
reef: src/mon/ConnectionTracker.cc: Fix dump function
2025-02-10 16:09:01 +05:30
NitzanMordhai
54fbe86fd3
Merge pull request #59193 from NitzanMordhai/wip-67502-reef
reef: qa/tasks: watchdog should terminate thrasher
2025-02-09 10:07:28 +02:00
Ilya Dryomov
35316b126f
Merge pull request #61602 from idryomov/wip-69679-reef
reef: mon/OSDMonitor: relax cap enforcement for unmanaged snapshots

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2025-02-07 22:47:17 +01:00
Adam King
347f64545b qa/upgrade: use staggered upgrade features for reef-x/stress-split
This test was trying to partially upgrade the mons and OSDs by
kicking off an upgrade and then checking every 2 seconds if
enough had been upgraded. Since staggered upgrade parameters
were present in the initial reef release (not true for quincy)
it makes sense to use them instead in order to do this in a
more controlled manner.

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit f1ca0c79de)
2025-02-07 16:21:23 -05:00
Adam King
eb2dfacee2 qa/upgrade: fix checks to make sure upgrade is still in progress
Without checking both for the upgrade being in progress and that
the status isn't reporting an error, we can end up in a scenario
where the test is just waiting for an upgrade that has already
been marked failed and will never complete. This same sort of
change was already done in the orch suite upgrade tests and
has helped with jobs timing out there

Fixes: https://tracker.ceph.com/issues/65546

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 61a48c5ced)
2025-02-07 16:21:23 -05:00
Laura Flores
59bbf5639e qa/suites: add "mon down" log variations to ignorelist
Fixes: https://tracker.ceph.com/issues/64864
Signed-off-by: Laura Flores <lflores@ibm.com>
(cherry picked from commit d475ac3e6a)

Conflicts:
	qa/suites/orch/cephadm/smoke/start.yaml
	qa/suites/orch/cephadm/workunits/task/test_host_drain.yaml
	qa/suites/orch/cephadm/workunits/task/test_monitoring_stack_basic.yaml
	qa/suites/orch/cephadm/workunits/task/test_rgw_multisite.yaml
	qa/suites/orch/cephadm/workunits/task/test_set_mon_crush_locations.yaml
	qa/tasks/thrashosds-health.yaml
2025-02-07 15:36:42 -05:00
Shilpa Jagannath
4842fb05b5
Merge pull request #61367 from cbodley/wip-67268-reef
reef: rgw/rgw_rados: fix server side-copy orphans tail-objects
2025-02-06 08:41:59 -08:00
Casey Bodley
85ad7ae8ef qa/rgw: fix user cleanup in s3tests task
the if condition was backwards, preventing non-keystone users from being
removed after the s3tests task runs

Fixes: https://tracker.ceph.com/issues/69741

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 4a0ca73f53)
2025-02-04 11:21:16 -05:00
Milind Changire
ba280dff90
Merge pull request #59923 from mchangir/wip-68076-reef
reef: qa: relocate subvol creation overrides and test
2025-02-04 15:19:01 +05:30
Milind Changire
26fb879870
Merge pull request #60689 from vshankar/wip-68110-reef
reef: mds: batch backtrace updates by pool-id when expiring a log segment
2025-02-04 14:55:13 +05:30
Milind Changire
ec60a4c43b
Merge pull request #60390 from rishabh-d-dave/wip-68616-reef
reef: qa/cephfs: ignore when specific OSD is reported down during upgrade
2025-02-04 14:54:10 +05:30
Milind Changire
88ff2c0f56
Merge pull request #60188 from kotreshhr/wip-68413-reef
reef: mgr/status: Fix 'fs status' json output
2025-02-04 14:50:26 +05:30
Patrick Donnelly
47193eb9b1
qa: ignore warnings variations
Fixes: https://tracker.ceph.com/issues/67601
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 782c88aa96)

Conflicts:
	qa/cephfs/overrides/ignorelist_health.yaml: trivial
2025-02-04 10:05:56 +05:30
Milind Changire
5b65447d07
qa: relocate subvol creation overrides and test
Fixes: https://tracker.ceph.com/issues/65829
Signed-off-by: Milind Changire <mchangir@redhat.com>
(cherry picked from commit ef68253a87)
2025-02-03 16:40:35 +05:30
Nitzan Mordechai
3a7b0de9ec thrashers: standardize stop and join method names
Thrashers that do not inherit from ThrasherGreenlet previously used a
method called do_join, which combined stop and join functionality. To
ensure consistency and clarity, we want all thrashers to use separate
stop, join, and stop_and_join methods.

This commit renames methods and implements missing stop and stop_and_join
methods in thrashers that did not inherit from ThrasherGreenlet.

Fixes: https://tracker.ceph.com/issues/66698
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
(cherry picked from commit a035b5a22f)
2025-02-03 10:34:08 +00:00
NitzanMordhai
cdd5126235
Merge pull request #61481 from NitzanMordhai/wip-69622-reef
reef: test: ceph daemon command with asok path
2025-02-03 12:25:50 +02:00
Ilya Dryomov
a8857ef872 mon/OSDMonitor: relax cap enforcement for unmanaged snapshots
Since commit 4972e054b3 ("mon/OSDMonitor: enforce caps when
creating/deleting unmanaged snapshots"), a) write access to the MON
service, b) write access to the OSD service for a pool or c) permission
for "osd pool op unmanaged-snap" command for a pool is required.  For
"profile rbd" we configure read-only access to the MON service and rely
on write access to the OSD service, however the corresponding check in
is_osd_writable() is too strict.

A OSD cap like "profile rbd namespace=myns" or "allow w namespace=myns"
allows write access to myns namespace of any pool, but is_osd_writable()
disallows operations with unmanaged snapshots with such a cap because
its match.pool_namespace.pool_name.empty() is true.  This condition
appears to serve as the "doesn't include support for the application
tag" guard, but it should actually be match.pool_tag.is_match_all()
(or match.pool_tag.application.empty() if open-coded) -- no restriction
on the pool name doesn't automatically mean that there is a restriction
on the application tag.

Fixes: https://tracker.ceph.com/issues/69679
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 5f3815e800)
2025-01-31 10:15:04 +01:00
Yuri Weinstein
ba91b0bb84
Merge pull request #57229 from galsalomon66/wip-65245-reef
reef: rgw/s3select: s3select response handler refactor

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2025-01-29 15:24:10 -08:00
Milind Changire
a66db03d22
Merge pull request #59705 from vshankar/wip-67375-reef
reef: mon: fix `fs set down` to adjust max_mds only when cluster is not down
2025-01-29 18:01:16 +05:30