Mapping rbd images to nbd devices using ioctl interface is not
robust. It was discovered that the device size or the md5 checksum
of the nbd device was incorrect immediately after mapping using
ioctl method. When using the nbd netlink interface to map RBD images
the issue was not encountered. Switch to using nbd netlink interface
for mapping.
Fixes: https://tracker.ceph.com/issues/64063
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit fcbf7367d2)
Conflicts:
PendingReleaseNotes [ moved to >=18.2.5 section ]
Add a reproducer for the crash on a bad variant access which was fixed
in commit 7d75161051 ("librbd: fix a crash in get_rollback_snap_id").
The reproducer deliberately works around many other issues with force
promote in snapshot-based mirroring: stopping rbd-mirror daemon
shouldn't be necessary (let alone with SIGKILL), get_rollback_snap_id()
and its caller can_create_primary_snapshot() are flawed and can pick
the wrong snapshot to roll back to or skip rollback when it's actually
required, the user snapshot in this scenario should be removed as part
of force promoting because it's incomplete and won't be usable after
the image is promoted, etc.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0f4a37dd9f)
Conflicts:
qa/workunits/rbd/rbd_mirror_journal.sh [ commits 3fd8a03887
("qa/workunits/rbd: merge journal and snapshot test scripts")
and 3fdbc160bb ("rbd-mirror: allow mirroring to a different
namespace") not in reef ]
qa/workunits/rbd/rbd_mirror_snapshot.sh [ duplicated/cloned for
snapshot-based mirroring ]
* refs/pull/57190/head:
pybind/mgr/mgr_module: turn off all automatic transactions
pybind/mgr: disable sqlite3/python autocommit
qa/tasks/mgr: add tests for sqlite autocommit
qa/tasks/vstart_runner: run daemons in foreground
qa/tasks/vstart_runner: add missing poll method
qa/suites/rados/mgr: add cli/devicehealth tasks
qa: reorganize mgr unit tests
qa: use position-independent link
qa: add missing terminating newline
pybind/mgr: add killpoint for sqlite3 database setup
mgr: allow specifying module option level
mon/MgrMonitor: promote standby when unsetting down flag
mon/MgrMonitor: only drop active if exists
Reviewed-by: Laura Flores <lflores@redhat.com>
That autocommit is properly turned off and that commits via context managers
work as expected.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit fb82b6d35a)
This mirrors teuthology and makes it possible to check the exit status of a
daemon.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit e2e2144a56)
These should have been part of the commit adding the tests.
Fixes: 9ebcbdbed0
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 440f25e1ec)
Refactor common tasks and allow loading mgrmodules before unittests start.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 2f48dc9a00)
Some are for development purposes and should be filtered out by the dashboard.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 0d94eebb0d)
As this seems to take a little bit to be updated and the tests end
up failing despite the rgw daemons actually being upgraded successfully
Fixes: https://tracker.ceph.com/issues/67758
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit b9f63e1257)
The reason create_image() + enable_mirror() happens to work for
PARENT_POOL is that PARENT_POOL is enabled for mirroring in image mode
unconditionally, unlike POOL, POOL/NS1 or PARENT_POOL/NS1 for which
MIRROR_POOL_MODE setting is respected. This isn't immediately obvious
because it's done in setup_pools() in rbd_mirror_helpers.sh.
Switch to create_image_and_enable_mirror() for clarity.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 44804a374a)
For the purposes of the summary with image counts, "rbd mirror pool
status" command is supposed to count each image only once. To this
end, for unidirectional mirroring the status of the receiving site
should be taken while for bidirectional mirroring the statuses should
be combined/reduced. For example, if mirroring is enabled on a single
image and everything is in order, the summary is expected to be
image health: OK
images: 1 total
1 replaying
on both clusters even though on the primary the local status is
MIRROR_IMAGE_STATUS_STATE_STOPPED and only on the secondary it's
MIRROR_IMAGE_STATUS_STATE_REPLAYING.
Currently this isn't the case for custom namespaces. In the same
scenario the primary ends up reporting
image health: OK
images: 1 total
1 stopped
based solely on the local status in a namespace.
Fixes: https://tracker.ceph.com/issues/69911
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f5eadfff80)
Conflicts:
qa/workunits/rbd/rbd_mirror_bootstrap.sh [ commits 3fd8a03887
("qa/workunits/rbd: merge journal and snapshot test scripts")
and 3fdbc160bb ("rbd-mirror: allow mirroring to a different
namespace") not in reef ]
- changed the check for non-deterministic, return code 1 is also legit
- unneeded check for is_dir, if it exist
- limit the number of threads to prevent error
Fixes: https://tracker.ceph.com/issues/67263
Signed-off-by: NitzanMordhai <nmordech@redhat.com>
(cherry picked from commit 30921272dd)
We are currently conducting regular ceph-dencoder tests for backward compatibility.
However, we are omitting tests for forward compatibility.
This suite will introduce tests against the ceph-objects-corpus to address forward
compatibility issues that may arise.
the script will install N-2 version and run against the latest version corpus objects
that we have, then install N-1 to N version and check them as well.
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
(cherry picked from commit 3f26a965f6)
This test was trying to partially upgrade the mons and OSDs by
kicking off an upgrade and then checking every 2 seconds if
enough had been upgraded. Since staggered upgrade parameters
were present in the initial reef release (not true for quincy)
it makes sense to use them instead in order to do this in a
more controlled manner.
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit f1ca0c79de)
Without checking both for the upgrade being in progress and that
the status isn't reporting an error, we can end up in a scenario
where the test is just waiting for an upgrade that has already
been marked failed and will never complete. This same sort of
change was already done in the orch suite upgrade tests and
has helped with jobs timing out there
Fixes: https://tracker.ceph.com/issues/65546
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 61a48c5ced)
the if condition was backwards, preventing non-keystone users from being
removed after the s3tests task runs
Fixes: https://tracker.ceph.com/issues/69741
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 4a0ca73f53)
Thrashers that do not inherit from ThrasherGreenlet previously used a
method called do_join, which combined stop and join functionality. To
ensure consistency and clarity, we want all thrashers to use separate
stop, join, and stop_and_join methods.
This commit renames methods and implements missing stop and stop_and_join
methods in thrashers that did not inherit from ThrasherGreenlet.
Fixes: https://tracker.ceph.com/issues/66698
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
(cherry picked from commit a035b5a22f)
Since commit 4972e054b3 ("mon/OSDMonitor: enforce caps when
creating/deleting unmanaged snapshots"), a) write access to the MON
service, b) write access to the OSD service for a pool or c) permission
for "osd pool op unmanaged-snap" command for a pool is required. For
"profile rbd" we configure read-only access to the MON service and rely
on write access to the OSD service, however the corresponding check in
is_osd_writable() is too strict.
A OSD cap like "profile rbd namespace=myns" or "allow w namespace=myns"
allows write access to myns namespace of any pool, but is_osd_writable()
disallows operations with unmanaged snapshots with such a cap because
its match.pool_namespace.pool_name.empty() is true. This condition
appears to serve as the "doesn't include support for the application
tag" guard, but it should actually be match.pool_tag.is_match_all()
(or match.pool_tag.application.empty() if open-coded) -- no restriction
on the pool name doesn't automatically mean that there is a restriction
on the application tag.
Fixes: https://tracker.ceph.com/issues/69679
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 5f3815e800)