Commit Graph

664 Commits

Author SHA1 Message Date
Ilya Dryomov
d1d848276f qa/workunits/rbd: wait for replaying status in bootstrap tests
wait_for_replay_complete() doesn't wait for image status to get
updated.  This didn't matter previously because these tests are run on
two different pools and nothing else was following.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-05-06 11:47:52 +02:00
Ilya Dryomov
b7e79642d5 rbd-mirror: remove callout when destroying pool replayer
If a pool replayer is removed in an error state (e.g. after failing to
connect to the remote cluster), its callout should be removed as well.
Otherwise, the error would persist causing "daemon health: ERROR"
status to be reported even after a new pool replayer is created and
started successfully.

Fixes: https://tracker.ceph.com/issues/65487
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-05-05 21:11:54 +02:00
Ilya Dryomov
c870ead3d4
Merge pull request #55595 from VallariAg/wip-nvmeof-test-v3
qa/suite/rbd/nvmeof: Deploy multiple gateways and namespaces

Reviewed-by: Barak Davidov <barakda@il.ibm.com>
Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2024-03-20 10:49:36 +01:00
Vallari Agrawal
00651cfac2
qa/suite/rbd/nvmeof: Deploy multiple gateways and namespaces
1. Deploy 2 gateways on different nodes, then check for multi-path.
    To add another gateway, only "roles" need to be changed in job yaml.
2. Create "n" nvmeof namespaces, configured by 'namespaces_count'
3. Rename qa/suites/rbd/nvmeof/cluster/fixed-3.yaml to fixed-4.yaml
    which contains 2 gateways and 2 initiators.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2024-03-19 20:48:26 +05:30
Ilya Dryomov
166a236237 qa/workunits/rbd: switch rbd-mirror workunits to bash
By making use of here strings in commit ea3a567f7f ("qa/workunits:
make wait_for_status_in_pool_dir() reentrant") we grew a dependency on
bash.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-03-10 18:19:57 +01:00
Ilya Dryomov
fa5ef874ac
Merge pull request #54802 from ajarr/wip-61617
qa: Add tests to validate synced images on rbd-mirror

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2024-02-23 23:47:42 +01:00
Ramana Raja
b7aae5c3c5 qa: Add tests to validate syncing of images using rbd-mirror
Introduce functional tests to validate that the images under
workloads are correctly mirrored between two clusters using snapshot
based mirroring.

Run workload on a primary image using a krbd or nbd client. Take
mirror snapshots of the image under workload. Unmount the mapped image
and calculate its MD5 checksum before demoting it. After demotion,
wait for the mirror status of the image to be 'up+unknown' in both
the clusters. This is to make sure that the non-primary image in the
other cluster is ready to be promoted. Now promote the non-primary
image in the other cluster. Map the promoted image and calculate its
MD5 checksum. Verify that the checksums of the demoted and promoted
images in the two clusters are the same.

The above test is run as part of two different workunits:
 - a workunit that validates the syncing of multiple mirrored images
   with workloads running on them
 - another workunit that validates the syncing of a single mirrored
   image with workload running on it and the image is set as primary
   alternatively between the two clusters, as it happens during
   failover and failback scenarios.

Fixes: https://tracker.ceph.com/issues/61617
Signed-off-by: Ramana Raja <rraja@redhat.com>
Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Christopher Hoffman <choffman@redhat.com>
2024-02-22 11:44:36 -05:00
Ramana Raja
ea3a567f7f qa/workunits: make wait_for_status_in_pool_dir() reentrant
In rbd_mirror_helpers.sh, the `wait_for_status_in_pool_dir()` helper
stored `mirror image status` and `mirror pool status` command outputs
in files that could be shared over successive calls or calls from
multiple threads. Instead store the command outputs in local variables
to make `wait_for_status_in_pool_dir()` reentrant.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2024-02-22 11:44:28 -05:00
Mykola Golub
5442f7eb21 tools/rbd: make 'children' command support --image-id
Fixes: https://tracker.ceph.com/issues/64376
Signed-off-by: Mykola Golub <mykola.golub@clyso.com>
2024-02-13 15:50:32 +00:00
Vallari Agrawal
1713c4852c
qa: add qa/tasks/nvmeof.py and rbd/nvmeof_basic_task and fio workunits
This is v2 of the rbd/nvmeof test: It deploys 1 gateway and 1 initiator.
Then does basic verification on nvme commands and runs fio.

This commit creates:
1. qa/tasks/nvmeof.py: adds a new 'Nvmeof' task which deploys
    the gateway and shares config with the initiator hosts.
    Sharing config was previously done by 'nvmeof_gateway_cfg' task
    in qa/tasks/cephadm.py (that task is removed in this commit).
2. qa/workunits/rbd/nvmeof_basic_tests.sh:
    Runs nvme commands (discovery, connect, connect-all, disconnect-all,
    and list-subsys) and does basic verification of the output.
3. qa/workunits/rbd/nvmeof_fio_test.sh:
    Runs fio command. Also runs iostat in parallel if IOSTAT_INTERVAL
    variable is set. This variable configures the delay between each iostat
    print.

nvmeof-cli upgrade from v0.0.6 to v0.0.7 introduced major changes
to all nvmeof commands. This commit changes v0.0.6 commands to
v0.0.7 in qa/workunits/rbd/nvmeof_initiator.sh

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2024-02-12 13:00:09 +05:30
Ramana Raja
fcbf7367d2 rbd-nbd: map using netlink interface by default
Mapping rbd images to nbd devices using ioctl interface is not
robust. It was discovered that the device size or the md5 checksum
of the nbd device was incorrect immediately after mapping using
ioctl method. When using the nbd netlink interface to map RBD images
the issue was not encountered. Switch to using nbd netlink interface
for mapping.

Fixes: https://tracker.ceph.com/issues/64063
Signed-off-by: Ramana Raja <rraja@redhat.com>
2024-01-25 11:00:59 -05:00
Ramana Raja
1eebb7ba79 rbd_nbd: fix resize of images mapped using netlink
Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.

Fixes: https://tracker.ceph.com/issues/64139
Signed-off-by: Ramana Raja <rraja@redhat.com>
2024-01-24 15:33:50 -05:00
Ilya Dryomov
d9147a14c4
Merge pull request #54205 from VallariAg/wip-nvmeof-test
qa: add rbd/nvmeof integration test

Reviewed-by: Zack Cerza <zack@redhat.com>
Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2023-12-04 18:14:38 +01:00
Vallari Agrawal
42e121a42a
qa: add rbd/nvmeof test
A basic test for ceph-nvmeof[1] where
nvmeof initiator is created.
It requires use of a new task "nvmeof_gateway_cfg"
under cephadm which shares config information
between two remote hosts.

[1] https://github.com/ceph/ceph-nvmeof/

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
2023-12-04 19:27:54 +05:30
Ramana Raja
ea033fe860 qa/workunits/rbd/cli_generic.sh: narrow race window
... when checking whether a rbd_support module command fails after
blocklisting the module's client.

In tests that check the recovery of the rbd_support module after its
client is blocklisted, the rbd_support module's client is
blocklisted using the `osd blocklist add` command. Next,
`osd blocklist ls` command is issued to confirm that the client is
blocklisted. A rbd_support module command is then issued and expected
to fail in order to verify that the blocklisting has affected the
rbd_support module's operations. Sometimes it was observed that before
this rbd_support module command reached the ceph-mgr, the rbd_support
module detected the blocklisting, recovered from it, and was able to
serve the command. To reduce the race window that occurs when trying to
verify that the rbd_support module's operation is affected by client
blocklisting, get rid of the `osd blocklist ls` command.

Fixes: https://tracker.ceph.com/issues/63673
Signed-off-by: Ramana Raja <rraja@redhat.com>
2023-11-29 13:49:06 -05:00
Suyashd999
9b773eec4a qa/suites/rbd: Cleanup of MIRROR_IMAGE_MODE
Fixes: https://tracker.ceph.com/issues/63431
Signed-off-by: Suyash Dongre <suyashd999@gmail.com>
2023-11-14 18:28:02 +05:30
Ilya Dryomov
c93a53aa66
Merge pull request #48508 from pkalever/rbd-tests
qa/workunits/rbd: merge journal and snapshot test scripts

Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2023-11-03 12:55:02 +01:00
Prasanna Kumar Kalever
3fd8a03887 qa/workunits/rbd: merge journal and snapshot test scripts
The idea is to avoid the maintenance of duplicate code in both the journal
and snapshot test scripts.

Usage:
   RBD_MIRROR_MODE=journal rbd_mirror.sh

Use environment variable RBD_MIRROR_MODE to set the mode
Available modes: snapshot | journal

Fixes: https://tracker.ceph.com/issues/54312
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2023-11-02 18:11:55 +05:30
Ilya Dryomov
c5eb0ce432
Merge pull request #53535 from ajarr/wip-62891
qa/suites/rbd: add test to check rbd_support module recovery

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Mykola Golub <mgolub@suse.com>
2023-11-01 10:45:59 +01:00
Ilya Dryomov
bf82a7bd34
Merge pull request #50593 from pkalever/fix-feature-disable
rbd-nbd: fix stuck with disable request

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2023-10-30 08:55:09 +01:00
Ramana Raja
2f2cd3bcff qa/suites/rbd: add test to check rbd_support module recovery
... on repeated blocklisting of its client.

There were issues with rbd_support module not being able to recover
from its RADOS client being repeatedly blocklisted. This occured for
example in clusters with OSDs slow to process RBD requests while the
module's mirror_snapshot_scheduler was taking mirror snapshots by
requesting exclusive locks on the RBD images and workloads were running
on the snapshotted images via kernel clients.

Fixes: https://tracker.ceph.com/issues/62891
Signed-off-by: Ramana Raja <rraja@redhat.com>
2023-10-10 12:58:19 -04:00
Ilya Dryomov
237aa221eb qa/suites/krbd: stress test for recovering from watch errors
Fixes: https://tracker.ceph.com/issues/63010
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-10-02 12:21:12 +02:00
Prasanna Kumar Kalever
dbb4daff40 rbd-nbd: fix stuck with disable request
Problem:
-------
Trying to disable any feature on an rbd image mapped with nbd leads to stuck
in rbd-nbd.

The rbd-nbd registers a watcher callback to detect image resize in
NBDWatchCtx::handle_notify(). The handle_notify calls image info method, which
calls refresh_if_required and it got stuck there.

It is getting stuck in ImageState::refresh_if_required() because
DisableFeaturesRequest issues update notifications while still holding onto
the exclusive lock with everything that has to do with it blocked.

Solution:
--------
Set only notify flag as part of NBDWatchCtx::handle_notify() and handle
the resize detection part as part of a different thread.

Fixes: https://tracker.ceph.com/issues/58740
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2023-09-21 11:18:03 +05:30
Ilya Dryomov
153df2d64b qa: add "failover / failback loop" test for rbd-mirror
For snapshot-based mirroring, check that demote (or other mirror
snapshots) don't pile up.  Nothing in particular to assert on for
journal-based mirroring but the test is still useful.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-09-01 19:05:36 +02:00
Ilya Dryomov
d49df8d74c qa/workunits/rbd: use jammy version of qemu-iotests for centos 9
It's the one we are using for all recent distros.

While at it, get rid of custom bin directory -- it appears that both
v2.3.0 and v2.11.0 tests are happy with just symlinks in the current
directory.

Fixes: https://tracker.ceph.com/issues/61565
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-07-25 14:00:04 +02:00
Casey Bodley
af04457a43 test/pybind/rbd: convert from nose to pytest
* use fixtures for temporary images and groups
* use pytest.skip instead of nose.SkipTest
* replace setUp/tearDown with setup/teardown_method
* add @pytest.mark.skip_if_crimson
* replace nose assertions

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-07-06 11:02:11 -04:00
Ilya Dryomov
acb270a3dd qa/workunits/rbd: make continuous export-diff test actually work
The current version is pretty useless:

- "rbd bench" writes the same byte (0xff) over and over again, so
  almost all checksumming is in vain
- snapshots are taken in a steady state (i.e. not under I/O), so no
  race conditions can get exposed
- even with these caveats, it's not wired up into the suite

Redo this workunit to be a reliable reproducer for the issue fixed
in the previous commit and wire it up for both krbd and rbd-nbd.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-06-20 22:14:39 +02:00
Matan
653b97e472
Merge pull request #51388 from Matan-B/wip-matanb-c-enable-rbd-tests
qa/suites/crimson: Enhance rbd api testing

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>
2023-05-11 16:28:55 +02:00
Ramana Raja
a2f15d4b2f qa/workunits/rbd: Add tests for rbd_support module recovery
... after the module's RADOS client is blocklisted.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2023-05-08 16:45:41 -04:00
Matan Breizman
5823c04542 qa/suites/crimson: Skip unsupported tests (Crimson)
Align with `rbd_api_tests` and skip deep_copy and breaklock tests
in Crimson.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-05-08 10:57:06 +00:00
Josh Soref
965ee91d3f rbd: fix spelling errors
* acquire
* are
* asynchronous
* attempt
* bootstrap
* concurrent
* consume
* couldn't
* cumulative
* disable
* disabling
* disaster
* disconnected
* endianness
* entries
* exclusive
* filesystem
* flag
* generic
* github
* image
* information
* initiating
* latency
* limitations
* metadata
* modify
* namespace
* noautoconsole
* ourselves
* prefetch
* propagate
* protection
* recorder
* recover
* release
* replicated
* reserved
* selection
* sentinel
* several
* snapshot
* source
* specifying
* suppress
* synchronize
* the
* transfer
* triggering
* unknown
* validation
* version
* visible
* write log entries

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2023-04-26 09:30:53 -04:00
Ilya Dryomov
3b1610997a qa/workunits/rbd: use bionic version of qemu-iotests for jammy
Same as in commit 2de2146c30 ("qa/workunits/rbd: use bionic version
of qemu-iotests for focal").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-03-15 17:12:36 +01:00
Matan Breizman
b73d8fd860 qa/*/crimson: Seperate Crimson's rbd api testing
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-03-07 08:57:03 +00:00
Ilya Dryomov
b21a379c5b librbd: call apply_changes() after setting librados_thread_count
Otherwise the setting doesn't take effect.  While at it, replace
home-grown stringify() with standard to_string().

Fixes: https://tracker.ceph.com/issues/58833
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-02-23 12:50:45 +01:00
Ilya Dryomov
f4edd7728a
Merge pull request #49614 from isodude/wip-librbd-misalign-discard
librbd: Fix local rbd mirror journals growing forever

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2023-02-17 18:09:39 +01:00
Ilya Dryomov
fcfef0a19e qa/workunits/rbd-nbd: work around "rbd feature disable" hang
"rbd feature disable" appears to reliably hang if the corresponding
remote request is proxied to rbd-nbd (because rbd-nbd happens to own
the exclusive lock after a series of blkdiscard calls) [1].  Work
around it here by enabling journaling before the image is mapped
and disabling it after the image is unmapped.

Also, don't assert on the output of "rbd journal inspect --verbose"
having a certain number of entries.  This is racy: if the script gets
delayed after the last blkdiscard call for some reason, there may be
fewer entries present in the journal or none at all.

[1] https://tracker.ceph.com/issues/58740

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-02-16 13:05:05 +01:00
Ilya Dryomov
5cec2670be qa/suites/rbd: fix sporadic "rx-only direction" test failures
The existing

    xmlstarlet sel -t -v  '//mirror/peers/peer[1]/uuid')" = ""

test is bogus since a tx-only peer gets added after the remote
rbd-mirror daemon pings the local cluster.  It happened to pass most
of the time because xmlstarlet filter just failed on an empty peers
array, producing the wrongly expected empty string by accident.

Fixes: https://tracker.ceph.com/issues/58688
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-02-10 15:26:27 +01:00
Josef Johansson
21a26a7528 librbd: Fix local rbd mirror journals growing forever
This commit fixes commit 7ca1bab90f by pushing properly aligned
discards back to m_image_extents, if corrected.

If discards are misaligned (off 0, len 4608, gran=4096), they are
corrected properly, but only in object_extents and not in
m_image_extents.

When journal_append_event is triggered it will only append from
m_image_extents and does not now about the alignment fixes. In
commit_io_events_extent it will log a message and return without
completing the io since the larger misaligned area was sent to the journal.
This will in turn break rbd journal mirroring since the local client will wait
indefinately on the commit to be completed, which it never does.

This does not effect rbd-mirror in any way, which may be confusing and
dangerous since it's only rbd-mirror that updates ceph health, and not
the local client.

Setting `rbd_skip_partial_discard = false` under client will restore the
pre 7ca1bab behaviour and thus not trigger the bug with journals growing.
This will set `rbd_discard_granularity_bytes = 0` internally. This
setting is only changed during startup of a client.

Fixes: 7ca1bab90f
Fixes: https://tracker.ceph.com/issues/57396
Signed-off-by: Josef Johansson <josef@oderland.se>
2023-01-20 11:59:16 +01:00
Ilya Dryomov
8780f602a9
Merge pull request #48618 from idryomov/rbd-clone-encryption-part2
librbd: add encryption format support for clones (part 2/2)

Reviewed-by: Mykola Golub <mgolub@suse.com>
Acked-by: Or Ozeri <oro@il.ibm.com>
2022-12-05 17:47:19 +01:00
Ilya Dryomov
8d5d478532 qa/workunits/rbd: add encryption-aware resize test
Note that we are hitting https://tracker.ceph.com/issues/58160 here
because by the time we get to "rbd resize" RAW_DEV mapping owns the
lock (due to a write to /dev/mapper/cryptsetupdev being last).

While at it, resurrect the ability to easily run this script on
vstart clusters -- see commit f737c2855a ("qa/workunits/rbd: make
luks-encryption test work on vstart cluster").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-12-04 18:24:10 +01:00
Ilya Dryomov
a27ee2bdf8 rbd, rbd-nbd: make --encryption-format optional
If no --encryption-format specified at all, default to "luks" for each
specified --encryption-passphrase-file.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-12-04 18:19:19 +01:00
Ilya Dryomov
e62e3b6613 rbd, rbd-nbd: accept "luks", "luks1" and "luks2" formats
Since RBD_ENCRYPTION_FORMAT_LUKS1, RBD_ENCRYPTION_FORMAT_LUKS2
and RBD_ENCRYPTION_FORMAT_LUKS aren't treated the same when loading
encryption anymore, "luks1" and "luks2" formats need to be accepted
in addition to "luks" format.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-12-04 18:19:19 +01:00
Ilya Dryomov
d642f7804b rbd, rbd-nbd: don't strip trailing newline in passphrase files
One of the stated goals is compatibility with standard LUKS tools,
in particular being able to load encryption on images formatted with
cryptsetup.  cryptsetup doesn't do this and this really interferes
with randomly generated (binary) passphrases.

While at it, open passphrase files as binary -- it communicates the
intent if nothing else on POSIX.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-12-04 18:19:19 +01:00
Ilya Dryomov
8f712733af qa: rbd_groups.sh: change interpreter to bash
Commit e0da2a4e8c ("qa/workunits/rbd: Add test to list snapshots of
consistency group") added bash-specific syntax.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-12-04 13:20:44 +01:00
Ilya Dryomov
9ca2ec704e
Merge pull request #48549 from pkalever/snap-list
cls/rbd: update last_read in group::snap_list

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2022-12-02 13:18:08 +01:00
Ilya Dryomov
af6ed506f2
Merge pull request #48680 from pkalever/snap-id
rbd: add --snap-id option to "rbd device map" to allow mapping arbitrary snapshots

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2022-11-27 14:10:31 +01:00
Ilya Dryomov
4a7150cd36 qa/workunits/rbd-nbd: clear DEV after detach tests
Otherwise we attempt to unmap it in cleanup(), needlessly.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-11-26 13:27:33 +01:00
Ilya Dryomov
5a425927ed mgr/rbd_support: avoid wedging the task queue if pool is removed
rados.ObjectNotFound exception handler was referencing ioctx variable
which is assigned only if the pool exists and rados.open_ioctx() call
succeeds.  This lead to a fatal error

  mgr[rbd_support] Failed to locate pool mypool
  mgr[rbd_support] execute_task: [errno 2] error opening pool 'b'mypool''
  mgr[rbd_support] Fatal runtime error: local variable 'ioctx' referenced before assignment

and wedged the task queue.  No other commands were processed until
ceph-mgr daemon restart.

Fixes: https://tracker.ceph.com/issues/52932
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-11-23 23:11:42 +01:00
Prasanna Kumar Kalever
92480e6561 qa/workunits/rbd: added tests for --snap-id
Fixes: https://tracker.ceph.com/issues/57902
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-11-10 19:28:30 +05:30
Prasanna Kumar Kalever
e0da2a4e8c qa/workunits/rbd: Add test to list snapshots of consistency group
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-11-09 11:19:35 +05:30