RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-20 18:33:44 +00:00

Author	SHA1	Message	Date
Ilya Dryomov	d1d848276f	qa/workunits/rbd: wait for replaying status in bootstrap tests wait_for_replay_complete() doesn't wait for image status to get updated. This didn't matter previously because these tests are run on two different pools and nothing else was following. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-05-06 11:47:52 +02:00
Ilya Dryomov	b7e79642d5	rbd-mirror: remove callout when destroying pool replayer If a pool replayer is removed in an error state (e.g. after failing to connect to the remote cluster), its callout should be removed as well. Otherwise, the error would persist causing "daemon health: ERROR" status to be reported even after a new pool replayer is created and started successfully. Fixes: https://tracker.ceph.com/issues/65487 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-05-05 21:11:54 +02:00
Ilya Dryomov	c870ead3d4	Merge pull request #55595 from VallariAg/wip-nvmeof-test-v3 qa/suite/rbd/nvmeof: Deploy multiple gateways and namespaces Reviewed-by: Barak Davidov <barakda@il.ibm.com> Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>	2024-03-20 10:49:36 +01:00
Vallari Agrawal	00651cfac2	qa/suite/rbd/nvmeof: Deploy multiple gateways and namespaces 1. Deploy 2 gateways on different nodes, then check for multi-path. To add another gateway, only "roles" need to be changed in job yaml. 2. Create "n" nvmeof namespaces, configured by 'namespaces_count' 3. Rename qa/suites/rbd/nvmeof/cluster/fixed-3.yaml to fixed-4.yaml which contains 2 gateways and 2 initiators. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>	2024-03-19 20:48:26 +05:30
Ilya Dryomov	166a236237	qa/workunits/rbd: switch rbd-mirror workunits to bash By making use of here strings in commit `ea3a567f7f` ("qa/workunits: make wait_for_status_in_pool_dir() reentrant") we grew a dependency on bash. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-03-10 18:19:57 +01:00
Ilya Dryomov	fa5ef874ac	Merge pull request #54802 from ajarr/wip-61617 qa: Add tests to validate synced images on rbd-mirror Reviewed-by: Ilya Dryomov <idryomov@gmail.com>	2024-02-23 23:47:42 +01:00
Ramana Raja	b7aae5c3c5	qa: Add tests to validate syncing of images using rbd-mirror Introduce functional tests to validate that the images under workloads are correctly mirrored between two clusters using snapshot based mirroring. Run workload on a primary image using a krbd or nbd client. Take mirror snapshots of the image under workload. Unmount the mapped image and calculate its MD5 checksum before demoting it. After demotion, wait for the mirror status of the image to be 'up+unknown' in both the clusters. This is to make sure that the non-primary image in the other cluster is ready to be promoted. Now promote the non-primary image in the other cluster. Map the promoted image and calculate its MD5 checksum. Verify that the checksums of the demoted and promoted images in the two clusters are the same. The above test is run as part of two different workunits: - a workunit that validates the syncing of multiple mirrored images with workloads running on them - another workunit that validates the syncing of a single mirrored image with workload running on it and the image is set as primary alternatively between the two clusters, as it happens during failover and failback scenarios. Fixes: https://tracker.ceph.com/issues/61617 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-authored-by: Ilya Dryomov <idryomov@redhat.com> Co-authored-by: Christopher Hoffman <choffman@redhat.com>	2024-02-22 11:44:36 -05:00
Ramana Raja	ea3a567f7f	qa/workunits: make wait_for_status_in_pool_dir() reentrant In rbd_mirror_helpers.sh, the `wait_for_status_in_pool_dir()` helper stored `mirror image status` and `mirror pool status` command outputs in files that could be shared over successive calls or calls from multiple threads. Instead store the command outputs in local variables to make `wait_for_status_in_pool_dir()` reentrant. Signed-off-by: Ramana Raja <rraja@redhat.com>	2024-02-22 11:44:28 -05:00
Mykola Golub	5442f7eb21	tools/rbd: make 'children' command support --image-id Fixes: https://tracker.ceph.com/issues/64376 Signed-off-by: Mykola Golub <mykola.golub@clyso.com>	2024-02-13 15:50:32 +00:00
Vallari Agrawal	1713c4852c	qa: add qa/tasks/nvmeof.py and rbd/nvmeof_basic_task and fio workunits This is v2 of the rbd/nvmeof test: It deploys 1 gateway and 1 initiator. Then does basic verification on nvme commands and runs fio. This commit creates: 1. qa/tasks/nvmeof.py: adds a new 'Nvmeof' task which deploys the gateway and shares config with the initiator hosts. Sharing config was previously done by 'nvmeof_gateway_cfg' task in qa/tasks/cephadm.py (that task is removed in this commit). 2. qa/workunits/rbd/nvmeof_basic_tests.sh: Runs nvme commands (discovery, connect, connect-all, disconnect-all, and list-subsys) and does basic verification of the output. 3. qa/workunits/rbd/nvmeof_fio_test.sh: Runs fio command. Also runs iostat in parallel if IOSTAT_INTERVAL variable is set. This variable configures the delay between each iostat print. nvmeof-cli upgrade from v0.0.6 to v0.0.7 introduced major changes to all nvmeof commands. This commit changes v0.0.6 commands to v0.0.7 in qa/workunits/rbd/nvmeof_initiator.sh Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>	2024-02-12 13:00:09 +05:30
Ramana Raja	fcbf7367d2	rbd-nbd: map using netlink interface by default Mapping rbd images to nbd devices using ioctl interface is not robust. It was discovered that the device size or the md5 checksum of the nbd device was incorrect immediately after mapping using ioctl method. When using the nbd netlink interface to map RBD images the issue was not encountered. Switch to using nbd netlink interface for mapping. Fixes: https://tracker.ceph.com/issues/64063 Signed-off-by: Ramana Raja <rraja@redhat.com>	2024-01-25 11:00:59 -05:00
Ramana Raja	1eebb7ba79	rbd_nbd: fix resize of images mapped using netlink Include device identifier or cookie in the message sent to the kernel to resize images mapped to NBD devices using netlink. Otherwise, netlink_resize() fails and the size of the device isn't updated. Fixes: https://tracker.ceph.com/issues/64139 Signed-off-by: Ramana Raja <rraja@redhat.com>	2024-01-24 15:33:50 -05:00
Ilya Dryomov	d9147a14c4	Merge pull request #54205 from VallariAg/wip-nvmeof-test qa: add rbd/nvmeof integration test Reviewed-by: Zack Cerza <zack@redhat.com> Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>	2023-12-04 18:14:38 +01:00
Vallari Agrawal	42e121a42a	qa: add rbd/nvmeof test A basic test for ceph-nvmeof[1] where nvmeof initiator is created. It requires use of a new task "nvmeof_gateway_cfg" under cephadm which shares config information between two remote hosts. [1] https://github.com/ceph/ceph-nvmeof/ Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>	2023-12-04 19:27:54 +05:30
Ramana Raja	ea033fe860	qa/workunits/rbd/cli_generic.sh: narrow race window ... when checking whether a rbd_support module command fails after blocklisting the module's client. In tests that check the recovery of the rbd_support module after its client is blocklisted, the rbd_support module's client is blocklisted using the `osd blocklist add` command. Next, `osd blocklist ls` command is issued to confirm that the client is blocklisted. A rbd_support module command is then issued and expected to fail in order to verify that the blocklisting has affected the rbd_support module's operations. Sometimes it was observed that before this rbd_support module command reached the ceph-mgr, the rbd_support module detected the blocklisting, recovered from it, and was able to serve the command. To reduce the race window that occurs when trying to verify that the rbd_support module's operation is affected by client blocklisting, get rid of the `osd blocklist ls` command. Fixes: https://tracker.ceph.com/issues/63673 Signed-off-by: Ramana Raja <rraja@redhat.com>	2023-11-29 13:49:06 -05:00
Suyashd999	9b773eec4a	qa/suites/rbd: Cleanup of MIRROR_IMAGE_MODE Fixes: https://tracker.ceph.com/issues/63431 Signed-off-by: Suyash Dongre <suyashd999@gmail.com>	2023-11-14 18:28:02 +05:30
Ilya Dryomov	c93a53aa66	Merge pull request #48508 from pkalever/rbd-tests qa/workunits/rbd: merge journal and snapshot test scripts Reviewed-by: Ramana Raja <rraja@redhat.com> Reviewed-by: Mykola Golub <mgolub@suse.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 12:55:02 +01:00
Prasanna Kumar Kalever	3fd8a03887	qa/workunits/rbd: merge journal and snapshot test scripts The idea is to avoid the maintenance of duplicate code in both the journal and snapshot test scripts. Usage: RBD_MIRROR_MODE=journal rbd_mirror.sh Use environment variable RBD_MIRROR_MODE to set the mode Available modes: snapshot \| journal Fixes: https://tracker.ceph.com/issues/54312 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2023-11-02 18:11:55 +05:30
Ilya Dryomov	c5eb0ce432	Merge pull request #53535 from ajarr/wip-62891 qa/suites/rbd: add test to check rbd_support module recovery Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mykola Golub <mgolub@suse.com>	2023-11-01 10:45:59 +01:00
Ilya Dryomov	bf82a7bd34	Merge pull request #50593 from pkalever/fix-feature-disable rbd-nbd: fix stuck with disable request Reviewed-by: Mykola Golub <mgolub@suse.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>	2023-10-30 08:55:09 +01:00
Ramana Raja	2f2cd3bcff	qa/suites/rbd: add test to check rbd_support module recovery ... on repeated blocklisting of its client. There were issues with rbd_support module not being able to recover from its RADOS client being repeatedly blocklisted. This occured for example in clusters with OSDs slow to process RBD requests while the module's mirror_snapshot_scheduler was taking mirror snapshots by requesting exclusive locks on the RBD images and workloads were running on the snapshotted images via kernel clients. Fixes: https://tracker.ceph.com/issues/62891 Signed-off-by: Ramana Raja <rraja@redhat.com>	2023-10-10 12:58:19 -04:00
Ilya Dryomov	237aa221eb	qa/suites/krbd: stress test for recovering from watch errors Fixes: https://tracker.ceph.com/issues/63010 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-10-02 12:21:12 +02:00
Prasanna Kumar Kalever	dbb4daff40	rbd-nbd: fix stuck with disable request Problem: ------- Trying to disable any feature on an rbd image mapped with nbd leads to stuck in rbd-nbd. The rbd-nbd registers a watcher callback to detect image resize in NBDWatchCtx::handle_notify(). The handle_notify calls image info method, which calls refresh_if_required and it got stuck there. It is getting stuck in ImageState::refresh_if_required() because DisableFeaturesRequest issues update notifications while still holding onto the exclusive lock with everything that has to do with it blocked. Solution: -------- Set only notify flag as part of NBDWatchCtx::handle_notify() and handle the resize detection part as part of a different thread. Fixes: https://tracker.ceph.com/issues/58740 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2023-09-21 11:18:03 +05:30
Ilya Dryomov	153df2d64b	qa: add "failover / failback loop" test for rbd-mirror For snapshot-based mirroring, check that demote (or other mirror snapshots) don't pile up. Nothing in particular to assert on for journal-based mirroring but the test is still useful. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-09-01 19:05:36 +02:00
Ilya Dryomov	d49df8d74c	qa/workunits/rbd: use jammy version of qemu-iotests for centos 9 It's the one we are using for all recent distros. While at it, get rid of custom bin directory -- it appears that both v2.3.0 and v2.11.0 tests are happy with just symlinks in the current directory. Fixes: https://tracker.ceph.com/issues/61565 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-07-25 14:00:04 +02:00
Casey Bodley	af04457a43	test/pybind/rbd: convert from nose to pytest * use fixtures for temporary images and groups * use pytest.skip instead of nose.SkipTest * replace setUp/tearDown with setup/teardown_method * add @pytest.mark.skip_if_crimson * replace nose assertions Signed-off-by: Casey Bodley <cbodley@redhat.com>	2023-07-06 11:02:11 -04:00
Ilya Dryomov	acb270a3dd	qa/workunits/rbd: make continuous export-diff test actually work The current version is pretty useless: - "rbd bench" writes the same byte (0xff) over and over again, so almost all checksumming is in vain - snapshots are taken in a steady state (i.e. not under I/O), so no race conditions can get exposed - even with these caveats, it's not wired up into the suite Redo this workunit to be a reliable reproducer for the issue fixed in the previous commit and wire it up for both krbd and rbd-nbd. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-06-20 22:14:39 +02:00
Matan	653b97e472	Merge pull request #51388 from Matan-B/wip-matanb-c-enable-rbd-tests qa/suites/crimson: Enhance rbd api testing Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>	2023-05-11 16:28:55 +02:00
Ramana Raja	a2f15d4b2f	qa/workunits/rbd: Add tests for rbd_support module recovery ... after the module's RADOS client is blocklisted. Signed-off-by: Ramana Raja <rraja@redhat.com>	2023-05-08 16:45:41 -04:00
Matan Breizman	5823c04542	qa/suites/crimson: Skip unsupported tests (Crimson) Align with `rbd_api_tests` and skip deep_copy and breaklock tests in Crimson. Signed-off-by: Matan Breizman <mbreizma@redhat.com>	2023-05-08 10:57:06 +00:00
Josh Soref	965ee91d3f	rbd: fix spelling errors * acquire * are * asynchronous * attempt * bootstrap * concurrent * consume * couldn't * cumulative * disable * disabling * disaster * disconnected * endianness * entries * exclusive * filesystem * flag * generic * github * image * information * initiating * latency * limitations * metadata * modify * namespace * noautoconsole * ourselves * prefetch * propagate * protection * recorder * recover * release * replicated * reserved * selection * sentinel * several * snapshot * source * specifying * suppress * synchronize * the * transfer * triggering * unknown * validation * version * visible * write log entries Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>	2023-04-26 09:30:53 -04:00
Ilya Dryomov	3b1610997a	qa/workunits/rbd: use bionic version of qemu-iotests for jammy Same as in commit `2de2146c30` ("qa/workunits/rbd: use bionic version of qemu-iotests for focal"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-03-15 17:12:36 +01:00
Matan Breizman	b73d8fd860	qa/*/crimson: Seperate Crimson's rbd api testing Signed-off-by: Matan Breizman <mbreizma@redhat.com>	2023-03-07 08:57:03 +00:00
Ilya Dryomov	b21a379c5b	librbd: call apply_changes() after setting librados_thread_count Otherwise the setting doesn't take effect. While at it, replace home-grown stringify() with standard to_string(). Fixes: https://tracker.ceph.com/issues/58833 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-02-23 12:50:45 +01:00
Ilya Dryomov	f4edd7728a	Merge pull request #49614 from isodude/wip-librbd-misalign-discard librbd: Fix local rbd mirror journals growing forever Reviewed-by: Mykola Golub <mgolub@suse.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>	2023-02-17 18:09:39 +01:00
Ilya Dryomov	fcfef0a19e	qa/workunits/rbd-nbd: work around "rbd feature disable" hang "rbd feature disable" appears to reliably hang if the corresponding remote request is proxied to rbd-nbd (because rbd-nbd happens to own the exclusive lock after a series of blkdiscard calls) [1]. Work around it here by enabling journaling before the image is mapped and disabling it after the image is unmapped. Also, don't assert on the output of "rbd journal inspect --verbose" having a certain number of entries. This is racy: if the script gets delayed after the last blkdiscard call for some reason, there may be fewer entries present in the journal or none at all. [1] https://tracker.ceph.com/issues/58740 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-02-16 13:05:05 +01:00
Ilya Dryomov	5cec2670be	qa/suites/rbd: fix sporadic "rx-only direction" test failures The existing xmlstarlet sel -t -v '//mirror/peers/peer[1]/uuid')" = "" test is bogus since a tx-only peer gets added after the remote rbd-mirror daemon pings the local cluster. It happened to pass most of the time because xmlstarlet filter just failed on an empty peers array, producing the wrongly expected empty string by accident. Fixes: https://tracker.ceph.com/issues/58688 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-02-10 15:26:27 +01:00
Josef Johansson	21a26a7528	librbd: Fix local rbd mirror journals growing forever This commit fixes commit `7ca1bab90f` by pushing properly aligned discards back to m_image_extents, if corrected. If discards are misaligned (off 0, len 4608, gran=4096), they are corrected properly, but only in object_extents and not in m_image_extents. When journal_append_event is triggered it will only append from m_image_extents and does not now about the alignment fixes. In commit_io_events_extent it will log a message and return without completing the io since the larger misaligned area was sent to the journal. This will in turn break rbd journal mirroring since the local client will wait indefinately on the commit to be completed, which it never does. This does not effect rbd-mirror in any way, which may be confusing and dangerous since it's only rbd-mirror that updates ceph health, and not the local client. Setting `rbd_skip_partial_discard = false` under client will restore the pre `7ca1bab` behaviour and thus not trigger the bug with journals growing. This will set `rbd_discard_granularity_bytes = 0` internally. This setting is only changed during startup of a client. Fixes: `7ca1bab90f` Fixes: https://tracker.ceph.com/issues/57396 Signed-off-by: Josef Johansson <josef@oderland.se>	2023-01-20 11:59:16 +01:00
Ilya Dryomov	8780f602a9	Merge pull request #48618 from idryomov/rbd-clone-encryption-part2 librbd: add encryption format support for clones (part 2/2) Reviewed-by: Mykola Golub <mgolub@suse.com> Acked-by: Or Ozeri <oro@il.ibm.com>	2022-12-05 17:47:19 +01:00
Ilya Dryomov	8d5d478532	qa/workunits/rbd: add encryption-aware resize test Note that we are hitting https://tracker.ceph.com/issues/58160 here because by the time we get to "rbd resize" RAW_DEV mapping owns the lock (due to a write to /dev/mapper/cryptsetupdev being last). While at it, resurrect the ability to easily run this script on vstart clusters -- see commit `f737c2855a` ("qa/workunits/rbd: make luks-encryption test work on vstart cluster"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-12-04 18:24:10 +01:00
Ilya Dryomov	a27ee2bdf8	rbd, rbd-nbd: make --encryption-format optional If no --encryption-format specified at all, default to "luks" for each specified --encryption-passphrase-file. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-12-04 18:19:19 +01:00
Ilya Dryomov	e62e3b6613	rbd, rbd-nbd: accept "luks", "luks1" and "luks2" formats Since RBD_ENCRYPTION_FORMAT_LUKS1, RBD_ENCRYPTION_FORMAT_LUKS2 and RBD_ENCRYPTION_FORMAT_LUKS aren't treated the same when loading encryption anymore, "luks1" and "luks2" formats need to be accepted in addition to "luks" format. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-12-04 18:19:19 +01:00
Ilya Dryomov	d642f7804b	rbd, rbd-nbd: don't strip trailing newline in passphrase files One of the stated goals is compatibility with standard LUKS tools, in particular being able to load encryption on images formatted with cryptsetup. cryptsetup doesn't do this and this really interferes with randomly generated (binary) passphrases. While at it, open passphrase files as binary -- it communicates the intent if nothing else on POSIX. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-12-04 18:19:19 +01:00
Ilya Dryomov	8f712733af	qa: rbd_groups.sh: change interpreter to bash Commit `e0da2a4e8c` ("qa/workunits/rbd: Add test to list snapshots of consistency group") added bash-specific syntax. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-12-04 13:20:44 +01:00
Ilya Dryomov	9ca2ec704e	Merge pull request #48549 from pkalever/snap-list cls/rbd: update last_read in group::snap_list Reviewed-by: Mykola Golub <mgolub@suse.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>	2022-12-02 13:18:08 +01:00
Ilya Dryomov	af6ed506f2	Merge pull request #48680 from pkalever/snap-id rbd: add --snap-id option to "rbd device map" to allow mapping arbitrary snapshots Reviewed-by: Ilya Dryomov <idryomov@gmail.com>	2022-11-27 14:10:31 +01:00
Ilya Dryomov	4a7150cd36	qa/workunits/rbd-nbd: clear DEV after detach tests Otherwise we attempt to unmap it in cleanup(), needlessly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-11-26 13:27:33 +01:00
Ilya Dryomov	5a425927ed	mgr/rbd_support: avoid wedging the task queue if pool is removed rados.ObjectNotFound exception handler was referencing ioctx variable which is assigned only if the pool exists and rados.open_ioctx() call succeeds. This lead to a fatal error mgr[rbd_support] Failed to locate pool mypool mgr[rbd_support] execute_task: [errno 2] error opening pool 'b'mypool'' mgr[rbd_support] Fatal runtime error: local variable 'ioctx' referenced before assignment and wedged the task queue. No other commands were processed until ceph-mgr daemon restart. Fixes: https://tracker.ceph.com/issues/52932 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-11-23 23:11:42 +01:00
Prasanna Kumar Kalever	92480e6561	qa/workunits/rbd: added tests for --snap-id Fixes: https://tracker.ceph.com/issues/57902 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2022-11-10 19:28:30 +05:30
Prasanna Kumar Kalever	e0da2a4e8c	qa/workunits/rbd: Add test to list snapshots of consistency group Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2022-11-09 11:19:35 +05:30

1 2 3 4 5 ...

664 Commits