wait_for_replay_complete() doesn't wait for image status to get
updated. This didn't matter previously because these tests are run on
two different pools and nothing else was following.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If a pool replayer is removed in an error state (e.g. after failing to
connect to the remote cluster), its callout should be removed as well.
Otherwise, the error would persist causing "daemon health: ERROR"
status to be reported even after a new pool replayer is created and
started successfully.
Fixes: https://tracker.ceph.com/issues/65487
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
1. Deploy 2 gateways on different nodes, then check for multi-path.
To add another gateway, only "roles" need to be changed in job yaml.
2. Create "n" nvmeof namespaces, configured by 'namespaces_count'
3. Rename qa/suites/rbd/nvmeof/cluster/fixed-3.yaml to fixed-4.yaml
which contains 2 gateways and 2 initiators.
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
By making use of here strings in commit ea3a567f7f ("qa/workunits:
make wait_for_status_in_pool_dir() reentrant") we grew a dependency on
bash.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Introduce functional tests to validate that the images under
workloads are correctly mirrored between two clusters using snapshot
based mirroring.
Run workload on a primary image using a krbd or nbd client. Take
mirror snapshots of the image under workload. Unmount the mapped image
and calculate its MD5 checksum before demoting it. After demotion,
wait for the mirror status of the image to be 'up+unknown' in both
the clusters. This is to make sure that the non-primary image in the
other cluster is ready to be promoted. Now promote the non-primary
image in the other cluster. Map the promoted image and calculate its
MD5 checksum. Verify that the checksums of the demoted and promoted
images in the two clusters are the same.
The above test is run as part of two different workunits:
- a workunit that validates the syncing of multiple mirrored images
with workloads running on them
- another workunit that validates the syncing of a single mirrored
image with workload running on it and the image is set as primary
alternatively between the two clusters, as it happens during
failover and failback scenarios.
Fixes: https://tracker.ceph.com/issues/61617
Signed-off-by: Ramana Raja <rraja@redhat.com>
Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Christopher Hoffman <choffman@redhat.com>
In rbd_mirror_helpers.sh, the `wait_for_status_in_pool_dir()` helper
stored `mirror image status` and `mirror pool status` command outputs
in files that could be shared over successive calls or calls from
multiple threads. Instead store the command outputs in local variables
to make `wait_for_status_in_pool_dir()` reentrant.
Signed-off-by: Ramana Raja <rraja@redhat.com>
This is v2 of the rbd/nvmeof test: It deploys 1 gateway and 1 initiator.
Then does basic verification on nvme commands and runs fio.
This commit creates:
1. qa/tasks/nvmeof.py: adds a new 'Nvmeof' task which deploys
the gateway and shares config with the initiator hosts.
Sharing config was previously done by 'nvmeof_gateway_cfg' task
in qa/tasks/cephadm.py (that task is removed in this commit).
2. qa/workunits/rbd/nvmeof_basic_tests.sh:
Runs nvme commands (discovery, connect, connect-all, disconnect-all,
and list-subsys) and does basic verification of the output.
3. qa/workunits/rbd/nvmeof_fio_test.sh:
Runs fio command. Also runs iostat in parallel if IOSTAT_INTERVAL
variable is set. This variable configures the delay between each iostat
print.
nvmeof-cli upgrade from v0.0.6 to v0.0.7 introduced major changes
to all nvmeof commands. This commit changes v0.0.6 commands to
v0.0.7 in qa/workunits/rbd/nvmeof_initiator.sh
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
Mapping rbd images to nbd devices using ioctl interface is not
robust. It was discovered that the device size or the md5 checksum
of the nbd device was incorrect immediately after mapping using
ioctl method. When using the nbd netlink interface to map RBD images
the issue was not encountered. Switch to using nbd netlink interface
for mapping.
Fixes: https://tracker.ceph.com/issues/64063
Signed-off-by: Ramana Raja <rraja@redhat.com>
Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.
Fixes: https://tracker.ceph.com/issues/64139
Signed-off-by: Ramana Raja <rraja@redhat.com>
A basic test for ceph-nvmeof[1] where
nvmeof initiator is created.
It requires use of a new task "nvmeof_gateway_cfg"
under cephadm which shares config information
between two remote hosts.
[1] https://github.com/ceph/ceph-nvmeof/
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
... when checking whether a rbd_support module command fails after
blocklisting the module's client.
In tests that check the recovery of the rbd_support module after its
client is blocklisted, the rbd_support module's client is
blocklisted using the `osd blocklist add` command. Next,
`osd blocklist ls` command is issued to confirm that the client is
blocklisted. A rbd_support module command is then issued and expected
to fail in order to verify that the blocklisting has affected the
rbd_support module's operations. Sometimes it was observed that before
this rbd_support module command reached the ceph-mgr, the rbd_support
module detected the blocklisting, recovered from it, and was able to
serve the command. To reduce the race window that occurs when trying to
verify that the rbd_support module's operation is affected by client
blocklisting, get rid of the `osd blocklist ls` command.
Fixes: https://tracker.ceph.com/issues/63673
Signed-off-by: Ramana Raja <rraja@redhat.com>
The idea is to avoid the maintenance of duplicate code in both the journal
and snapshot test scripts.
Usage:
RBD_MIRROR_MODE=journal rbd_mirror.sh
Use environment variable RBD_MIRROR_MODE to set the mode
Available modes: snapshot | journal
Fixes: https://tracker.ceph.com/issues/54312
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
... on repeated blocklisting of its client.
There were issues with rbd_support module not being able to recover
from its RADOS client being repeatedly blocklisted. This occured for
example in clusters with OSDs slow to process RBD requests while the
module's mirror_snapshot_scheduler was taking mirror snapshots by
requesting exclusive locks on the RBD images and workloads were running
on the snapshotted images via kernel clients.
Fixes: https://tracker.ceph.com/issues/62891
Signed-off-by: Ramana Raja <rraja@redhat.com>
Problem:
-------
Trying to disable any feature on an rbd image mapped with nbd leads to stuck
in rbd-nbd.
The rbd-nbd registers a watcher callback to detect image resize in
NBDWatchCtx::handle_notify(). The handle_notify calls image info method, which
calls refresh_if_required and it got stuck there.
It is getting stuck in ImageState::refresh_if_required() because
DisableFeaturesRequest issues update notifications while still holding onto
the exclusive lock with everything that has to do with it blocked.
Solution:
--------
Set only notify flag as part of NBDWatchCtx::handle_notify() and handle
the resize detection part as part of a different thread.
Fixes: https://tracker.ceph.com/issues/58740
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
For snapshot-based mirroring, check that demote (or other mirror
snapshots) don't pile up. Nothing in particular to assert on for
journal-based mirroring but the test is still useful.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It's the one we are using for all recent distros.
While at it, get rid of custom bin directory -- it appears that both
v2.3.0 and v2.11.0 tests are happy with just symlinks in the current
directory.
Fixes: https://tracker.ceph.com/issues/61565
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* use fixtures for temporary images and groups
* use pytest.skip instead of nose.SkipTest
* replace setUp/tearDown with setup/teardown_method
* add @pytest.mark.skip_if_crimson
* replace nose assertions
Signed-off-by: Casey Bodley <cbodley@redhat.com>
The current version is pretty useless:
- "rbd bench" writes the same byte (0xff) over and over again, so
almost all checksumming is in vain
- snapshots are taken in a steady state (i.e. not under I/O), so no
race conditions can get exposed
- even with these caveats, it's not wired up into the suite
Redo this workunit to be a reliable reproducer for the issue fixed
in the previous commit and wire it up for both krbd and rbd-nbd.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Otherwise the setting doesn't take effect. While at it, replace
home-grown stringify() with standard to_string().
Fixes: https://tracker.ceph.com/issues/58833
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
"rbd feature disable" appears to reliably hang if the corresponding
remote request is proxied to rbd-nbd (because rbd-nbd happens to own
the exclusive lock after a series of blkdiscard calls) [1]. Work
around it here by enabling journaling before the image is mapped
and disabling it after the image is unmapped.
Also, don't assert on the output of "rbd journal inspect --verbose"
having a certain number of entries. This is racy: if the script gets
delayed after the last blkdiscard call for some reason, there may be
fewer entries present in the journal or none at all.
[1] https://tracker.ceph.com/issues/58740
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The existing
xmlstarlet sel -t -v '//mirror/peers/peer[1]/uuid')" = ""
test is bogus since a tx-only peer gets added after the remote
rbd-mirror daemon pings the local cluster. It happened to pass most
of the time because xmlstarlet filter just failed on an empty peers
array, producing the wrongly expected empty string by accident.
Fixes: https://tracker.ceph.com/issues/58688
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This commit fixes commit 7ca1bab90f by pushing properly aligned
discards back to m_image_extents, if corrected.
If discards are misaligned (off 0, len 4608, gran=4096), they are
corrected properly, but only in object_extents and not in
m_image_extents.
When journal_append_event is triggered it will only append from
m_image_extents and does not now about the alignment fixes. In
commit_io_events_extent it will log a message and return without
completing the io since the larger misaligned area was sent to the journal.
This will in turn break rbd journal mirroring since the local client will wait
indefinately on the commit to be completed, which it never does.
This does not effect rbd-mirror in any way, which may be confusing and
dangerous since it's only rbd-mirror that updates ceph health, and not
the local client.
Setting `rbd_skip_partial_discard = false` under client will restore the
pre 7ca1bab behaviour and thus not trigger the bug with journals growing.
This will set `rbd_discard_granularity_bytes = 0` internally. This
setting is only changed during startup of a client.
Fixes: 7ca1bab90f
Fixes: https://tracker.ceph.com/issues/57396
Signed-off-by: Josef Johansson <josef@oderland.se>
Note that we are hitting https://tracker.ceph.com/issues/58160 here
because by the time we get to "rbd resize" RAW_DEV mapping owns the
lock (due to a write to /dev/mapper/cryptsetupdev being last).
While at it, resurrect the ability to easily run this script on
vstart clusters -- see commit f737c2855a ("qa/workunits/rbd: make
luks-encryption test work on vstart cluster").
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If no --encryption-format specified at all, default to "luks" for each
specified --encryption-passphrase-file.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Since RBD_ENCRYPTION_FORMAT_LUKS1, RBD_ENCRYPTION_FORMAT_LUKS2
and RBD_ENCRYPTION_FORMAT_LUKS aren't treated the same when loading
encryption anymore, "luks1" and "luks2" formats need to be accepted
in addition to "luks" format.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
One of the stated goals is compatibility with standard LUKS tools,
in particular being able to load encryption on images formatted with
cryptsetup. cryptsetup doesn't do this and this really interferes
with randomly generated (binary) passphrases.
While at it, open passphrase files as binary -- it communicates the
intent if nothing else on POSIX.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Commit e0da2a4e8c ("qa/workunits/rbd: Add test to list snapshots of
consistency group") added bash-specific syntax.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
rados.ObjectNotFound exception handler was referencing ioctx variable
which is assigned only if the pool exists and rados.open_ioctx() call
succeeds. This lead to a fatal error
mgr[rbd_support] Failed to locate pool mypool
mgr[rbd_support] execute_task: [errno 2] error opening pool 'b'mypool''
mgr[rbd_support] Fatal runtime error: local variable 'ioctx' referenced before assignment
and wedged the task queue. No other commands were processed until
ceph-mgr daemon restart.
Fixes: https://tracker.ceph.com/issues/52932
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>