Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.
Fixes: https://tracker.ceph.com/issues/64139
Signed-off-by: Ramana Raja <rraja@redhat.com>
Problem:
-------
Trying to disable any feature on an rbd image mapped with nbd leads to stuck
in rbd-nbd.
The rbd-nbd registers a watcher callback to detect image resize in
NBDWatchCtx::handle_notify(). The handle_notify calls image info method, which
calls refresh_if_required and it got stuck there.
It is getting stuck in ImageState::refresh_if_required() because
DisableFeaturesRequest issues update notifications while still holding onto
the exclusive lock with everything that has to do with it blocked.
Solution:
--------
Set only notify flag as part of NBDWatchCtx::handle_notify() and handle
the resize detection part as part of a different thread.
Fixes: https://tracker.ceph.com/issues/58740
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Otherwise the setting doesn't take effect. While at it, replace
home-grown stringify() with standard to_string().
Fixes: https://tracker.ceph.com/issues/58833
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
"rbd feature disable" appears to reliably hang if the corresponding
remote request is proxied to rbd-nbd (because rbd-nbd happens to own
the exclusive lock after a series of blkdiscard calls) [1]. Work
around it here by enabling journaling before the image is mapped
and disabling it after the image is unmapped.
Also, don't assert on the output of "rbd journal inspect --verbose"
having a certain number of entries. This is racy: if the script gets
delayed after the last blkdiscard call for some reason, there may be
fewer entries present in the journal or none at all.
[1] https://tracker.ceph.com/issues/58740
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This commit fixes commit 7ca1bab90f by pushing properly aligned
discards back to m_image_extents, if corrected.
If discards are misaligned (off 0, len 4608, gran=4096), they are
corrected properly, but only in object_extents and not in
m_image_extents.
When journal_append_event is triggered it will only append from
m_image_extents and does not now about the alignment fixes. In
commit_io_events_extent it will log a message and return without
completing the io since the larger misaligned area was sent to the journal.
This will in turn break rbd journal mirroring since the local client will wait
indefinately on the commit to be completed, which it never does.
This does not effect rbd-mirror in any way, which may be confusing and
dangerous since it's only rbd-mirror that updates ceph health, and not
the local client.
Setting `rbd_skip_partial_discard = false` under client will restore the
pre 7ca1bab behaviour and thus not trigger the bug with journals growing.
This will set `rbd_discard_granularity_bytes = 0` internally. This
setting is only changed during startup of a client.
Fixes: 7ca1bab90f
Fixes: https://tracker.ceph.com/issues/57396
Signed-off-by: Josef Johansson <josef@oderland.se>
This patch includes twe new test cases:
a. map/unmap test with only image name and
b. map/unmap test after changing default pool which expects the image
to come from new default pool.
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
For `detach` failing to find the process is fatal while unmap
will still try to send disconnect to the device.
Signed-off-by: Mykola Golub <mgolub@suse.com>
The commands allow to restart a daemon without destroying the nbd
device.
Now, if the netlink is used, a dead connection timeout is set on
the nbd device setup, so the device is not immediately released
if the rbd-nbd process terminates without disconnect (unmap).
The attach command just sends terminate signal to the rbd-nbd
process. The detach command starts a new process and connects to
the existing device.
Signed-off-by: Mykola Golub <mgolub@suse.com>
Previously it still could race when unmap_device returned success
because the device was not found in `rbd-nbd list-mapped` (the nbd
device was removed) but the test failed because the process was still
found in the ps table.
Fixes: https://tracker.ceph.com/issues/47394
Signed-off-by: Mykola Golub <mgolub@suse.com>
In recent versions `rbd list-mapped` does not print the white space
at the end of the line.
Fixes: https://tracker.ceph.com/issues/45305
Signed-off-by: Mykola Golub <mgolub@suse.com>
The unmap action only sends a signal to the kernel to notify the
rbd-nbd daemon to disconnect. Therefore, it's possible that an
unmap followed by an immediate re-map to the same device might
fail since the unmap is still in-progress.
Fixes: https://tracker.ceph.com/issues/44567
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The "unmap" request is asynchronous, so wait for a short amount
of time for the "rbd-nbd" daemon process to exit.
Fixes: http://tracker.ceph.com/issues/39598
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The pool and namespace can now be specified as in a
<pool-name>[/<namespace-name>] format as positional
arguments.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
/bin/bash is a Linuxism. Other operating systems install bash to
different paths. Use /usr/bin/env in shebangs to find bash.
Signed-off-by: Alan Somers <asomers@gmail.com>
Previously running the script as unprivileged user was not very useful
due to difficulty to change path sudo was looking for a command to
execute.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
- cleanup on a test failure;
- minimize interference with other processes (tests) that are
run concurrently;
- use xmlstarlet when parsing rbd output;
- add exit status test.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>