The commands allow to restart a daemon without destroying the nbd
device.
Now, if the netlink is used, a dead connection timeout is set on
the nbd device setup, so the device is not immediately released
if the rbd-nbd process terminates without disconnect (unmap).
The attach command just sends terminate signal to the rbd-nbd
process. The detach command starts a new process and connects to
the existing device.
Signed-off-by: Mykola Golub <mgolub@suse.com>
Previously it still could race when unmap_device returned success
because the device was not found in `rbd-nbd list-mapped` (the nbd
device was removed) but the test failed because the process was still
found in the ps table.
Fixes: https://tracker.ceph.com/issues/47394
Signed-off-by: Mykola Golub <mgolub@suse.com>
Previously, the peer uuid variable was empty which resulted in the failure
to remove the duplicate peer.
Fixes: https://tracker.ceph.com/issues/47007
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
We need to temporary disable "exit on error" mode so it does not
abort when `rbd mirror pool peer add` returns "already exists"
error code.
Signed-off-by: Mykola Golub <mgolub@suse.com>
In recent versions `rbd list-mapped` does not print the white space
at the end of the line.
Fixes: https://tracker.ceph.com/issues/45305
Signed-off-by: Mykola Golub <mgolub@suse.com>
We might race with the remote rbd-mirror daemon creating a
tx-only peer when adding a new peer. Therefore, delete the
tx-only peer and attempt to re-create it.
Fixes: https://tracker.ceph.com/issues/44938
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The free-form journal replay status description is now JSON-encoded. The
"master"/"mirror" designators have been changed to "primary"/"non_primary"
to better align with RBD terminology.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
There is a potential race between the expected exceptions being
thrown and Python shutting down racing with librados background
threads. Ensure that librados is properly shut down prior to
exiting Python.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
For the case when the non-global level does not have a schedule
and a higher level is used as the parent, it wrongly listed
schedules from all branches under the parent, instead of only the
interested one.
Signed-off-by: Mykola Golub <mgolub@suse.com>
The OpenStack tempests tests do not stay stable and break approximately
every six months. Remove the test suite for now.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The unmap action only sends a signal to the kernel to notify the
rbd-nbd daemon to disconnect. Therefore, it's possible that an
unmap followed by an immediate re-map to the same device might
fail since the unmap is still in-progress.
Fixes: https://tracker.ceph.com/issues/44567
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Ensure that snapshot-based mirroring is tested in different RBD image
feature combinations.
Fixes: https://tracker.ceph.com/issues/44396
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The 'ceph' CLI and 'rbd mirror pool/image status' commandsshould revert
to use the admin user so that it has proper credentials for the cluster.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The mirroring site name is stored in the MON config which requires
higher privledges than the standard "client.mirror" user.
Fixes: https://tracker.ceph.com/issues/44066
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
A new functional test for snapshot-based mirroring will be created and
the other stress-tests should eventually be applied to both snapshot-
and journal-based mirroring.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
In Python 3.5 json.tool was changed to produce unsorted output and
--sort-keys option was added to compensate. This wasn't caught by
4fe245cc2f ("qa: update krbd tests for python3") because it raced
with 50933b863a ("qa: krbd_exclusive_option.sh: update for recent
kernel changes").
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* refs/pull/32252/head:
qa/cephfs/begin: libaio-devel on el8
qa/tasks: nosetests -> python -m nose
qa/tasks/rbd_fio: fio 2.21 -> 3.16
src/test/cli-integration/rbd/snap-diff.t: python -> python
qa/workunits: use nose 3
qa/tasks/cbt: install python3 deps
qa/tasks/ceph_manager.py: do not use python to write a file
test/pybind/test_rados: execute takes a bytes (not str) payload
qa/packages/packages: python[3]-ceph is no more
qa: use python3 for venvs etc
packaging: remove python3-ipaddres, as it is part of the stdlib in py3
qa/packages: python-ceph -> python3-ceph
qa/distros: centos7 -> centos8, rhel7 -> rhel8
spec: remove _python_buildid in favor of python3_pkgversion macro
spec: remove python2 packages and conditions
debian: remove python >= 2.7 requirement
debian: add mgr python versions
debian: explicitly set PYTHON2=OFF to prevent picking up python2 interpreter
debian: update control file to use python3 dependency names
debian: remove all python2 overrides and declarations
debian: remove all python2 install files
Reviewed-by: Alfredo Deza <adeza@redhat.com>
The 'rbd bench' command was recently modified to print IEC units
instead of bytes/sec. This broke the handling for QoS throughput
tests since it was incorrectly evaluating the available RBD
throughput. Additionally, the QoS tests should use a "<="
comparison operator since the QoS is the upper-bound limit.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
- don't use journaling feature to chose the mode;
- provide new API function mirror_image_enable2;
- return back the old behavior to automatically enable/disable
journaling feature on enabling/disabling image mirroring.
Signed-off-by: Mykola Golub <mgolub@suse.com>
Enabling mirroring for an image that does not support journaling
assumes snapshot based mirroring, which is supported only when the
pool is in the "image" mirror mode.
Also for the pool in the "image" mirror mode disabling/enabling
journaling feature for a mirroring image will switch
snapshot/journal mirror mode.
Signed-off-by: Mykola Golub <mgolub@suse.com>
"rbd snap rollback" expects an unlocked image, but we may get there
locked if object map is enabled (or if lock_on_read is specified in
rbd_default_map_options).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Since 5.3:
- a plain "rbd map" acquires the lock, so it's not different from
"rbd map -o exclusive" in this regard
- if the lock is held by the exclusive peer, I/O is failed right away
instead of blocking
- lock_timeout option is respected only by "rbd map" and not by I/O
Since 5.5:
- if the mapping is read-only, the lock isn't acquired
Added blacklisting test case, dropped lock_timeout test case.
Fixes: https://tracker.ceph.com/issues/43127
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This will ensure that the Ceph dashbord's block mirroring page and
the CLI's 'mirror pool status' have matching health indications.
Fixes: https://tracker.ceph.com/issues/42748
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Snapshot existence validation code was removed from krbd. It was racy
and relied on having watch established for snapshots.
Fixes: https://tracker.ceph.com/issues/42916
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Therefore the v1 parent spec should not attempt to populate the
namespace.
Fixes: https://tracker.ceph.com/issues/41938
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This helps to to avoid the case where new tasks were not being scheduled
when an image name was re-used after having a task created under the
same name.
Fixes: https://tracker.ceph.com/issues/41032
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Due to pipe, sdiff return code was ignored. Also the compare functions
spent most of time in xxd, which was normally unnecessary.
Signed-off-by: Mykola Golub <mgolub@suse.com>
The "unmap" request is asynchronous, so wait for a short amount
of time for the "rbd-nbd" daemon process to exit.
Fixes: http://tracker.ceph.com/issues/39598
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This was used for API backwards compatibility testing, but now that
the C++ API will not remain stable, it serves no purpose.
Fixes: http://tracker.ceph.com/issues/39072
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
With discard_granularity set to alloc_size, we no longer get object
size alignment from blk_bio_discard_split().
This assumption is pretty deeply ingrained in krbd_data_pool.sh, so
make it explicit. For krbd_fallocate.sh, just fix the expectation.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The test case is issuing discards that span two objects: the tail of
the first object is truncated, the head of the second object is zeroed.
These discards aren't serial, so there is a race:
discard i ~ i + 1: truncate i, zero i + 1
discard i + 1 ~ i + 2: truncate i + 1, zero i + 2
can be executed as
truncate i + 1, zero i + 2, truncate i, zero i + 1
For object i + 1, the sequence ends up being truncate tail, then zero
head. This zero op is munged to truncate on the OSD, resulting in size
0 instead of OBJECT_SIZE / 2.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This better mimics the behavior of teuthology and tests rbd-mirror
daemon's ability to handle a pool deletion.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The test extracts the mon addresses from the monmap, but with the
recent v2 format change it extracted an invalid address.
Fixes: http://tracker.ceph.com/issues/38385
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The pool and namespace can now be specified as in a
<pool-name>[/<namespace-name>] format as positional
arguments.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
It is particularly useful when running multiple rbd-mirror instances
in Active-Passive or Active-Active mode.
Signed-off-by: Mykola Golub <mgolub@suse.com>
expect_fail incorrectly unset '-e' option and if a consequent test
failed it did not abort the execution. And two typos in the namespace
tests were not detected due to this.
Signed-off-by: Mykola Golub <mgolub@suse.com>
Sporadically the rbd-mirror fsx stress test would fail due to very
slow sync times due to overloaded clusters. Attempt to wait for all
images to be replicated before proceeding with the comparison.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* check if geom_gate can be loaded before doing the actual tests
Otherwise continuing does not make sense.
Major reason for this problem is due to mismatch between
kernel and module versions.
* After FreeBSD kernevel 1200078 ggate resizing is possible
So set the flag that resizing can be tested
* Only sudo commands that really need sudo
rbd-ggate list is available in regular user mode
* be a bit more verbose during testing and list the test purpose
* list-mapped is an option in rbd-nbd, not (yet) in rbd-ggate
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
Avoid sporadic failures in combination with msgr-failures/many.yaml,
where assert_locked() might take over 10 seconds.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Since we only support Jewel and later releases, which both support
object-map and fast-diff, enabling/disabling object-map should always
enable/disable fast-diff.
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>