Since this --flags=locks takes the mds_lock and dumps thousands of ops, this
may take a long time to complete for each individual MDS. The entire quiesce
set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/57275/head:
qa/fsx: use a specified sha1 to build the xfstest-dev
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Leonid Usov <leonid.usov@ibm.com>
We are currently conducting regular ceph-dencoder tests for backward compatibility.
However, we are omitting tests for forward compatibility.
This suite will introduce tests against the ceph-objects-corpus to address forward
compatibility issues that may arise.
the script will install N-2 version and run against the latest version corpus objects
that we have, then install N-1 to N version and check them as well.
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
* refs/pull/57274/head:
mds: don't stall the asok thread for flush commands
qa/quiescer: relax some timing requirements in the quiescer
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
We're getting the following error while initializing 64MB disks
on WS 2019: "The disk is not large enough to support a GPT
partition style.".
For this reason, we'll use MBR instead.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
We're adding a test that:
* maps a configurable number of images
* runs a specified test - we're reusing the ones from stress_test,
making just a few minor changes to allow running the same test
multiple times
* restarts the ceph-rbd Windows service
* waits for the images to be reconnected and refreshes the mount
information
* reruns the test
* repeats the above workflow for a specified number of times,
reusing the same images
This test ensures that:
* mounted images are still available after a service restart
* drive letters are retained
* the image content is retained
* there are no race conditions when connecting or disconnecting
a large number of images in parallel
* the driver is capable of mapping a specified number of images
simultaneously
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
We're splitting the rbd-wnbd python test into separate files so
that the common code may easily be reused by other tests. This
also makes the code easier to read and maintain.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Single-major mapping scheme was introduced in 2014 and became the
default in 2017. It's getting increasingly difficult to build and,
more importantly, to boot a 10 year old kernel with recent userspace
(systemd, etc). If someone is still running such a kernel, it's
really unlikely that they would have the most recent rbd CLI tool
installed.
Fixes: https://tracker.ceph.com/issues/51845
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
as well as in:
- multisite tests (used for notification v2 migration tests)
- the qa suites running notifications
enable lifecycle logs in notification tests: for the lc notification test cases
this is needed after: 429967917b
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
RCA showed that it is not the NFS code that lead to the warning since the
warning occurred before the test cases started to execute, later on after
some discussion with the venky and greg, it was found that there were some
clog changes made recently which leads to this warning being added to the
clog.
Digging more further, it was found that the warning is generated when mgr fail
is run when there is no mgr available. The reason for unavailability is when
`setup_mgrs()` in class `MgrTestCase` stops the mgr daemons, sometimes the mgr
just crashes - `mgr handle_mgr_signal *** Got signal Terminated ***` and
after which `mgr fail` (again part of `setup_mgrs()`) is run and the `MGR_DOWN`
warning is generated.
This warning is only evident in nfs is because this is the only fs suite that
makes use of class `MgrTestCase`. To support my analysis, I had ran about eight
jobs in teuthology and I could not reproduce this warning. Since this is not
harming the NFS test cases execution and the logs do mention that the mgr
daemon did get restarted (`INFO:tasks.cephadm.mgr.x:Restarting mgr.x
(starting--it wasn't running)...`), it is good to conclude that ignoring this
warning is the simplest solution.
Fixes: https://tracker.ceph.com/issues/65265
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
* refs/pull/57192/head:
PendingReleaseNotes: add note on the client incompatibility health warning and feature bit
doc/cephfs: add client_mds_auth_caps client feature bit
doc/cephfs: add missing client feature bits
doc/cephfs: document MDS_CLIENTS_BROKEN_ROOTSQUASH health error
qa: add tests for MDS_CLIENTS_BROKEN_ROOTSQUASH
mds: raise health warning if client lacks feature for root_squash
mon/MDSMonitor: add note about missing metadata inclusion
mds: check relevant caps for fs include root_squash
mds: refactor out fs_name match in MDSAuthCaps
qa: test for root_squash with multiple caps
qa: pass kwargs to mount from remount
qa: simplify update_attrs and only update relevant keys
client: allow overriding client features
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/57166/head:
qa: make quiesce ops dump world readable
qa: use specific ops/cache dump file names
Reviewed-by: Leonid Usov <leonid.usov@ibm.com>
Where the client has root_squash for one cap but not for another. The fs
without root_squash should not necessarily reject the client.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This sha1 is the latest master head and works well for our tests.
Fixes: https://tracker.ceph.com/issues/64572
Signed-off-by: Xiubo Li <xiubli@redhat.com>
wait_for_replay_complete() doesn't wait for image status to get
updated. This didn't matter previously because these tests are run on
two different pools and nothing else was following.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If a pool replayer is removed in an error state (e.g. after failing to
connect to the remote cluster), its callout should be removed as well.
Otherwise, the error would persist causing "daemon health: ERROR"
status to be reported even after a new pool replayer is created and
started successfully.
Fixes: https://tracker.ceph.com/issues/65487
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
mgr/snap_schedule: restore yearly spec to lowercase y
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
test_multifs_single_path_rootsquash was never run with vstart_runner.py
or with teuthology and is therefore full of bugs. Fix it to make sure it
runs fine.
Introduced-by: 1fda8ed2d4
Fixes: https://tracker.ceph.com/issues/65246
Signed-off-by: Rishabh Dave <ridave@redhat.com>