It's common during cluster setup for there to be periods with
degraded/recovering PGs. Ignore those errors.
Signed-off-by: Samuel Just <sjust@redhat.com>
Bases on quincy-x.
```
$ cp -R qa/suites/upgrade/quincy-x/ qa/suites/upgrade/reef-x
$ git add qa/suites/upgrade/reef-x
$ git mv qa/suites/upgrade/reef-x/filestore-remove-check/1-ceph-install/quincy.yaml qa/suites/upgrade/reef-x/filestore-remove-check/1-ceph-install/reef.yaml
$ find qa/suites/upgrade/reef-x/ -type f -exec sed -i 's/quincy/reef/g' {} +
```
A note from rebase: changes from 05e24270a2efe85bcdceade87b0e91efcfca3001
have been pulled in.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
- remove upgrades from octopus
- stubs for completing upgrade to reef
Still missing the quincy-x upgrade tests.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
A basic test for ceph-nvmeof[1] where
nvmeof initiator is created.
It requires use of a new task "nvmeof_gateway_cfg"
under cephadm which shares config information
between two remote hosts.
[1] https://github.com/ceph/ceph-nvmeof/
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
as the scrub reservation changes had made it obsolete.
Note - it is not an issue of fixing the test, but rather
that the tested functionality is no longer there.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* refs/pull/53431/head:
qa: add test cases to verify error reporting works as expected
mgr: fix some doc strings in object_format.py
mgr/tests: test returning error status works as expected
mgr: make object_format's Responder class capable of responding err status
mgr/nfs: report proper errno with err status
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
This is a simple sub-suite that has one job. Always schedule on all supported distros.
Fixes: https://tracker.ceph.com/issues/43393
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
... when checking whether a rbd_support module command fails after
blocklisting the module's client.
In tests that check the recovery of the rbd_support module after its
client is blocklisted, the rbd_support module's client is
blocklisted using the `osd blocklist add` command. Next,
`osd blocklist ls` command is issued to confirm that the client is
blocklisted. A rbd_support module command is then issued and expected
to fail in order to verify that the blocklisting has affected the
rbd_support module's operations. Sometimes it was observed that before
this rbd_support module command reached the ceph-mgr, the rbd_support
module detected the blocklisting, recovered from it, and was able to
serve the command. To reduce the race window that occurs when trying to
verify that the rbd_support module's operation is affected by client
blocklisting, get rid of the `osd blocklist ls` command.
Fixes: https://tracker.ceph.com/issues/63673
Signed-off-by: Ramana Raja <rraja@redhat.com>
Generate a name that is shorter and easier to remember.
Also, write a simpler, faster & better helper method for generating
unique names. This method will also have shorter and more concise name,
so this will be easier to type and easier to read.
Fixes: https://tracker.ceph.com/issues/63680
Signed-off-by: Rishabh Dave <ridave@redhat.com>
kernel 5.4 (Ubuntu 20.04) has the following missing commits:
- 5a9e2f5d5590 ceph: add ceph.{cluster_fsid/client_id} vxattrs
- 247b1f19dbeb ceph: add status debugfs file
fs suite relies on these debugfs entries to gather mount information
(client-id, addr/inst) which are required by some tests. In fs suite,
the disto kernel gets overridden by the testing kernel and therefore
even if Ubuntu 20.04 is chosen as the distro, the testing kernel is
installed. However, with smoke suite, the distro kernel is used and
the missing patches causes certain essential information gathering to
fail early on (client-id, etc..) causing the test to not even start
execution. PR #54515 fixes a bug in the client-id fetching path but
isn't complete due to the missing patches - details here:
https://tracker.ceph.com/issues/63488#note-8
But its essential to have the smoke tests running since those tests
have lately uncovered bugs in the MDS (w/ distro kernels). In order
to benefit from those tests, this change ignores failures when
gathering mount information (which aren't used by the fs relevant
smoke tests). The test (in fs suite) that rely on this piece of
information would fail when run with 20.04 distro kernel (but the
fs suite overrides it with the testing kernel).
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Writing guest keyring to CWD's file named "keyring" will over-write
build/keyring on developer's machine which will make the cluster
inoperatable and also fail the test.
Fixes: https://tracker.ceph.com/issues/63506
Signed-off-by: Rishabh Dave <ridave@redhat.com>
We need to get more debug logs from bluestore to know what exactly
has happened for the extent map.
URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@redhat.com>
The config setting is persisted in ceph.conf after the MDSs are started.
However, the test case fails the file system causing the active MDS to
restart and pick up the new config. When the file system is marked joinable,
then, if the MDS which was standby before the file system was marked failed
takes over as the rank, the updated setting are not used by this MDS.
In the failed test, merging directory fragment is disabled, but since
the config is set in ceph.conf, the (earlier standby) MDS which acquires
a rank uses the default merge size causing the dirfrag to merge and
thereby tripping the test.
Fixes: http://tracker.ceph.com/issues/57087
Signed-off-by: Venky Shankar <vshankar@redhat.com>
... instead of a simple counter.
This - as a preparation for the next commit, which will decouple
the "being reserved" state from the handling of scrub requests.
The planned changes to the scrub state machine will make
it harder to know when to clear the "being reserved" state.
The changes here would allow us to err on the side of caution,
i.e. trying to "un-count" a remote reservation even if it was not
actually reserved or was already deleted.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Using
ceph tell $pgid [deep]-scrub
to initiate an 'operator initiated' scrub, and
ceph tell $pgid schedule[-deep]-scrub
for causing a 'periodic scrub' to be scheduled.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
The snapdiff test cases will take too much time, sometimes for hours.
It's very inconvenient to run some general tests locally.
Just move it to a dedicated binary.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Replace old tarball link with a url link to linux-6.5.11.tar.xz.
Fixes: https://tracker.ceph.com/issues/57655
Signed-off-by: Milind Changire <mchangir@redhat.com>
qa/cephadm: basic test for monitoring stack
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Reviewed-by: Redouane Kachach <rkachach@redhat.com>
Since we're adding a warning if any host is listed explicitly
in the placement of any service when removing the host,
we need to adjust the host drain test that removes a host
without the --force flag to not have the explicit hostname
in the placement for the mon service.
Signed-off-by: Adam King <adking@redhat.com>
cephfs,mon: fs rename must require FS to be offline and refuse_client_session to be set
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
This was a gap in our testing in general, but I'm
adding it here right now specifically to use it
to test the "--rm-crush-entry" flag in a follow
up commit
Signed-off-by: Adam King <adking@redhat.com>
When working with large group tests (18 in this case), it gets very
tedious to debug and fix tests when all 18 have to be run again for
every mistake. Cheap fix for this to split these 18 tests into several
classes.
But when modification are made to the feature, all these 18 tests needs
to exercised and previous solution forces developer to intitiate all
these test classes to run one by one.
Best of both worlds can be achieved if we split tests into group but
move all these related group to a new file.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add a FS command that enables users to swap names of two file systems in
a single PAXOS transaction. Add an option to this command that swaps
FSCIDS along with FS names. This commands also updates the application
pool tags and fails when mirroring is enabled on either or both FSs.
Fixes: https://tracker.ceph.com/issues/58129
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Maintain the prefix_itr between calls to SnapMapper::get_next_objects_to_trim() to prevent searching depleted prefixes.
We got 8 distinct hash prefixes used for searching objects owned by a given PG.
On each call to SnapMapper::get_next_objects_to_trim() we start from the first prefix even after all objects mapped to it were depleted.
This means that we will be searching for 1 non-existing prefix after the first prefix was depleted, 2 after the first two prefixes were depleted... and so on until we will search 7 non-existing prefixes after the first 7 prefixes were depleted.
This is a performance improvement PR only!
It maintains the existing behavior and does not try to fix/change any of the TRIM logic.
I added an extra step after the last object is trimmed doing a full scan of the DB and only if no object was found it will return ENOENT.
This should make the new code no-worse than existing code which returns ENOENT after a full scan found no object.
It should not impact performance in real life snaps as it should only happen once per-snap.
added snap-mapper tests to rados-test-suite
disabled osd_debug_trim_objects when running (SnapMapperTest, prefix_itr) to prevent asserts(as this code does illegal inserts into DELETED snaps)
Code beautifing
Disabled the assert as there is a corner case when we retrieve the last valid object/s in a snap
The prefix_itr is advanced past the last valid value (as we completed a full scan)
If the OSD will call get_next_objects_to_trim() before the retrieved object/s was processed and removed from the SnapMapper DB it won't be found by the next call (as the prefix_itr is invalid).
The object will be found in the second-pass which will seems as if it was added after the trim was started (which is illegal) and will trigger an ASSERT
Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
client_refuse_session must be set for a CephFS before an attempt to
rename a CephFS can be made. Add a new test for this, and update current
tests (test_admin.py and test_volumes.py) accordingly.
Fixes: https://tracker.ceph.com/issues/63154
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Move tests for "ceph fs volume rename" command to a new class. This
makes it possible to run this group of tests in a single command.
This provides a convenient way to execute these tests which is necessary
after the changes has been made to the code for the "ceph fs volume
rename" command.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Reject the attempt to rename the CephFS is the CephFS is not offline.
Add new tests for this and update current tests (test_admin.py and
test_volumes.py) accordingly.
Fixes: https://tracker.ceph.com/issues/63154
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The idea is to avoid the maintenance of duplicate code in both the journal
and snapshot test scripts.
Usage:
RBD_MIRROR_MODE=journal rbd_mirror.sh
Use environment variable RBD_MIRROR_MODE to set the mode
Available modes: snapshot | journal
Fixes: https://tracker.ceph.com/issues/54312
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
add --progress flag to git submodule update commands
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Ceph has lots of submodules that needs to be cloned before building
binaries from the repository. Seeing the progress when these submodules
are being cloned is useful, especially when developers/users have a
network issue or a slow network.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
RGW daemons register in the servicemap by gid which allows multiple radosgw instances to share an auth key/identity. The daemon name is sent as part of the metadata. (84c265238b).
All other daemons register by the daemon name and the manager stores all daemon state information with daemon name as key. The 'config show' command looks up the daemon_state map with the daemon name the user mentions as key (for example: 'osd.0', 'client.rgw', 'mon.a').
Due to the change in RGW daemon registration, the key used for storing daemon state has become rgw.gid and 'config show client.rgw' no longer works.
This change will take care of going through the daemon metadata to look for the RGW daemon name when a user enters the config show command for a RGW daemon. Once the correct daemon is found, we retrieve the corresponding daemon key (rgw.gid) and use that to query the daemon_state map.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2011756
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
Dependencies listed in xfstests_dev.py for xfstests-dev project are
outdated. This leads the xfstests_dev.py based integration tests to
fail. Update this dependency list using README of xfstests-dev project.
Also, remove code which is not relevant anymore (specifically, if-block
that checks and deals for 'python' and 'btrfs-progs-devel').
Fixes: https://tracker.ceph.com/issues/62556
Signed-off-by: Rishabh Dave <ridave@redhat.com>
previously, valgrind_post() func used grep to find error
from valgrind logs.
now, it uses ValgrindScanner to log better exceptions with
traceback and exception kind, along with creating a more detailed
summary in valgrind.yaml in archive.
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
"unit_test_scan" optional config uses Remote.run_unit_test()
to scan xml files generated by unit tests to throw better failure
messages (for s3tests and gtests run by workunit)
It also creates "unit_test_summary.yaml" for more exception details
from xml files.
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
also use kafka binaries insted of building from source
Fixes: https://tracker.ceph.com/issues/63205
Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
* refs/pull/53839/head:
qa: enhance test cases
mds: erase clients getting evicted from laggy_clients
mds: report clients laggy due laggy OSDs only after checking any OSD is laggy
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Some of the smoke, orch and perf-basic tests are failing due
to POOL_APP_NOT_ENABLED health check failure. Add
POOL_APP_NOT_ENABLED to ignorelist for these tests.
Signed-off-by: Prashant D <pdhange@redhat.com>
Highlights of this commit include:
- splitting the rgw perf counters cache int two
caches for bucket labeled and user labeled op counters
- add config overrides to verify suite for CI
- add tenant label for op counters
- misc cleanup cleanup
- add docs for rgw metrics
Signed-off-by: Ali Maredia <amaredia@redhat.com>