The rook team relies on a daily CI system to validate
rook changes. It doesn't seem that the teuthology tests
are maintained, so it makes sense to remove them from the
rados suite.
By removing this symlink, rook test coverage will remain
in the orch suite, and coverage will only be removed from the
rados suite.
Workaround for: https://tracker.ceph.com/issues/58585
Signed-off-by: Laura Flores <lflores@redhat.com>
1. Setting frequent scrub status updates, to compensate for the removal
of some 'send updates' in PR#50283.
2. Switching back to using the wpq scheduler, as otherwise the number of
concurrent recovery operations is below what the test expects.
Fixes: https://tracker.ceph.com/issues/61386
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Runs a boto script that reuploads one part multiple times before
completing and then we check for any orphans.
Original boto script contributed by Matt Benjamin
<mbenjami@redhat.com> on top of which modifications were made.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
The qa e2e is failing because the script is not adapted with cypress 10.
Fixes: https://tracker.ceph.com/issues/61519
Signed-off-by: Nizamudeen A <nia@redhat.com>
* refs/pull/50875/head:
mon/MDSMonitor: ignore extraneous up:boot messages
qa: add test case for mds sending multiple boot messages
qa: support checking for a log message that should not exist
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Initialize standalone test for stretched clusters,
testing uneven weight warnings and != 2 buckets
warnings.
Added `wait_for_health_gone()` function in ceph-helpers.sh
this function allows us to wait for health condition to
disappear when doing standalone tests.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
The `cephadm version` command no longer bases the output on the
container images, rather it uses a special python file added to the
zipapp during the build to report on the version of cephadm (the
binary).
The other option was to preserve this behavior and add a new version
command or make it behave differently depending on what options were
provided. I discussed the options with AMK in person and we decided that
changing the tests was preferable.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
With 1 sec. delay we may sometimes fail to get correct length of
quorum since the monitor didn't updated on time.
With the following fix, we will wait for quorum and check every few
seconds (3) until timeout (30).
Fixes: https://tracker.ceph.com/issues/52316
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
With mClock scheduler enabled, a small subset of config options related
to recovery limits are not allowed to be modified unless
osd_mclock_override_recovery_settings option is enabled. This override
option is disabled by default. The following options cannot be modified
without enabling the override option:
- osd_max_backfills
- osd_recovery_max_active[_(hdd|ssd)]
The above options are removed from the mon kv store which effectively
restores them to the default values.
This was resulting in tests for example,
test_cluster_configuration.ClusterConfigurationTest to fail since it
modifies the recovery options and expects to verify the modified value.
Therefore, for tests, osd_mclock_override_recovery_settings option is
enabled in vstart_runner.py so that current and future tests
are not affected.
Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
This is a follow-up to PR: https://github.com/ceph/ceph/pull/48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.
Other changes:
1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
built-in profile defaults.
Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
* refs/pull/49691/head:
qa: add test for opening a file via a hard link that is not in the same mds as the inode
mds: rdlock_path_xlock_dentry supports returning auth target inode
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Using the default pool size of 2 with random eio thrashing can cause
some of the object to mark as lost.
fixing typo from 'osd default pool size: 3' to 'osd pool default size: 3'
so we will have pool size 3 correctly.
Fixes: https://tracker.ceph.com/issues/49888
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
With the new mClock default profile, tests were failing with "Exiting scrub checking -- not all pgs scrubbed" due to slower scrubs.
Changing the default profile to high_recovery_ops for testing purposes will fix this issue.
Fixes: https://tracker.ceph.com/issues/61228
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
We have few test suites that using 'override' in yaml file
while ceph.py task is looking for 'overrides', in that case
those configure params won't take any affects.
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
* refs/pull/51386/head:
qa: ignore cluster warning when fs flag refuse_client_session is set
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/51251/head:
PendingReleaseNotes: add a note about deleting files from lost+found directory
qa: add checks that validate removal of entries from lost+found dir
mds: allow unlink operation under lost+found directory
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/43184/head:
qa: fix journal flush failure issue due to the MDS daemon crashes
qa: add test support for the alloc ino failing
mds: do not take the ino which has been used
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Running file system scrub is recommended post running filesystem
data and metadata recovery. Running scrub isn't covered in tests.
Fixes: http://tracker.ceph.com/issues/59527
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.
Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.
Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.
A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Let's use the middle profile as the default.
Modify the standalone tests accordingly.
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fix an issue where an overridden mClock recovery setting (set prior to
an osd restart) could be lost after an osd restart.
For e.g., consider that prior to an osd restart, the option
'osd_max_backfill' was successfully set to a value different from the
mClock default. If the osd was restarted for some reason, the
boot-up sequence was incorrectly resetting the backfill value to the
mclock default within the async local/remote reservers. This fix
ensures that no change is made if the current overriden value is
different from the mClock default.
Modify an existing standalone test to verify that the local and remote
async reservers are updated to the desired number of backfills under
normal conditions and also across osd restarts.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The mClock scheduler's cost model for HDDs/SSDs is modified and now
represents the cost of an IO in terms of bytes.
The cost parameters, namely, osd_mclock_cost_per_io_usec_[hdd|ssd]
and osd_mclock_cost_per_byte_usec_[hdd|ssd] which represent the cost
of an IO in secs are inaccurate and therefore removed.
The new model considers the following aspects of an osd to calculate
the cost of an IO:
- osd_mclock_max_capacity_iops_[hdd|ssd] (existing option)
The measured random write IOPS at 4 KiB block size. This is
measured during OSD boot-up using OSD bench tool.
- osd_mclock_max_sequential_bandwidth_[hdd|ssd] (new config option)
The maximum sequential bandwidth of of the underlying device.
For HDDs, 150 MiB/s is considered, and for SSDs 750 MiB/s is
considered in the cost calculation.
The following important changes are made to arrive at the overall
cost of an IO,
1. Represent QoS reservation and limit config parameter as proportion:
The reservation and limit parameters are now set in terms of a
proportion of the OSD's max IOPS capacity. The earlier representation
was in terms of IOPS per OSD shard which required the user to perform
calculations before setting the parameter. Representing the
reservation and limit in terms of proportions is much more intuitive
and simpler for a user.
2. Cost per IO Calculation:
Using the above config options, osd_bandwidth_cost_per_io for the osd is
calculated and set. It is the ratio of the max sequential bandwidth and
the max random write iops of the osd. It is a constant and represents the
base cost of an IO in terms of bytes. This is added to the actual size of
the IO(in bytes) to represent the overall cost of the IO operation.See
mClockScheduler::calc_scaled_cost().
3. Cost calculation in Bytes:
The settings for reservation and limit in terms a fraction of the OSD's
maximum IOPS capacity is converted to Bytes/sec before updating the
mClock server's ClientInfo structure. This is done for each OSD op shard
using osd_bandwidth_capacity_per_shard shown below:
(res|lim) = (IOPS proportion) * osd_bandwidth_capacity_per_shard
(Bytes/sec) (unitless) (bytes/sec)
The above result is updated within the mClock server's ClientInfo
structure for different op_scheduler_class operations. See
mClockScheduler::ClientRegistry::update_from_config().
The overall cost of an IO operation (in secs) is finally determined
during the tag calculations performed in the mClock server. See
crimson::dmclock::RequestTag::tag_calc() for more details.
4. Profile Allocations:
Optimize mClock profile allocations due to the change in the cost model
and lower recovery cost.
5. Modify standalone tests to reflect the change in the QoS config
parameter representation of reservation and limit options.
Fixes: https://tracker.ceph.com/issues/58529
Fixes: https://tracker.ceph.com/issues/59080
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
- test_nfs_export_creation_at_filepath:
ENOTDIR is raised instead of EINVAL which is better
aligned with the nature of the failure
- test_nfs_export_creation_at_symlink:
ENOTDIR is raised instead of ENOENT since the code
can now check if the path is symlink but won't follow
it.
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
It actually didn't test the invalid path but still ended with
ENOENT(which is expected in case path is invalid) as the test
didn't create a fs, and it failed saying "FS nfs-cephfs not found"
which too raises ENOENT and thus it always passed.
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Class CapTester contains two distinct immiscible group of methods: one
that tests MON caps and other that tests MDS caps. When using CapTester
for the former reason the instantiation neither needs mount object and
the path where files for testing will be created nor it needs to run the
method that creates files for testing rw permissions. When using
this class for latter the case is the exact opposite.
Create 2 separate classes for each of these purpose and class that
inherits both of these classes so that instantiating the class becomes
as simple as it can be.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The worst case in kclient the dirty caps will be held for 60 seconds,
and also the MDS may defer updating the directory rstat for 5 seconds,
which is per tick, maybe longer if needs to wait for mdlog to flush.
Fixes: https://tracker.ceph.com/issues/59349
Signed-off-by: Xiubo Li <xiubli@redhat.com>
When trying to filling the volume space by continuing filling multiple
files, and when flushing the dirty caps back to MDS the MDS will try
to skip updating the parent rstat in 'mds_dirstat_min_interval' to
avoid propagating more often than this. That means the quota changes
couldn't be broadcasted to the clients in time.
So after waiting for 20 seconds, and if we try to write the existing
files only the first file could successfully update the parent quota
realm in MDS, but this won't increase the total size.
Fixes: https://tracker.ceph.com/issues/59349
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Move get_mon_cap_from_keyring() and get_fsnmes_from_moncap() from class
CapTester to main namespace of caps_helper.py so that they can be
imported freely and reused by tests.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This method checks if the output of the command "ceph fs ls" for client
ID it receives is same as the output printed for client.admin. Don't do
so, limit the test to only checking if "ceph fs ls --id client.x -k
keyring_file" prints fs name for which client.x has permissions.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Improvement #1:
CapTester.write_test_files() not only creates the test file but also
does the following for every mount object it receives in parameters -
* carefully produces the path for the test file as per parameters
received
* generates the unique data for each test file on a CephFS mount
* creates a data structure -- list of lists -- that holds all this
information along with mount object itself for each mount object so
that tests can be conducted at a later point
Untangle this mess of code by splitting this method into 3 separate
methods -
1. To produce the path for test file (as per user's need).
2. To generate the data that will be written into the test file.
3. To actually create the test file on CephFS.
Improvement #2:
Remove the internal data structure used for testing -- self.test_set --
and use separate class attributes to store all the data required for
testing instead of a tuple. This serves two purpose -
One, it makes it easy to manipulate all this data from helper methods
and during debugging session, especially while using a PDB session.
And two, make it impossible to have multiple mounts/multiple "test sets"
within same CapTester instance for the sake of simplicity. Users can
instead create two instances of CapTester instances if needed.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Inheritting CephFSTestCase in CapTester just for methods assertEqual()
and assertIn() from class unittest.TestCase is odd and heavy-weight.
Don't inherit CephFSTestCase and use simple assert instead.
Reference: https://github.com/ceph/ceph/pull/50882#discussion_r1160611549.
To avoid code duplication, a couple of similar methods have been added
instead.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The tuple was not meant to be passed as a whole but its individual
members are to be passed as a list of positional arguments.
Introduced-by: 87025d1585
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Separate `mon-stretch` from `mon`.
Renamed `mon-stretched-cluster.sh` to
`mon-stretch-fail-recovery.sh`.
This isolation of stretch cluster test will enable
developers to get results faster for stretch-cluster
related stuff.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
qemu-utils is usually pre-installed but, due to what appears to be
a Ubuntu packaging bug, it's not upgraded when qemu-block-extra is
installed:
The following NEW packages will be installed:
qemu-block-extra
The following packages will be upgraded:
qemu-system-common qemu-system-data qemu-system-gui qemu-system-x86
However, the version of the block driver must match exactly the version
of the qemu-img tool, so the above leads to:
$ qemu-img convert -f qcow2 -O raw /home/ubuntu/cephtest/qemu/base.client.0.0.qcow2 rbd:rbd/client.0.0
Failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so
Note: only modules from the same build can be loaded.
qemu: module block-block-rbd not found, do you want to install qemu-block-extra package?
qemu-img: Unknown protocol 'rbd'
Fixes: https://tracker.ceph.com/issues/59431
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
There are two types of snapshots that can be created on a snapshot based mirroring image - Normal Snapshot(same as journal based snapshot) and Nirror Image Snapshot. Till now Dashboard allowed only Mirror image snapshot, this PR intends to enable both the types
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
1e44d86b2 swapped this to a pg tell command which doesn't actually
need the primary specified. Drop the now unnecessary lookup.
Signed-off-by: Samuel Just <sjust@redhat.com>
Trying to add a feature where mon crush locations
can be set through the orchestrator using the mon
service spec. This is meant to be a test for that.
Signed-off-by: Adam King <adking@redhat.com>
mgr: Add one finisher thread per module
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>