This (already deprecated) module is removed as a side-effect of the
deprecation and removal of the `restful` module.
Fixes: https://tracker.ceph.com/issues/47066
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
We don't support balanced reads on ec pools. Additionally, the yaml
actually specifies 'balanced_reads' rather than 'balance_reads' and
therefore has no actual effect.
Signed-off-by: Samuel Just <sjust@redhat.com>
The OSD's IOPS capacity is used by the mClock scheduler to determine the
quantum of bandwidth allocation for the various operations on the OSD.
Prior to this commit, maybe_override_max_osd_capacity_for_qos() only
checked if the measured IOPS capacity exceeded the higher threshold defined
by 'osd_mclock_iops_capacity_threshold_[hdd|ssd]' and if so fallback to the
last valid or the default IOPS capacity as defined by
osd_mclock_max_capacity_iops_[hdd|ssd].
It's quite possible that the reported IOPS is unrealistically low. This
could be due to transient factors on the underlying device or it could
indicate bad health of the device. Either way, the safer option would be
to fallback to the last valid or the default IOPS setting for that OSD in
order to avoid cluster performance (slow or stalled ops) issues down the
line.
Therefore, to handle this case, the commit introduces additional config
options viz.,
- osd_mclock_iops_capacity_low_threshold_hdd - set to 50 IOPS and
- osd_mclock_iops_capacity_low_threshold_ssd - set to 1000 IOPS
If the measured IOPS capacity doesn't fall within the low and high
threshold range, the default or the last valid IOPS capacity is used.
The existing cluster log warning is suitably modified to convey the
reason.
Additionally, for a couple of valgrind related teuthology tests, the
cluster warning is added to the ignorelist since the reported IOPS can
be very low due to slowness.
Fixes: https://tracker.ceph.com/issues/67421
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
* refs/pull/59029/head:
qa: simplify postmerge construction
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Add framework for various random options for debug bluestore.
Use framework to select:
- write_v1
- write_v2
- write_v1 / write_v2 selected at random
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
and avoid errors when "clusternodes" is not defined.
Fixes: https://tracker.ceph.com/issues/67352
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
qa/suites/rados/verify/validater: increase heartbeat grace timeout
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Laura Flores <lflores@redhat.com>
Eventually, the PG_DEGRADED warning goes away and cluster goes
back to healthy state before the end of the test
Fixes: https://tracker.ceph.com/issues/66922
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
Having debug 20 is impractical. Slows down execution and takes disk space,
but gives little help in eventual debugging.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Running these tests with thrashers on small clusters leads to many very
slow ops due to the cluster being overloaded. That has a tendency to
make some of the API tests timeout and fail.
Fixes: https://tracker.ceph.com/issues/50371
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
Basically when we deploy a 3 MONS
Check if the connection scores are clean
with a 60 seconds grace period
Fixes: https://tracker.ceph.com/issues/65695
Signed-off-by: Kamoltat <ksirivad@redhat.com>
Test the case where 2 DC loses connection with each other
for a 3 AZ stretch cluster with stretch pool enabled.
Check if cluster is accessible and PGs are active+clean
after reconnected.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
Test the following new Ceph CLI commands:
`ceph osd pool stretch set`
`ceph osd pool stretch unset`
`ceph osd pool stretch show`
`qa/workunits/mon/mon-stretch-pool.sh`
will create the stretch cluster
while performing input validation for the CLI
Commands mentioned above.
`qa/tasks/stretch_cluster.py`
is in charge of
setting a pool to stretch cluster
and checks whether it prevents PGs
from the going active when there is not
enough buckets available in the acting
set of PGs to go active.
Also, test different MON fail over scenarios
after setting pool as stretch
`qa/suites/rados/singleton/all/mon-stretch-pool.yaml`
brings the scripts together.
Fixes: https://tracker.ceph.com/issues/64802
Signed-off-by: Kamoltat <ksirivad@redhat.com>
roles being set without overrides causing too many values to unpack (expected 1)
Fixes: https://tracker.ceph.com/issues/66209
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
The last PR modified the suites to only check for host thrasher.
This update fixes that issue by implementing different settings
with dedicated YAML files for host thrashing
Fixes: https://tracker.ceph.com/issues/66657
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
IO is frozen when the injectfull command is sent as part of the test
which causes the cleanup to hang so we need to clear it.
Fixes: https://tracker.ceph.com/issues/59380
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
thrash-old-clients tests should only support N-3 releases. To fix this for
main, I have removed all releases < quincy and have added squid.
Also, we are fully switching to centos.9_stream packages/containers after
the centos.8_stream end of life, so I changed the distro from centos.8_stream
to centos.9_stream.
*** Note: If this commit is backported, it should be done in such a way that
only releases >= quincy reference centos.9_stream. For instance, if backporting to squid,
a reef/squid thrash test is okay to make references to centos.9_stream since both reef and
squid support this, but a pacific/squid test will have to take a different approach
since pacific does not support centos.9_stream.
Fixes: https://tracker.ceph.com/issues/66398
Signed-off-by: Laura Flores <lflores@ibm.com>
We are currently conducting regular ceph-dencoder tests for backward compatibility.
However, we are omitting tests for forward compatibility.
This suite will introduce tests against the ceph-objects-corpus to address forward
compatibility issues that may arise.
the script will install N-2 version and run against the latest version corpus objects
that we have, then install N-1 to N version and check them as well.
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
OSD_DOWN cluster log warning is raised on rare occasions due to
the osd_hearbeat_grace timeout getting exceeded. The warning is
soon cleared. Given the nature of the test (valgrind), the
grace timeout is increased to 160 secs to avoid generating the
warning.
Fixes: https://tracker.ceph.com/issues/65768
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Not every log with this error has the parentheses, so
these warnings were still causing the test to fail
[ERR] [WRN] CEPHADM_STRAY_DAEMON: 2 stray daemon(s)... in cluster log
Signed-off-by: Adam King <adking@redhat.com>
* refs/pull/56997/head:
pybind/mgr: disable sqlite3/python autocommit
qa/tasks/mgr: add tests for sqlite autocommit
qa/tasks/vstart_runner: run daemons in foreground
qa/tasks/vstart_runner: add missing poll method
qa/suites/rados/mgr: add cli/devicehealth tasks
qa: reorganize mgr unit tests
qa: use position-independent link
qa: add missing terminating newline
pybind/mgr: add killpoint for sqlite3 database setup
mgr: allow specifying module option level
mon/MgrMonitor: promote standby when unsetting down flag
mon/MgrMonitor: only drop active if exists
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
The "stray daemon" that is getting logged about in this test is
from "stray daemon laundry.pid70383 on host smithi027 not managed by cephadm".
It seems the rados_api_tests is creating some additional "laundry" entity
during these tests that gets reported as an actual daemon in the mgr,
but cephadm is unaware of it, resulting in the warning. Originally
we thought to maybe add "laundry" itself to the ignorelist, but
without an additional patch that added extra logging for debug
purposes (which can't be merged) the log statement found in
the logs due to this problem will not say what daemon it found
to be stray. There will just be a generic warning about a stray
daemon. In a real cluster, a user would then check "ceph health detail"
to find out what daemon is stray, but the log scraper can't do this
and just fails the test due to the presence of the warning.
Signed-off-by: Adam King <adking@redhat.com>