Changes some the tests in teuthology to make
the test more deterministic.
Using:
`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.
took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.
Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
query the python version before trying to test diskprediction_local
Fixes: https://tracker.ceph.com/issues/50196
Signed-off-by: Kefu Chai <kchai@redhat.com>
This will retain the debug log settings for all RADOS suites
that were previously symlinked to the 'objectstore'
directory. The next commit will reduce the debug log level
for the original 'objectstore' directory for the remainder
of tests.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This reverts commit a7994a0fdd.
Failed attempt at solving the issue is in PR #33272. Until we
find a clean solution for this, whiltelisting the warning is
probably the best thing for now.
Fixes: http://tracker.ceph.com/issues/43943
Signed-off-by: Venky Shankar <vshankar@redhat.com>
in tasks/module_selftest.yaml, `TestModuleSelftest.test_telegraf()` is
called. but we fail to prepare a unix domain socket to which the telegraf
module can send stats. and telegraf module does not catch
FileNotFoundError exception, so the exception is populated to ceph-mgr
and is found by the test, hence the test is marked a failure whenever
telegraf is tested.
in this change,
* catch this exception, so it won't be caught by ceph-mgr
* whitelist the error message, so the test can pass
Signed-off-by: Kefu Chai <kchai@redhat.com>
This moves dashboard.yaml from rados/mgr into a new, separate rados/dashboard
suite. The common elements it uses are moved from rados/mgr into qa/ and
replaced with symlinks.
Fixes: https://tracker.ceph.com/issues/41820
Signed-off-by: Nathan Cutler <ncutler@suse.com>
We're currently facing some issues with our integration
tests. Because of that we agreed on commenting questionable
suites out to be able to run all other suites on open pull
requests.
'test_health' and 'test_perf_counters' are commented out
because they led to issues in relation to
https://tracker.ceph.com/issues/41538
As soon as the issue has been fixed, we need to re-add
these two suites again.
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
* refs/pull/29034/head:
doc/mgr/crash: document missing commands, options
qa/suites/rados/singleton/all/test-crash: whitelist RECENT_CRASH
qa/suites/rados/mgr/tasks/insights: whitelist RECENT_CRASH
qa/tasks/mgr/test_insights: crash module now rejects bad crash reports
mgr/telemetry: fix remote into crash do_ls()
mgr/crash: don't make these methods static
mgr/BaseMgrModule: handle unicode health detail strings
mgr/crash: verify timestamp is valid
qa/suites/mgr: whitelist RECENT_CRASH
mgr/crash: remove unused var
mgr/crash: remove unused import 'six'
qa/workunits/rados/test_crash: health check
mgr/crash: improve validation on post
mgr/crash: automatically prune old crashes after a year
mgr/crash: raise RECENT_CRASH warning for recent (new) crashes
mgr/crash: add 'crash ls-new'
mgr/crash: add option and serve infra
mgr/crash: keep copy of crashes in memory
mgr/pg_autoscaler: adjust style to match built-in tables
mgr/crash: make 'crash ls' a nice table with a NEW column
mgr/crash: nicely format 'crash info' output
mgr/crash: add 'crash archive <id>', 'crash archive-all' commands
Reviewed-by: Neha Ojha <nojha@redhat.com>
The mgr's libcephfs client gets evicted after the mgr fails over.
Whitelist the message.
Fixes: http://tracker.ceph.com/issues/40867
Signed-off-by: Sage Weil <sage@redhat.com>
This warning is caused by the recent changes to the volumes
module that cache the CephFS handles.
Commit 5c41e949af
Signed-off-by: Ricardo Dias <rdias@suse.com>
1. To be able to run the cli without an external orchestrator.
2. Run the CLI in Teuthology.
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
The current solution fails on our CI-system as some outputs can have
more values and some parameters like 'w' can vary in different
environments.
As this was only tested before in a vstart cluster environment it
worked.
Through this commit only the given attributes we know to be there,
will be tested.
Fixes: https://tracker.ceph.com/issues/37275
Signed-off-by: Stephan Müller <smueller@suse.com>
This splits out the collection of health and log data from the
/api/dashboard/health controller into /api/health/{full,minimal} and
/api/logs/all.
/health/full contains all the data (minus logs) that /dashboard/health
did, whereas /health/minimal contains only what is needed for the health
component to function. /logs/all contains exactly what the logs portion
of /dashboard/health did.
By using /health/minimal, on a vstart cluster we pull ~1.4KB of data
every 5s, where we used to pull ~6KB; those numbers would get larger
with larger clusters. Once we split out log data, that will drop to
~0.4KB.
Fixes: http://tracker.ceph.com/issues/36675
Signed-off-by: Zack Cerza <zack@redhat.com>
Add options to mark OSDs in/out/down/reweight/lost/remove/destroy/create
Fixes: http://tracker.ceph.com/issues/24270
Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
This utilizes the recent feature in teuthology [1] to skip hidden files in
suites when building the job matrix.
Idea of this change is to enable referring to the top-level qa directory in a
position-independent way such that copies of a suite to another location do not
break any symlinks.
[1] https://github.com/ceph/teuthology/pull/1185
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Add ability to list, set and unset cluster-wide OSD flags.
Flags can be listed and changed through the `/api/osd/flags` API
resource. By using a GET request, the list is retrieved. By using a PUT
request, the flags are updated (all at once). Flags not contained in the
data of the PUT are removed, additional once are added. Note that the
PUT requests require a JSON body with the data contained as value of the
'flags' key like so:
{"flags": ["flag1", "flag2", ...]}
Fixes: http://tracker.ceph.com/issues/24056
Signed-off-by: Patrick Nawracay <pnawracay@suse.com>