otherwise we have following warning in health report
{"status":"HEALTH_WARN","checks":{"RECENT_MGR_MODULE_CRASH":{"severity":"HEALTH_WARN","summary":{"message":"1 mgr modules have recently crashed","count":1},"muted":false}},"mutes":[]}
and it does not disappear after the test waits for 30 seconds.
and the tasks.mgr.test_module_selftest.TestModuleSelftest test
fails like:
2021-07-21T09:59:52.560 INFO:tasks.cephfs_test_runner:======================================================================
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:ERROR: test_module_commands (tasks.mgr.test_module_selftest.TestModuleSelftest)
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/mgr/test_module_selftest.py", line 201, in
test_mo
dule_commands
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: self.wait_for_health_clear(timeout=30)
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 172, in
wait_for_health_c
lear
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: self.wait_until_true(is_clear, timeout)
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 209, in
wait_until_true
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2021-07-21T09:59:52.564 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries
in this change, the crash reports are nuked right after
we see the warning, so that we can have a clean health
report.
Fixes: https://tracker.ceph.com/issues/51743
Signed-off-by: Kefu Chai <kchai@redhat.com>
"device_health_metrics" pool is gone -- .mgr pool is in.
I don't think the pool removal code in some test cases is necessary any
longer with recent changes to remove those warnings; so that code is
gone too.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
query the python version before trying to test diskprediction_local
Fixes: https://tracker.ceph.com/issues/50196
Signed-off-by: Kefu Chai <kchai@redhat.com>
since all module options are using the new-style config framework.
the migration is offered for the use case of upgrade from luminous to mimic,
since pacific can only be upgraded from octopus. the mimic monitors are alreay
able to populate the configurations to mgr, not to mention the octopus
monitors, so there is no need to migrate the options stored in config-key
store anymore.
Signed-off-by: Kefu Chai <kchai@redhat.com>
the test for diskprediction_cloud is never enabled, and the used
cloud-based service is not reachable anymore. let's just remove the dead
code.
Signed-off-by: Kefu Chai <kchai@redhat.com>
for better readability, and to ease the pain of developer to track back
to the top level python package for referencing a submodule
Signed-off-by: Kefu Chai <kchai@redhat.com>
Introduced in 4872cc5aa3
`_ceph_set_module_option` also accepts `None`, not just strings.
Fixes: http://tracker.ceph.com/issues/40779
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
mgr/dashboard: Add separate option to config SSL port
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
There is a need to introduce this new config option because the MgrModule::get_module_option() and MgrModule::get_localized_module_option() method will be refactored soon and will not support the default parameter anymore. Instead the default value must be configured in the MODULE_OPTIONS. Currently we misuse the server_port depending on if SSL is enabled or not.
Fixes: https://tracker.ceph.com/issues/38331
Signed-off-by: Volker Theile <vtheile@suse.com>
When mgr/selftest/testkey = foo and mgr/selftest/x/testkey is not set,
then get_localized() should return foo.
Signed-off-by: Sage Weil <sage@redhat.com>
Separate diskprediction local cloud from the diskprediction plugin.
Devicehealth invoke device prediction function related on the global
configuration "device_failure_prediction_mode".
Signed-off-by: Rick Chen <rick.chen@prophetstor.com>
This module is written by Rick Chen <rick.chen@prophetstor.com> and
provides both a built-in local predictor and a cloud mode that queries
a cloud service (provided by ProphetStor) to predict device failures.
Signed-off-by: Rick Chen <rick.chen@prophetstor.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Avoid need for each module to expose a self-test
command: they can just implement the method,
and then get it called via the selftest module.
As well as fewer LOC, this means that the self
test commands are not cluttering the interface
for end users, as they've invisible until
the selftest module is loaded.
Signed-off-by: John Spray <john.spray@redhat.com>
This is being done by passing native CPython objects
back and forth. It's safe because sub-interpreters in CPython
share memory allocation infrastructure and share the GIL.
With a view to PEP554, we limit inter-interpreter calls
to pickleable objects, so that this may be implemented
using byte-arrays in future.
This infrastructure should enable:
- the dashboard to display the status of other modules, for
example the set of progress indicators from `progress`
- dashboard and restful to share an underlying long running
job mechanism.
Signed-off-by: John Spray <john.spray@redhat.com>
This Manager Module will send statistics and version information from
a Ceph cluster back to telemetry.ceph.com if the user has opted-in on sending
this information.
Additionally a user can tell that the information is allowed to be made
public which then allows other users to see this information.
Signed-off-by: Wido den Hollander <wido@42on.com>
(cherry picked from commit 8f6137d162)
Telegraf is a agent for collecting and reporting metrics.
It has multiple inputs and can send data to various outputs like
for example InfluxDB or ElasticSearch.
This module works by using the socket_listener of Telegraf and can
send data over UDP, TCP and a local Unix Socket.
Signed-off-by: Wido den Hollander <wido@42on.com>
With this change, we avoid the disabling/enabling of the ceph-mgr module
being tested for each test function declared in each test case. Now
the ceph-mgr module being tested is disabled/enabled only once for each
test case.
Signed-off-by: Ricardo Dias <rdias@suse.com>
This was throwing IOError("Port 9283 not free on '::'",)
when trying to serve, since merging https://github.com/ceph/ceph/pull/19744
It's because the standbys (on the same node as the active) are
now trying to listen too.
Fixes: https://tracker.ceph.com/issues/22755
Signed-off-by: John Spray <john.spray@redhat.com>
Even though the selftest routine doesn't care about
the settings, we should set them to avoid emitting
nasty log/health messages when enabling the module.
Fixes: http://tracker.ceph.com/issues/22514
Signed-off-by: John Spray <john.spray@redhat.com>
Some extra coverage of the dashboard, including its standby
redirect mode and the publishing of URIs.
Also invoking the command_spam mode of the selftest module.
Signed-off-by: John Spray <john.spray@redhat.com>
The module self test commands give us a chance to
catch any other ceph changes that change something
that a module was relying on reading.
Signed-off-by: John Spray <john.spray@redhat.com>