A module option called CLUSTER_STATUS has two option. INSTALLED
AND POST_INSTALLED. When CLUSTER_STATUS is INSTALLED it will allow to show the
create-cluster-wizard after login the initial time. After the cluster
creation is succesfull this option is set to POST_INSTALLED
Also has the e2e codes for the Review Section
Fixes: https://tracker.ceph.com/issues/50336
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
qa/mgr/dashboard: add extra wait to test
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Fixes: https://tracker.ceph.com/issues/52480
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Introducing APIVersion class to handle versioning for API-endpints and making
them backward compatible.
- Rename the dashboard command to better reflect its behavior.
- Rename '_radosgw_admin' method to 'send_rgwadmin_command' for consistency with
'send_mon_command' and move it to the mgr_module.py .
- Cleanup: remove unneeded rgw settings.
- Better error handling and test coverage.
Fixes: https://tracker.ceph.com/issues/44605
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
otherwise we have following warning in health report
{"status":"HEALTH_WARN","checks":{"RECENT_MGR_MODULE_CRASH":{"severity":"HEALTH_WARN","summary":{"message":"1 mgr modules have recently crashed","count":1},"muted":false}},"mutes":[]}
and it does not disappear after the test waits for 30 seconds.
and the tasks.mgr.test_module_selftest.TestModuleSelftest test
fails like:
2021-07-21T09:59:52.560 INFO:tasks.cephfs_test_runner:======================================================================
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:ERROR: test_module_commands (tasks.mgr.test_module_selftest.TestModuleSelftest)
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/mgr/test_module_selftest.py", line 201, in
test_mo
dule_commands
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: self.wait_for_health_clear(timeout=30)
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 172, in
wait_for_health_c
lear
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: self.wait_until_true(is_clear, timeout)
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 209, in
wait_until_true
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2021-07-21T09:59:52.564 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries
in this change, the crash reports are nuked right after
we see the warning, so that we can have a clean health
report.
Fixes: https://tracker.ceph.com/issues/51743
Signed-off-by: Kefu Chai <kchai@redhat.com>
Changes some the tests in teuthology to make
the test more deterministic.
Using:
`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.
took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.
Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
When you add a host in maintenance mode and then exit the maintenance
mode, a 500 server error will popup which will interrupt the whole
exit maintenance process and leave the host in an unknown/offline state.
It happened when I was setting the status of the host through the
HostSpec(). With this change, I am using the enter_maintenance api of
the orch to enable the maintenance.
Fixes: https://tracker.ceph.com/issues/51218
Signed-off-by: Nizamudeen A <nia@redhat.com>
"device_health_metrics" pool is gone -- .mgr pool is in.
I don't think the pool removal code in some test cases is necessary any
longer with recent changes to remove those warnings; so that code is
gone too.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.
Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.
Fixes: https://tracker.ceph.com/issues/49988
Signed-off-by: Kamoltat <ksirivad@redhat.com>
The ability to create host by specifying network address and also create
labels.
https://tracker.ceph.com/issues/50318
Signed-off-by: Nizamudeen A <nia@redhat.com>
With mclock scheduler enabled, the recovery throughput is throttled based
on factors like the type of mclock profile enabled, the OSD capacity among
others. Due to this the recovery times may vary and therefore the existing
timeout of 120 secs may not be sufficient.
To address the above, a new method called _is_inprogress_or_complete() is
introduced in the TestProgress Class that checks if the event with the
specified 'id' is in progress by checking the 'progress' key of the
progress command response. This method also handles the corner case where
the event completes just before it's called.
The existing wait_until_true() method in the CephTestCase Class is
modified to accept another function argument called "check_fn". This is
set to the _is_inprogress_or_complete() function described earlier in the
"test_turn_off_module" test that has been observed to fail due to the
reasons already described above. A retry mechanism of a maximum of 5
attempts is introduced after the first timeout is hit. This means that
the wait can extend up to a maximum of 600 secs (120 secs * 5) as long as
there is recovery progress reported by the 'ceph progress' command result.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
query the python version before trying to test diskprediction_local
Fixes: https://tracker.ceph.com/issues/50196
Signed-off-by: Kefu Chai <kchai@redhat.com>
Dashboard backend settings:
- Refactoring: now accepting more than 1 type of value.
- RGW_API_ACCESS_KEY & RGW_API_SECRET_KEY accept string (backward compatibility: legacy behavior) as well as dictionary of strings for connecting multiple daemons.
- Ease of use: deprecated: mgr/dashboard/RGW_API_USER_ID: not useful anymore (kept for backward compatibility).
UI/UX:
- Created context component (to be shown only on rgw-related routes) for selecting operating daemon.
- Daemon selector only shown if there is more than 1 daemon running on a local cluster (to reduce cognitive load).
Fixes: https://tracker.ceph.com/issues/47375
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
This PR intends to store the jwt token in secure cookies instead of local storage
Fixes: https://tracker.ceph.com/issues/44591
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 36703c63381e6723fff57266235f8230e6af1d92)
Fixes: https://tracker.ceph.com/issues/48355
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Implemented a user lockout mechanism if the user enters 10 invalid attempts. The attempt count gets resetted to 0 once the user succesfully logins before getting disabled. Once the user gets disabled administrator has to manually enable the user which will also resets the number of attempts.
Fixes: https://tracker.ceph.com/issues/40914
Signed-off-by: Nizamudeen A <nia@redhat.com>