The progress module disabled the pg recovery event by default
since the event is expensive and has interrupted other serviceis
when there is OSDs being marked in/out from the the cluster.
To turn the event on manually:
ceph config set mgr mgr/progress/allow_pg_recovery_event true
Updated qa/tasks/mgr/test_progress.py to enable
the pg recovery event when testing the progress module.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
This PR adds some visual hints for osds that are near full or full
Fixes: https://tracker.ceph.com/issues/53334
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
mgr/dashboard: move NFS_GANESHA_SUPPORTED_FSALS to mgr_module.py
Importing from nfs module throws AttributeError because as a side effect the dashboard module is impersonating the nfs module.
https://gist.github.com/varshar16/61ac26426bbe5f5f562ebb14bcd0f548
mgr/dashboard: 'Create NFS export' form: list clusters from nfs module
mgr/dashboard: frontend+backend cleanups for NFS export
Removed all code and references related to daemons. UI cleanup and adopted unit-testing for
nfs-epxort create form for CEPHFS backend. Cleanup for export list/get/create/set/delete endpoints.
mgr/dashboard: rm set-ganesha ref + update docs
Remove existing set-ganesha-clusters-rados-pool-namespace references as
they are no longer required. Moreover, nfs doc in dashboard doc is
updated accordingly to the current nfs status.
mgr/dashboard: add nfs-export e2e test coverage
mgr/dashboard: 'Create NFS export' form: remove RGW user id field.
- Improve bucket typeahead behavior.
- Increase version for bucket list endpoint.
- Some refactoring.
mgr/dashboard: 'Create NFS export' form: allow RGW backend only when default realm is selected.
When RGW multisite is configured, the NFS module can only handle buckets in the default realm.
mgr/dashboard: 'Create service' form: fix NFS service creation.
After https://github.com/ceph/ceph/pull/42073, NFS pool and namespace are not customizable.
mgr/dashboard: 'Create NFS export' form: add bucket validation.
- Allow only existing buckets.
- Refactoring:
- Moved bucket validator from bucket form to cd-validators.ts
- Split bucket validator into 2: bucket name validator and bucket existence (that checks either existence or non-existence).
mgr/dashboard: 'Create NFS export' form: path validation refactor: allow only existing paths.
Fixes: https://tracker.ceph.com/issues/46493
Fixes: https://tracker.ceph.com/issues/51479
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
A module option called CLUSTER_STATUS has two option. INSTALLED
AND POST_INSTALLED. When CLUSTER_STATUS is INSTALLED it will allow to show the
create-cluster-wizard after login the initial time. After the cluster
creation is succesfull this option is set to POST_INSTALLED
Also has the e2e codes for the Review Section
Fixes: https://tracker.ceph.com/issues/50336
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
qa/mgr/dashboard: add extra wait to test
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Fixes: https://tracker.ceph.com/issues/52480
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Introducing APIVersion class to handle versioning for API-endpints and making
them backward compatible.
- Rename the dashboard command to better reflect its behavior.
- Rename '_radosgw_admin' method to 'send_rgwadmin_command' for consistency with
'send_mon_command' and move it to the mgr_module.py .
- Cleanup: remove unneeded rgw settings.
- Better error handling and test coverage.
Fixes: https://tracker.ceph.com/issues/44605
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
otherwise we have following warning in health report
{"status":"HEALTH_WARN","checks":{"RECENT_MGR_MODULE_CRASH":{"severity":"HEALTH_WARN","summary":{"message":"1 mgr modules have recently crashed","count":1},"muted":false}},"mutes":[]}
and it does not disappear after the test waits for 30 seconds.
and the tasks.mgr.test_module_selftest.TestModuleSelftest test
fails like:
2021-07-21T09:59:52.560 INFO:tasks.cephfs_test_runner:======================================================================
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:ERROR: test_module_commands (tasks.mgr.test_module_selftest.TestModuleSelftest)
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/mgr/test_module_selftest.py", line 201, in
test_mo
dule_commands
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: self.wait_for_health_clear(timeout=30)
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 172, in
wait_for_health_c
lear
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: self.wait_until_true(is_clear, timeout)
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 209, in
wait_until_true
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2021-07-21T09:59:52.564 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries
in this change, the crash reports are nuked right after
we see the warning, so that we can have a clean health
report.
Fixes: https://tracker.ceph.com/issues/51743
Signed-off-by: Kefu Chai <kchai@redhat.com>
Changes some the tests in teuthology to make
the test more deterministic.
Using:
`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.
took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.
Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
When you add a host in maintenance mode and then exit the maintenance
mode, a 500 server error will popup which will interrupt the whole
exit maintenance process and leave the host in an unknown/offline state.
It happened when I was setting the status of the host through the
HostSpec(). With this change, I am using the enter_maintenance api of
the orch to enable the maintenance.
Fixes: https://tracker.ceph.com/issues/51218
Signed-off-by: Nizamudeen A <nia@redhat.com>
"device_health_metrics" pool is gone -- .mgr pool is in.
I don't think the pool removal code in some test cases is necessary any
longer with recent changes to remove those warnings; so that code is
gone too.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.
Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.
Fixes: https://tracker.ceph.com/issues/49988
Signed-off-by: Kamoltat <ksirivad@redhat.com>
The ability to create host by specifying network address and also create
labels.
https://tracker.ceph.com/issues/50318
Signed-off-by: Nizamudeen A <nia@redhat.com>
With mclock scheduler enabled, the recovery throughput is throttled based
on factors like the type of mclock profile enabled, the OSD capacity among
others. Due to this the recovery times may vary and therefore the existing
timeout of 120 secs may not be sufficient.
To address the above, a new method called _is_inprogress_or_complete() is
introduced in the TestProgress Class that checks if the event with the
specified 'id' is in progress by checking the 'progress' key of the
progress command response. This method also handles the corner case where
the event completes just before it's called.
The existing wait_until_true() method in the CephTestCase Class is
modified to accept another function argument called "check_fn". This is
set to the _is_inprogress_or_complete() function described earlier in the
"test_turn_off_module" test that has been observed to fail due to the
reasons already described above. A retry mechanism of a maximum of 5
attempts is introduced after the first timeout is hit. This means that
the wait can extend up to a maximum of 600 secs (120 secs * 5) as long as
there is recovery progress reported by the 'ceph progress' command result.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
query the python version before trying to test diskprediction_local
Fixes: https://tracker.ceph.com/issues/50196
Signed-off-by: Kefu Chai <kchai@redhat.com>
Dashboard backend settings:
- Refactoring: now accepting more than 1 type of value.
- RGW_API_ACCESS_KEY & RGW_API_SECRET_KEY accept string (backward compatibility: legacy behavior) as well as dictionary of strings for connecting multiple daemons.
- Ease of use: deprecated: mgr/dashboard/RGW_API_USER_ID: not useful anymore (kept for backward compatibility).
UI/UX:
- Created context component (to be shown only on rgw-related routes) for selecting operating daemon.
- Daemon selector only shown if there is more than 1 daemon running on a local cluster (to reduce cognitive load).
Fixes: https://tracker.ceph.com/issues/47375
Signed-off-by: Alfonso Martínez <almartin@redhat.com>