Merge pull request #42327 from zdover23/wip-doc-cephadm-troubleshooting-1-of-x-2021-07-15

doc/cephadm: rewrite troubleshooting 1 of x

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
This commit is contained in:
zdover23 2021-07-15 01:35:14 +10:00 committed by GitHub
commit a0bbdb7725
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,46 +1,70 @@
Troubleshooting
===============
Sometimes there is a need to investigate why a cephadm command failed or why
a specific service no longer runs properly.
You might need to investigate why a cephadm command failed
or why a certain service no longer runs properly.
As cephadm deploys daemons as containers, troubleshooting daemons is slightly
different. Here are a few tools and commands to help investigating issues.
Cephadm deploys daemons as containers. This means that
troubleshooting those containerized daemons might work
differently than you expect (and that is certainly true if
you expect this troubleshooting to work the way that
troubleshooting does when the daemons involved aren't
containerized).
Here are some tools and commands to help you troubleshoot
your Ceph environment.
.. _cephadm-pause:
Pausing or disabling cephadm
----------------------------
If something goes wrong and cephadm is doing behaving in a way you do
not like, you can pause most background activity with::
If something goes wrong and cephadm is behaving badly, you can
pause most of the Ceph cluster's background activity by running
the following command:
.. prompt:: bash #
ceph orch pause
This will stop any changes, but cephadm will still periodically check hosts to
refresh its inventory of daemons and devices. You can disable cephadm
completely with::
This stops all changes in the Ceph cluster, but cephadm will
still periodically check hosts to refresh its inventory of
daemons and devices. You can disable cephadm completely by
running the following commands:
.. prompt:: bash #
ceph orch set backend ''
ceph mgr module disable cephadm
This will disable all of the ``ceph orch ...`` CLI commands but the previously
deployed daemon containers will still continue to exist and start as they
did before.
These commands disable all of the ``ceph orch ...`` CLI commands.
All previously deployed daemon containers continue to exist and
will start as they did before you ran these commands.
Please refer to :ref:`cephadm-spec-unmanaged` for disabling individual
services.
See :ref:`cephadm-spec-unmanaged` for information on disabling
individual services.
Per-service and per-daemon events
---------------------------------
In order to aid debugging failed daemon deployments, cephadm stores
events per service and per daemon. They often contain relevant information::
In order to help with the process of debugging failed daemon
deployments, cephadm stores events per service and per daemon.
These events often contain information relevant to
troubleshooting
your Ceph cluster.
Listing service events
~~~~~~~~~~~~~~~~~~~~~~
To see the events associated with a certain service, run a
command of the and following form:
.. prompt:: bash #
ceph orch ls --service_name=<service-name> --format yaml
for example:
This will return something in the following form:
.. code-block:: yaml
@ -58,10 +82,18 @@ for example:
- '2021-02-01T12:09:25.264584 service:alertmanager [ERROR] "Failed to apply: Cannot
place <AlertManagerSpec for service_name=alertmanager> on unknown_host: Unknown hosts"'
Or per daemon::
Listing daemon events
~~~~~~~~~~~~~~~~~~~~~
To see the events associated with a certain daemon, run a
command of the and following form:
.. prompt:: bash #
ceph orch ps --service-name <service-name> --daemon-id <daemon-id> --format yaml
This will return something in the following form:
.. code-block:: yaml
daemon_type: mds
@ -190,7 +222,8 @@ Things users can do:
[root@mon1 ~]# ssh -F config -i ~/cephadm_private_key root@mon1
Verifying that the Public Key is Listed in the authorized_keys file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To verify that the public key is in the authorized_keys file, run the following commands::
[root@mon1 ~]# cephadm shell -- ceph cephadm get-pub-key > ~/ceph.pub