mirror of
https://github.com/ceph/ceph
synced 2025-03-25 11:48:05 +00:00
Merge pull request #42327 from zdover23/wip-doc-cephadm-troubleshooting-1-of-x-2021-07-15
doc/cephadm: rewrite troubleshooting 1 of x Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
This commit is contained in:
commit
a0bbdb7725
@ -1,46 +1,70 @@
|
||||
Troubleshooting
|
||||
===============
|
||||
|
||||
Sometimes there is a need to investigate why a cephadm command failed or why
|
||||
a specific service no longer runs properly.
|
||||
You might need to investigate why a cephadm command failed
|
||||
or why a certain service no longer runs properly.
|
||||
|
||||
As cephadm deploys daemons as containers, troubleshooting daemons is slightly
|
||||
different. Here are a few tools and commands to help investigating issues.
|
||||
Cephadm deploys daemons as containers. This means that
|
||||
troubleshooting those containerized daemons might work
|
||||
differently than you expect (and that is certainly true if
|
||||
you expect this troubleshooting to work the way that
|
||||
troubleshooting does when the daemons involved aren't
|
||||
containerized).
|
||||
|
||||
Here are some tools and commands to help you troubleshoot
|
||||
your Ceph environment.
|
||||
|
||||
.. _cephadm-pause:
|
||||
|
||||
Pausing or disabling cephadm
|
||||
----------------------------
|
||||
|
||||
If something goes wrong and cephadm is doing behaving in a way you do
|
||||
not like, you can pause most background activity with::
|
||||
If something goes wrong and cephadm is behaving badly, you can
|
||||
pause most of the Ceph cluster's background activity by running
|
||||
the following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch pause
|
||||
|
||||
This will stop any changes, but cephadm will still periodically check hosts to
|
||||
refresh its inventory of daemons and devices. You can disable cephadm
|
||||
completely with::
|
||||
This stops all changes in the Ceph cluster, but cephadm will
|
||||
still periodically check hosts to refresh its inventory of
|
||||
daemons and devices. You can disable cephadm completely by
|
||||
running the following commands:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch set backend ''
|
||||
ceph mgr module disable cephadm
|
||||
|
||||
This will disable all of the ``ceph orch ...`` CLI commands but the previously
|
||||
deployed daemon containers will still continue to exist and start as they
|
||||
did before.
|
||||
These commands disable all of the ``ceph orch ...`` CLI commands.
|
||||
All previously deployed daemon containers continue to exist and
|
||||
will start as they did before you ran these commands.
|
||||
|
||||
Please refer to :ref:`cephadm-spec-unmanaged` for disabling individual
|
||||
services.
|
||||
See :ref:`cephadm-spec-unmanaged` for information on disabling
|
||||
individual services.
|
||||
|
||||
|
||||
Per-service and per-daemon events
|
||||
---------------------------------
|
||||
|
||||
In order to aid debugging failed daemon deployments, cephadm stores
|
||||
events per service and per daemon. They often contain relevant information::
|
||||
In order to help with the process of debugging failed daemon
|
||||
deployments, cephadm stores events per service and per daemon.
|
||||
These events often contain information relevant to
|
||||
troubleshooting
|
||||
your Ceph cluster.
|
||||
|
||||
Listing service events
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
To see the events associated with a certain service, run a
|
||||
command of the and following form:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch ls --service_name=<service-name> --format yaml
|
||||
|
||||
for example:
|
||||
This will return something in the following form:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
@ -58,10 +82,18 @@ for example:
|
||||
- '2021-02-01T12:09:25.264584 service:alertmanager [ERROR] "Failed to apply: Cannot
|
||||
place <AlertManagerSpec for service_name=alertmanager> on unknown_host: Unknown hosts"'
|
||||
|
||||
Or per daemon::
|
||||
Listing daemon events
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
To see the events associated with a certain daemon, run a
|
||||
command of the and following form:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch ps --service-name <service-name> --daemon-id <daemon-id> --format yaml
|
||||
|
||||
This will return something in the following form:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
daemon_type: mds
|
||||
@ -190,7 +222,8 @@ Things users can do:
|
||||
[root@mon1 ~]# ssh -F config -i ~/cephadm_private_key root@mon1
|
||||
|
||||
Verifying that the Public Key is Listed in the authorized_keys file
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
To verify that the public key is in the authorized_keys file, run the following commands::
|
||||
|
||||
[root@mon1 ~]# cephadm shell -- ceph cephadm get-pub-key > ~/ceph.pub
|
||||
|
Loading…
Reference in New Issue
Block a user