mirror of
https://github.com/ceph/ceph
synced 2025-02-20 01:17:47 +00:00
doc/rados: edit "troubleshooting-mon"
Edit the text in the "Initial Troubleshooting" section of doc/rados/troubleshooting/troubleshooting-mon.rst. Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
This commit is contained in:
parent
070b69046a
commit
fa8129a746
@ -17,59 +17,66 @@ consult the following troubleshooting information.
|
||||
Initial Troubleshooting
|
||||
=======================
|
||||
|
||||
#. **Make sure that the monitors are running.**
|
||||
The first steps in the process of troubleshooting Ceph Monitors involve making
|
||||
sure that the Monitors are running and that they are able to communicate with
|
||||
the network and on the network. Follow the steps in this section to rule out
|
||||
the simplest causes of Monitor malfunction.
|
||||
|
||||
First, make sure that the monitor (*mon*) daemon processes (``ceph-mon``)
|
||||
are running. Sometimes Ceph admins either forget to start the mons or
|
||||
forget to restart the mons after an upgrade. Checking for this simple
|
||||
oversight can save hours of painstaking troubleshooting. It is also
|
||||
important to make sure that the manager daemons (``ceph-mgr``) are running.
|
||||
Remember that typical cluster configurations provide one ``ceph-mgr`` for
|
||||
each ``ceph-mon``.
|
||||
#. **Make sure that the Monitors are running.**
|
||||
|
||||
.. note:: Rook will not run more than two managers.
|
||||
Make sure that the Monitor (*mon*) daemon processes (``ceph-mon``) are
|
||||
running. It might be the case that the mons have not be restarted after an
|
||||
upgrade. Checking for this simple oversight can save hours of painstaking
|
||||
troubleshooting.
|
||||
|
||||
It is also important to make sure that the manager daemons (``ceph-mgr``)
|
||||
are running. Remember that typical cluster configurations provide one
|
||||
Manager (``ceph-mgr``) for each Monitor (``ceph-mon``).
|
||||
|
||||
#. **Make sure that you can reach the monitor nodes.**
|
||||
.. note:: In releases prior to v1.12.5, Rook will not run more than two
|
||||
managers.
|
||||
|
||||
In certain rare cases, there may be ``iptables`` rules that block access to
|
||||
monitor nodes or TCP ports. These rules might be left over from earlier
|
||||
#. **Make sure that you can reach the Monitor nodes.**
|
||||
|
||||
In certain rare cases, ``iptables`` rules might be blocking access to
|
||||
Monitor nodes or TCP ports. These rules might be left over from earlier
|
||||
stress testing or rule development. To check for the presence of such
|
||||
rules, SSH into the server and then try to connect to the monitor's ports
|
||||
(``tcp/3300`` and ``tcp/6789``) using ``telnet``, ``nc``, or a similar
|
||||
tool.
|
||||
rules, SSH into each Monitor node and use ``telnet`` or ``nc`` or a similar
|
||||
tool to attempt to connect to each of the other Monitor nodes on ports
|
||||
``tcp/3300`` and ``tcp/6789``.
|
||||
|
||||
#. **Make sure that the ``ceph status`` command runs and receives a reply from the cluster.**
|
||||
|
||||
If the ``ceph status`` command does receive a reply from the cluster, then
|
||||
the cluster is up and running. The monitors will answer to a ``status``
|
||||
request only if there is a formed quorum. Confirm that one or more ``mgr``
|
||||
daemons are reported as running. Under ideal conditions, all ``mgr``
|
||||
daemons will be reported as running.
|
||||
#. **Make sure that the "ceph status" command runs and receives a reply from the cluster.**
|
||||
|
||||
If the ``ceph status`` command receives a reply from the cluster, then the
|
||||
cluster is up and running. Monitors answer to a ``status`` request only if
|
||||
there is a formed quorum. Confirm that one or more ``mgr`` daemons are
|
||||
reported as running. In a cluster with no deficiencies, ``ceph status``
|
||||
will report that all ``mgr`` daemons are running.
|
||||
|
||||
If the ``ceph status`` command does not receive a reply from the cluster,
|
||||
then there are probably not enough monitors ``up`` to form a quorum. The
|
||||
``ceph -s`` command with no further options specified connects to an
|
||||
arbitrarily selected monitor. In certain cases, however, it might be
|
||||
helpful to connect to a specific monitor (or to several specific monitors
|
||||
then there are probably not enough Monitors ``up`` to form a quorum. If the
|
||||
``ceph -s`` command is run with no further options specified, it connects
|
||||
to an arbitrarily selected Monitor. In certain cases, however, it might be
|
||||
helpful to connect to a specific Monitor (or to several specific Monitors
|
||||
in sequence) by adding the ``-m`` flag to the command: for example, ``ceph
|
||||
status -m mymon1``.
|
||||
|
||||
#. **None of this worked. What now?**
|
||||
|
||||
If the above solutions have not resolved your problems, you might find it
|
||||
helpful to examine each individual monitor in turn. Whether or not a quorum
|
||||
has been formed, it is possible to contact each monitor individually and
|
||||
helpful to examine each individual Monitor in turn. Even if no quorum has
|
||||
been formed, it is possible to contact each Monitor individually and
|
||||
request its status by using the ``ceph tell mon.ID mon_status`` command
|
||||
(here ``ID`` is the monitor's identifier).
|
||||
(here ``ID`` is the Monitor's identifier).
|
||||
|
||||
Run the ``ceph tell mon.ID mon_status`` command for each monitor in the
|
||||
Run the ``ceph tell mon.ID mon_status`` command for each Monitor in the
|
||||
cluster. For more on this command's output, see :ref:`Understanding
|
||||
mon_status
|
||||
<rados_troubleshoting_troubleshooting_mon_understanding_mon_status>`.
|
||||
|
||||
There is also an alternative method: SSH into each monitor node and query
|
||||
the daemon's admin socket. See :ref:`Using the Monitor's Admin
|
||||
There is also an alternative method for contacting each individual Monitor:
|
||||
SSH into each Monitor node and query the daemon's admin socket. See
|
||||
:ref:`Using the Monitor's Admin
|
||||
Socket<rados_troubleshoting_troubleshooting_mon_using_admin_socket>`.
|
||||
|
||||
.. _rados_troubleshoting_troubleshooting_mon_using_admin_socket:
|
||||
|
Loading…
Reference in New Issue
Block a user