doc/rados: edit t-mon "common issues" (2 of x)

Edit the second part of the section "Most Common Monitor Issues" in
doc/rados/troubleshooting/troubleshooting-mon.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
This commit is contained in:
Zac Dover 2023-11-08 23:24:06 +10:00
parent db6fbc9653
commit 7dcfa9132c

View File

@ -213,31 +213,38 @@ you should be seeing something similar to::
troubleshooting the monitor, so check you ``ceph status`` again just to make troubleshooting the monitor, so check you ``ceph status`` again just to make
sure. Proceed if the monitor is not yet in the quorum. sure. Proceed if the monitor is not yet in the quorum.
**What if the state is ``probing``?** **What does it mean if a Monitor's state is ``probing``?**
This means the monitor is still looking for the other monitors. Every time If ``ceph health detail`` shows that a Monitor's state is
you start a monitor, the monitor will stay in this state for some time while ``probing``, then the Monitor is still looking for the other Monitors. Every
trying to connect the rest of the monitors specified in the ``monmap``. The Monitor remains in this state for some time when it is started. When a
time a monitor will spend in this state can vary. For instance, when on a Monitor has connected to the other Monitors specified in the ``monmap``, it
single-monitor cluster (never do this in production), the monitor will pass ceases to be in the ``probing`` state. The amount of time that a Monitor is
through the probing state almost instantaneously. In a multi-monitor in the ``probing`` state depends upon the parameters of the cluster of which
cluster, the monitors will stay in this state until they find enough monitors it is a part. For example, when a Monitor is a part of a single-monitor
to form a quorum |---| this means that if you have 2 out of 3 monitors down, the cluster (never do this in production), the monitor passes through the probing
one remaining monitor will stay in this state indefinitely until you bring state almost instantaneously. In a multi-monitor cluster, the Monitors stay
one of the other monitors up. in the ``probing`` state until they find enough monitors to form a quorum
|---| this means that if two out of three Monitors in the cluster are
``down``, the one remaining Monitor stays in the ``probing`` state
indefinitely until you bring one of the other monitors up.
If you have a quorum the starting daemon should be able to find the If quorum has been established, then the Monitor daemon should be able to
other monitors quickly, as long as they can be reached. If your find the other Monitors quickly, as long as they can be reached. If a Monitor
monitor is stuck probing and you have gone through with all the communication is stuck in the ``probing`` state and you have exhausted the procedures above
troubleshooting, then there is a fair chance that the monitor is trying that describe the troubleshooting of communications between the Monitors,
to reach the other monitors on a wrong address. ``mon_status`` outputs the then it is possible that the problem Monitor is trying to reach the other
``monmap`` known to the monitor: check if the other monitor's locations Monitors at a wrong address. ``mon_status`` outputs the ``monmap`` that is
match reality. If they don't, jump to known to the monitor: determine whether the other Monitors' locations as
`Recovering a Monitor's Broken monmap`_; if they do, then it may be related specified in the ``monmap`` match the locations of the Monitors in the
to severe clock skews amongst the monitor nodes and you should refer to network. If they do not, see `Recovering a Monitor's Broken monmap`_.
`Clock Skews`_ first, but if that doesn't solve your problem then it is If the locations of the Monitors as specified in the ``monmap`` match the
the time to prepare some logs and reach out to the community (please refer locations of the Monitors in the network, then the persistent
to `Preparing your logs`_ on how to best prepare your logs). ``probing`` state could be related to severe clock skews amongst the monitor
nodes. See `Clock Skews`_. If the information in `Clock Skews`_ does not
bring the Monitor out of the ``probing`` state, then prepare your system logs
and ask the Ceph community for help. See `Preparing your logs`_ for
information about the proper preparation of logs.
**What if state is ``electing``?** **What if state is ``electing``?**