mirror of https://github.com/ceph/ceph
31 lines
1.1 KiB
ReStructuredText
31 lines
1.1 KiB
ReStructuredText
|
==================================
|
||
|
Recovering from Monitor Failures
|
||
|
==================================
|
||
|
|
||
|
In production clusters, we recommend running the cluster with a minimum
|
||
|
of three monitors. The failure of a single monitor should not take down
|
||
|
the entire monitor cluster, provided a majority of the monitors remain
|
||
|
available. If the majority of nodes are available, the remaining nodes
|
||
|
will be able to form a quorum.
|
||
|
|
||
|
When you check your cluster's health, you may notice that a monitor
|
||
|
has failed. For example::
|
||
|
|
||
|
ceph health
|
||
|
HEALTH_WARN 1 mons down, quorum 0,2
|
||
|
|
||
|
For additional detail, you may check the cluster status::
|
||
|
|
||
|
ceph status
|
||
|
HEALTH_WARN 1 mons down, quorum 0,2
|
||
|
mon.b (rank 1) addr 192.168.106.220:6790/0 is down (out of quorum)
|
||
|
|
||
|
In most cases, you can simply restart the affected node.
|
||
|
For example::
|
||
|
|
||
|
service ceph -a restart {failed-mon}
|
||
|
|
||
|
If there are not enough monitors to form a quorum, the ``ceph``
|
||
|
command will block trying to reach the cluster. In this situation,
|
||
|
you need to get enough ``ceph-mon`` daemons running to form a quorum
|
||
|
before doing anything else with the cluster.
|