mirror of
https://github.com/ceph/ceph
synced 2025-01-25 12:34:46 +00:00
ade052e08a
Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
385 lines
14 KiB
ReStructuredText
385 lines
14 KiB
ReStructuredText
.. _adding-and-removing-monitors:
|
|
|
|
==========================
|
|
Adding/Removing Monitors
|
|
==========================
|
|
|
|
When you have a cluster up and running, you may add or remove monitors
|
|
from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_
|
|
or `Monitor Bootstrap`_.
|
|
|
|
.. _adding-monitors:
|
|
|
|
Adding Monitors
|
|
===============
|
|
|
|
Ceph monitors are lightweight processes that are the single source of truth
|
|
for the cluster map. You can run a cluster with 1 monitor but we recommend at least 3
|
|
for a production cluster. Ceph monitors use a variation of the
|
|
`Paxos`_ algorithm to establish consensus about maps and other critical
|
|
information across the cluster. Due to the nature of Paxos, Ceph requires
|
|
a majority of monitors to be active to establish a quorum (thus establishing
|
|
consensus).
|
|
|
|
It is advisable to run an odd number of monitors. An
|
|
odd number of monitors is more resilient than an
|
|
even number. For instance, with a two monitor deployment, no
|
|
failures can be tolerated and still maintain a quorum; with three monitors,
|
|
one failure can be tolerated; in a four monitor deployment, one failure can
|
|
be tolerated; with five monitors, two failures can be tolerated. This avoids
|
|
the dreaded *split brain* phenomenon, and is why an odd number is best.
|
|
In short, Ceph needs a majority of
|
|
monitors to be active (and able to communicate with each other), but that
|
|
majority can be achieved using a single monitor, or 2 out of 2 monitors,
|
|
2 out of 3, 3 out of 4, etc.
|
|
|
|
For small or non-critical deployments of multi-node Ceph clusters, it is
|
|
advisable to deploy three monitors, and to increase the number of monitors
|
|
to five for larger clusters or to survive a double failure. There is rarely
|
|
justification for seven or more.
|
|
|
|
Since monitors are lightweight, it is possible to run them on the same
|
|
host as OSDs; however, we recommend running them on separate hosts,
|
|
because `fsync` issues with the kernel may impair performance.
|
|
Dedicated monitor nodes also minimize disruption since monitor and OSD
|
|
daemons are not inactive at the same time when a node crashes or is
|
|
taken down for maintenance.
|
|
|
|
Dedicated
|
|
monitor nodes also make for cleaner maintenance by avoiding both OSDs and
|
|
a mon going down if a node is rebooted, taken down, or crashes.
|
|
|
|
.. note:: A *majority* of monitors in your cluster must be able to
|
|
reach each other in order to establish a quorum.
|
|
|
|
Deploy your Hardware
|
|
--------------------
|
|
|
|
If you are adding a new host when adding a new monitor, see `Hardware
|
|
Recommendations`_ for details on minimum recommendations for monitor hardware.
|
|
To add a monitor host to your cluster, first make sure you have an up-to-date
|
|
version of Linux installed (typically Ubuntu 16.04 or RHEL 7).
|
|
|
|
Add your monitor host to a rack in your cluster, connect it to the network
|
|
and ensure that it has network connectivity.
|
|
|
|
.. _Hardware Recommendations: ../../../start/hardware-recommendations
|
|
|
|
Install the Required Software
|
|
-----------------------------
|
|
|
|
For manually deployed clusters, you must install Ceph packages
|
|
manually. See `Installing Packages`_ for details.
|
|
You should configure SSH to a user with password-less authentication
|
|
and root permissions.
|
|
|
|
.. _Installing Packages: ../../../install/install-storage-cluster
|
|
|
|
|
|
.. _Adding a Monitor (Manual):
|
|
|
|
Adding a Monitor (Manual)
|
|
-------------------------
|
|
|
|
This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map
|
|
and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster. If
|
|
this results in only two monitor daemons, you may add more monitors by
|
|
repeating this procedure until you have a sufficient number of ``ceph-mon``
|
|
daemons to achieve a quorum.
|
|
|
|
At this point you should define your monitor's id. Traditionally, monitors
|
|
have been named with single letters (``a``, ``b``, ``c``, ...), but you are
|
|
free to define the id as you see fit. For the purpose of this document,
|
|
please take into account that ``{mon-id}`` should be the id you chose,
|
|
without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a``
|
|
on ``mon.a``).
|
|
|
|
#. Create the default directory on the machine that will host your
|
|
new monitor. ::
|
|
|
|
ssh {new-mon-host}
|
|
sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}
|
|
|
|
#. Create a temporary directory ``{tmp}`` to keep the files needed during
|
|
this process. This directory should be different from the monitor's default
|
|
directory created in the previous step, and can be removed after all the
|
|
steps are executed. ::
|
|
|
|
mkdir {tmp}
|
|
|
|
#. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to
|
|
the retrieved keyring, and ``{key-filename}`` is the name of the file
|
|
containing the retrieved monitor key. ::
|
|
|
|
ceph auth get mon. -o {tmp}/{key-filename}
|
|
|
|
#. Retrieve the monitor map, where ``{tmp}`` is the path to
|
|
the retrieved monitor map, and ``{map-filename}`` is the name of the file
|
|
containing the retrieved monitor map. ::
|
|
|
|
ceph mon getmap -o {tmp}/{map-filename}
|
|
|
|
#. Prepare the monitor's data directory created in the first step. You must
|
|
specify the path to the monitor map so that you can retrieve the
|
|
information about a quorum of monitors and their ``fsid``. You must also
|
|
specify a path to the monitor keyring::
|
|
|
|
sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
|
|
|
|
|
|
#. Start the new monitor and it will automatically join the cluster.
|
|
The daemon needs to know which address to bind to, via either the
|
|
``--public-addr {ip}`` or ``--public-network {network}`` argument.
|
|
For example::
|
|
|
|
ceph-mon -i {mon-id} --public-addr {ip:port}
|
|
|
|
.. _removing-monitors:
|
|
|
|
Removing Monitors
|
|
=================
|
|
|
|
When you remove monitors from a cluster, consider that Ceph monitors use
|
|
Paxos to establish consensus about the master cluster map. You must have
|
|
a sufficient number of monitors to establish a quorum for consensus about
|
|
the cluster map.
|
|
|
|
.. _Removing a Monitor (Manual):
|
|
|
|
Removing a Monitor (Manual)
|
|
---------------------------
|
|
|
|
This procedure removes a ``ceph-mon`` daemon from your cluster. If this
|
|
procedure results in only two monitor daemons, you may add or remove another
|
|
monitor until you have a number of ``ceph-mon`` daemons that can achieve a
|
|
quorum.
|
|
|
|
#. Stop the monitor. ::
|
|
|
|
service ceph -a stop mon.{mon-id}
|
|
|
|
#. Remove the monitor from the cluster. ::
|
|
|
|
ceph mon remove {mon-id}
|
|
|
|
#. Remove the monitor entry from ``ceph.conf``.
|
|
|
|
|
|
Removing Monitors from an Unhealthy Cluster
|
|
-------------------------------------------
|
|
|
|
This procedure removes a ``ceph-mon`` daemon from an unhealthy
|
|
cluster, for example a cluster where the monitors cannot form a
|
|
quorum.
|
|
|
|
|
|
#. Stop all ``ceph-mon`` daemons on all monitor hosts. ::
|
|
|
|
ssh {mon-host}
|
|
service ceph stop mon || stop ceph-mon-all
|
|
# and repeat for all mons
|
|
|
|
#. Identify a surviving monitor and log in to that host. ::
|
|
|
|
ssh {mon-host}
|
|
|
|
#. Extract a copy of the monmap file. ::
|
|
|
|
ceph-mon -i {mon-id} --extract-monmap {map-path}
|
|
# in most cases, that's
|
|
ceph-mon -i `hostname` --extract-monmap /tmp/monmap
|
|
|
|
#. Remove the non-surviving or problematic monitors. For example, if
|
|
you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where
|
|
only ``mon.a`` will survive, follow the example below::
|
|
|
|
monmaptool {map-path} --rm {mon-id}
|
|
# for example,
|
|
monmaptool /tmp/monmap --rm b
|
|
monmaptool /tmp/monmap --rm c
|
|
|
|
#. Inject the surviving map with the removed monitors into the
|
|
surviving monitor(s). For example, to inject a map into monitor
|
|
``mon.a``, follow the example below::
|
|
|
|
ceph-mon -i {mon-id} --inject-monmap {map-path}
|
|
# for example,
|
|
ceph-mon -i a --inject-monmap /tmp/monmap
|
|
|
|
#. Start only the surviving monitors.
|
|
|
|
#. Verify the monitors form a quorum (``ceph -s``).
|
|
|
|
#. You may wish to archive the removed monitors' data directory in
|
|
``/var/lib/ceph/mon`` in a safe location, or delete it if you are
|
|
confident the remaining monitors are healthy and are sufficiently
|
|
redundant.
|
|
|
|
.. _Changing a Monitor's IP address:
|
|
|
|
Changing a Monitor's IP Address
|
|
===============================
|
|
|
|
.. important:: Existing monitors are not supposed to change their IP addresses.
|
|
|
|
Monitors are critical components of a Ceph cluster, and they need to maintain a
|
|
quorum for the whole system to work properly. To establish a quorum, the
|
|
monitors need to discover each other. Ceph has strict requirements for
|
|
discovering monitors.
|
|
|
|
Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors.
|
|
However, monitors discover each other using the monitor map, not ``ceph.conf``.
|
|
For example, if you refer to `Adding a Monitor (Manual)`_ you will see that you
|
|
need to obtain the current monmap for the cluster when creating a new monitor,
|
|
as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The
|
|
following sections explain the consistency requirements for Ceph monitors, and a
|
|
few safe ways to change a monitor's IP address.
|
|
|
|
|
|
Consistency Requirements
|
|
------------------------
|
|
|
|
A monitor always refers to the local copy of the monmap when discovering other
|
|
monitors in the cluster. Using the monmap instead of ``ceph.conf`` avoids
|
|
errors that could break the cluster (e.g., typos in ``ceph.conf`` when
|
|
specifying a monitor address or port). Since monitors use monmaps for discovery
|
|
and they share monmaps with clients and other Ceph daemons, the monmap provides
|
|
monitors with a strict guarantee that their consensus is valid.
|
|
|
|
Strict consistency also applies to updates to the monmap. As with any other
|
|
updates on the monitor, changes to the monmap always run through a distributed
|
|
consensus algorithm called `Paxos`_. The monitors must agree on each update to
|
|
the monmap, such as adding or removing a monitor, to ensure that each monitor in
|
|
the quorum has the same version of the monmap. Updates to the monmap are
|
|
incremental so that monitors have the latest agreed upon version, and a set of
|
|
previous versions, allowing a monitor that has an older version of the monmap to
|
|
catch up with the current state of the cluster.
|
|
|
|
If monitors discovered each other through the Ceph configuration file instead of
|
|
through the monmap, it would introduce additional risks because the Ceph
|
|
configuration files are not updated and distributed automatically. Monitors
|
|
might inadvertently use an older ``ceph.conf`` file, fail to recognize a
|
|
monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able
|
|
to determine the current state of the system accurately. Consequently, making
|
|
changes to an existing monitor's IP address must be done with great care.
|
|
|
|
|
|
Changing a Monitor's IP address (The Right Way)
|
|
-----------------------------------------------
|
|
|
|
Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to
|
|
ensure that other monitors in the cluster will receive the update. To change a
|
|
monitor's IP address, you must add a new monitor with the IP address you want
|
|
to use (as described in `Adding a Monitor (Manual)`_), ensure that the new
|
|
monitor successfully joins the quorum; then, remove the monitor that uses the
|
|
old IP address. Then, update the ``ceph.conf`` file to ensure that clients and
|
|
other daemons know the IP address of the new monitor.
|
|
|
|
For example, lets assume there are three monitors in place, such as ::
|
|
|
|
[mon.a]
|
|
host = host01
|
|
addr = 10.0.0.1:6789
|
|
[mon.b]
|
|
host = host02
|
|
addr = 10.0.0.2:6789
|
|
[mon.c]
|
|
host = host03
|
|
addr = 10.0.0.3:6789
|
|
|
|
To change ``mon.c`` to ``host04`` with the IP address ``10.0.0.4``, follow the
|
|
steps in `Adding a Monitor (Manual)`_ by adding a new monitor ``mon.d``. Ensure
|
|
that ``mon.d`` is running before removing ``mon.c``, or it will break the
|
|
quorum. Remove ``mon.c`` as described on `Removing a Monitor (Manual)`_. Moving
|
|
all three monitors would thus require repeating this process as many times as
|
|
needed.
|
|
|
|
|
|
Changing a Monitor's IP address (The Messy Way)
|
|
-----------------------------------------------
|
|
|
|
There may come a time when the monitors must be moved to a different network, a
|
|
different part of the datacenter or a different datacenter altogether. While it
|
|
is possible to do it, the process becomes a bit more hazardous.
|
|
|
|
In such a case, the solution is to generate a new monmap with updated IP
|
|
addresses for all the monitors in the cluster, and inject the new map on each
|
|
individual monitor. This is not the most user-friendly approach, but we do not
|
|
expect this to be something that needs to be done every other week. As it is
|
|
clearly stated on the top of this section, monitors are not supposed to change
|
|
IP addresses.
|
|
|
|
Using the previous monitor configuration as an example, assume you want to move
|
|
all the monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these
|
|
networks are unable to communicate. Use the following procedure:
|
|
|
|
#. Retrieve the monitor map, where ``{tmp}`` is the path to
|
|
the retrieved monitor map, and ``{filename}`` is the name of the file
|
|
containing the retrieved monitor map. ::
|
|
|
|
ceph mon getmap -o {tmp}/{filename}
|
|
|
|
#. The following example demonstrates the contents of the monmap. ::
|
|
|
|
$ monmaptool --print {tmp}/{filename}
|
|
|
|
monmaptool: monmap file {tmp}/{filename}
|
|
epoch 1
|
|
fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
|
|
last_changed 2012-12-17 02:46:41.591248
|
|
created 2012-12-17 02:46:41.591248
|
|
0: 10.0.0.1:6789/0 mon.a
|
|
1: 10.0.0.2:6789/0 mon.b
|
|
2: 10.0.0.3:6789/0 mon.c
|
|
|
|
#. Remove the existing monitors. ::
|
|
|
|
$ monmaptool --rm a --rm b --rm c {tmp}/{filename}
|
|
|
|
monmaptool: monmap file {tmp}/{filename}
|
|
monmaptool: removing a
|
|
monmaptool: removing b
|
|
monmaptool: removing c
|
|
monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)
|
|
|
|
#. Add the new monitor locations. ::
|
|
|
|
$ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
|
|
|
|
monmaptool: monmap file {tmp}/{filename}
|
|
monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)
|
|
|
|
#. Check new contents. ::
|
|
|
|
$ monmaptool --print {tmp}/{filename}
|
|
|
|
monmaptool: monmap file {tmp}/{filename}
|
|
epoch 1
|
|
fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
|
|
last_changed 2012-12-17 02:46:41.591248
|
|
created 2012-12-17 02:46:41.591248
|
|
0: 10.1.0.1:6789/0 mon.a
|
|
1: 10.1.0.2:6789/0 mon.b
|
|
2: 10.1.0.3:6789/0 mon.c
|
|
|
|
At this point, we assume the monitors (and stores) are installed at the new
|
|
location. The next step is to propagate the modified monmap to the new
|
|
monitors, and inject the modified monmap into each new monitor.
|
|
|
|
#. First, make sure to stop all your monitors. Injection must be done while
|
|
the daemon is not running.
|
|
|
|
#. Inject the monmap. ::
|
|
|
|
ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}
|
|
|
|
#. Restart the monitors.
|
|
|
|
After this step, migration to the new location is complete and
|
|
the monitors should operate successfully.
|
|
|
|
|
|
.. _Manual Deployment: ../../../install/manual-deployment
|
|
.. _Monitor Bootstrap: ../../../dev/mon-bootstrap
|
|
.. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science)
|