mirror of
https://github.com/ceph/ceph
synced 2025-01-28 22:14:02 +00:00
d83fa5352a
Edit doc/rados/operations/add-or-rm-osds.rst (2 of x). Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
420 lines
15 KiB
ReStructuredText
420 lines
15 KiB
ReStructuredText
======================
|
|
Adding/Removing OSDs
|
|
======================
|
|
|
|
When a cluster is up and running, it is possible to add or remove OSDs.
|
|
|
|
Adding OSDs
|
|
===========
|
|
|
|
OSDs can be added to a cluster in order to expand the cluster's capacity and
|
|
resilience. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on one
|
|
storage drive within a host machine. But if your host machine has multiple
|
|
storage drives, you may map one ``ceph-osd`` daemon for each drive on the
|
|
machine.
|
|
|
|
It's a good idea to check the capacity of your cluster so that you know when it
|
|
approaches its capacity limits. If your cluster has reached its ``near full``
|
|
ratio, then you should add OSDs to expand your cluster's capacity.
|
|
|
|
.. warning:: Do not add an OSD after your cluster has reached its ``full
|
|
ratio``. OSD failures that occur after the cluster reaches its ``near full
|
|
ratio`` might cause the cluster to exceed its ``full ratio``.
|
|
|
|
|
|
Deploying your Hardware
|
|
-----------------------
|
|
|
|
If you are also adding a new host when adding a new OSD, see `Hardware
|
|
Recommendations`_ for details on minimum recommendations for OSD hardware. To
|
|
add an OSD host to your cluster, begin by making sure that an appropriate
|
|
version of Linux has been installed on the host machine and that all initial
|
|
preparations for your storage drives have been carried out. For details, see
|
|
`Filesystem Recommendations`_.
|
|
|
|
Next, add your OSD host to a rack in your cluster, connect the host to the
|
|
network, and ensure that the host has network connectivity. For details, see
|
|
`Network Configuration Reference`_.
|
|
|
|
|
|
.. _Hardware Recommendations: ../../../start/hardware-recommendations
|
|
.. _Filesystem Recommendations: ../../configuration/filesystem-recommendations
|
|
.. _Network Configuration Reference: ../../configuration/network-config-ref
|
|
|
|
Installing the Required Software
|
|
--------------------------------
|
|
|
|
If your cluster has been manually deployed, you will need to install Ceph
|
|
software packages manually. For details, see `Installing Ceph (Manual)`_.
|
|
Configure SSH for the appropriate user to have both passwordless authentication
|
|
and root permissions.
|
|
|
|
.. _Installing Ceph (Manual): ../../../install
|
|
|
|
|
|
Adding an OSD (Manual)
|
|
----------------------
|
|
|
|
The following procedure sets up a ``ceph-osd`` daemon, configures this OSD to
|
|
use one drive, and configures the cluster to distribute data to the OSD. If
|
|
your host machine has multiple drives, you may add an OSD for each drive on the
|
|
host by repeating this procedure.
|
|
|
|
As the following procedure will demonstrate, adding an OSD involves creating a
|
|
metadata directory for it, configuring a data storage drive, adding the OSD to
|
|
the cluster, and then adding it to the CRUSH map.
|
|
|
|
When you add the OSD to the CRUSH map, you will need to consider the weight you
|
|
assign to the new OSD. Since storage drive capacities increase over time, newer
|
|
OSD hosts are likely to have larger hard drives than the older hosts in the
|
|
cluster have and therefore might have greater weight as well.
|
|
|
|
.. tip:: Ceph works best with uniform hardware across pools. It is possible to
|
|
add drives of dissimilar size and then adjust their weights accordingly.
|
|
However, for best performance, consider a CRUSH hierarchy that has drives of
|
|
the same type and size. It is better to add larger drives uniformly to
|
|
existing hosts. This can be done incrementally, replacing smaller drives
|
|
each time the new drives are added.
|
|
|
|
#. Create the new OSD by running a command of the following form. If you opt
|
|
not to specify a UUID in this command, the UUID will be set automatically
|
|
when the OSD starts up. The OSD number, which is needed for subsequent
|
|
steps, is found in the command's output:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd create [{uuid} [{id}]]
|
|
|
|
If the optional parameter {id} is specified it will be used as the OSD ID.
|
|
However, if the ID number is already in use, the command will fail.
|
|
|
|
.. warning:: Explicitly specifying the ``{id}`` parameter is not
|
|
recommended. IDs are allocated as an array, and any skipping of entries
|
|
consumes extra memory. This memory consumption can become significant if
|
|
there are large gaps or if clusters are large. By leaving the ``{id}``
|
|
parameter unspecified, we ensure that Ceph uses the smallest ID number
|
|
available and that these problems are avoided.
|
|
|
|
#. Create the default directory for your new OSD by running commands of the
|
|
following form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ssh {new-osd-host}
|
|
sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
|
|
|
|
#. If the OSD will be created on a drive other than the OS drive, prepare it
|
|
for use with Ceph. Run commands of the following form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ssh {new-osd-host}
|
|
sudo mkfs -t {fstype} /dev/{drive}
|
|
sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
|
|
|
|
#. Initialize the OSD data directory by running commands of the following form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ssh {new-osd-host}
|
|
ceph-osd -i {osd-num} --mkfs --mkkey
|
|
|
|
Make sure that the directory is empty before running ``ceph-osd``.
|
|
|
|
#. Register the OSD authentication key by running a command of the following
|
|
form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
|
|
|
|
This presentation of the command has ``ceph-{osd-num}`` in the listed path
|
|
because many clusters have the name ``ceph``. However, if your cluster name
|
|
is not ``ceph``, then the string ``ceph`` in ``ceph-{osd-num}`` needs to be
|
|
replaced with your cluster name. For example, if your cluster name is
|
|
``cluster1``, then the path in the command should be
|
|
``/var/lib/ceph/osd/cluster1-{osd-num}/keyring``.
|
|
|
|
#. Add the OSD to the CRUSH map by running the following command. This allows
|
|
the OSD to begin receiving data. The ``ceph osd crush add`` command can add
|
|
OSDs to the CRUSH hierarchy wherever you want. If you specify one or more
|
|
buckets, the command places the OSD in the most specific of those buckets,
|
|
and it moves that bucket underneath any other buckets that you have
|
|
specified. **Important:** If you specify only the root bucket, the command
|
|
will attach the OSD directly to the root, but CRUSH rules expect OSDs to be
|
|
inside of hosts. If the OSDs are not inside hosts, the OSDS will likely not
|
|
receive any data.
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
|
|
|
|
Note that there is another way to add a new OSD to the CRUSH map: decompile
|
|
the CRUSH map, add the OSD to the device list, add the host as a bucket (if
|
|
it is not already in the CRUSH map), add the device as an item in the host,
|
|
assign the device a weight, recompile the CRUSH map, and set the CRUSH map.
|
|
For details, see `Add/Move an OSD`_. This is rarely necessary with recent
|
|
releases (this sentence was written the month that Reef was released).
|
|
|
|
|
|
.. _rados-replacing-an-osd:
|
|
|
|
Replacing an OSD
|
|
----------------
|
|
|
|
.. note:: If the procedure in this section does not work for you, try the
|
|
instructions in the ``cephadm`` documentation:
|
|
:ref:`cephadm-replacing-an-osd`.
|
|
|
|
Sometimes OSDs need to be replaced: for example, when a disk fails, or when an
|
|
administrator wants to reprovision OSDs with a new back end (perhaps when
|
|
switching from Filestore to BlueStore). Replacing an OSD differs from `Removing
|
|
the OSD`_ in that the replaced OSD's ID and CRUSH map entry must be kept intact
|
|
after the OSD is destroyed for replacement.
|
|
|
|
|
|
#. Make sure that it is safe to destroy the OSD:
|
|
|
|
.. prompt:: bash $
|
|
|
|
while ! ceph osd safe-to-destroy osd.{id} ; do sleep 10 ; done
|
|
|
|
#. Destroy the OSD:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd destroy {id} --yes-i-really-mean-it
|
|
|
|
#. *Optional*: If the disk that you plan to use is not a new disk and has been
|
|
used before for other purposes, zap the disk:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph-volume lvm zap /dev/sdX
|
|
|
|
#. Prepare the disk for replacement by using the ID of the OSD that was
|
|
destroyed in previous steps:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph-volume lvm prepare --osd-id {id} --data /dev/sdX
|
|
|
|
#. Finally, activate the OSD:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph-volume lvm activate {id} {fsid}
|
|
|
|
Alternatively, instead of carrying out the final two steps (preparing the disk
|
|
and activating the OSD), you can re-create the OSD by running a single command
|
|
of the following form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph-volume lvm create --osd-id {id} --data /dev/sdX
|
|
|
|
Starting the OSD
|
|
----------------
|
|
|
|
After an OSD is added to Ceph, the OSD is in the cluster. However, until it is
|
|
started, the OSD is considered ``down`` and ``in``. The OSD is not running and
|
|
will be unable to receive data. To start an OSD, either run ``service ceph``
|
|
from your admin host or run a command of the following form to start the OSD
|
|
from its host machine:
|
|
|
|
.. prompt:: bash $
|
|
|
|
sudo systemctl start ceph-osd@{osd-num}
|
|
|
|
After the OSD is started, it is considered ``up`` and ``in``.
|
|
|
|
Observing the Data Migration
|
|
----------------------------
|
|
|
|
After the new OSD has been added to the CRUSH map, Ceph begins rebalancing the
|
|
cluster by migrating placement groups (PGs) to the new OSD. To observe this
|
|
process by using the `ceph`_ tool, run the following command:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph -w
|
|
|
|
Or:
|
|
|
|
.. prompt:: bash $
|
|
|
|
watch ceph status
|
|
|
|
The PG states will first change from ``active+clean`` to ``active, some
|
|
degraded objects`` and then return to ``active+clean`` when migration
|
|
completes. When you are finished observing, press Ctrl-C to exit.
|
|
|
|
.. _Add/Move an OSD: ../crush-map#addosd
|
|
.. _ceph: ../monitoring
|
|
|
|
|
|
Removing OSDs (Manual)
|
|
======================
|
|
|
|
It is possible to remove an OSD manually while the cluster is running: you
|
|
might want to do this in order to reduce the size of the cluster or when
|
|
replacing hardware. Typically, an OSD is a Ceph ``ceph-osd`` daemon running on
|
|
one storage drive within a host machine. Alternatively, if your host machine
|
|
has multiple storage drives, you might need to remove multiple ``ceph-osd``
|
|
daemons: one daemon for each drive on the machine.
|
|
|
|
.. warning:: Before you begin the process of removing an OSD, make sure that
|
|
your cluster is not near its ``full ratio``. Otherwise the act of removing
|
|
OSDs might cause the cluster to reach or exceed its ``full ratio``.
|
|
|
|
|
|
Taking the OSD ``out`` of the Cluster
|
|
-------------------------------------
|
|
|
|
OSDs are typically ``up`` and ``in`` before they are removed from the cluster.
|
|
Before the OSD can be removed from the cluster, the OSD must be taken ``out``
|
|
of the cluster so that Ceph can begin rebalancing and copying its data to other
|
|
OSDs. To take an OSD ``out`` of the cluster, run a command of the following
|
|
form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd out {osd-num}
|
|
|
|
|
|
Observing the Data Migration
|
|
----------------------------
|
|
|
|
After the OSD has been taken ``out`` of the cluster, Ceph begins rebalancing
|
|
the cluster by migrating placement groups out of the OSD that was removed. To
|
|
observe this process by using the `ceph`_ tool, run the following command:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph -w
|
|
|
|
The PG states will change from ``active+clean`` to ``active, some degraded
|
|
objects`` and will then return to ``active+clean`` when migration completes.
|
|
When you are finished observing, press Ctrl-C to exit.
|
|
|
|
.. note:: Under certain conditions, the action of taking ``out`` an OSD
|
|
might lead CRUSH to encounter a corner case in which some PGs remain stuck
|
|
in the ``active+remapped`` state. This problem sometimes occurs in small
|
|
clusters with few hosts (for example, in a small testing cluster). To
|
|
address this problem, mark the OSD ``in`` by running a command of the
|
|
following form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd in {osd-num}
|
|
|
|
After the OSD has come back to its initial state, do not mark the OSD
|
|
``out`` again. Instead, set the OSD's weight to ``0`` by running a command
|
|
of the following form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd crush reweight osd.{osd-num} 0
|
|
|
|
After the OSD has been reweighted, observe the data migration and confirm
|
|
that it has completed successfully. The difference between marking an OSD
|
|
``out`` and reweighting the OSD to ``0`` has to do with the bucket that
|
|
contains the OSD. When an OSD is marked ``out``, the weight of the bucket is
|
|
not changed. But when an OSD is reweighted to ``0``, the weight of the
|
|
bucket is updated (namely, the weight of the OSD is subtracted from the
|
|
overall weight of the bucket). When operating small clusters, it can
|
|
sometimes be preferable to use the above reweight command.
|
|
|
|
|
|
Stopping the OSD
|
|
----------------
|
|
|
|
After you take an OSD ``out`` of the cluster, the OSD might still be running.
|
|
In such a case, the OSD is ``up`` and ``out``. Before it is removed from the
|
|
cluster, the OSD must be stopped by running commands of the following form:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ssh {osd-host}
|
|
sudo systemctl stop ceph-osd@{osd-num}
|
|
|
|
After the OSD has been stopped, it is ``down``.
|
|
|
|
|
|
Removing the OSD
|
|
----------------
|
|
|
|
The following procedure removes an OSD from the cluster map, removes the OSD's
|
|
authentication key, removes the OSD from the OSD map, and removes the OSD from
|
|
the ``ceph.conf`` file. If your host has multiple drives, it might be necessary
|
|
to remove an OSD from each drive by repeating this procedure.
|
|
|
|
#. Begin by having the cluster forget the OSD. This step removes the OSD from
|
|
the CRUSH map, removes the OSD's authentication key, and removes the OSD
|
|
from the OSD map. (The :ref:`purge subcommand <ceph-admin-osd>` was
|
|
introduced in Luminous. For older releases, see :ref:`the procedure linked
|
|
here <ceph_osd_purge_procedure_pre_luminous>`.):
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd purge {id} --yes-i-really-mean-it
|
|
|
|
|
|
#. Navigate to the host where the master copy of the cluster's
|
|
``ceph.conf`` file is kept:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ssh {admin-host}
|
|
cd /etc/ceph
|
|
vim ceph.conf
|
|
|
|
#. Remove the OSD entry from your ``ceph.conf`` file (if such an entry
|
|
exists)::
|
|
|
|
[osd.1]
|
|
host = {hostname}
|
|
|
|
#. Copy the updated ``ceph.conf`` file from the location on the host where the
|
|
master copy of the cluster's ``ceph.conf`` is kept to the ``/etc/ceph``
|
|
directory of the other hosts in your cluster.
|
|
|
|
.. _ceph_osd_purge_procedure_pre_luminous:
|
|
|
|
If your Ceph cluster is older than Luminous, you will be unable to use the
|
|
``ceph osd purge`` command. Instead, carry out the following procedure:
|
|
|
|
#. Remove the OSD from the CRUSH map so that it no longer receives data (for
|
|
more details, see `Remove an OSD`_):
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd crush remove {name}
|
|
|
|
Instead of removing the OSD from the CRUSH map, you might opt for one of two
|
|
alternatives: (1) decompile the CRUSH map, remove the OSD from the device
|
|
list, and remove the device from the host bucket; (2) remove the host bucket
|
|
from the CRUSH map (provided that it is in the CRUSH map and that you intend
|
|
to remove the host), recompile the map, and set it:
|
|
|
|
|
|
#. Remove the OSD authentication key:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph auth del osd.{osd-num}
|
|
|
|
#. Remove the OSD:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd rm {osd-num}
|
|
|
|
For example:
|
|
|
|
.. prompt:: bash $
|
|
|
|
ceph osd rm 1
|
|
|
|
.. _Remove an OSD: ../crush-map#removeosd
|