ceph/doc/cephadm/upgrade.rst
Zac Dover 73070f6960 doc/cephadm: add admonition for tracker 53062
NOTE: This commit also adds an admonition to the pacific.rst
      page, in the /releases directory.

This commit warns users not to upgrade to Ceph v16 (Pacific). Upgrading
to v16 (Pacific) can cause data corruption.

This commit has been made in response to https://tracker.ceph.com/issues/53062
and in response to Stefan Kooman's urgent request that we update the
documentation, adding a warning to the documentation to prevent users
from upgrading to v16.2.6.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
2021-10-31 04:00:46 +10:00

191 lines
5.3 KiB
ReStructuredText

==============
Upgrading Ceph
==============
.. DANGER:: DATE: 01 NOV 2021.
DO NOT UPGRADE TO CEPH PACIFIC FROM AN OLDER VERSION.
A recently-discovered bug (https://tracker.ceph.com/issues/53062) can cause
data corruption. This bug occurs during OMAP format conversion for
clusters that are updated to Pacific. New clusters are not affected by this
bug.
The trigger for this bug is BlueStore's repair/quick-fix functionality. This
bug can be triggered in two known ways:
(1) manually via the ceph-bluestore-tool, or
(2) automatically, by OSD if ``bluestore_fsck_quick_fix_on_mount`` is set
to true.
The fix for this bug is expected to be available in Ceph v16.2.7.
DO NOT set ``bluestore_quick_fix_on_mount`` to true. If it is currently
set to true in your configuration, immediately set it to false.
DO NOT run ``ceph-bluestore-tool``'s repair/quick-fix commands.
Cephadm can safely upgrade Ceph from one bugfix release to the next. For
example, you can upgrade from v15.2.0 (the first Octopus release) to the next
point release, v15.2.1.
The automated upgrade process follows Ceph best practices. For example:
* The upgrade order starts with managers, monitors, then other daemons.
* Each daemon is restarted only after Ceph indicates that the cluster
will remain available.
.. note::
The Ceph cluster health status is likely to switch to
``HEALTH_WARNING`` during the upgrade.
.. note::
In case a host of the cluster is offline, the upgrade is paused.
Starting the upgrade
====================
Before you use cephadm to upgrade Ceph, verify that all hosts are currently online and that your cluster is healthy by running the following command:
.. prompt:: bash #
ceph -s
To upgrade (or downgrade) to a specific release, run the following command:
.. prompt:: bash #
ceph orch upgrade start --ceph-version <version>
For example, to upgrade to v16.2.6, run the following command:
.. prompt:: bash #
ceph orch upgrade start --ceph-version 15.2.1
.. note::
From version v16.2.6 the Docker Hub registry is no longer used, so if you use Docker you have to point it to the image in the quay.io registry:
.. prompt:: bash #
ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.6
Monitoring the upgrade
======================
Determine (1) whether an upgrade is in progress and (2) which version the
cluster is upgrading to by running the following command:
.. prompt:: bash #
ceph orch upgrade status
Watching the progress bar during a Ceph upgrade
-----------------------------------------------
During the upgrade, a progress bar is visible in the ceph status output. It
looks like this:
.. code-block:: console
# ceph -s
[...]
progress:
Upgrade to docker.io/ceph/ceph:v15.2.1 (00h 20m 12s)
[=======.....................] (time remaining: 01h 43m 31s)
Watching the cephadm log during an upgrade
------------------------------------------
Watch the cephadm log by running the following command:
.. prompt:: bash #
ceph -W cephadm
Canceling an upgrade
====================
You can stop the upgrade process at any time by running the following command:
.. prompt:: bash #
ceph orch upgrade stop
Potential problems
==================
There are a few health alerts that can arise during the upgrade process.
UPGRADE_NO_STANDBY_MGR
----------------------
This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an
active standby manager daemon. In order to proceed with the upgrade, Ceph
requires an active standby manager daemon (which you can think of in this
context as "a second manager").
You can ensure that Cephadm is configured to run 2 (or more) managers by
running the following command:
.. prompt:: bash #
ceph orch apply mgr 2 # or more
You can check the status of existing mgr daemons by running the following
command:
.. prompt:: bash #
ceph orch ps --daemon-type mgr
If an existing mgr daemon has stopped, you can try to restart it by running the
following command:
.. prompt:: bash #
ceph orch daemon restart <name>
UPGRADE_FAILED_PULL
-------------------
This alert (``UPGRADE_FAILED_PULL``) means that Ceph was unable to pull the
container image for the target version. This can happen if you specify a
version or container image that does not exist (e.g. "1.2.3"), or if the
container registry can not be reached by one or more hosts in the cluster.
To cancel the existing upgrade and to specify a different target version, run
the following commands:
.. prompt:: bash #
ceph orch upgrade stop
ceph orch upgrade start --ceph-version <version>
Using customized container images
=================================
For most users, upgrading requires nothing more complicated than specifying the
Ceph version number to upgrade to. In such cases, cephadm locates the specific
Ceph container image to use by combining the ``container_image_base``
configuration option (default: ``docker.io/ceph/ceph``) with a tag of
``vX.Y.Z``.
But it is possible to upgrade to an arbitrary container image, if that's what
you need. For example, the following command upgrades to a development build:
.. prompt:: bash #
ceph orch upgrade start --image quay.io/ceph-ci/ceph:recent-git-branch-name
For more information about available container images, see :ref:`containers`.