ceph/doc/security/CVE-2021-20288.rst

.. _CVE-2021-20288:

CVE-2021-20288: Unauthorized global_id reuse in cephx
=====================================================

* `NIST information page <https://nvd.nist.gov/vuln/detail/CVE-2021-20288>`_

Summary
-------

Ceph was not ensuring that reconnecting/renewing clients were
presenting an existing ticket when reclaiming their global_id value.
An attacker that was able to authenticate could claim a global_id in
use by a different client and potentially disrupt
other cluster services.

Background
----------

Each authenticated client or daemon in Ceph is assigned a numeric
global_id identifier. That value is assumed to be unique across the
cluster.  When clients reconnect to the monitor (e.g., due to a
network disconnection) or renew their ticket, they are supposed to
present their old ticket to prove prior possession of their global_id
so that it can be reclaimed and thus remain constant over the lifetime
of that client instance.

Ceph was not correctly checking that the old ticket was valid, allowing
an arbitrary global_id to be reclaimed, even if it was in use by another
active client in the system.

Attacker Requirements
---------------------

Any potential attacker must:

* have a valid authentication key for the cluster
* know or guess the global_id of another client
* run a modified version of the Ceph client code to reclaim another client's global_id
* construct appropriate client messages or requests to disrupt service or exploit
  Ceph daemon assumptions about global_id uniqueness

Impact
------

Confidentiality Impact
______________________

None

Integrity Impact
________________

Partial.  An attacker could potentially exploit assumptions around
global_id uniqueness to disrupt other clients' access or disrupt
Ceph daemons.

Availability Impact
___________________

High.  An attacker could potentially exploit assumptions around
global_id uniqueness to disrupt other clients' access or disrupt
Ceph daemons.

Access Complexity
_________________

High.  The client must make use of modified client code in order to
exploit specific assumptions in the behavior of other Ceph daemons.

Authentication
______________

Yes.  The attacker must also be authenticated and have access to the
same services as a client it is wishing to impersonate or disrupt.

Gained Access
_____________

Partial.  An attacker can partially impersonate another client.

Affected versions
-----------------

All prior versions of Ceph monitors fail to ensure that global_id reclaim
attempts are authentic.

In addition, all user-space daemons and clients starting from Luminous v12.2.0
were failing to securely reclaim their global_id following commit a2eb6ae3fb57
("mon/monclient: hunt for multiple monitor in parallel").

All versions of the Linux kernel client properly authenticate.

Fixed versions
--------------

* Pacific v16.2.1 (and later)
* Octopus v15.2.11 (and later)
* Nautilus v14.2.20 (and later)


Fix details
-----------

#. Patched monitors now properly require that clients securely reclaim
   their global_id when the ``auth_allow_insecure_global_id_reclaim``
   is ``false``.  Initially, by default, this option is set to
   ``true`` so that existing clients can continue to function without
   disruption until all clients have been upgraded.  When this option
   is set to false, then an unpatched client will not be able to reconnect
   to the cluster after an intermittent network disruption breaking
   its connect to a monitor, or be able to renew its authentication
   ticket when it times out (by default, after 72 hours).

   Patched monitors raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED``
   health alert if ``auth_allow_insecure_global_id_reclaim`` is enabled.
   This health alert can be muted with::

     ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w

   Although it is not recommended, the alert can also be disabled with::

     ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false

#. Patched monitors can disconnect new clients right after they have
   authenticated (forcing them to reconnect and reclaim) in order to
   determine whether they securely reclaim global_ids.  This allows
   the cluster and users to discover quickly whether clients would be
   affected by requiring secure global_id reclaim: most clients will
   report an authentication error immediately.  This behavior can be
   disabled by setting ``auth_expose_insecure_global_id_reclaim`` to
   ``false``::

     ceph config set mon auth_expose_insecure_global_id_reclaim false

#. Patched monitors will raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health
   alert for any clients or daemons that are not securely reclaiming their
   global_id.  These clients should be upgraded before disabling the
   ``auth_allow_insecure_global_id_reclaim`` option to avoid disrupting
   client access.

   By default (if ``auth_expose_insecure_global_id_reclaim`` has not
   been disabled), clients' failure to securely reclaim global_id will
   immediately be exposed and raise this health alert.
   However, if ``auth_expose_insecure_global_id_reclaim`` has been
   disabled, this alert will not be triggered for a client until it is
   forced to reconnect to a monitor (e.g., due to a network disruption)
   or the client renews its authentication ticket (by default, after
   72 hours).

#. The default time-to-live (TTL) for authentication tickets has been increased
   from 12 hours to 72 hours.  Because we previously were not ensuring that
   a client's prior ticket was valid when reclaiming their global_id, a client
   could tolerate a network outage that lasted longer than the ticket TTL and still
   reclaim its global_id.  Once the cluster starts requiring secure global_id reclaim,
   a client that is disconnected for longer than the TTL may fail to reclaim its global_id,
   fail to reauthenticate, and be unable to continue communicating with the cluster
   until it is restarted.  The default TTL was increased to minimize the impact of this
   change on users.


Recommendations
---------------

#. Users should upgrade to a patched version of Ceph at their earliest
   convenience.

#. Users should upgrade any unpatched clients at their earliest
   convenience.  By default, these clients can be easily identified by
   checking the ``ceph health detail`` output for the
   ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` alert.

#. If all clients cannot be upgraded immediately, the health alerts can be
   temporarily muted with::

     ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w  # 1 week
     ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w  # 1 week

#. After all clients have been updated and the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM``
   alert is no longer present, the cluster should be set to prevent insecure
   global_id reclaim with::

     ceph config set mon auth_allow_insecure_global_id_reclaim false