mirror of https://github.com/ceph/ceph
184 lines
6.8 KiB
ReStructuredText
184 lines
6.8 KiB
ReStructuredText
.. _CVE-2021-20288:
|
|
|
|
CVE-2021-20288: Unauthorized global_id reuse in cephx
|
|
=====================================================
|
|
|
|
* `NIST information page <https://nvd.nist.gov/vuln/detail/CVE-2021-20288>`_
|
|
|
|
Summary
|
|
-------
|
|
|
|
Ceph was not ensuring that reconnecting/renewing clients were
|
|
presenting an existing ticket when reclaiming their global_id value.
|
|
An attacker that was able to authenticate could claim a global_id in
|
|
use by a different client and potentially disrupt
|
|
other cluster services.
|
|
|
|
Background
|
|
----------
|
|
|
|
Each authenticated client or daemon in Ceph is assigned a numeric
|
|
global_id identifier. That value is assumed to be unique across the
|
|
cluster. When clients reconnect to the monitor (e.g., due to a
|
|
network disconnection) or renew their ticket, they are supposed to
|
|
present their old ticket to prove prior possession of their global_id
|
|
so that it can be reclaimed and thus remain constant over the lifetime
|
|
of that client instance.
|
|
|
|
Ceph was not correctly checking that the old ticket was valid, allowing
|
|
an arbitrary global_id to be reclaimed, even if it was in use by another
|
|
active client in the system.
|
|
|
|
Attacker Requirements
|
|
---------------------
|
|
|
|
Any potential attacker must:
|
|
|
|
* have a valid authentication key for the cluster
|
|
* know or guess the global_id of another client
|
|
* run a modified version of the Ceph client code to reclaim another client's global_id
|
|
* construct appropriate client messages or requests to disrupt service or exploit
|
|
Ceph daemon assumptions about global_id uniqueness
|
|
|
|
Impact
|
|
------
|
|
|
|
Confidentiality Impact
|
|
______________________
|
|
|
|
None
|
|
|
|
Integrity Impact
|
|
________________
|
|
|
|
Partial. An attacker could potentially exploit assumptions around
|
|
global_id uniqueness to disrupt other clients' access or disrupt
|
|
Ceph daemons.
|
|
|
|
Availability Impact
|
|
___________________
|
|
|
|
High. An attacker could potentially exploit assumptions around
|
|
global_id uniqueness to disrupt other clients' access or disrupt
|
|
Ceph daemons.
|
|
|
|
Access Complexity
|
|
_________________
|
|
|
|
High. The client must make use of modified client code in order to
|
|
exploit specific assumptions in the behavior of other Ceph daemons.
|
|
|
|
Authentication
|
|
______________
|
|
|
|
Yes. The attacker must also be authenticated and have access to the
|
|
same services as a client it is wishing to impersonate or disrupt.
|
|
|
|
Gained Access
|
|
_____________
|
|
|
|
Partial. An attacker can partially impersonate another client.
|
|
|
|
Affected versions
|
|
-----------------
|
|
|
|
All prior versions of Ceph monitors fail to ensure that global_id reclaim
|
|
attempts are authentic.
|
|
|
|
In addition, all user-space daemons and clients starting from Luminous v12.2.0
|
|
were failing to securely reclaim their global_id following commit a2eb6ae3fb57
|
|
("mon/monclient: hunt for multiple monitor in parallel").
|
|
|
|
All versions of the Linux kernel client properly authenticate.
|
|
|
|
Fixed versions
|
|
--------------
|
|
|
|
* Pacific v16.2.1 (and later)
|
|
* Octopus v15.2.11 (and later)
|
|
* Nautilus v14.2.20 (and later)
|
|
|
|
|
|
Fix details
|
|
-----------
|
|
|
|
#. Patched monitors now properly require that clients securely reclaim
|
|
their global_id when the ``auth_allow_insecure_global_id_reclaim``
|
|
is ``false``. Initially, by default, this option is set to
|
|
``true`` so that existing clients can continue to function without
|
|
disruption until all clients have been upgraded. When this option
|
|
is set to false, then an unpatched client will not be able to reconnect
|
|
to the cluster after an intermittent network disruption breaking
|
|
its connect to a monitor, or be able to renew its authentication
|
|
ticket when it times out (by default, after 72 hours).
|
|
|
|
Patched monitors raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED``
|
|
health alert if ``auth_allow_insecure_global_id_reclaim`` is enabled.
|
|
This health alert can be muted with::
|
|
|
|
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w
|
|
|
|
Although it is not recommended, the alert can also be disabled with::
|
|
|
|
ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false
|
|
|
|
#. Patched monitors can disconnect new clients right after they have
|
|
authenticated (forcing them to reconnect and reclaim) in order to
|
|
determine whether they securely reclaim global_ids. This allows
|
|
the cluster and users to discover quickly whether clients would be
|
|
affected by requiring secure global_id reclaim: most clients will
|
|
report an authentication error immediately. This behavior can be
|
|
disabled by setting ``auth_expose_insecure_global_id_reclaim`` to
|
|
``false``::
|
|
|
|
ceph config set mon auth_expose_insecure_global_id_reclaim false
|
|
|
|
#. Patched monitors will raise the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` health
|
|
alert for any clients or daemons that are not securely reclaiming their
|
|
global_id. These clients should be upgraded before disabling the
|
|
``auth_allow_insecure_global_id_reclaim`` option to avoid disrupting
|
|
client access.
|
|
|
|
By default (if ``auth_expose_insecure_global_id_reclaim`` has not
|
|
been disabled), clients' failure to securely reclaim global_id will
|
|
immediately be exposed and raise this health alert.
|
|
However, if ``auth_expose_insecure_global_id_reclaim`` has been
|
|
disabled, this alert will not be triggered for a client until it is
|
|
forced to reconnect to a monitor (e.g., due to a network disruption)
|
|
or the client renews its authentication ticket (by default, after
|
|
72 hours).
|
|
|
|
#. The default time-to-live (TTL) for authentication tickets has been increased
|
|
from 12 hours to 72 hours. Because we previously were not ensuring that
|
|
a client's prior ticket was valid when reclaiming their global_id, a client
|
|
could tolerate a network outage that lasted longer than the ticket TTL and still
|
|
reclaim its global_id. Once the cluster starts requiring secure global_id reclaim,
|
|
a client that is disconnected for longer than the TTL may fail to reclaim its global_id,
|
|
fail to reauthenticate, and be unable to continue communicating with the cluster
|
|
until it is restarted. The default TTL was increased to minimize the impact of this
|
|
change on users.
|
|
|
|
|
|
Recommendations
|
|
---------------
|
|
|
|
#. Users should upgrade to a patched version of Ceph at their earliest
|
|
convenience.
|
|
|
|
#. Users should upgrade any unpatched clients at their earliest
|
|
convenience. By default, these clients can be easily identified by
|
|
checking the ``ceph health detail`` output for the
|
|
``AUTH_INSECURE_GLOBAL_ID_RECLAIM`` alert.
|
|
|
|
#. If all clients cannot be upgraded immediately, the health alerts can be
|
|
temporarily muted with::
|
|
|
|
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week
|
|
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week
|
|
|
|
#. After all clients have been updated and the ``AUTH_INSECURE_GLOBAL_ID_RECLAIM``
|
|
alert is no longer present, the cluster should be set to prevent insecure
|
|
global_id reclaim with::
|
|
|
|
ceph config set mon auth_allow_insecure_global_id_reclaim false
|