ceph/doc/cephfs/full.rst


Handling a full Ceph file system
================================

When a RADOS cluster reaches its ``mon_osd_full_ratio`` (default
95%) capacity, it is marked with the OSD full flag.  This flag causes
most normal RADOS clients to pause all operations until it is resolved
(for example by adding more capacity to the cluster).

The file system has some special handling of the full flag, explained below.

Hammer and later
----------------

Since the hammer release, a full file system will lead to ENOSPC
results from:

 * Data writes on the client
 * Metadata operations other than deletes and truncates

Because the full condition may not be encountered until
data is flushed to disk (sometime after a ``write`` call has already
returned 0), the ENOSPC error may not be seen until the application
calls ``fsync`` or ``fclose`` (or equivalent) on the file handle.

Calling ``fsync`` is guaranteed to reliably indicate whether the data
made it to disk, and will return an error if it doesn't.  ``fclose`` will
only return an error if buffered data happened to be flushed since
the last write -- a successful ``fclose`` does not guarantee that the
data made it to disk, and in a full-space situation, buffered data
may be discarded after an ``fclose`` if no space is available to persist it.

.. warning::
    If an application appears to be misbehaving on a full file system,
    check that it is performing ``fsync()`` calls as necessary to ensure
    data is on disk before proceeding.

Data writes may be cancelled by the client if they are in flight at the
time the OSD full flag is sent.  Clients update the ``osd_epoch_barrier``
when releasing capabilities on files affected by cancelled operations, in
order to ensure that these cancelled operations do not interfere with
subsequent access to the data objects by the MDS or other clients.  For
more on the epoch barrier mechanism, see :ref:`background_blocklisting_and_osd_epoch_barrier`.

Legacy (pre-hammer) behavior
----------------------------

In versions of Ceph earlier than hammer, the MDS would ignore
the full status of the RADOS cluster, and any data writes from
clients would stall until the cluster ceased to be full.

There are two dangerous conditions to watch for with this behaviour:

* If a client had pending writes to a file, then it was not possible
  for the client to release the file to the MDS for deletion: this could
  lead to difficulty clearing space on a full file system
* If clients continued to create a large number of empty files, the
  resulting metadata writes from the MDS could lead to total exhaustion
  of space on the OSDs such that no further deletions could be performed.
doc: add cephfs ENOSPC and eviction information Adding this at this time to give us a sensible place to talk about the epoch barrier stuff. The eviction stuff will probably get simplified once we add a mon-side eviction command that handles blacklisting and MDS session eviction in one go. Signed-off-by: John Spray <john.spray@redhat.com> 2014-12-09 13:21:44 +00:00
doc: filesystem to file system "Filesystem" is not a word (although fairly common in use). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> 2019-09-09 19:36:04 +00:00			`Handling a full Ceph file system`
			`================================`
doc: add cephfs ENOSPC and eviction information Adding this at this time to give us a sensible place to talk about the epoch barrier stuff. The eviction stuff will probably get simplified once we add a mon-side eviction command that handles blacklisting and MDS session eviction in one go. Signed-off-by: John Spray <john.spray@redhat.com> 2014-12-09 13:21:44 +00:00
			When a RADOS cluster reaches its ``mon_osd_full_ratio`` (default
			`95%) capacity, it is marked with the OSD full flag. This flag causes`
			`most normal RADOS clients to pause all operations until it is resolved`
			`(for example by adding more capacity to the cluster).`

doc: filesystem to file system "Filesystem" is not a word (although fairly common in use). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> 2019-09-09 19:36:04 +00:00			`The file system has some special handling of the full flag, explained below.`
doc: add cephfs ENOSPC and eviction information Adding this at this time to give us a sensible place to talk about the epoch barrier stuff. The eviction stuff will probably get simplified once we add a mon-side eviction command that handles blacklisting and MDS session eviction in one go. Signed-off-by: John Spray <john.spray@redhat.com> 2014-12-09 13:21:44 +00:00
			`Hammer and later`
			`----------------`

doc: filesystem to file system "Filesystem" is not a word (although fairly common in use). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> 2019-09-09 19:36:04 +00:00			`Since the hammer release, a full file system will lead to ENOSPC`
doc: add cephfs ENOSPC and eviction information Adding this at this time to give us a sensible place to talk about the epoch barrier stuff. The eviction stuff will probably get simplified once we add a mon-side eviction command that handles blacklisting and MDS session eviction in one go. Signed-off-by: John Spray <john.spray@redhat.com> 2014-12-09 13:21:44 +00:00			`results from:`

			`* Data writes on the client`
			`* Metadata operations other than deletes and truncates`

			`Because the full condition may not be encountered until`
			data is flushed to disk (sometime after a ``write`` call has already
			`returned 0), the ENOSPC error may not be seen until the application`
			calls ``fsync`` or ``fclose`` (or equivalent) on the file handle.

			Calling ``fsync`` is guaranteed to reliably indicate whether the data
			made it to disk, and will return an error if it doesn't. ``fclose`` will
			`only return an error if buffered data happened to be flushed since`
			the last write -- a successful ``fclose`` does not guarantee that the
			`data made it to disk, and in a full-space situation, buffered data`
			may be discarded after an ``fclose`` if no space is available to persist it.

			`.. warning::`
doc: filesystem to file system "Filesystem" is not a word (although fairly common in use). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> 2019-09-09 19:36:04 +00:00			`If an application appears to be misbehaving on a full file system,`
doc: add cephfs ENOSPC and eviction information Adding this at this time to give us a sensible place to talk about the epoch barrier stuff. The eviction stuff will probably get simplified once we add a mon-side eviction command that handles blacklisting and MDS session eviction in one go. Signed-off-by: John Spray <john.spray@redhat.com> 2014-12-09 13:21:44 +00:00			check that it is performing ``fsync()`` calls as necessary to ensure
			`data is on disk before proceeding.`

			`Data writes may be cancelled by the client if they are in flight at the`
			time the OSD full flag is sent. Clients update the ``osd_epoch_barrier``
			`when releasing capabilities on files affected by cancelled operations, in`
			`order to ensure that these cancelled operations do not interfere with`
			`subsequent access to the data objects by the MDS or other clients. For`
blacklist -> blocklist Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Neha Ojha <nojha@redhat.com> 2020-06-15 15:27:03 +00:00			more on the epoch barrier mechanism, see :ref:`background_blocklisting_and_osd_epoch_barrier`.
doc: add cephfs ENOSPC and eviction information Adding this at this time to give us a sensible place to talk about the epoch barrier stuff. The eviction stuff will probably get simplified once we add a mon-side eviction command that handles blacklisting and MDS session eviction in one go. Signed-off-by: John Spray <john.spray@redhat.com> 2014-12-09 13:21:44 +00:00
			`Legacy (pre-hammer) behavior`
			`----------------------------`

			`In versions of Ceph earlier than hammer, the MDS would ignore`
			`the full status of the RADOS cluster, and any data writes from`
			`clients would stall until the cluster ceased to be full.`

			`There are two dangerous conditions to watch for with this behaviour:`

			`* If a client had pending writes to a file, then it was not possible`
			`for the client to release the file to the MDS for deletion: this could`
doc: filesystem to file system "Filesystem" is not a word (although fairly common in use). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> 2019-09-09 19:36:04 +00:00			`lead to difficulty clearing space on a full file system`
doc: add cephfs ENOSPC and eviction information Adding this at this time to give us a sensible place to talk about the epoch barrier stuff. The eviction stuff will probably get simplified once we add a mon-side eviction command that handles blacklisting and MDS session eviction in one go. Signed-off-by: John Spray <john.spray@redhat.com> 2014-12-09 13:21:44 +00:00			`* If clients continued to create a large number of empty files, the`
			`resulting metadata writes from the MDS could lead to total exhaustion`
			`of space on the OSDs such that no further deletions could be performed.`