ceph/doc/cephfs/disaster-recovery.rst
John Spray 82f9960162 doc/cephfs: make scary DR bits less prominent
I'm sure people will still find them, but let's at least
force people to click through one more time to get to the
commands that can damage your cluster.

Also, the ".. danger" directive at the top of the page
wasn't actually getting special formatting, so I changed
it to a ".. warning" which is red.

Signed-off-by: John Spray <john.spray@redhat.com>
2018-07-10 10:52:52 +01:00

61 lines
2.1 KiB
ReStructuredText

Disaster recovery
=================
Metadata damage and repair
--------------------------
If a filesystem has inconsistent or missing metadata, it is considered
*damaged*. You may find out about damage from a health message, or in some
unfortunate cases from an assertion in a running MDS daemon.
Metadata damage can result either from data loss in the underlying RADOS
layer (e.g. multiple disk failures that lose all copies of a PG), or from
software bugs.
CephFS includes some tools that may be able to recover a damaged filesystem,
but to use them safely requires a solid understanding of CephFS internals.
The documentation for these potentially dangerous operations is on a
separate page: :ref:`disaster-recovery-experts`.
Data pool damage (files affected by lost data PGs)
--------------------------------------------------
If a PG is lost in a *data* pool, then the filesystem will continue
to operate normally, but some parts of some files will simply
be missing (reads will return zeros).
Losing a data PG may affect many files. Files are split into many objects,
so identifying which files are affected by loss of particular PGs requires
a full scan over all object IDs that may exist within the size of a file.
This type of scan may be useful for identifying which files require
restoring from a backup.
.. danger::
This command does not repair any metadata, so when restoring files in
this case you must *remove* the damaged file, and replace it in order
to have a fresh inode. Do not overwrite damaged files in place.
If you know that objects have been lost from PGs, use the ``pg_files``
subcommand to scan for files that may have been damaged as a result:
::
cephfs-data-scan pg_files <path> <pg id> [<pg id>...]
For example, if you have lost data from PGs 1.4 and 4.5, and you would like
to know which files under /home/bob might have been damaged:
::
cephfs-data-scan pg_files /home/bob 1.4 4.5
The output will be a list of paths to potentially damaged files, one
per line.
Note that this command acts as a normal CephFS client to find all the
files in the filesystem and read their layouts, so the MDS must be
up and running.