mirror of
https://github.com/ceph/ceph
synced 2025-01-08 20:21:33 +00:00
82f9960162
I'm sure people will still find them, but let's at least force people to click through one more time to get to the commands that can damage your cluster. Also, the ".. danger" directive at the top of the page wasn't actually getting special formatting, so I changed it to a ".. warning" which is red. Signed-off-by: John Spray <john.spray@redhat.com>
61 lines
2.1 KiB
ReStructuredText
61 lines
2.1 KiB
ReStructuredText
|
|
Disaster recovery
|
|
=================
|
|
|
|
Metadata damage and repair
|
|
--------------------------
|
|
|
|
If a filesystem has inconsistent or missing metadata, it is considered
|
|
*damaged*. You may find out about damage from a health message, or in some
|
|
unfortunate cases from an assertion in a running MDS daemon.
|
|
|
|
Metadata damage can result either from data loss in the underlying RADOS
|
|
layer (e.g. multiple disk failures that lose all copies of a PG), or from
|
|
software bugs.
|
|
|
|
CephFS includes some tools that may be able to recover a damaged filesystem,
|
|
but to use them safely requires a solid understanding of CephFS internals.
|
|
The documentation for these potentially dangerous operations is on a
|
|
separate page: :ref:`disaster-recovery-experts`.
|
|
|
|
Data pool damage (files affected by lost data PGs)
|
|
--------------------------------------------------
|
|
|
|
If a PG is lost in a *data* pool, then the filesystem will continue
|
|
to operate normally, but some parts of some files will simply
|
|
be missing (reads will return zeros).
|
|
|
|
Losing a data PG may affect many files. Files are split into many objects,
|
|
so identifying which files are affected by loss of particular PGs requires
|
|
a full scan over all object IDs that may exist within the size of a file.
|
|
This type of scan may be useful for identifying which files require
|
|
restoring from a backup.
|
|
|
|
.. danger::
|
|
|
|
This command does not repair any metadata, so when restoring files in
|
|
this case you must *remove* the damaged file, and replace it in order
|
|
to have a fresh inode. Do not overwrite damaged files in place.
|
|
|
|
If you know that objects have been lost from PGs, use the ``pg_files``
|
|
subcommand to scan for files that may have been damaged as a result:
|
|
|
|
::
|
|
|
|
cephfs-data-scan pg_files <path> <pg id> [<pg id>...]
|
|
|
|
For example, if you have lost data from PGs 1.4 and 4.5, and you would like
|
|
to know which files under /home/bob might have been damaged:
|
|
|
|
::
|
|
|
|
cephfs-data-scan pg_files /home/bob 1.4 4.5
|
|
|
|
The output will be a list of paths to potentially damaged files, one
|
|
per line.
|
|
|
|
Note that this command acts as a normal CephFS client to find all the
|
|
files in the filesystem and read their layouts, so the MDS must be
|
|
up and running.
|
|
|