2018-07-11 19:56:06 +00:00
|
|
|
.. _cephfs-disaster-recovery:
|
2015-01-05 15:46:38 +00:00
|
|
|
|
|
|
|
Disaster recovery
|
|
|
|
=================
|
|
|
|
|
2018-07-10 09:41:52 +00:00
|
|
|
Metadata damage and repair
|
|
|
|
--------------------------
|
2015-01-05 15:46:38 +00:00
|
|
|
|
2019-09-09 19:36:04 +00:00
|
|
|
If a file system has inconsistent or missing metadata, it is considered
|
2018-07-10 09:41:52 +00:00
|
|
|
*damaged*. You may find out about damage from a health message, or in some
|
|
|
|
unfortunate cases from an assertion in a running MDS daemon.
|
2015-01-05 15:46:38 +00:00
|
|
|
|
2018-07-10 09:41:52 +00:00
|
|
|
Metadata damage can result either from data loss in the underlying RADOS
|
|
|
|
layer (e.g. multiple disk failures that lose all copies of a PG), or from
|
|
|
|
software bugs.
|
2015-01-05 15:46:38 +00:00
|
|
|
|
2019-09-09 19:36:04 +00:00
|
|
|
CephFS includes some tools that may be able to recover a damaged file system,
|
2018-07-10 09:41:52 +00:00
|
|
|
but to use them safely requires a solid understanding of CephFS internals.
|
|
|
|
The documentation for these potentially dangerous operations is on a
|
|
|
|
separate page: :ref:`disaster-recovery-experts`.
|
2015-01-05 15:46:38 +00:00
|
|
|
|
2018-07-10 09:41:52 +00:00
|
|
|
Data pool damage (files affected by lost data PGs)
|
|
|
|
--------------------------------------------------
|
2015-01-05 15:46:38 +00:00
|
|
|
|
2019-09-09 19:36:04 +00:00
|
|
|
If a PG is lost in a *data* pool, then the file system will continue
|
2018-07-10 09:41:52 +00:00
|
|
|
to operate normally, but some parts of some files will simply
|
|
|
|
be missing (reads will return zeros).
|
2016-09-08 22:58:26 +00:00
|
|
|
|
|
|
|
Losing a data PG may affect many files. Files are split into many objects,
|
|
|
|
so identifying which files are affected by loss of particular PGs requires
|
|
|
|
a full scan over all object IDs that may exist within the size of a file.
|
|
|
|
This type of scan may be useful for identifying which files require
|
|
|
|
restoring from a backup.
|
|
|
|
|
|
|
|
.. danger::
|
|
|
|
|
|
|
|
This command does not repair any metadata, so when restoring files in
|
|
|
|
this case you must *remove* the damaged file, and replace it in order
|
|
|
|
to have a fresh inode. Do not overwrite damaged files in place.
|
|
|
|
|
|
|
|
If you know that objects have been lost from PGs, use the ``pg_files``
|
|
|
|
subcommand to scan for files that may have been damaged as a result:
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
cephfs-data-scan pg_files <path> <pg id> [<pg id>...]
|
|
|
|
|
|
|
|
For example, if you have lost data from PGs 1.4 and 4.5, and you would like
|
|
|
|
to know which files under /home/bob might have been damaged:
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
cephfs-data-scan pg_files /home/bob 1.4 4.5
|
|
|
|
|
|
|
|
The output will be a list of paths to potentially damaged files, one
|
|
|
|
per line.
|
|
|
|
|
|
|
|
Note that this command acts as a normal CephFS client to find all the
|
2019-09-09 19:36:04 +00:00
|
|
|
files in the file system and read their layouts, so the MDS must be
|
2016-09-08 22:58:26 +00:00
|
|
|
up and running.
|
|
|
|
|