mirror of
https://github.com/ceph/ceph
synced 2024-12-18 17:37:38 +00:00
doc/cephfs: recover file system after recovering
... monitor stores using OSDs. The steps are valid only to recover single active MDS file systems. Partially-fixes: https://tracker.ceph.com/issues/51341 Signed-off-by: Ramana Raja <rraja@redhat.com>
This commit is contained in:
parent
0509deb6a8
commit
ffe5cfb687
@ -170,6 +170,7 @@ Troubleshooting and Disaster Recovery
|
||||
Troubleshooting <troubleshooting>
|
||||
Disaster recovery <disaster-recovery>
|
||||
cephfs-journal-tool <cephfs-journal-tool>
|
||||
Recovering file system after monitor store loss <recover-fs-after-mon-store-loss>
|
||||
|
||||
|
||||
.. raw:: html
|
||||
|
59
doc/cephfs/recover-fs-after-mon-store-loss.rst
Normal file
59
doc/cephfs/recover-fs-after-mon-store-loss.rst
Normal file
@ -0,0 +1,59 @@
|
||||
Recovering the file system after catastrophic Monitor store loss
|
||||
================================================================
|
||||
|
||||
During rare occasions, all the monitor stores of a cluster may get corrupted
|
||||
or lost. To recover the cluster in such a scenario, you need to rebuild the
|
||||
monitor stores using the OSDs (see :ref:`mon-store-recovery-using-osds`),
|
||||
and get back the pools intact (active+clean state). However, the rebuilt monitor
|
||||
stores don't restore the file system maps ("FSMap"). Additional steps are required
|
||||
to bring back the file system. The steps to recover a multiple active MDS file
|
||||
system or multiple file systems are yet to be identified. Currently, only the steps
|
||||
to recover a **single active MDS** file system with no additional file systems
|
||||
in the cluster have been identified and tested. Briefly the steps are: stop the
|
||||
MDSs; recreate the FSMap with basic defaults; and allow MDSs to recover from
|
||||
the journal/metadata stored in the filesystem's pools. The steps are described
|
||||
in more detail below.
|
||||
|
||||
First up, stop all the MDSs of the cluster.
|
||||
|
||||
Verify that the MDSs have been stopped. Execute the below command and
|
||||
check that no active or standby MDS daemons are listed for the file system.
|
||||
|
||||
::
|
||||
|
||||
ceph fs dump
|
||||
|
||||
Recreate the file system using the recovered file system pools. The new FSMap
|
||||
will have the filesystem's default settings. However, the user defined file
|
||||
system settings such as ``standby_count_wanted``, ``required_client_features``,
|
||||
extra data pools, etc., are lost and need to be reapplied later.
|
||||
|
||||
::
|
||||
|
||||
ceph fs new <fs_name> <metadata_pool> <data_pool> --force
|
||||
|
||||
The file system cluster ID, fscid, of the file system will not be preserved.
|
||||
This behaviour may not be desirable for certain applications (e.g., Ceph CSI)
|
||||
that expect the file system to be unchanged across recovery. To fix this, pass
|
||||
the desired fscid when recreating the file system.
|
||||
|
||||
::
|
||||
|
||||
ceph fs new <fs_name> <metadata_pool> <data_pool> --fscid <fscid> --force
|
||||
|
||||
Next, reset the file system. The below command marks the state of the
|
||||
file system's rank 0 such that eventually when a MDS daemon picks up rank 0 the
|
||||
daemon reads the existing in-RADOS metadata and doesn't overwrite it.
|
||||
|
||||
::
|
||||
|
||||
ceph fs reset <fs_name> --yes-i-really-mean-it
|
||||
|
||||
Restart the MDSs. Check that the file system is no longer in degraded state and
|
||||
one of the MDSs is active.
|
||||
|
||||
::
|
||||
|
||||
ceph fs dump
|
||||
|
||||
Reapply any other custom file system settings.
|
@ -433,6 +433,8 @@ If there are any survivors, we can always :ref:`replace <adding-and-removing-mon
|
||||
new one. After booting up, the new joiner will sync up with a healthy
|
||||
peer, and once it is fully sync'ed, it will be able to serve the clients.
|
||||
|
||||
.. _mon-store-recovery-using-osds:
|
||||
|
||||
Recovery using OSDs
|
||||
-------------------
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user