From 0a5032138b8bb5eb0c13ea7234025e65d862b124 Mon Sep 17 00:00:00 2001 From: Venky Shankar Date: Mon, 16 Sep 2019 14:56:22 +0530 Subject: [PATCH] doc: document mds journaling Fixes: http://tracker.ceph.com/issues/41783 Signed-off-by: Venky Shankar --- doc/cephfs/index.rst | 1 + doc/cephfs/mds-journaling.rst | 57 +++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) create mode 100644 doc/cephfs/mds-journaling.rst diff --git a/doc/cephfs/index.rst b/doc/cephfs/index.rst index 0f36ca1cb4f..d301b94d2a0 100644 --- a/doc/cephfs/index.rst +++ b/doc/cephfs/index.rst @@ -104,6 +104,7 @@ authentication keyring. Experimental Features CephFS Quotas Using Ceph with Hadoop + MDS Journaling cephfs-journal-tool File layouts Client eviction diff --git a/doc/cephfs/mds-journaling.rst b/doc/cephfs/mds-journaling.rst new file mode 100644 index 00000000000..c13bf43c706 --- /dev/null +++ b/doc/cephfs/mds-journaling.rst @@ -0,0 +1,57 @@ +MDS Journaling +============== + +CephFS Metadata Pool +-------------------- + +CephFS uses a separate (metadata) pool for managing file metadata (inodes and +dentries) in a Ceph File System. The metadata pool has all the information about +files in a Ceph File System including the File System hierarchy. Additionally, +CephFS maintains meta information related to other entities in a file system +such as file system journals, open file table, session map, etc. + +This document describes how Ceph Metadata Servers use and rely on journaling. + +CephFS MDS Journaling +--------------------- + +CephFS metadata servers stream a journal of metadata events into RADOS in the metadata +pool prior to executing a file system operation. Active MDS daemon(s) manage metadata +for files and directories in CephFS. + +CephFS uses journaling for couple of reasons: + +#. Consistency: On an MDS failover, the journal events can be replayed to reach a + consistent file system state. Also, metadata operations that require multiple + updates to the backing store need to be journaled for crash consistency (along + with other consistency mechanisms such as locking, etc..). + +#. Performance: Journal updates are (mostly) sequential, hence updates to journals + are fast. Furthermore, updates can be batched into single write, thereby saving + disk seek time involved in updates to different parts of a file. Having a large + journal also helps a standby MDS to warm its cache which helps indirectly during + MDS failover. + +Each active metadata server maintains its own journal in the metadata pool. Journals +are striped over multiple objects. Journal entries which are not required (deemed as +old) are trimmed by the metadata server. + +Journal Events +-------------- + +Apart from journaling file system metadata updates, CephFS journals various other events +such as client session info and directory import/export state to name a few. These events +are used by the metadata sever to reestablish correct state as required, e.g., Ceph MDS +tries to reconnect clients on restart when journal events get replayed and a specific +event type in the journal specifies that a client entity type has a session with the MDS +before it was restarted. + +To examine the list of such events recorded in the journal, CephFS provides a command +line utility `cephfs-journal-tool` which can be used as follows: + +:: + + cephfs-journal-tool --rank=: event get list + +`cephfs-journal-tool` is also used to discover and repair a damaged Ceph File System. +(See :doc:`/cephfs/cephfs-journal-tool` for more details)