================================== CephFS Dynamic Metadata Management ================================== Metadata operations usually take up more than 50 percent of all file system operations. Also the metadata scales in a more complex fashion when compared to scaling storage (which in turn scales I/O throughput linearly). This is due to the hierarchical and interdependent nature of the file system metadata. So in CephFS, the metadata workload is decoupled from data workload so as to avoid placing unnecessary strain on the RADOS cluster. The metadata is hence handled by a cluster of Metadata Servers (MDSs). CephFS distributes metadata across MDSs via `Dynamic Subtree Partitioning `__. Dynamic Subtree Partitioning ---------------------------- In traditional subtree partitioning, subtrees of the file system hierarchy are assigned to individual MDSs. This metadata distribution strategy provides good hierarchical locality, linear growth of cache and horizontal scaling across MDSs and a fairly good distribution of metadata across MDSs. .. image:: subtree-partitioning.svg The problem with traditional subtree partitioning is that the workload growth by depth (across a single MDS) leads to a hotspot of activity. This results in lack of vertical scaling and wastage of non-busy resources/MDSs. This led to the adoption of a more dynamic way of handling metadata: Dynamic Subtree Partitioning, where load intensive portions of the directory hierarchy from busy MDSs are migrated to non busy MDSs. This strategy ensures that activity hotspots are relieved as they appear and so leads to vertical scaling of the metadata workload in addition to horizontal scaling.