mirror of
https://github.com/ceph/ceph
synced 2025-01-26 04:55:30 +00:00
35 lines
1.7 KiB
ReStructuredText
35 lines
1.7 KiB
ReStructuredText
|
==================================
|
||
|
CephFS Dynamic Metadata Management
|
||
|
==================================
|
||
|
Metadata operations usually take up more than 50 percent of all
|
||
|
file system operations. Also the metadata scales in a more complex
|
||
|
fashion when compared to scaling storage (which in turn scales I/O
|
||
|
throughput linearly). This is due to the hierarchical and
|
||
|
interdependent nature of the file system metadata. So in CephFS,
|
||
|
the metadata workload is decoupled from data workload so as to
|
||
|
avoid placing unnecessary strain on the RADOS cluster. The metadata
|
||
|
is hence handled by a cluster of Metadata Servers (MDSs).
|
||
|
CephFS distributes metadata across MDSs via `Dynamic Subtree Partitioning <https://ceph.com/wp-content/uploads/2016/08/weil-mds-sc04.pdf>`__.
|
||
|
|
||
|
Dynamic Subtree Partitioning
|
||
|
----------------------------
|
||
|
In traditional subtree partitioning, subtrees of the file system
|
||
|
hierarchy are assigned to individual MDSs. This metadata distribution
|
||
|
strategy provides good hierarchical locality, linear growth of
|
||
|
cache and horizontal scaling across MDSs and a fairly good distribution
|
||
|
of metadata across MDSs.
|
||
|
|
||
|
.. image:: subtree-partitioning.svg
|
||
|
|
||
|
The problem with traditional subtree partitioning is that the workload
|
||
|
growth by depth (across a single MDS) leads to a hotspot of activity.
|
||
|
This results in lack of vertical scaling and wastage of non-busy resources/MDSs.
|
||
|
|
||
|
This led to the adoption of a more dynamic way of handling
|
||
|
metadata: Dynamic Subtree Partitioning, where load intensive portions
|
||
|
of the directory hierarchy from busy MDSs are migrated to non busy MDSs.
|
||
|
|
||
|
This strategy ensures that activity hotspots are relieved as they
|
||
|
appear and so leads to vertical scaling of the metadata workload in
|
||
|
addition to horizontal scaling.
|