mirror of https://github.com/ceph/ceph
142 lines
6.0 KiB
ReStructuredText
142 lines
6.0 KiB
ReStructuredText
=========================
|
|
Create a Ceph file system
|
|
=========================
|
|
|
|
Creating pools
|
|
==============
|
|
|
|
A Ceph file system requires at least two RADOS pools, one for data and one for metadata.
|
|
There are important considerations when planning these pools:
|
|
|
|
- We recommend configuring *at least* 3 replicas for the metadata pool,
|
|
as data loss in this pool can render the entire file system inaccessible.
|
|
Configuring 4 would not be extreme, especially since the metadata pool's
|
|
capacity requirements are quite modest.
|
|
- We recommend the fastest feasible low-latency storage devices (NVMe, Optane,
|
|
or at the very least SAS/SATA SSD) for the metadata pool, as this will
|
|
directly affect the latency of client file system operations.
|
|
- We strongly suggest that the CephFS metadata pool be provisioned on dedicated
|
|
SSD / NVMe OSDs. This ensures that high client workload does not adversely
|
|
impact metadata operations. See :ref:`device_classes` to configure pools this
|
|
way.
|
|
- The data pool used to create the file system is the "default" data pool and
|
|
the location for storing all inode backtrace information, which is used for hard link
|
|
management and disaster recovery. For this reason, all CephFS inodes
|
|
have at least one object in the default data pool. If erasure-coded
|
|
pools are planned for file system data, it is best to configure the default as
|
|
a replicated pool to improve small-object write and
|
|
read performance when updating backtraces. Separately, another erasure-coded
|
|
data pool can be added (see also :ref:`ecpool`) that can be used on an entire
|
|
hierarchy of directories and files (see also :ref:`file-layouts`).
|
|
|
|
Refer to :doc:`/rados/operations/pools` to learn more about managing pools. For
|
|
example, to create two pools with default settings for use with a file system, you
|
|
might run the following commands:
|
|
|
|
.. code:: bash
|
|
|
|
$ ceph osd pool create cephfs_data
|
|
$ ceph osd pool create cephfs_metadata
|
|
|
|
The metadata pool will typically hold at most a few gigabytes of data. For
|
|
this reason, a smaller PG count is usually recommended. 64 or 128 is commonly
|
|
used in practice for large clusters.
|
|
|
|
.. note:: The names of the file systems, metadata pools, and data pools can
|
|
only have characters in the set [a-zA-Z0-9\_-.].
|
|
|
|
Creating a file system
|
|
======================
|
|
|
|
Once the pools are created, you may enable the file system using the ``fs new`` command:
|
|
|
|
.. code:: bash
|
|
|
|
$ ceph fs new <fs_name> <metadata> <data> [--force] [--allow-dangerous-metadata-overlay] [<fscid:int>] [--recover] [--yes-i-really-really-mean-it] [<set>...]
|
|
|
|
This command creates a new file system with specified metadata and data pool.
|
|
The specified data pool is the default data pool and cannot be changed once set.
|
|
Each file system has its own set of MDS daemons assigned to ranks so ensure that
|
|
you have sufficient standby daemons available to accommodate the new file system.
|
|
|
|
.. note::
|
|
``--yes-i-really-really-mean-it`` may be used for some ``fs set`` commands
|
|
|
|
The ``--force`` option is used to achieve any of the following:
|
|
|
|
- To set an erasure-coded pool for the default data pool. Use of an EC pool for the
|
|
default data pool is discouraged. Refer to `Creating pools`_ for details.
|
|
- To set non-empty pool (pool already contains some objects) for the metadata pool.
|
|
- To create a file system with a specific file system's ID (fscid).
|
|
The --force option is required with --fscid option.
|
|
|
|
The ``--allow-dangerous-metadata-overlay`` option permits the reuse metadata and
|
|
data pools if it is already in-use. This should only be done in emergencies and
|
|
after careful reading of the documentation.
|
|
|
|
If the ``--fscid`` option is provided then this creates a file system with a
|
|
specific fscid. This can be used when an application expects the file system's ID
|
|
to be stable after it has been recovered, e.g., after monitor databases are
|
|
lost and rebuilt. Consequently, file system IDs don't always keep increasing
|
|
with newer file systems.
|
|
|
|
The ``--recover`` option sets the state of file system's rank 0 to existing but
|
|
failed. So when a MDS daemon eventually picks up rank 0, the daemon reads the
|
|
existing in-RADOS metadata and doesn't overwrite it. The flag also prevents the
|
|
standby MDS daemons to join the file system.
|
|
|
|
The ``set`` option allows to set multiple options supported by ``fs set``
|
|
atomically with the creation of the file system.
|
|
|
|
For example:
|
|
|
|
.. code:: bash
|
|
|
|
$ ceph fs new cephfs cephfs_metadata cephfs_data set max_mds 2 allow_standby_replay true
|
|
$ ceph fs ls
|
|
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
|
|
|
|
Once a file system has been created, your MDS(s) will be able to enter
|
|
an *active* state. For example, in a single MDS system:
|
|
|
|
.. code:: bash
|
|
|
|
$ ceph mds stat
|
|
cephfs-1/1/1 up {0=a=up:active}
|
|
|
|
Once the file system is created and the MDS is active, you are ready to mount
|
|
the file system. If you have created more than one file system, you will
|
|
choose which to use when mounting.
|
|
|
|
- `Mount CephFS`_
|
|
- `Mount CephFS as FUSE`_
|
|
- `Mount CephFS on Windows`_
|
|
|
|
.. _Mount CephFS: ../../cephfs/mount-using-kernel-driver
|
|
.. _Mount CephFS as FUSE: ../../cephfs/mount-using-fuse
|
|
.. _Mount CephFS on Windows: ../../cephfs/ceph-dokan
|
|
|
|
If you have created more than one file system, and a client does not
|
|
specify a file system when mounting, you can control which file system
|
|
they will see by using the ``ceph fs set-default`` command.
|
|
|
|
Adding a Data Pool to the File System
|
|
-------------------------------------
|
|
|
|
See :ref:`adding-data-pool-to-file-system`.
|
|
|
|
|
|
Using Erasure Coded pools with CephFS
|
|
=====================================
|
|
|
|
You may use Erasure Coded pools as CephFS data pools as long as they have overwrites enabled, which is done as follows:
|
|
|
|
.. code:: bash
|
|
|
|
ceph osd pool set my_ec_pool allow_ec_overwrites true
|
|
|
|
Note that EC overwrites are only supported when using OSDs with the BlueStore backend.
|
|
|
|
You may not use Erasure Coded pools as CephFS metadata pools, because CephFS metadata is stored using RADOS *OMAP* data structures, which EC pools cannot store.
|
|
|