================== Configuring Ceph ================== When you start the Ceph service, the initialization process activates a series of daemons that run in the background. The hosts in a typical Ceph cluster run at least one of four daemons: - Object Storage Device (``ceph-osd``) - Monitor (``ceph-mon``) - Metadata Server (``ceph-mds``) - Ceph Gateway (``radosgw``) For your convenience, each daemon has a series of default values (*i.e.*, many are set by ``ceph/src/common/config_opts.h``). You may override these settings with a Ceph configuration file. .. _ceph-conf-file: The ceph.conf File ================== When you start a Ceph cluster, each daemon looks for a ``ceph.conf`` file that provides its configuration settings. For manual deployments, you need to create a ``ceph.conf`` file to configure your cluster. For third party tools that create configuration files for you (*e.g.*, Chef), you may use the information contained herein as a reference. The ``ceph.conf`` file defines: - Cluster membership - Host names - Host addresses - Paths to keyrings - Paths to journals - Paths to data - Other runtime options The default ``ceph.conf`` locations in sequential order include: #. ``$CEPH_CONF`` (*i.e.,* the path following the ``$CEPH_CONF`` environment variable) #. ``-c path/path`` (*i.e.,* the ``-c`` command line argument) #. ``/etc/ceph/ceph.conf`` #. ``~/.ceph/config`` #. ``./ceph.conf`` (*i.e.,* in the current working directory) The ``ceph.conf`` file uses an *ini* style syntax. You can add comments to the ``ceph.conf`` file by preceding comments with a semi-colon (;) or a pound sign (#). For example: .. code-block:: ini # <--A number (#) sign precedes a comment. ; A comment may be anything. # Comments always follow a semi-colon (;) or a pound (#) on each line. # The end of the line terminates a comment. # We recommend that you provide comments in your configuration file(s). .. _ceph-conf-settings: ceph.conf Settings ================== The ``ceph.conf`` file can configure all daemons in a cluster, or all daemons of a particular type. To configure a series of daemons, the settings must be included under the processes that will receive the configuration as follows: ``[global]`` :Description: Settings under ``[global]`` affect all daemons in a Ceph cluster. :Example: ``auth supported = cephx`` ``[osd]`` :Description: Settings under ``[osd]`` affect all ``ceph-osd`` daemons in the cluster. :Example: ``osd journal size = 1000`` ``[mon]`` :Description: Settings under ``[mon]`` affect all ``ceph-mon`` daemons in the cluster. :Example: ``mon addr = 10.0.0.101:6789`` ``[mds]`` :Description: Settings under ``[mds]`` affect all ``ceph-mds`` daemons in the cluster. :Example: ``host = myserver01`` Global settings affect all instances of all daemon in the cluster. Use the ``[global]`` setting for values that are common for all daemons in the cluster. You can override each ``[global]`` setting by: #. Changing the setting in a particular process type (*e.g.,* ``[osd]``, ``[mon]``, ``[mds]`` ). #. Changing the setting in a particular process (*e.g.,* ``[osd.1]`` ) Overriding a global setting affects all child processes, except those that you specifically override. A typical global setting involves activating authentication. For example: .. code-block:: ini [global] # Enable authentication between hosts within the cluster. auth supported = cephx You can specify settings that apply to a particular type of daemon. When you specify settings under ``[osd]``, ``[mon]`` or ``[mds]`` without specifying a particular instance, the setting will apply to all OSDs, monitors or metadata daemons respectively. You may specify settings for particular instances of a daemon. You may specify an instance by entering its type, delimited by a period (.) and by the instance ID. The instance ID for an OSD is always numeric, but it may be alphanumeric for monitors and metadata servers. .. code-block:: ini [osd.1] # settings affect osd.1 only. [mon.a] # settings affect mon.a only. [mds.b] # settings affect mds.b only. .. _ceph-metavariables: Metavariables ============= Metavariables simplify cluster configuration dramatically. When a metavariable is set in a configuration value, Ceph expands the metavariable into a concrete value. Metavariables are very powerful when used within the ``[global]``, ``[osd]``, ``[mon]`` or ``[mds]`` sections of your configuration file. Ceph metavariables are similar to Bash shell expansion. Ceph supports the following metavariables: ``$cluster`` :Description: Expands to the cluster name. Useful when running multiple clusters on the same hardware. :Example: ``/etc/ceph/$cluster.keyring`` :Default: ``ceph`` ``$type`` :Description: Expands to one of ``mds``, ``osd``, or ``mon``, depending on the type of the current daemon. :Example: ``/var/lib/ceph/$type`` ``$id`` :Description: Expands to the daemon identifier. For ``osd.0``, this would be ``0``; for ``mds.a``, it would be ``a``. :Example: ``/var/lib/ceph/$type/$cluster-$id`` ``$host`` :Description: Expands to the host name of the current daemon. ``$name`` :Description: Expands to ``$type.$id``. :Example: ``/var/run/ceph/$cluster-$name.asok`` .. _ceph-conf-common-settings: Common Settings =============== The `Hardware Recommendations`_ section provides some hardware guidelines for configuring the cluster. It is possible for a single host to run multiple daemons. For example, a single host with multiple disks or RAIDs may run one ``ceph-osd`` for each disk or RAID. Additionally, a host may run both a ``ceph-mon`` and an ``ceph-osd`` daemon on the same host. Ideally, you will have a host for a particular type of process. For example, one host may run ``ceph-osd`` daemons, another host may run a ``ceph-mds`` daemon, and other hosts may run ``ceph-mon`` daemons. Each host has a name identified by the ``host`` setting. Monitors also specify a network address and port (i.e., domain name or IP address) identified by the ``addr`` setting. A basic configuration file will typically specify only minimal settings for each instance of a daemon. For example: .. code-block:: ini [mon.a] host = hostName mon addr = 150.140.130.120:6789 [osd.0] host = hostName .. _Hardware Recommendations: ../../install/hardware-recommendations .. _ceph-network-config: Networks ======== Monitors listen on port 6789 by default, while metadata servers and OSDs listen on the first available port beginning at 6800. Ensure that you open port 6789 on hosts that run a monitor daemon, and open one port beginning at port 6800 for each OSD or metadata server that runs on the host. Ports are host-specific, so you don't need to open any more ports open than the number of daemons running on that host, other than potentially a few spares. You may consider opening a few additional ports in case a daemon fails and restarts without letting go of the port such that the restarted daemon binds to a new port. If you set up separate public and cluster networks, you may need to make entries for each network. For example:: iptables -A INPUT -m multiport -p tcp -s {ip-address}/{netmask} --dports 6789,6800:6810 -j ACCEPT In our `hardware recommendations`_ section, we recommend having at least two NIC cards, because Ceph can support two networks: a public (front-side) network, and a cluster (back-side) network. Ceph functions just fine with a public network only. You only need to specify the public and cluster network settings if you use both public and cluster networks. There are several reasons to consider operating two separate networks. First, OSDs handle data replication for the clients. When OSDs replicate data more than once, the network load between OSDs easily dwarfs the network load between clients and the Ceph cluster. This can introduce latency and create a performance problem. Second, while most people are generally civil, a very tiny segment of the population likes to engage in what's known as a Denial of Service (DoS) attack. When traffic between OSDs gets disrupted, placement groups may no longer reflect an ``active + clean`` state, which may prevent users from reading and writing data. A great way to defeat this type of attack is to maintain a completely separate cluster network that doesn't connect directly to the internet. To configure the networks, add the following options to the ``[global]`` section of your ``ceph.conf`` file. .. code-block:: ini [global] public network {public-network-ip-address/netmask} cluster network {enter cluster-network-ip-address/netmask} To configure Ceph hosts to use the networks, you should set the following options in the daemon instance sections of your ``ceph.conf`` file. .. code-block:: ini [osd.0] public network {host-public-ip-address} cluster network {host-cluster-ip-address} .. _hardware recommendations: ../../install/hardware-recommendations .. _ceph-monitor-config: Monitors ======== Ceph production clusters typically deploy with a minimum 3 monitors to ensure high availability should a monitor instance crash. An odd number of monitors (3) ensures that the Paxos algorithm can determine which version of the cluster map is the most recent from a quorum of monitors. .. note:: You may deploy Ceph with a single monitor, but if the instance fails, the lack of a monitor may interrupt data service availability. Ceph monitors typically listen on port ``6789``. For example: .. code-block:: ini [mon.a] host = hostName mon addr = 150.140.130.120:6789 By default, Ceph expects that you will store a monitor's data under the following path:: /var/lib/ceph/mon/$cluster-$id You must create the corresponding directory yourself. With metavariables fully expressed and a cluster named "ceph", the foregoing directory would evaluate to:: /var/lib/ceph/mon/ceph-a You may override this path using the ``mon data`` setting. We don't recommend changing the default location. Create the default directory on your new monitor host. :: ssh {new-mon-host} sudo mkdir /var/lib/ceph/mon/ceph-{mon-letter} .. _ceph-osd-config: OSDs ==== Ceph production clusters typically deploy OSDs where one host has one OSD daemon running a filestore on one data disk. A typical deployment specifies a journal size and whether the file store's extended attributes (XATTRs) use an object map (i.e., when running on the ``ext4`` filesystem). For example: .. code-block:: ini [osd] osd journal size = 10000 filestore xattr use omap = true #enables the object map. Only if running ext4. [osd.0] hostname = {hostname} By default, Ceph expects that you will store an OSD's data with the following path:: /var/lib/ceph/osd/$cluster-$id You must create the corresponding directory yourself. With metavariables fully expressed and a cluster named "ceph", the foregoing directory would evaluate to:: /var/lib/ceph/osd/ceph-0 You may override this path using the ``osd data`` setting. We don't recommend changing the default location. Create the default directory on your new OSD host. :: ssh {new-osd-host} sudo mkdir /var/lib/ceph/osd/ceph-{osd-number} The ``osd data`` path ideally leads to a mount point with a hard disk that is separate from the hard disk storing and running the operating system and daemons. If the OSD is for a disk other than the OS disk, prepare it for use with Ceph, and mount it to the directory you just created:: ssh {new-osd-host} sudo mkfs -t {fstype} /dev/{disk} sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} We recommend using the ``xfs`` file system or the ``btrfs`` file system when running :command:mkfs. By default, Ceph expects that you will store an OSDs journal with the following path:: /var/lib/ceph/osd/$cluster-$id/journal Without performance optimization, Ceph stores the journal on the same disk as the OSDs data. An OSD optimized for performance may use a separate disk to store journal data (e.g., a solid state drive delivers high performance journaling). Ceph's default ``osd journal size`` is 0, so you will need to set this in your ``ceph.conf`` file. A journal size should find the product of the ``filestore min sync interval`` and the expected throughput, and multiple the product by two (2):: osd journal size = {2 * (expected throughput * filestore min sync interval)} The expected throughput number should include the expected disk throughput (i.e., sustained data transfer rate), and network throughput. For example, a 7200 RPM disk will likely have approximately 100 MB/s. Taking the ``min()`` of the disk and network throughput should provide a reasonable expected throughput. Some users just start off with a 10GB journal size. For example:: osd journal size = 10000 .. _ceph-logging-and-debugging: Logs / Debugging ================ Ceph is still on the leading edge, so you may encounter situations that require modifying logging output and using Ceph's debugging. To activate Ceph's debugging output (*i.e.*, ``dout()``), you may add ``debug`` settings to your configuration. Ceph's logging levels operate on a scale of 1 to 20, where 1 is terse and 20 is verbose. Subsystems common to each daemon may be set under ``[global]`` in your configuration file. Subsystems for particular daemons are set under the daemon section in your configuration file (*e.g.*, ``[mon]``, ``[osd]``, ``[mds]``). For example:: [global] debug ms = 1 [mon] debug mon = 20 debug paxos = 20 debug auth = 20 [osd] debug osd = 20 debug filestore = 20 debug journal = 20 debug monc = 20 [mds] debug mds = 20 debug mds balancer = 20 debug mds log = 20 debug mds migrator = 20 When your system is running well, choose appropriate logging levels and remove unnecessary debugging settings to ensure your cluster runs optimally. Logging debug output messages is relatively slow, and a waste of resources when operating your cluster. .. tip: When debug output slows down your system, the latency can hide race conditions. Each subsystem has a logging level for its output logs, and for its logs in-memory. You may set different values for each of these subsystems by setting a log file level and a memory level for debug logging. For example:: debug {subsystem} {log-level}/{memory-level} #for example debug mds log 1/20 +--------------------+-----------+--------------+ | Subsystem | Log Level | Memory Level | +====================+===========+==============+ | ``default`` | 0 | 5 | +--------------------+-----------+--------------+ | ``lockdep`` | 0 | 5 | +--------------------+-----------+--------------+ | ``context`` | 0 | 5 | +--------------------+-----------+--------------+ | ``crush`` | 1 | 5 | +--------------------+-----------+--------------+ | ``mds`` | 1 | 5 | +--------------------+-----------+--------------+ | ``mds balancer`` | 1 | 5 | +--------------------+-----------+--------------+ | ``mds locker`` | 1 | 5 | +--------------------+-----------+--------------+ | ``mds log`` | 1 | 5 | +--------------------+-----------+--------------+ | ``mds log expire`` | 1 | 5 | +--------------------+-----------+--------------+ | ``mds migrator`` | 1 | 5 | +--------------------+-----------+--------------+ | ``buffer`` | 0 | 0 | +--------------------+-----------+--------------+ | ``timer`` | 0 | 5 | +--------------------+-----------+--------------+ | ``filer`` | 0 | 5 | +--------------------+-----------+--------------+ | ``objecter`` | 0 | 0 | +--------------------+-----------+--------------+ | ``rados`` | 0 | 5 | +--------------------+-----------+--------------+ | ``rbd`` | 0 | 5 | +--------------------+-----------+--------------+ | ``journaler`` | 0 | 5 | +--------------------+-----------+--------------+ | ``objectcacher`` | 0 | 5 | +--------------------+-----------+--------------+ | ``client`` | 0 | 5 | +--------------------+-----------+--------------+ | ``osd`` | 0 | 5 | +--------------------+-----------+--------------+ | ``optracker`` | 0 | 5 | +--------------------+-----------+--------------+ | ``objclass`` | 0 | 5 | +--------------------+-----------+--------------+ | ``filestore`` | 1 | 5 | +--------------------+-----------+--------------+ | ``journal`` | 1 | 5 | +--------------------+-----------+--------------+ | ``ms`` | 0 | 5 | +--------------------+-----------+--------------+ | ``mon`` | 1 | 5 | +--------------------+-----------+--------------+ | ``monc`` | 0 | 5 | +--------------------+-----------+--------------+ | ``paxos`` | 0 | 5 | +--------------------+-----------+--------------+ | ``tp`` | 0 | 5 | +--------------------+-----------+--------------+ | ``auth`` | 1 | 5 | +--------------------+-----------+--------------+ | ``finisher`` | 1 | 5 | +--------------------+-----------+--------------+ | ``heartbeatmap`` | 1 | 5 | +--------------------+-----------+--------------+ | ``perfcounter`` | 1 | 5 | +--------------------+-----------+--------------+ | ``rgw`` | 1 | 5 | +--------------------+-----------+--------------+ | ``hadoop`` | 1 | 5 | +--------------------+-----------+--------------+ | ``asok`` | 1 | 5 | +--------------------+-----------+--------------+ | ``throttle`` | 1 | 5 | +--------------------+-----------+--------------+ Example ceph.conf ================= .. literalinclude:: demo-ceph.conf :language: ini .. _ceph-runtime-config: Runtime Changes =============== Ceph allows you to make changes to the configuration of an ``ceph-osd``, ``ceph-mon``, or ``ceph-mds`` daemon at runtime. This capability is quite useful for increasing/decreasing logging output, enabling/disabling debug settings, and even for runtime optimization. The following reflects runtime configuration usage:: ceph {daemon-type} tell {id or *} injectargs --{name} {value} [--{name} {value}] Replace ``{daemon-type}`` with one of ``osd``, ``mon`` or ``mds``. You may apply the runtime setting to all daemons of a particular type with ``*``, or specify a specific daemon's ID (i.e., its number or letter). For example, to increase debug logging for a ``ceph-osd`` daemon named ``osd.0``, execute the following:: ceph osd tell 0 injectargs --debug_osd 20 In your ``ceph.conf`` file, you may use spaces when specifying a setting name. When specifying a setting name on the command line, ensure that you use an underscore (``_``) between terms (e.g., ``debug osd`` becomes ``debug_osd``).