ceph/doc/config-cluster/ceph-conf.rst
John Wilkins 4609639ba1 doc: Added "how to" for debug/logging config. Trimmed titles too.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-09-03 14:05:51 -07:00

512 lines
19 KiB
ReStructuredText

==================
Configuring Ceph
==================
When you start the Ceph service, the initialization process activates a series
of daemons that run in the background. The hosts in a typical Ceph cluster run
at least one of four daemons:
- Object Storage Device (``ceph-osd``)
- Monitor (``ceph-mon``)
- Metadata Server (``ceph-mds``)
- Ceph Gateway (``radosgw``)
For your convenience, each daemon has a series of default values (*i.e.*, many
are set by ``ceph/src/common/config_opts.h``). You may override these settings
with a Ceph configuration file.
The ceph.conf File
==================
When you start a Ceph cluster, each daemon looks for a ``ceph.conf`` file that
provides its configuration settings. For manual deployments, you need to create
a ``ceph.conf`` file to configure your cluster. For third party tools that
create configuration files for you (*e.g.*, Chef), you may use the information
contained herein as a reference. The ``ceph.conf`` file defines:
- Cluster membership
- Host names
- Host addresses
- Paths to keyrings
- Paths to journals
- Paths to data
- Other runtime options
The default ``ceph.conf`` locations in sequential order include:
#. ``$CEPH_CONF`` (*i.e.,* the path following the ``$CEPH_CONF`` environment variable)
#. ``-c path/path`` (*i.e.,* the ``-c`` command line argument)
#. ``/etc/ceph/ceph.conf``
#. ``~/.ceph/config``
#. ``./ceph.conf`` (*i.e.,* in the current working directory)
The ``ceph.conf`` file uses an *ini* style syntax. You can add comments to the
``ceph.conf`` file by preceding comments with a semi-colon (;) or a pound sign
(#). For example:
.. code-block:: ini
# <--A number (#) sign precedes a comment.
; A comment may be anything.
# Comments always follow a semi-colon (;) or a pound (#) on each line.
# The end of the line terminates a comment.
# We recommend that you provide comments in your configuration file(s).
ceph.conf Settings
==================
The ``ceph.conf`` file can configure all daemons in a cluster, or all daemons of
a particular type. To configure a series of daemons, the settings must be
included under the processes that will receive the configuration as follows:
``[global]``
:Description: Settings under ``[global]`` affect all daemons in a Ceph cluster.
:Example: ``auth supported = cephx``
``[osd]``
:Description: Settings under ``[osd]`` affect all ``ceph-osd`` daemons in the cluster.
:Example: ``osd journal size = 1000``
``[mon]``
:Description: Settings under ``[mon]`` affect all ``ceph-mon`` daemons in the cluster.
:Example: ``mon addr = 10.0.0.101:6789``
``[mds]``
:Description: Settings under ``[mds]`` affect all ``ceph-mds`` daemons in the cluster.
:Example: ``host = myserver01``
Global settings affect all instances of all daemon in the cluster. Use the ``[global]``
setting for values that are common for all daemons in the cluster. You can override each
``[global]`` setting by:
#. Changing the setting in a particular process type (*e.g.,* ``[osd]``, ``[mon]``, ``[mds]`` ).
#. Changing the setting in a particular process (*e.g.,* ``[osd.1]`` )
Overriding a global setting affects all child processes, except those that
you specifically override.
A typical global setting involves activating authentication. For example:
.. code-block:: ini
[global]
# Enable authentication between hosts within the cluster.
auth supported = cephx
You can specify settings that apply to a particular type of daemon. When you
specify settings under ``[osd]``, ``[mon]`` or ``[mds]`` without specifying a
particular instance, the setting will apply to all OSDs, monitors or metadata
daemons respectively.
You may specify settings for particular instances of a daemon. You may specify
an instance by entering its type, delimited by a period (.) and by the
instance ID. The instance ID for an OSD is always numeric, but it may be
alphanumeric for monitors and metadata servers.
.. code-block:: ini
[osd.1]
# settings affect osd.1 only.
[mon.a]
# settings affect mon.a only.
[mds.b]
# settings affect mds.b only.
Metavariables
=============
Metavariables simplify cluster configuration dramatically. When a metavariable
is set in a configuration value, Ceph expands the metavariable into a concrete
value. Metavariables are very powerful when used within the ``[global]``,
``[osd]``, ``[mon]`` or ``[mds]`` sections of your configuration file. Ceph
metavariables are similar to Bash shell expansion.
Ceph supports the following metavariables:
``$cluster``
:Description: Expands to the cluster name. Useful when running multiple clusters on the same hardware.
:Example: ``/etc/ceph/$cluster.keyring``
:Default: ``ceph``
``$type``
:Description: Expands to one of ``mds``, ``osd``, or ``mon``, depending on the type of the current daemon.
:Example: ``/var/lib/ceph/$type``
``$id``
:Description: Expands to the daemon identifier. For ``osd.0``, this would be ``0``; for ``mds.a``, it would be ``a``.
:Example: ``/var/lib/ceph/$type/$cluster-$id``
``$host``
:Description: Expands to the host name of the current daemon.
``$name``
:Description: Expands to ``$type.$id``.
:Example: ``/var/run/ceph/$cluster-$name.asok``
Common Settings
===============
The `Hardware Recommendations`_ section provides some hardware guidelines for
configuring the cluster. It is possible for a single host to run multiple
daemons. For example, a single host with multiple disks or RAIDs may run one
``ceph-osd`` for each disk or RAID. Additionally, a host may run both a
``ceph-mon`` and an ``ceph-osd`` daemon on the same host. Ideally, you will have
a host for a particular type of process. For example, one host may run
``ceph-osd`` daemons, another host may run a ``ceph-mds`` daemon, and other
hosts may run ``ceph-mon`` daemons.
Each host has a name identified by the ``host`` setting. Monitors also specify
a network address and port (i.e., domain name or IP address) identified by the
``addr`` setting. A basic configuration file will typically specify only
minimal settings for each instance of a daemon. For example:
.. code-block:: ini
[mon.a]
host = hostName
mon addr = 150.140.130.120:6789
[osd.0]
host = hostName
.. _Hardware Recommendations: ../../install/hardware-recommendations
Networks
========
Monitors listen on port 6789 by default, while metadata servers and OSDs listen
on the first available port beginning at 6800. Ensure that you open port 6789 on
hosts that run a monitor daemon, and open one port beginning at port 6800 for
each OSD or metadata server that runs on the host. Ports are host-specific, so
you don't need to open any more ports open than the number of daemons running on
that host, other than potentially a few spares. You may consider opening a few
additional ports in case a daemon fails and restarts without letting go of the
port such that the restarted daemon binds to a new port. If you set up separate
public and cluster networks, you may need to make entries for each network.
For example::
iptables -A INPUT -m multiport -p tcp -s {ip-address}/{netmask} --dports 6789,6800:6810 -j ACCEPT
In our `hardware recommendations`_ section, we recommend having at least two NIC
cards, because Ceph can support two networks: a public (front-side) network, and
a cluster (back-side) network. Ceph functions just fine with a public network
only. You only need to specify the public and cluster network settings if you
use both public and cluster networks.
There are several reasons to consider operating two separate networks. First,
OSDs handle data replication for the clients. When OSDs replicate data more than
once, the network load between OSDs easily dwarfs the network load between
clients and the Ceph cluster. This can introduce latency and create a
performance problem. Second, while most people are generally civil, a very tiny
segment of the population likes to engage in what's known as a Denial of Service
(DoS) attack. When traffic between OSDs gets disrupted, placement groups may no
longer reflect an ``active + clean`` state, which may prevent users from reading
and writing data. A great way to defeat this type of attack is to maintain a
completely separate cluster network that doesn't connect directly to the
internet.
To configure the networks, add the following options to the ``[global]`` section
of your ``ceph.conf`` file.
.. code-block:: ini
[global]
public network {public-network-ip-address/netmask}
cluster network {enter cluster-network-ip-address/netmask}
To configure Ceph hosts to use the networks, you should set the following options
in the daemon instance sections of your ``ceph.conf`` file.
.. code-block:: ini
[osd.0]
public network {host-public-ip-address}
cluster network {host-cluster-ip-address}
.. _hardware recommendations: ../../install/hardware-recommendations
Monitors
========
Ceph production clusters typically deploy with a minimum 3 monitors to ensure
high availability should a monitor instance crash. An odd number of monitors (3)
ensures that the Paxos algorithm can determine which version of the cluster map
is the most recent from a quorum of monitors.
.. note:: You may deploy Ceph with a single monitor, but if the instance fails,
the lack of a monitor may interrupt data service availability.
Ceph monitors typically listen on port ``6789``. For example:
.. code-block:: ini
[mon.a]
host = hostName
mon addr = 150.140.130.120:6789
By default, Ceph expects that you will store a monitor's data under the following path::
/var/lib/ceph/mon/$cluster-$id
You must create the corresponding directory yourself. With metavariables fully
expressed and a cluster named "ceph", the foregoing directory would evaluate to::
/var/lib/ceph/mon/ceph-a
You may override this path using the ``mon data`` setting. We don't recommend
changing the default location. Create the default directory on your new monitor host. ::
ssh {new-mon-host}
sudo mkdir /var/lib/ceph/mon/ceph-{mon-letter}
OSDs
====
Ceph production clusters typically deploy OSDs where one host has one OSD daemon
running a filestore on one data disk. A typical deployment specifies a journal
size and whether the file store's extended attributes (XATTRs) use an
object map (i.e., when running on the ``ext4`` filesystem). For example:
.. code-block:: ini
[osd]
osd journal size = 10000
filestore xattr use omap = true #enables the object map. Only if running ext4.
[osd.0]
hostname = {hostname}
By default, Ceph expects that you will store an OSD's data with the following path::
/var/lib/ceph/osd/$cluster-$id
You must create the corresponding directory yourself. With metavariables fully
expressed and a cluster named "ceph", the foregoing directory would evaluate to::
/var/lib/ceph/osd/ceph-0
You may override this path using the ``osd data`` setting. We don't recommend
changing the default location. Create the default directory on your new OSD host. ::
ssh {new-osd-host}
sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
The ``osd data`` path ideally leads to a mount point with a hard disk that is
separate from the hard disk storing and running the operating system and
daemons. If the OSD is for a disk other than the OS disk, prepare it for
use with Ceph, and mount it to the directory you just created::
ssh {new-osd-host}
sudo mkfs -t {fstype} /dev/{disk}
sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
We recommend using the ``xfs`` file system or the ``btrfs`` file system when
running :command:mkfs.
By default, Ceph expects that you will store an OSDs journal with the following path::
/var/lib/ceph/osd/$cluster-$id/journal
Without performance optimization, Ceph stores the journal on the same disk as
the OSDs data. An OSD optimized for performance may use a separate disk to store
journal data (e.g., a solid state drive delivers high performance journaling).
Ceph's default ``osd journal size`` is 0, so you will need to set this in your
``ceph.conf`` file. A journal size should find the product of the ``filestore
min sync interval`` and the expected throughput, and multiple the product by
two (2)::
osd journal size = {2 * (expected throughput * filestore min sync interval)}
The expected throughput number should include the expected disk throughput
(i.e., sustained data transfer rate), and network throughput. For example,
a 7200 RPM disk will likely have approximately 100 MB/s. Taking the ``min()``
of the disk and network throughput should provide a reasonable expected
throughput. Some users just start off with a 10GB journal size. For
example::
osd journal size = 10000
Logs / Debugging
================
Ceph is still on the leading edge, so you may encounter situations that require
modifying logging output and using Ceph's debugging. To activate Ceph's
debugging output (*i.e.*, ``dout()``), you may add ``debug`` settings to your
configuration. Ceph's logging levels operate on a scale of 1 to 20, where 1 is
terse and 20 is verbose. Subsystems common to each daemon may be set under
``[global]`` in your configuration file. Subsystems for particular daemons are
set under the daemon section in your configuration file (*e.g.*, ``[mon]``,
``[osd]``, ``[mds]``). For example::
[global]
debug ms = 1
[mon]
debug mon = 20
debug paxos = 20
debug auth = 20
[osd]
debug osd = 20
debug filestore = 20
debug journal = 20
debug monc = 20
[mds]
debug mds = 20
debug mds balancer = 20
debug mds log = 20
debug mds migrator = 20
When your system is running well, choose appropriate logging levels and remove
unnecessary debugging settings to ensure your cluster runs optimally. Logging
debug output messages is relatively slow, and a waste of resources when operating
your cluster.
.. tip: When debug output slows down your system, the latency can hide race conditions.
Each subsystem has a logging level for its output logs, and for its logs
in-memory. You may set different values for each of these subsystems by setting
a log file level and a memory level for debug logging. For example::
debug {subsystem} {log-level}/{memory-level}
#for example
debug mds log 1/20
+--------------------+-----------+--------------+
| Subsystem | Log Level | Memory Level |
+====================+===========+==============+
| ``default`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``lockdep`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``context`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``crush`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``mds`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``mds balancer`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``mds locker`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``mds log`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``mds log expire`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``mds migrator`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``buffer`` | 0 | 0 |
+--------------------+-----------+--------------+
| ``timer`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``filer`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``objecter`` | 0 | 0 |
+--------------------+-----------+--------------+
| ``rados`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``rbd`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``journaler`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``objectcacher`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``client`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``osd`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``optracker`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``objclass`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``filestore`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``journal`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``ms`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``mon`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``monc`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``paxos`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``tp`` | 0 | 5 |
+--------------------+-----------+--------------+
| ``auth`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``finisher`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``heartbeatmap`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``perfcounter`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``rgw`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``hadoop`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``asok`` | 1 | 5 |
+--------------------+-----------+--------------+
| ``throttle`` | 1 | 5 |
+--------------------+-----------+--------------+
Example ceph.conf
=================
.. literalinclude:: demo-ceph.conf
:language: ini
Runtime Changes
===============
Ceph allows you to make changes to the configuration of an ``ceph-osd``,
``ceph-mon``, or ``ceph-mds`` daemon at runtime. This capability is quite
useful for increasing/decreasing logging output, enabling/disabling debug
settings, and even for runtime optimization. The following reflects runtime
configuration usage::
ceph {daemon-type} tell {id or *} injectargs --{name} {value} [--{name} {value}]
Replace ``{daemon-type}`` with one of ``osd``, ``mon`` or ``mds``. You may apply
the runtime setting to all daemons of a particular type with ``*``, or specify
a specific daemon's ID (i.e., its number or letter). For example, to increase
debug logging for a ``ceph-osd`` daemon named ``osd.0``, execute the following::
ceph osd tell 0 injectargs --debug_osd 20
In your ``ceph.conf`` file, you may use spaces when specifying a setting name.
When specifying a setting name on the command line, ensure that you use an
underscore (``_``) between terms (e.g., ``debug osd`` becomes ``debug_osd``).