2013-01-18 07:31:47 +00:00
|
|
|
=====================================
|
|
|
|
Configuring Monitor/OSD Interaction
|
|
|
|
=====================================
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
.. index:: heartbeat
|
|
|
|
|
2013-01-18 07:31:47 +00:00
|
|
|
After you have completed your initial Ceph configuration, you may deploy and run
|
|
|
|
Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the
|
2013-06-14 23:53:56 +00:00
|
|
|
:term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
|
|
|
|
Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
|
|
|
|
reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
|
|
|
|
OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
|
|
|
|
Monitor doesn't receive reports, or if it receives reports of changes in the
|
|
|
|
Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
|
|
|
|
Cluster Map`.
|
2013-01-18 07:31:47 +00:00
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
|
|
|
|
interaction. However, you may override the defaults. The following sections
|
|
|
|
describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
|
|
|
|
monitoring the Ceph Storage Cluster.
|
2013-01-18 07:31:47 +00:00
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
.. index:: heartbeat interval
|
2013-01-18 07:31:47 +00:00
|
|
|
|
|
|
|
OSDs Check Heartbeats
|
|
|
|
=====================
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
|
|
|
|
seconds. You can change the heartbeat interval by adding an ``osd heartbeat
|
|
|
|
interval`` setting under the ``[osd]`` section of your Ceph configuration file,
|
|
|
|
or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
|
|
|
|
show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
|
|
|
|
consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
|
|
|
|
Monitor, which will update the Ceph Cluster Map. You may change this grace
|
|
|
|
period by adding an ``osd heartbeat grace`` setting under the ``[osd]`` section
|
|
|
|
of your Ceph configuration file, or by setting the value at runtime.
|
2013-01-18 07:31:47 +00:00
|
|
|
|
|
|
|
|
|
|
|
.. ditaa:: +---------+ +---------+
|
|
|
|
| OSD 1 | | OSD 2 |
|
|
|
|
+---------+ +---------+
|
|
|
|
| |
|
|
|
|
|----+ Heartbeat |
|
|
|
|
| | Interval |
|
|
|
|
|<---+ Exceeded |
|
|
|
|
| |
|
|
|
|
| Check |
|
|
|
|
| Heartbeat |
|
|
|
|
|------------------->|
|
|
|
|
| |
|
|
|
|
|<-------------------|
|
|
|
|
| Heart Beating |
|
|
|
|
| |
|
|
|
|
|----+ Heartbeat |
|
|
|
|
| | Interval |
|
|
|
|
|<---+ Exceeded |
|
|
|
|
| |
|
|
|
|
| Check |
|
|
|
|
| Heartbeat |
|
|
|
|
|------------------->|
|
|
|
|
| |
|
|
|
|
|----+ Grace |
|
|
|
|
| | Period |
|
|
|
|
|<---+ Exceeded |
|
|
|
|
| |
|
|
|
|
|----+ Mark |
|
|
|
|
| | OSD 2 |
|
|
|
|
|<---+ Down |
|
|
|
|
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
.. index:: OSD down report
|
2013-01-18 07:31:47 +00:00
|
|
|
|
|
|
|
OSDs Report Down OSDs
|
|
|
|
=====================
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
By default, a Ceph OSD Daemon must report to the Ceph Monitors that another Ceph
|
|
|
|
OSD Daemon is ``down`` three times before the Ceph Monitors acknowledge that the
|
|
|
|
reported Ceph OSD Daemon is ``down``. You can change the minimum number of
|
|
|
|
``osd down`` reports by adding an ``mon osd min down reports`` setting (``osd
|
|
|
|
min down reports`` prior to v0.62) under the ``[mon]`` section of your Ceph
|
|
|
|
configuration file, or by setting the value at runtime. By default, only one
|
|
|
|
Ceph OSD Daemon is required to report another Ceph OSD Daemon ``down``. You can
|
|
|
|
change the number of Ceph OSD Daemones required to report a Ceph OSD Daemon
|
|
|
|
``down`` to a Ceph Monitor by adding an ``mon osd min down reporters`` setting
|
2013-11-20 22:28:18 +00:00
|
|
|
(``osd min down reporters`` prior to v0.62) under the ``[mon]`` section of your
|
2013-06-14 23:53:56 +00:00
|
|
|
Ceph configuration file, or by setting the value at runtime.
|
2013-01-18 07:31:47 +00:00
|
|
|
|
|
|
|
|
|
|
|
.. ditaa:: +---------+ +---------+
|
|
|
|
| OSD 1 | | Monitor |
|
|
|
|
+---------+ +---------+
|
|
|
|
| |
|
|
|
|
| OSD 2 Is Down |
|
|
|
|
|-------------->|
|
|
|
|
| |
|
|
|
|
| OSD 2 Is Down |
|
|
|
|
|-------------->|
|
|
|
|
| |
|
|
|
|
| OSD 2 Is Down |
|
|
|
|
|-------------->|
|
|
|
|
| |
|
|
|
|
| |----------+ Mark
|
|
|
|
| | | OSD 2
|
|
|
|
| |<---------+ Down
|
|
|
|
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
.. index:: peering failure
|
2013-03-30 00:38:02 +00:00
|
|
|
|
2013-01-18 07:31:47 +00:00
|
|
|
OSDs Report Peering Failure
|
|
|
|
===========================
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
|
|
|
|
Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
|
|
|
|
the most recent copy of the cluster map every 30 seconds. You can change the
|
|
|
|
Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
|
|
|
|
setting under the ``[osd]`` section of your Ceph configuration file, or by
|
|
|
|
setting the value at runtime.
|
2013-01-18 07:31:47 +00:00
|
|
|
|
|
|
|
.. ditaa:: +---------+ +---------+ +-------+ +---------+
|
|
|
|
| OSD 1 | | OSD 2 | | OSD 3 | | Monitor |
|
|
|
|
+---------+ +---------+ +-------+ +---------+
|
|
|
|
| | | |
|
|
|
|
| Request To | | |
|
|
|
|
| Peer | | |
|
|
|
|
|-------------->| | |
|
|
|
|
|<--------------| | |
|
|
|
|
| Peering | |
|
|
|
|
| | |
|
|
|
|
| Request To | |
|
|
|
|
| Peer | |
|
|
|
|
|----------------------------->| |
|
|
|
|
| |
|
|
|
|
|----+ OSD Monitor |
|
|
|
|
| | Heartbeat |
|
|
|
|
|<---+ Interval Exceeded |
|
|
|
|
| |
|
|
|
|
| Failed to Peer with OSD 3 |
|
|
|
|
|-------------------------------------------->|
|
|
|
|
|<--------------------------------------------|
|
|
|
|
| Receive New Cluster Map |
|
|
|
|
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
.. index:: OSD status
|
2013-03-30 00:38:02 +00:00
|
|
|
|
2013-01-18 07:31:47 +00:00
|
|
|
OSDs Report Their Status
|
|
|
|
========================
|
|
|
|
|
2013-11-20 22:28:18 +00:00
|
|
|
If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
|
|
|
|
consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout``
|
|
|
|
elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
|
|
|
|
event such as a failure, a change in placement group stats, a change in
|
|
|
|
``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
|
|
|
|
Daemon minimum report interval by adding an ``osd mon report interval min``
|
|
|
|
setting under the ``[osd]`` section of your Ceph configuration file, or by
|
|
|
|
setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
|
|
|
|
Monitor every 120 seconds irrespective of whether any notable changes occur.
|
|
|
|
You can change the Ceph Monitor report interval by adding an ``osd mon report
|
|
|
|
interval max`` setting under the ``[osd]`` section of your Ceph configuration
|
|
|
|
file, or by setting the value at runtime.
|
2013-01-18 07:31:47 +00:00
|
|
|
|
|
|
|
|
|
|
|
.. ditaa:: +---------+ +---------+
|
|
|
|
| OSD 1 | | Monitor |
|
|
|
|
+---------+ +---------+
|
|
|
|
| |
|
|
|
|
|----+ Report Min |
|
|
|
|
| | Interval |
|
|
|
|
|<---+ Exceeded |
|
|
|
|
| |
|
2013-11-20 22:28:18 +00:00
|
|
|
|----+ Reportable |
|
|
|
|
| | Event |
|
|
|
|
|<---+ Occurs |
|
|
|
|
| |
|
2013-01-18 07:31:47 +00:00
|
|
|
| Report To |
|
|
|
|
| Monitor |
|
|
|
|
|------------------->|
|
|
|
|
| |
|
2013-11-20 22:28:18 +00:00
|
|
|
|----+ Report Max |
|
2013-01-18 07:31:47 +00:00
|
|
|
| | Interval |
|
|
|
|
|<---+ Exceeded |
|
|
|
|
| |
|
2013-11-20 22:28:18 +00:00
|
|
|
| Report To |
|
|
|
|
| Monitor |
|
|
|
|
|------------------->|
|
|
|
|
| |
|
|
|
|
|----+ Monitor |
|
|
|
|
| | Fails |
|
|
|
|
|<---+ |
|
|
|
|
+----+ Monitor OSD
|
|
|
|
| | Report Timeout
|
2013-01-18 07:31:47 +00:00
|
|
|
|<---+ Exceeded
|
|
|
|
|
|
|
|
|
+----+ Mark
|
|
|
|
| | OSD 1
|
|
|
|
|<---+ Down
|
|
|
|
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Configuration Settings
|
|
|
|
======================
|
|
|
|
|
|
|
|
When modifying heartbeat settings, you should include them in the ``[global]``
|
|
|
|
section of your configuration file.
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
.. index:: monitor heartbeat
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
Monitor Settings
|
|
|
|
----------------
|
|
|
|
|
|
|
|
``mon osd min up ratio``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
|
|
|
|
mark Ceph OSD Daemons ``down``.
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
:Type: Double
|
|
|
|
:Default: ``.3``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd min in ratio``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
|
|
|
|
mark Ceph OSD Daemons ``out``.
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
:Type: Double
|
|
|
|
:Default: ``.3``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd laggy halflife``
|
|
|
|
|
|
|
|
:Description: The number of seconds laggy estimates will decay.
|
|
|
|
:Type: Integer
|
|
|
|
:Default: ``60*60``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd laggy weight``
|
|
|
|
|
|
|
|
:Description: The weight for new samples in laggy estimation decay.
|
|
|
|
:Type: Double
|
|
|
|
:Default: ``0.3``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd adjust heartbeat grace``
|
|
|
|
|
|
|
|
:Description: If set to ``true``, Ceph will scale based on laggy estimations.
|
|
|
|
:Type: Boolean
|
|
|
|
:Default: ``true``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd adjust down out interval``
|
|
|
|
|
|
|
|
:Description: If set to ``true``, Ceph will scaled based on laggy estimations.
|
|
|
|
:Type: Boolean
|
|
|
|
:Default: ``true``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd auto mark in``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
|
|
|
|
the Ceph Storage Cluster.
|
|
|
|
|
2013-03-30 00:38:02 +00:00
|
|
|
:Type: Boolean
|
|
|
|
:Default: ``false``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd auto mark auto out in``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
|
|
|
|
of the Ceph Storage Cluster as ``in`` the cluster.
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
:Type: Boolean
|
|
|
|
:Default: ``true``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd auto mark new in``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
|
|
|
|
Ceph Storage Cluster.
|
|
|
|
|
2013-03-30 00:38:02 +00:00
|
|
|
:Type: Boolean
|
|
|
|
:Default: ``true``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd down out interval``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
|
|
|
|
``down`` and ``out`` if it doesn't respond.
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``300``
|
|
|
|
|
|
|
|
|
2016-01-09 04:38:25 +00:00
|
|
|
``mon osd down out subtree limit``
|
2013-03-30 00:38:02 +00:00
|
|
|
|
2016-01-09 04:38:25 +00:00
|
|
|
:Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
|
|
|
|
automatically mark out. For instance, if set to ``host`` and if
|
|
|
|
all OSDs of a host are down, Ceph will not automatically mark out
|
|
|
|
these OSDs.
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
:Type: String
|
|
|
|
:Default: ``rack``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd report timeout``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: The grace period in seconds before declaring
|
|
|
|
unresponsive Ceph OSD Daemons ``down``.
|
|
|
|
|
2013-03-30 00:38:02 +00:00
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``900``
|
|
|
|
|
2013-05-13 19:53:11 +00:00
|
|
|
``mon osd min down reporters``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: The minimum number of Ceph OSD Daemons required to report a
|
|
|
|
``down`` Ceph OSD Daemon.
|
|
|
|
|
2013-05-13 19:53:11 +00:00
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``1``
|
|
|
|
|
|
|
|
|
|
|
|
``mon osd min down reports``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: The minimum number of times a Ceph OSD Daemon must report
|
|
|
|
that another Ceph OSD Daemon is ``down``.
|
2013-05-13 19:53:11 +00:00
|
|
|
|
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``3``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
.. index:: OSD hearbeat
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
OSD Settings
|
|
|
|
------------
|
|
|
|
|
|
|
|
``osd heartbeat address``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: An Ceph OSD Daemon's network address for heartbeats.
|
2013-03-30 00:38:02 +00:00
|
|
|
:Type: Address
|
|
|
|
:Default: The host address.
|
|
|
|
|
|
|
|
|
|
|
|
``osd heartbeat interval``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: How often an Ceph OSD Daemon pings its peers (in seconds).
|
2013-03-30 00:38:02 +00:00
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``6``
|
|
|
|
|
|
|
|
|
|
|
|
``osd heartbeat grace``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
|
|
|
|
that the Ceph Storage Cluster considers it ``down``.
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``20``
|
|
|
|
|
|
|
|
|
|
|
|
``osd mon heartbeat interval``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
|
|
|
|
Ceph OSD Daemon peers.
|
|
|
|
|
2013-03-30 00:38:02 +00:00
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``30``
|
|
|
|
|
|
|
|
|
|
|
|
``osd mon report interval max``
|
|
|
|
|
2013-11-20 22:28:18 +00:00
|
|
|
:Description: The maximum time in seconds that a Ceph OSD Daemon can wait before
|
|
|
|
it must report to a Ceph Monitor.
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``120``
|
|
|
|
|
|
|
|
|
|
|
|
``osd mon report interval min``
|
|
|
|
|
2013-11-20 22:28:18 +00:00
|
|
|
:Description: The minimum number of seconds a Ceph OSD Daemon may wait
|
|
|
|
from startup or another reportable event before reporting
|
|
|
|
to a Ceph Monitor.
|
2013-03-30 00:38:02 +00:00
|
|
|
|
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``5``
|
|
|
|
:Valid Range: Should be less than ``osd mon report interval max``
|
|
|
|
|
|
|
|
|
|
|
|
``osd mon ack timeout``
|
|
|
|
|
2013-06-14 23:53:56 +00:00
|
|
|
:Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
|
2013-03-30 00:38:02 +00:00
|
|
|
request for statistics.
|
|
|
|
|
|
|
|
:Type: 32-bit Integer
|
|
|
|
:Default: ``30``
|
|
|
|
|