doc: update documentation for standby-replay

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This commit is contained in:
Patrick Donnelly 2019-02-21 20:23:13 -08:00
parent 4fa4eda9ee
commit 77a69405f9
No known key found for this signature in database
GPG Key ID: 3A2A7E25BEA8AADB
2 changed files with 26 additions and 142 deletions

View File

@ -58,9 +58,10 @@ forms of the 'fail' command:
Managing failover
-----------------
If an MDS daemon stops communicating with the monitor, the monitor will
wait ``mds_beacon_grace`` seconds (default 15 seconds) before marking
the daemon as *laggy*.
If an MDS daemon stops communicating with the monitor, the monitor will wait
``mds_beacon_grace`` seconds (default 15 seconds) before marking the daemon as
*laggy*. If a standby is available, the monitor will immediately replace the
laggy daemon.
Each file system may specify a number of standby daemons to be considered
healthy. This number includes daemons in standby-replay waiting for a rank to
@ -76,148 +77,25 @@ Each file system may set the number of standby daemons wanted using:
Setting ``count`` to 0 will disable the health check.
Configuring standby daemons
---------------------------
Configuring standby-replay
--------------------------
There are four configuration settings that control how a daemon
will behave while in standby:
Each CephFS file system may be configured to add standby-replay daemons. These
standby daemons follow the active MDS's metadata journal to reduce failover
time in the event the active MDS becomes unavailable. Each active MDS may have
only one standby-replay daemon following it.
Configuring standby-replay on a file system is done using:
::
mds_standby_replay
mds_standby_for_name
mds_standby_for_rank
mds_standby_for_fscid
ceph fs set <fs name> allow_standby_replay <bool>
These may be set in the ceph.conf on the host where the MDS daemon
runs (as opposed to on the monitor). The daemon loads these settings
when it starts, and sends them to the monitor.
By default, if none of these settings are used, all MDS daemons
which do not hold a rank will be used as standbys for any rank.
The settings which associate a standby daemon with a particular
name or rank do not guarantee that the daemon will *only* be used
for that rank. They mean that when several standbys are available,
the associated standby daemon will be used. If a rank is failed,
and a standby is available, it will be used even if it is associated
with a different rank or named daemon.
mds_standby_replay
~~~~~~~~~~~~~~~~~~
If this is set to true, then the standby daemon will continuously read
the metadata journal of an up rank. This will give it
a warm metadata cache, and speed up the process of failing over
if the daemon serving the rank fails.
An up rank may only have one standby replay daemon assigned to it,
if two daemons are both set to be standby replay then one of them
will arbitrarily win, and the other will become a normal non-replay
standby.
Once a daemon has entered the standby replay state, it will only be
used as a standby for the rank that it is following. If another rank
fails, this standby replay daemon will not be used as a replacement,
even if no other standbys are available.
*Historical note:* In Ceph prior to v10.2.1, this setting (when ``false``) is
always true when ``mds_standby_for_*`` is also set.
mds_standby_for_name
~~~~~~~~~~~~~~~~~~~~
Set this to make the standby daemon only take over a failed rank
if the last daemon to hold it matches this name.
mds_standby_for_rank
~~~~~~~~~~~~~~~~~~~~
Set this to make the standby daemon only take over the specified
rank. If another rank fails, this daemon will not be used to
replace it.
Use in conjunction with ``mds_standby_for_fscid`` to be specific
about which filesystem's rank you are targeting, if you have
multiple filesystems.
mds_standby_for_fscid
~~~~~~~~~~~~~~~~~~~~~
If ``mds_standby_for_rank`` is set, this is simply a qualifier to
say which filesystem's rank is referred to.
If ``mds_standby_for_rank`` is not set, then setting FSCID will
cause this daemon to target any rank in the specified FSCID. Use
this if you have a daemon that you want to use for any rank, but
only within a particular filesystem.
mon_force_standby_active
~~~~~~~~~~~~~~~~~~~~~~~~
This setting is used on monitor hosts. It defaults to true.
If it is false, then daemons configured with mds_standby_replay=true
will **only** become active if the rank/name that they have
been configured to follow fails. On the other hand, if this
setting is true, then a daemon configured with mds_standby_replay=true
may be assigned some other rank.
Examples
--------
These are example ceph.conf snippets. In practice you can either
copy a ceph.conf with all daemons' configuration to all your servers,
or you can have a different file on each server that contains just
that server's daemons' configuration.
Simple pair
~~~~~~~~~~~
Two MDS daemons 'a' and 'b' acting as a pair, where whichever one is not
currently assigned a rank will be the standby replay follower
of the other.
::
[mds.a]
mds standby replay = true
mds standby for rank = 0
[mds.b]
mds standby replay = true
mds standby for rank = 0
Floating standby
~~~~~~~~~~~~~~~~
Three MDS daemons 'a', 'b' and 'c', in a filesystem that has
``max_mds`` set to 2.
::
# No explicit configuration required: whichever daemon is
# not assigned a rank will go into 'standby' and take over
# for whichever other daemon fails.
Two MDS clusters
~~~~~~~~~~~~~~~~
With two filesystems, I have four MDS daemons, and I want two
to act as a pair for one filesystem and two to act as a pair
for the other filesystem.
::
[mds.a]
mds standby for fscid = 1
[mds.b]
mds standby for fscid = 1
[mds.c]
mds standby for fscid = 2
[mds.d]
mds standby for fscid = 2
Once set, the monitors will assign available standby daemons to follow the
active MDSs in that file system.
Once an MDS has entered the standby-replay state, it will only be used as a
standby for the rank that it is following. If another rank fails, this
standby-replay daemon will not be used as a replacement, even if no other
standbys are available. For this reason, it is advised that if standby-replay
is used then every active MDS should have a standby-replay daemon.

View File

@ -428,6 +428,12 @@ These changes occurred between the Mimic and Nautilus releases.
``mds_recall_warning_decay_rate`` (default: 60s) sets the threshold
for this warning.
* The MDS mds_standby_for_*, mon_force_standby_active, and mds_standby_replay
configuration options have been obsoleted. Instead, the operator may now set
the new "allow_standby_replay" flag on the CephFS file system. This setting
causes standbys to become standby-replay for any available rank in the file
system.
* The Telegraf module for the Manager allows for sending statistics to
an Telegraf Agent over TCP, UDP or a UNIX Socket. Telegraf can then
send the statistics to databases like InfluxDB, ElasticSearch, Graphite