Merge pull request #38311 from anthonyeleven/anthonyeleven/fix-typo

doc/rados/operations: typo in stretch-mode.rst

Reviewed-by: Zac Dover <zac.dover@gmail.com>
This commit is contained in:
zdover23 2020-12-02 04:03:26 +10:00 committed by GitHub
commit c0f1c56cf5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -23,16 +23,15 @@ two or three data centers (or, in clouds, availability zones). With two
zones, we expect each site to hold a copy of the data, and for a third
site to have a tiebreaker monitor (this can be a VM or high-latency compared
to the main sites) to pick a winner if the network connection fails and both
DCs remain alive. For three sites, we expect a a copy of the data and an equal
DCs remain alive. For three sites, we expect a copy of the data and an equal
number of monitors in each site.
Note, the standard Ceph configuration will survive MANY failures of
the network or Data Centers, if you have configured it correctly, and it will
never compromise data consistency -- if you bring back enough of the Ceph servers
following a failure, it will recover. If you lose
a data center and can still form a quorum of monitors and have all the data
available (with enough copies to satisfy min_size, or CRUSH rules that will
re-replicate to meet it), Ceph will maintain availability.
Note that the standard Ceph configuration will survive MANY failures of the
network or data centers and it will never compromise data consistency. If you
bring back enough Ceph servers following a failure, it will recover. If you
lose a data center, but can still form a quorum of monitors and have all the data
available (with enough copies to satisfy pools' ``min_size``, or CRUSH rules
that will re-replicate to meet it), Ceph will maintain availability.
What can't it handle?
@ -54,32 +53,32 @@ guarantees.
The second important category of failures is when you think you have data replicated
across data centers, but the constraints aren't sufficient to guarantee this.
For instance, you might have data centers A and B, and your CRUSH rule targets 3 copies
and places a copy in each data center with a min_size of 2. The PG may go active with
and places a copy in each data center with a ``min_size`` of 2. The PG may go active with
2 copies in site A and no copies in site B, which means that if you then lose site A you
have lost data and Ceph can't operate on it. This situation is surprisingly difficult
to avoid with standard CRUSH rules.
Stretch Mode
============
The new stretch mode is designed to handle the 2-site case. (3 sites are
just as susceptible to netsplit issues, but much more resilient to surprising
data availability ones than 2-site clusters are.)
The new stretch mode is designed to handle the 2-site case. Three sites are
just as susceptible to netsplit issues, but are much more tolerant of
component availability outages than 2-site clusters are.
To enter stretch mode, you must set the location of each monitor, matching
your CRUSH map. For instance, to place mon.a in your first data center ::
your CRUSH map. For instance, to place ``mon.a`` in your first data center ::
$ ceph mon set_location a datacenter=site1
Next, generate a CRUSH rule which will place 2 copies in each data center. This
will require editing the crush map directly::
will require editing the CRUSH map directly::
$ ceph osd getcrushmap > crush.map.bin
$ crushtool -d crush.map.bin -o crush.map.txt
Then edit the crush.map.txt file to add a new rule. Here
there is only one other rule, so this is id 1, but you may need
to use a different rule id. We also have two data center buckets
named site1 and site2::
Now edit the ``crush.map.txt`` file to add a new rule. Here
there is only one other rule, so this is ID 1, but you may need
to use a different rule ID. We also have two datacenter buckets
named ``site1`` and ``site2``::
rule stretch_rule {
id 1
@ -94,7 +93,7 @@ named site1 and site2::
step emit
}
Finally, inject the crushmap to make the rule available to the cluster::
Finally, inject the CRUSH map to make the rule available to the cluster::
$ crushtool -c crush.map.txt -o crush2.map.bin
$ ceph osd setcrushmap -i crush2.map.bin
@ -104,14 +103,13 @@ the instructions in `Changing Monitor Elections`_.
.. _Changing Monitor elections: ../change-mon-elections
And lastly, tell the cluster to enter stretch mode. Here, ``mon.e`` is the
tiebreaker and we are splitting across data centers ::
And last, tell the cluster to enter stretch mode. Here, mon.e is the
tiebreaker and we are splitting across datacenters ::
$ ceph mon enable_stretch_mode e stretch_rule datacenter
$ ceph mon enable_stretch_mode e stretch_rule data center
When stretch mode is enabled, the OSDs wlll only take PGs active when
they peer across datacenters (or whatever other CRUSH bucket type
they peer across data centers (or whatever other CRUSH bucket type
you specified), assuming both are alive. Pools will increase in size
from the default 3 to 4, expecting 2 copies in each site. OSDs will only
be allowed to connect to monitors in the same data center.
@ -149,10 +147,11 @@ refuse, and it will not allow you to create EC pools once in stretch mode.
You must create your own CRUSH rule which provides 2 copies in each site, and
you must use 4 total copies with 2 in each site. If you have existing pools
with non-default size/min_size, Ceph will object when you attempt to
enable_stretch_mode.
enable stretch mode.
Because it runs with min_size 1 when degraded, you should only use stretch mode
with all-flash OSDs.
Because it runs with ``min_size 1`` when degraded, you should only use stretch
mode with all-flash OSDs. This minimizes the time needed to recover once
connectivity is restored, and thus minimizes the potential for data loss.
Hopefully, future development will extend this feature to support EC pools and
running with more than 2 full sites.
@ -178,4 +177,4 @@ recovered), you can invoke ::
This command should not be necessary; it is included to deal with
unanticipated situations. But you might wish to invoke it to remove
the HEALTH_WARN state which recovery mode generates.
the ``HEALTH_WARN`` state which recovery mode generates.