Merge pull request #46270 from anthonyeleven/anthonyeleven/clarify-min-alloc-size

This commit is contained in:
zdover23 2022-05-16 17:12:59 +10:00 committed by GitHub
commit 058a0ee89a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -395,16 +395,13 @@ is created on an HDD, BlueStore will be initialized with the current value
of :confval:`bluestore_min_alloc_size_hdd`, and SSD OSDs (including NVMe devices)
with the value of :confval:`bluestore_min_alloc_size_ssd`.
Note that this BlueStore attribute takes effect *only* at OSD creation; if
changed later, a given OSD's behavior will not change unless / until it is
destroyed and redeployed.
Through the Mimic release, the default values were 64KB and 16KB for rotational
(HDD) and non-rotational (SSD) media respectively. Octopus and later releases
default to a value of 4KB for all media types.
(HDD) and non-rotational (SSD) media respectively. Octopus changed the default
for SSD (non-rotational) media to 4KB, and Pacific changed the default for HDD
(rotational) media to 4KB as well.
This change was driven by the space amplification experienced by Ceph RADOS
GateWay (RGW) deployments that host large numbers of relatively small files
These changes were driven by space amplification experienced by Ceph RADOS
GateWay (RGW) deployments that host large numbers of small files
(S3/Swift objects).
For example, when an RGW client stores a 1KB S3 object, it is written to a
@ -446,12 +443,36 @@ the :confval:`bluestore_use_optimal_io_size_for_min_alloc_size`
option that enables automatic discovery of the appropriate value as each OSD is
created. Note that the use of ``bcache``, ``OpenCAS``, ``dmcrypt``,
``ATA over Ethernet``, `iSCSI`, or other device layering / abstraction
technologies may confound the determination of appropriate values. We suggest
inspecting such OSDs at startup via logs and admin sockets to ensure that
technologies may confound the determination of appropriate values. OSD devices
deployed on top of VMware VSAN virtual volumes have been reported to also
sometimes report a ``rotational`` attribute that does not match the underlying
hardware.
We suggest inspecting such OSDs at startup via logs and admin sockets to ensure that
behavior is appropriate. Note that this also may not work as desired with
older kernels. You can check for this by examining the presence and value
of ``/sys/block/<drive>/queue/optimal_io_size``.
You may also inspect a given OSD:
.. prompt:: bash #
ceph osd metadata osd.1701 | grep rotational
This space amplification may manifest as an unusually high ratio of raw to
stored data reported by ``ceph df``. ``ceph osd df`` may also report
anomalously high ``%USE`` / ``VAR`` values when
compared to other, ostensibly identical OSDs. A pool using OSDs with
mismatched ``min_alloc_size`` values may experience unexpected balancer
behavior as well.
Note that this BlueStore attribute takes effect *only* at OSD creation; if
changed later, a given OSD's behavior will not change unless / until it is
destroyed and redeployed with the appropriate option value(s). Upgrading
to a later Ceph release will *not* change the value used by OSDs deployed
under older releases or with other settings.
.. confval:: bluestore_min_alloc_size
.. confval:: bluestore_min_alloc_size_hdd
.. confval:: bluestore_min_alloc_size_ssd