mirror of
https://github.com/ceph/ceph
synced 2025-01-27 21:44:58 +00:00
c1bd02c978
Wip writeback throttling for cache tiering This patch is to do write back throttling for cache tiering, which is similar to what the Linux kernel does for page cache write back. A paramter 'cache_target_dirty_high_ratio' (default 0.6) is introduced as the high speed flushing threshold, while leave the 'cache_target_dirty_ratio' (default 0.4) to represent the low speed threshold. The flush speed is controlled by limiting the parallelism of flushing. The maximum parallelism under low speed is half of the parallelism under high speed. If there is at least one PG such that the dirty ratio beyond the high threshold, full speed mode is entered; If there is no PG such that dirty ratio beyond the low threshold, idle mode is entered; In other cases, slow speed mode is entered. Signed-off-by: Mingxin Liu <mingxinliu@ubuntukylin.com> Reviewed-by: Li Wang <liwang@ubuntukylin.com> Suggested-by: Nick Fisk <nick@fisk.me.uk> Tested-by: Kefu Chai <kchai@redhat.com>
185 lines
6.1 KiB
ReStructuredText
185 lines
6.1 KiB
ReStructuredText
Cache pool
|
|
==========
|
|
|
|
Purpose
|
|
-------
|
|
|
|
Use a pool of fast storage devices (probably SSDs) and use it as a
|
|
cache for an existing slower and larger pool.
|
|
|
|
Use a replicated pool as a front-end to service most I/O, and destage
|
|
cold data to a separate erasure coded pool that does not currently (and
|
|
cannot efficiently) handle the workload.
|
|
|
|
We should be able to create and add a cache pool to an existing pool
|
|
of data, and later remove it, without disrupting service or migrating
|
|
data around.
|
|
|
|
Use cases
|
|
---------
|
|
|
|
Read-write pool, writeback
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
We have an existing data pool and put a fast cache pool "in front" of
|
|
it. Writes will go to the cache pool and immediately ack. We flush
|
|
them back to the data pool based on the defined policy.
|
|
|
|
Read-only pool, weak consistency
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
We have an existing data pool and add one or more read-only cache
|
|
pools. We copy data to the cache pool(s) on read. Writes are
|
|
forwarded to the original data pool. Stale data is expired from the
|
|
cache pools based on the defined policy.
|
|
|
|
This is likely only useful for specific applications with specific
|
|
data access patterns. It may be a match for rgw, for example.
|
|
|
|
|
|
Interface
|
|
---------
|
|
|
|
Set up a read/write cache pool foo-hot for pool foo::
|
|
|
|
ceph osd tier add foo foo-hot
|
|
ceph osd tier cache-mode foo-hot writeback
|
|
|
|
Direct all traffic for foo to foo-hot::
|
|
|
|
ceph osd tier set-overlay foo foo-hot
|
|
|
|
Set the target size and enable the tiering agent for foo-hot::
|
|
|
|
ceph osd pool set foo-hot hit_set_type bloom
|
|
ceph osd pool set foo-hot hit_set_count 1
|
|
ceph osd pool set foo-hot hit_set_period 3600 # 1 hour
|
|
ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB
|
|
|
|
Drain the cache in preparation for turning it off::
|
|
|
|
ceph osd tier cache-mode foo-hot forward
|
|
rados -p foo-hot cache-flush-evict-all
|
|
|
|
When cache pool is finally empty, disable it::
|
|
|
|
ceph osd tier remove-overlay foo
|
|
ceph osd tier remove foo foo-hot
|
|
|
|
Read-only pools with lazy consistency::
|
|
|
|
ceph osd tier add foo foo-east
|
|
ceph osd tier cache-mode foo-east readonly
|
|
ceph osd tier add foo foo-west
|
|
ceph osd tier cache-mode foo-west readonly
|
|
|
|
|
|
|
|
Tiering agent
|
|
-------------
|
|
|
|
The tiering policy is defined as properties on the cache pool itself.
|
|
|
|
HitSet metadata
|
|
~~~~~~~~~~~~~~~
|
|
|
|
First, the agent requires HitSet information to be tracked on the
|
|
cache pool in order to determine which objects in the pool are being
|
|
accessed. This is enabled with::
|
|
|
|
ceph osd pool set foo-hot hit_set_type bloom
|
|
ceph osd pool set foo-hot hit_set_count 1
|
|
ceph osd pool set foo-hot hit_set_period 3600 # 1 hour
|
|
|
|
The supported HitSet types include 'bloom' (a bloom filter, the
|
|
default), 'explicit_hash', and 'explicit_object'. The latter two
|
|
explicitly enumerate accessed objects and are less memory efficient.
|
|
They are there primarily for debugging and to demonstrate pluggability
|
|
for the infrastructure. For the bloom filter type, you can additionally
|
|
define the false positive probability for the bloom filter (default is 0.05)::
|
|
|
|
ceph osd pool set foo-hot hit_set_fpp 0.15
|
|
|
|
The hit_set_count and hit_set_period define how much time each HitSet
|
|
should cover, and how many such HitSets to store. Binning accesses
|
|
over time allows Ceph to independently determine whether an object was
|
|
accessed at least once and whether it was accessed more than once over
|
|
some time period ("age" vs "temperature"). Note that the longer the
|
|
period and the higher the count the more RAM will be consumed by the
|
|
ceph-osd process. In particular, when the agent is active to flush or
|
|
evict cache objects, all hit_set_count HitSets are loaded into RAM.
|
|
|
|
Currently there is minimal benefit for hit_set_count > 1 since the
|
|
agent does not yet act intelligently on that information.
|
|
|
|
Cache mode
|
|
~~~~~~~~~~
|
|
|
|
The most important policy is the cache mode:
|
|
|
|
ceph osd pool set foo-hot cache-mode writeback
|
|
|
|
The supported modes are 'none', 'writeback', 'forward', and
|
|
'readonly'. Most installations want 'writeback', which will write
|
|
into the cache tier and only later flush updates back to the base
|
|
tier. Similarly, any object that is read will be promoted into the
|
|
cache tier.
|
|
|
|
The 'forward' mode is intended for when the cache is being disabled
|
|
and needs to be drained. No new objects will be promoted or written
|
|
to the cache pool unless they are already present. A background
|
|
operation can then do something like::
|
|
|
|
rados -p foo-hot cache-try-flush-evict-all
|
|
rados -p foo-hot cache-flush-evict-all
|
|
|
|
to force all data to be flushed back to the base tier.
|
|
|
|
The 'readonly' mode is intended for read-only workloads that do not
|
|
require consistency to be enforced by the storage system. Writes will
|
|
be forwarded to the base tier, but objects that are read will get
|
|
promoted to the cache. No attempt is made by Ceph to ensure that the
|
|
contents of the cache tier(s) are consistent in the presence of object
|
|
updates.
|
|
|
|
Cache sizing
|
|
~~~~~~~~~~~~
|
|
|
|
The agent performs two basic functions: flushing (writing 'dirty'
|
|
cache objects back to the base tier) and evicting (removing cold and
|
|
clean objects from the cache).
|
|
|
|
The thresholds at which Ceph will flush or evict objects is specified
|
|
relative to a 'target size' of the pool. For example::
|
|
|
|
ceph osd pool set foo-hot cache_target_dirty_ratio .4
|
|
ceph osd pool set foo-hot cache_target_dirty_high_ratio .6
|
|
ceph osd pool set foo-hot cache_target_full_ratio .8
|
|
|
|
will begin flushing dirty objects when 40% of the pool is dirty and begin
|
|
evicting clean objects when we reach 80% of the target size.
|
|
|
|
The target size can be specified either in terms of objects or bytes::
|
|
|
|
ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB
|
|
ceph osd pool set foo-hot target_max_objects 1000000 # 1 million objects
|
|
|
|
Note that if both limits are specified, Ceph will begin flushing or
|
|
evicting when either threshold is triggered.
|
|
|
|
Other tunables
|
|
~~~~~~~~~~~~~~
|
|
|
|
You can specify a minimum object age before a recently updated object is
|
|
flushed to the base tier::
|
|
|
|
ceph osd pool set foo-hot cache_min_flush_age 600 # 10 minutes
|
|
|
|
You can specify the minimum age of an object before it will be evicted from
|
|
the cache tier::
|
|
|
|
ceph osd pool set foo-hot cache_min_evict_age 1800 # 30 minutes
|
|
|
|
|
|
|