mirror of
https://github.com/ceph/ceph
synced 2025-01-02 17:12:31 +00:00
doc/dev/cache-pool: describe the tiering agent
Signed-off-by: Sage Weil <sage@inktank.com>
This commit is contained in:
parent
06bfdfc9dd
commit
f1e3bc9a9b
@ -7,6 +7,10 @@ Purpose
|
||||
Use a pool of fast storage devices (probably SSDs) and use it as a
|
||||
cache for an existing larger pool.
|
||||
|
||||
Use a replicated pool as a front-end to service most I/O, and destage
|
||||
cold data to a separate erasure coded pool that does not current (and
|
||||
cannot efficiently) handle the workload.
|
||||
|
||||
We should be able to create and add a cache pool to an existing pool
|
||||
of data, and later remove it, without disrupting service or migrating
|
||||
data around.
|
||||
@ -17,9 +21,9 @@ Use cases
|
||||
Read-write pool, writeback
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
We have an existing data pool and put a fast cache pool "in front" of it. Writes will
|
||||
go to the cache pool and immediately ack. We flush them back to the data pool based on
|
||||
some policy.
|
||||
We have an existing data pool and put a fast cache pool "in front" of
|
||||
it. Writes will go to the cache pool and immediately ack. We flush
|
||||
them back to the data pool based on the defined policy.
|
||||
|
||||
Read-only pool, weak consistency
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -27,7 +31,7 @@ Read-only pool, weak consistency
|
||||
We have an existing data pool and add one or more read-only cache
|
||||
pools. We copy data to the cache pool(s) on read. Writes are
|
||||
forwarded to the original data pool. Stale data is expired from the
|
||||
cache pools based on some as-yet undetermined policy.
|
||||
cache pools based on the defined policy.
|
||||
|
||||
This is likely only useful for specific applications with specific
|
||||
data access patterns. It may be a match for rgw, for example.
|
||||
@ -45,7 +49,7 @@ Direct all traffic for foo to foo-hot::
|
||||
|
||||
ceph osd tier set-overlay foo foo-hot
|
||||
|
||||
Set the target size and enable the tiering agent for foo-hit::
|
||||
Set the target size and enable the tiering agent for foo-hot::
|
||||
|
||||
ceph osd pool set foo-hot hit_set_type bloom
|
||||
ceph osd pool set foo-hot hit_set_count 1
|
||||
@ -70,3 +74,110 @@ Read-only pools with lazy consistency::
|
||||
ceph osd tier cache-mode foo-west readonly
|
||||
|
||||
|
||||
|
||||
Tiering agent
|
||||
-------------
|
||||
|
||||
The tiering policy is defined as properties on the cache pool itself.
|
||||
|
||||
HitSet metadata
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
First, the agent requires HitSet information to be tracked on the
|
||||
cache pool in order to determine which objects in the pool are being
|
||||
accessed. This is enabled with::
|
||||
|
||||
ceph osd pool set foo-hot hit_set_type bloom
|
||||
ceph osd pool set foo-hot hit_set_count 1
|
||||
ceph osd pool set foo-hot hit_set_period 3600 # 1 hour
|
||||
|
||||
The supported HitSet types include 'bloom' (a bloom filter, the
|
||||
default), 'explicit_hash', and 'explicit_object'. The latter two
|
||||
explicitly enumerate accessed objects and are less memory efficient.
|
||||
They are there primarily for debugging and to demonstrate pluggability
|
||||
for the infrastructure. For the bloom filter type, you can additionally
|
||||
define the false positive probability for the bloom filter (default is 0.05)::
|
||||
|
||||
ceph osd pool set foo-hot hit_set_fpp 0.15
|
||||
|
||||
The hit_set_count and hit_set_period define how much time each HitSet
|
||||
should cover, and how many such HitSets to store. Binning accesses
|
||||
over time allows Ceph to independently determine whether an object was
|
||||
accessed at least once and whether it was accessed more than once over
|
||||
some time period ("age" vs "temperature"). Note that the longer the
|
||||
period and the higher the count the more RAM will be consumed by the
|
||||
ceph-osd process. In particular, when the agent is active to flush or
|
||||
evict cache objects, all hit_set_count HitSets are loaded into RAM.
|
||||
|
||||
Currently there is minimal benefit for hit_set_count > 1 since the
|
||||
agent does not yet act intelligently on that information.
|
||||
|
||||
Cache mode
|
||||
~~~~~~~~~~
|
||||
|
||||
The most important policy is the cache mode:
|
||||
|
||||
ceph osd pool set foo-hot cache-mode writeback
|
||||
|
||||
The supported modes are 'none', 'writeback', 'forward', and
|
||||
'readonly'. Most installations want 'writeback', which will write
|
||||
into the cache tier and only later flush updates back to the base
|
||||
tier. Similarly, any object that is read will be promoted into the
|
||||
cache tier.
|
||||
|
||||
The 'forward' mode is intended for when the cache is being disabled
|
||||
and needs to be drained. No new objects will be promoted or written
|
||||
to the cache pool unless they are already present. A background
|
||||
operation can then do something like::
|
||||
|
||||
rados -p foo-hot cache-try-flush-evict-all
|
||||
rados -p foo-hot cache-flush-evict-all
|
||||
|
||||
to force all data to be flushed back to the base tier.
|
||||
|
||||
The 'readonly' mode is intended for read-only workloads that do not
|
||||
require consistency to be enforced by the storage system. Writes will
|
||||
be forwarded to the base tier, but objects that are read will get
|
||||
promoted to the cache. No attempt is made by Ceph to ensure that the
|
||||
contents of the cache tier(s) are consistent in the presense of object
|
||||
updates.
|
||||
|
||||
Cache sizing
|
||||
~~~~~~~~~~~~
|
||||
|
||||
The agent performs two basic functions: flushing (writing 'dirty'
|
||||
cache objects back to the base tier) and evicting (removing cold and
|
||||
clean objects from the cache).
|
||||
|
||||
The thresholds at which Ceph will flush or evict objects is specified
|
||||
relative to a 'target size' of the pool. For example::
|
||||
|
||||
ceph osd pool set foo-hot cache_target_dirty_ratio .4
|
||||
ceph osd pool set foo-hot cache_target_full_ratio .8
|
||||
|
||||
will being flushing dirty objects when 40% of the pool is dirty and begin
|
||||
evicting clean objects when we reach 80% of the target size.
|
||||
|
||||
The target size can be specified either in terms of objects or bytes::
|
||||
|
||||
ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB
|
||||
ceph osd pool set foo-hot target_max_objets 1000000 # 1 million objects
|
||||
|
||||
Note that if both limits are specified, Ceph will being flushing or
|
||||
evicting when either threshold is triggered.
|
||||
|
||||
Other tunables
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
You can specify a minimum object age before a recently updated object is
|
||||
flushed to the base tier::
|
||||
|
||||
ceph osd pool set foo-hot cache_min_flush_age 600 # 10 minutes
|
||||
|
||||
You can specify the minimum age of an object before it will be evicted from
|
||||
the cache tier::
|
||||
|
||||
ceph osd pool set foo-hot cache_min_evict_age 1800 # 30 minutes
|
||||
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user