mirror of
https://github.com/ceph/ceph
synced 2025-03-31 07:53:23 +00:00
doc/dev: remove cache-pool
Remove cache-pool, which is deprecated in Reef. Signed-off-by: Zac Dover <zac.dover@proton.me>
This commit is contained in:
parent
cdf5c2d5fc
commit
073b0919ed
doc/dev
@ -1,200 +0,0 @@
|
||||
Cache pool
|
||||
==========
|
||||
|
||||
Purpose
|
||||
-------
|
||||
|
||||
Use a pool of fast storage devices (probably SSDs) and use it as a
|
||||
cache for an existing slower and larger pool.
|
||||
|
||||
Use a replicated pool as a front-end to service most I/O, and destage
|
||||
cold data to a separate erasure coded pool that does not currently (and
|
||||
cannot efficiently) handle the workload.
|
||||
|
||||
We should be able to create and add a cache pool to an existing pool
|
||||
of data, and later remove it, without disrupting service or migrating
|
||||
data around.
|
||||
|
||||
Use cases
|
||||
---------
|
||||
|
||||
Read-write pool, writeback
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
We have an existing data pool and put a fast cache pool "in front" of
|
||||
it. Writes will go to the cache pool and immediately ack. We flush
|
||||
them back to the data pool based on the defined policy.
|
||||
|
||||
Read-only pool, weak consistency
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
We have an existing data pool and add one or more read-only cache
|
||||
pools. We copy data to the cache pool(s) on read. Writes are
|
||||
forwarded to the original data pool. Stale data is expired from the
|
||||
cache pools based on the defined policy.
|
||||
|
||||
This is likely only useful for specific applications with specific
|
||||
data access patterns. It may be a match for rgw, for example.
|
||||
|
||||
|
||||
Interface
|
||||
---------
|
||||
|
||||
Set up a read/write cache pool foo-hot for pool foo::
|
||||
|
||||
ceph osd tier add foo foo-hot
|
||||
ceph osd tier cache-mode foo-hot writeback
|
||||
|
||||
Direct all traffic for foo to foo-hot::
|
||||
|
||||
ceph osd tier set-overlay foo foo-hot
|
||||
|
||||
Set the target size and enable the tiering agent for foo-hot::
|
||||
|
||||
ceph osd pool set foo-hot hit_set_type bloom
|
||||
ceph osd pool set foo-hot hit_set_count 1
|
||||
ceph osd pool set foo-hot hit_set_period 3600 # 1 hour
|
||||
ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB
|
||||
ceph osd pool set foo-hot min_read_recency_for_promote 1
|
||||
ceph osd pool set foo-hot min_write_recency_for_promote 1
|
||||
|
||||
Drain the cache in preparation for turning it off::
|
||||
|
||||
ceph osd tier cache-mode foo-hot forward
|
||||
rados -p foo-hot cache-flush-evict-all
|
||||
|
||||
When cache pool is finally empty, disable it::
|
||||
|
||||
ceph osd tier remove-overlay foo
|
||||
ceph osd tier remove foo foo-hot
|
||||
|
||||
Read-only pools with lazy consistency::
|
||||
|
||||
ceph osd tier add foo foo-east
|
||||
ceph osd tier cache-mode foo-east readonly
|
||||
ceph osd tier add foo foo-west
|
||||
ceph osd tier cache-mode foo-west readonly
|
||||
|
||||
|
||||
|
||||
Tiering agent
|
||||
-------------
|
||||
|
||||
The tiering policy is defined as properties on the cache pool itself.
|
||||
|
||||
HitSet metadata
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
First, the agent requires HitSet information to be tracked on the
|
||||
cache pool in order to determine which objects in the pool are being
|
||||
accessed. This is enabled with::
|
||||
|
||||
ceph osd pool set foo-hot hit_set_type bloom
|
||||
ceph osd pool set foo-hot hit_set_count 1
|
||||
ceph osd pool set foo-hot hit_set_period 3600 # 1 hour
|
||||
|
||||
The supported HitSet types include 'bloom' (a bloom filter, the
|
||||
default), 'explicit_hash', and 'explicit_object'. The latter two
|
||||
explicitly enumerate accessed objects and are less memory efficient.
|
||||
They are there primarily for debugging and to demonstrate pluggability
|
||||
for the infrastructure. For the bloom filter type, you can additionally
|
||||
define the false positive probability for the bloom filter (default is 0.05)::
|
||||
|
||||
ceph osd pool set foo-hot hit_set_fpp 0.15
|
||||
|
||||
The hit_set_count and hit_set_period define how much time each HitSet
|
||||
should cover, and how many such HitSets to store. Binning accesses
|
||||
over time allows Ceph to independently determine whether an object was
|
||||
accessed at least once and whether it was accessed more than once over
|
||||
some time period ("age" vs "temperature").
|
||||
|
||||
The ``min_read_recency_for_promote`` defines how many HitSets to check for the
|
||||
existence of an object when handling a read operation. The checking result is
|
||||
used to decide whether to promote the object asynchronously. Its value should be
|
||||
between 0 and ``hit_set_count``. If it's set to 0, the object is always promoted.
|
||||
If it's set to 1, the current HitSet is checked. And if this object is in the
|
||||
current HitSet, it's promoted. Otherwise not. For the other values, the exact
|
||||
number of archive HitSets are checked. The object is promoted if the object is
|
||||
found in any of the most recent ``min_read_recency_for_promote`` HitSets.
|
||||
|
||||
A similar parameter can be set for the write operation, which is
|
||||
``min_write_recency_for_promote``. ::
|
||||
|
||||
ceph osd pool set {cachepool} min_read_recency_for_promote 1
|
||||
ceph osd pool set {cachepool} min_write_recency_for_promote 1
|
||||
|
||||
Note that the longer the ``hit_set_period`` and the higher the
|
||||
``min_read_recency_for_promote``/``min_write_recency_for_promote`` the more RAM
|
||||
will be consumed by the ceph-osd process. In particular, when the agent is active
|
||||
to flush or evict cache objects, all hit_set_count HitSets are loaded into RAM.
|
||||
|
||||
Cache mode
|
||||
~~~~~~~~~~
|
||||
|
||||
The most important policy is the cache mode:
|
||||
|
||||
ceph osd pool set foo-hot cache-mode writeback
|
||||
|
||||
The supported modes are 'none', 'writeback', 'forward', and
|
||||
'readonly'. Most installations want 'writeback', which will write
|
||||
into the cache tier and only later flush updates back to the base
|
||||
tier. Similarly, any object that is read will be promoted into the
|
||||
cache tier.
|
||||
|
||||
The 'forward' mode is intended for when the cache is being disabled
|
||||
and needs to be drained. No new objects will be promoted or written
|
||||
to the cache pool unless they are already present. A background
|
||||
operation can then do something like::
|
||||
|
||||
rados -p foo-hot cache-try-flush-evict-all
|
||||
rados -p foo-hot cache-flush-evict-all
|
||||
|
||||
to force all data to be flushed back to the base tier.
|
||||
|
||||
The 'readonly' mode is intended for read-only workloads that do not
|
||||
require consistency to be enforced by the storage system. Writes will
|
||||
be forwarded to the base tier, but objects that are read will get
|
||||
promoted to the cache. No attempt is made by Ceph to ensure that the
|
||||
contents of the cache tier(s) are consistent in the presence of object
|
||||
updates.
|
||||
|
||||
Cache sizing
|
||||
~~~~~~~~~~~~
|
||||
|
||||
The agent performs two basic functions: flushing (writing 'dirty'
|
||||
cache objects back to the base tier) and evicting (removing cold and
|
||||
clean objects from the cache).
|
||||
|
||||
The thresholds at which Ceph will flush or evict objects is specified
|
||||
relative to a 'target size' of the pool. For example::
|
||||
|
||||
ceph osd pool set foo-hot cache_target_dirty_ratio .4
|
||||
ceph osd pool set foo-hot cache_target_dirty_high_ratio .6
|
||||
ceph osd pool set foo-hot cache_target_full_ratio .8
|
||||
|
||||
will begin flushing dirty objects when 40% of the pool is dirty and begin
|
||||
evicting clean objects when we reach 80% of the target size.
|
||||
|
||||
The target size can be specified either in terms of objects or bytes::
|
||||
|
||||
ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB
|
||||
ceph osd pool set foo-hot target_max_objects 1000000 # 1 million objects
|
||||
|
||||
Note that if both limits are specified, Ceph will begin flushing or
|
||||
evicting when either threshold is triggered.
|
||||
|
||||
Other tunables
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
You can specify a minimum object age before a recently updated object is
|
||||
flushed to the base tier::
|
||||
|
||||
ceph osd pool set foo-hot cache_min_flush_age 600 # 10 minutes
|
||||
|
||||
You can specify the minimum age of an object before it will be evicted from
|
||||
the cache tier::
|
||||
|
||||
ceph osd pool set foo-hot cache_min_evict_age 1800 # 30 minutes
|
||||
|
||||
|
||||
|
@ -289,40 +289,6 @@ This seems complicated, but it gets us two valuable properties:
|
||||
All clone operations will need to consider adjacent ``chunk_maps``
|
||||
when adding or removing references.
|
||||
|
||||
Cache/Tiering
|
||||
-------------
|
||||
|
||||
There already exists a cache/tiering mechanism based on whiteouts.
|
||||
One goal here should ultimately be for this manifest machinery to
|
||||
provide a complete replacement.
|
||||
|
||||
See ``cache-pool.rst``
|
||||
|
||||
The manifest machinery already shares some code paths with the
|
||||
existing cache/tiering code, mainly ``stat_flush``.
|
||||
|
||||
In no particular order, here's in incomplete list of things that need
|
||||
to be wired up to provide feature parity:
|
||||
|
||||
* Online object access information: The osd already has pool configs
|
||||
for maintaining bloom filters which provide estimates of access
|
||||
recency for objects. We probably need to modify this to permit
|
||||
hitset maintenance for a normal pool -- there are already
|
||||
``CEPH_OSD_OP_PG_HITSET*`` interfaces for querying them.
|
||||
* Tiering agent: The osd already has a background tiering agent which
|
||||
would need to be modified to instead flush and evict using
|
||||
manifests.
|
||||
|
||||
* Use exiting existing features regarding the cache flush policy such as
|
||||
histset, age, ratio.
|
||||
- hitset
|
||||
- age, ratio, bytes
|
||||
|
||||
* Add tiering-mode to ``manifest-tiering``
|
||||
- Writeback
|
||||
- Read-only
|
||||
|
||||
|
||||
Data Structures
|
||||
===============
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user