mirror of
https://github.com/ceph/ceph
synced 2025-03-09 17:59:10 +00:00
Merge PR #31729 into master
* refs/pull/31729/head: qa: reduce cache size further mds: obsoleting 'mds_cache_size' Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
This commit is contained in:
commit
2216c63ed5
@ -216,6 +216,10 @@
|
||||
|
||||
* The format of MDSs in `ceph fs dump` has changed.
|
||||
|
||||
* The ``mds_cache_size`` config option is completely removed. Since luminous,
|
||||
the ``mds_cache_memory_limit`` config option has been preferred to configure
|
||||
the MDS's cache limits.
|
||||
|
||||
* The ``pg_autoscale_mode`` is now set to ``on`` by default for newly
|
||||
created pools, which means that Ceph will automatically manage the
|
||||
number of PGs. To change this behavior, or to learn more about PG
|
||||
|
@ -69,10 +69,9 @@ performance is very different for workloads whose metadata fits within
|
||||
that cache.
|
||||
|
||||
If your workload has more files than fit in your cache (configured using
|
||||
``mds_cache_memory_limit`` or ``mds_cache_size`` settings), then
|
||||
make sure you test it appropriately: don't test your system with a small
|
||||
number of files and then expect equivalent performance when you move
|
||||
to a much larger number of files.
|
||||
``mds_cache_memory_limit`` settings), then make sure you test it
|
||||
appropriately: don't test your system with a small number of files and then
|
||||
expect equivalent performance when you move to a much larger number of files.
|
||||
|
||||
Do you need a file system?
|
||||
--------------------------
|
||||
|
@ -5,10 +5,9 @@ This section describes ways to limit MDS cache size.
|
||||
|
||||
You can limit the size of the Metadata Server (MDS) cache by:
|
||||
|
||||
* *A memory limit*: A new behavior introduced in the Luminous release. Use the `mds_cache_memory_limit` parameters. We recommend to use memory limits instead of inode count limits.
|
||||
* *Inode count*: Use the `mds_cache_size` parameter. By default, limiting the MDS cache by inode count is disabled.
|
||||
* *A memory limit*: A new behavior introduced in the Luminous release. Use the `mds_cache_memory_limit` parameters.
|
||||
|
||||
In addition, you can specify a cache reservation by using the `mds_cache_reservation` parameter for MDS operations. The cache reservation is limited as a percentage of the memory or inode limit and is set to 5% by default. The intent of this parameter is to have the MDS maintain an extra reserve of memory for its cache for new metadata operations to use. As a consequence, the MDS should in general operate below its memory limit because it will recall old state from clients in order to drop unused metadata in its cache.
|
||||
In addition, you can specify a cache reservation by using the `mds_cache_reservation` parameter for MDS operations. The cache reservation is limited as a percentage of the memory and is set to 5% by default. The intent of this parameter is to have the MDS maintain an extra reserve of memory for its cache for new metadata operations to use. As a consequence, the MDS should in general operate below its memory limit because it will recall old state from clients in order to drop unused metadata in its cache.
|
||||
|
||||
The `mds_cache_reservation` parameter replaces the `mds_health_cache_threshold` in all situations except when MDS nodes sends a health alert to the Monitors indicating the cache is too large. By default, `mds_health_cache_threshold` is 150% of the maximum cache size.
|
||||
|
||||
|
@ -75,11 +75,11 @@ Message: "Client *name* failing to respond to cache pressure"
|
||||
Code: MDS_HEALTH_CLIENT_RECALL, MDS_HEALTH_CLIENT_RECALL_MANY
|
||||
Description: Clients maintain a metadata cache. Items (such as inodes) in the
|
||||
client cache are also pinned in the MDS cache, so when the MDS needs to shrink
|
||||
its cache (to stay within ``mds_cache_size`` or ``mds_cache_memory_limit``), it
|
||||
sends messages to clients to shrink their caches too. If the client is
|
||||
unresponsive or buggy, this can prevent the MDS from properly staying within
|
||||
its cache limits and it may eventually run out of memory and crash. This
|
||||
message appears if a client has failed to release more than
|
||||
its cache (to stay within ``mds_cache_memory_limit``), it sends messages to
|
||||
clients to shrink their caches too. If the client is unresponsive or buggy,
|
||||
this can prevent the MDS from properly staying within its cache limits and it
|
||||
may eventually run out of memory and crash. This message appears if a client
|
||||
has failed to release more than
|
||||
``mds_recall_warning_threshold`` capabilities (decaying with a half-life of
|
||||
``mds_recall_max_decay_rate``) within the last
|
||||
``mds_recall_warning_decay_rate`` second.
|
||||
@ -126,6 +126,6 @@ Code: MDS_HEALTH_CACHE_OVERSIZED
|
||||
Description: The MDS is not succeeding in trimming its cache to comply with the
|
||||
limit set by the administrator. If the MDS cache becomes too large, the daemon
|
||||
may exhaust available memory and crash. By default, this message appears if
|
||||
the actual cache size (in inodes or memory) is at least 50% greater than
|
||||
``mds_cache_size`` (default 100000) or ``mds_cache_memory_limit`` (default
|
||||
1GB). Modify ``mds_health_cache_threshold`` to set the warning ratio.
|
||||
the actual cache size (in memory) is at least 50% greater than
|
||||
``mds_cache_memory_limit`` (default 1GB). Modify ``mds_health_cache_threshold``
|
||||
to set the warning ratio.
|
||||
|
@ -5,9 +5,8 @@
|
||||
``mds cache memory limit``
|
||||
|
||||
:Description: The memory limit the MDS should enforce for its cache.
|
||||
Administrators should use this instead of ``mds cache size``.
|
||||
:Type: 64-bit Integer Unsigned
|
||||
:Default: ``1073741824``
|
||||
:Default: ``1G``
|
||||
|
||||
``mds cache reservation``
|
||||
|
||||
@ -18,14 +17,6 @@
|
||||
:Type: Float
|
||||
:Default: ``0.05``
|
||||
|
||||
``mds cache size``
|
||||
|
||||
:Description: The number of inodes to cache. A value of 0 indicates an
|
||||
unlimited number. It is recommended to use
|
||||
``mds_cache_memory_limit`` to limit the amount of memory the MDS
|
||||
cache uses.
|
||||
:Type: 32-bit Integer
|
||||
:Default: ``0``
|
||||
|
||||
``mds cache mid``
|
||||
|
||||
|
@ -38,9 +38,9 @@ specific clients as misbehaving, you should investigate why they are doing so.
|
||||
|
||||
Generally it will be the result of
|
||||
|
||||
#. Overloading the system (if you have extra RAM, increase the "mds cache size"
|
||||
config from its default 100000; having a larger active file set than your MDS
|
||||
cache is the #1 cause of this!).
|
||||
#. Overloading the system (if you have extra RAM, increase the
|
||||
"mds cache memory limit" config from its default 1GiB; having a larger active
|
||||
file set than your MDS cache is the #1 cause of this!).
|
||||
|
||||
#. Running an older (misbehaving) client.
|
||||
|
||||
|
@ -144,7 +144,7 @@ These sections include:
|
||||
the Ceph Storage Cluster, and override the same setting in
|
||||
``global``.
|
||||
|
||||
:Example: ``mds_cache_size = 10G``
|
||||
:Example: ``mds_cache_memory_limit = 10G``
|
||||
|
||||
``client``
|
||||
|
||||
|
@ -38,15 +38,18 @@ class TestClientLimits(CephFSTestCase):
|
||||
:param use_subdir: whether to put test files in a subdir or use root
|
||||
"""
|
||||
|
||||
cache_size = open_files/2
|
||||
# Set MDS cache memory limit to a low value that will make the MDS to
|
||||
# ask the client to trim the caps.
|
||||
cache_memory_limit = "1K"
|
||||
|
||||
self.set_conf('mds', 'mds cache size', cache_size)
|
||||
self.set_conf('mds', 'mds_cache_memory_limit', cache_memory_limit)
|
||||
self.set_conf('mds', 'mds_recall_max_caps', open_files/2)
|
||||
self.set_conf('mds', 'mds_recall_warning_threshold', open_files)
|
||||
self.fs.mds_fail_restart()
|
||||
self.fs.wait_for_daemons()
|
||||
|
||||
mds_min_caps_per_client = int(self.fs.get_config("mds_min_caps_per_client"))
|
||||
mds_max_caps_per_client = int(self.fs.get_config("mds_max_caps_per_client"))
|
||||
mds_recall_warning_decay_rate = self.fs.get_config("mds_recall_warning_decay_rate")
|
||||
self.assertTrue(open_files >= mds_min_caps_per_client)
|
||||
|
||||
@ -87,7 +90,7 @@ class TestClientLimits(CephFSTestCase):
|
||||
num_caps = self.get_session(mount_a_client_id)['num_caps']
|
||||
if num_caps <= mds_min_caps_per_client:
|
||||
return True
|
||||
elif num_caps < cache_size:
|
||||
elif num_caps <= mds_max_caps_per_client:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
@ -7557,11 +7557,6 @@ std::vector<Option> get_mds_options() {
|
||||
.set_description("interval in seconds between heap releases")
|
||||
.set_flag(Option::FLAG_RUNTIME),
|
||||
|
||||
Option("mds_cache_size", Option::TYPE_INT, Option::LEVEL_ADVANCED)
|
||||
.set_default(0)
|
||||
.set_description("maximum number of inodes in MDS cache (<=0 is unlimited)")
|
||||
.set_long_description("This tunable is no longer recommended. Use mds_cache_memory_limit."),
|
||||
|
||||
Option("mds_cache_memory_limit", Option::TYPE_SIZE, Option::LEVEL_BASIC)
|
||||
.set_default(1*(1LL<<30))
|
||||
.set_description("target maximum memory usage of MDS cache")
|
||||
|
@ -147,7 +147,6 @@ MDCache::MDCache(MDSRank *m, PurgeQueue &purge_queue_) :
|
||||
(g_conf()->mds_dir_max_commit_size << 20) :
|
||||
(0.9 *(g_conf()->osd_max_write_size << 20));
|
||||
|
||||
cache_inode_limit = g_conf().get_val<int64_t>("mds_cache_size");
|
||||
cache_memory_limit = g_conf().get_val<Option::size_t>("mds_cache_memory_limit");
|
||||
cache_reservation = g_conf().get_val<double>("mds_cache_reservation");
|
||||
cache_health_threshold = g_conf().get_val<double>("mds_health_cache_threshold");
|
||||
@ -212,8 +211,6 @@ MDCache::~MDCache()
|
||||
|
||||
void MDCache::handle_conf_change(const std::set<std::string>& changed, const MDSMap& mdsmap)
|
||||
{
|
||||
if (changed.count("mds_cache_size"))
|
||||
cache_inode_limit = g_conf().get_val<int64_t>("mds_cache_size");
|
||||
if (changed.count("mds_cache_memory_limit"))
|
||||
cache_memory_limit = g_conf().get_val<Option::size_t>("mds_cache_memory_limit");
|
||||
if (changed.count("mds_cache_reservation"))
|
||||
@ -232,7 +229,6 @@ void MDCache::handle_conf_change(const std::set<std::string>& changed, const MDS
|
||||
|
||||
void MDCache::log_stat()
|
||||
{
|
||||
mds->logger->set(l_mds_inode_max, cache_inode_limit ? : INT_MAX);
|
||||
mds->logger->set(l_mds_inodes, lru.lru_get_size());
|
||||
mds->logger->set(l_mds_inodes_pinned, lru.lru_get_num_pinned());
|
||||
mds->logger->set(l_mds_inodes_top, lru.lru_get_top());
|
||||
|
@ -186,16 +186,12 @@ class MDCache {
|
||||
explicit MDCache(MDSRank *m, PurgeQueue &purge_queue_);
|
||||
~MDCache();
|
||||
|
||||
uint64_t cache_limit_inodes(void) {
|
||||
return cache_inode_limit;
|
||||
}
|
||||
uint64_t cache_limit_memory(void) {
|
||||
return cache_memory_limit;
|
||||
}
|
||||
double cache_toofull_ratio(void) const {
|
||||
double inode_reserve = cache_inode_limit*(1.0-cache_reservation);
|
||||
double memory_reserve = cache_memory_limit*(1.0-cache_reservation);
|
||||
return fmax(0.0, fmax((cache_size()-memory_reserve)/memory_reserve, cache_inode_limit == 0 ? 0.0 : (CInode::count()-inode_reserve)/inode_reserve));
|
||||
return fmax(0.0, (cache_size()-memory_reserve)/memory_reserve);
|
||||
}
|
||||
bool cache_toofull(void) const {
|
||||
return cache_toofull_ratio() > 0.0;
|
||||
@ -204,7 +200,7 @@ class MDCache {
|
||||
return mempool::get_pool(mempool::mds_co::id).allocated_bytes();
|
||||
}
|
||||
bool cache_overfull(void) const {
|
||||
return (cache_inode_limit > 0 && CInode::count() > cache_inode_limit*cache_health_threshold) || (cache_size() > cache_memory_limit*cache_health_threshold);
|
||||
return cache_size() > cache_memory_limit*cache_health_threshold;
|
||||
}
|
||||
|
||||
void advance_stray() {
|
||||
@ -1269,7 +1265,6 @@ class MDCache {
|
||||
void finish_uncommitted_fragment(dirfrag_t basedirfrag, int op);
|
||||
void rollback_uncommitted_fragment(dirfrag_t basedirfrag, frag_vec_t&& old_frags);
|
||||
|
||||
uint64_t cache_inode_limit;
|
||||
uint64_t cache_memory_limit;
|
||||
double cache_reservation;
|
||||
double cache_health_threshold;
|
||||
|
@ -3176,7 +3176,6 @@ void MDSRank::create_logger()
|
||||
mds_plb.add_u64_counter(l_mds_dir_commit, "dir_commit", "Directory commit");
|
||||
mds_plb.add_u64_counter(l_mds_dir_split, "dir_split", "Directory split");
|
||||
mds_plb.add_u64_counter(l_mds_dir_merge, "dir_merge", "Directory merge");
|
||||
mds_plb.add_u64(l_mds_inode_max, "inode_max", "Max inodes, cache size");
|
||||
mds_plb.add_u64(l_mds_inodes_pinned, "inodes_pinned", "Inodes pinned");
|
||||
mds_plb.add_u64(l_mds_inodes_expired, "inodes_expired", "Inodes expired");
|
||||
mds_plb.add_u64(l_mds_inodes_with_caps, "inodes_with_caps",
|
||||
@ -3638,7 +3637,6 @@ const char** MDSRankDispatcher::get_tracked_conf_keys() const
|
||||
"mds_cache_memory_limit",
|
||||
"mds_cache_mid",
|
||||
"mds_cache_reservation",
|
||||
"mds_cache_size",
|
||||
"mds_cache_trim_decay_rate",
|
||||
"mds_cap_revoke_eviction_timeout",
|
||||
"mds_dump_cache_threshold_file",
|
||||
|
@ -51,7 +51,6 @@ enum {
|
||||
l_mds_dir_commit,
|
||||
l_mds_dir_split,
|
||||
l_mds_dir_merge,
|
||||
l_mds_inode_max,
|
||||
l_mds_inodes,
|
||||
l_mds_inodes_top,
|
||||
l_mds_inodes_bottom,
|
||||
|
@ -262,10 +262,9 @@
|
||||
;debug mds = 20
|
||||
;debug journaler = 20
|
||||
|
||||
# The number of inodes to cache.
|
||||
# Type: 32-bit Integer
|
||||
# (Default: 100000)
|
||||
;mds cache size = 250000
|
||||
# The memory limit the MDS should enforce for its cache.
|
||||
# (Default: 1G)
|
||||
;mds cache memory limit = 2G
|
||||
|
||||
;[mds.alpha]
|
||||
; host = alpha
|
||||
|
@ -207,7 +207,7 @@ usage=$usage"\t-N, --not-new: reuse existing cluster config (default)\n"
|
||||
usage=$usage"\t--valgrind[_{osd,mds,mon,rgw}] 'toolname args...'\n"
|
||||
usage=$usage"\t--nodaemon: use ceph-run as wrapper for mon/osd/mds\n"
|
||||
usage=$usage"\t--redirect-output: only useful with nodaemon, directs output to log file\n"
|
||||
usage=$usage"\t--smallmds: limit mds cache size\n"
|
||||
usage=$usage"\t--smallmds: limit mds cache memory limit\n"
|
||||
usage=$usage"\t-m ip:port\t\tspecify monitor address\n"
|
||||
usage=$usage"\t-k keep old configuration files\n"
|
||||
usage=$usage"\t-x enable cephx (on by default)\n"
|
||||
@ -1242,7 +1242,8 @@ if [ "$smallmds" -eq 1 ]; then
|
||||
wconf <<EOF
|
||||
[mds]
|
||||
mds log max segments = 2
|
||||
mds cache size = 10000
|
||||
# Default 'mds cache memory limit' is 1GiB, and here we set it to 100MiB.
|
||||
mds cache memory limit = 100M
|
||||
EOF
|
||||
fi
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user