Merge PR #31729 into master

* refs/pull/31729/head:
	qa: reduce cache size further
	mds: obsoleting 'mds_cache_size'

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
This commit is contained in:
Patrick Donnelly 2019-12-11 09:44:27 -08:00
commit 2216c63ed5
No known key found for this signature in database
GPG Key ID: 3A2A7E25BEA8AADB
15 changed files with 36 additions and 57 deletions

View File

@ -216,6 +216,10 @@
* The format of MDSs in `ceph fs dump` has changed.
* The ``mds_cache_size`` config option is completely removed. Since luminous,
the ``mds_cache_memory_limit`` config option has been preferred to configure
the MDS's cache limits.
* The ``pg_autoscale_mode`` is now set to ``on`` by default for newly
created pools, which means that Ceph will automatically manage the
number of PGs. To change this behavior, or to learn more about PG

View File

@ -69,10 +69,9 @@ performance is very different for workloads whose metadata fits within
that cache.
If your workload has more files than fit in your cache (configured using
``mds_cache_memory_limit`` or ``mds_cache_size`` settings), then
make sure you test it appropriately: don't test your system with a small
number of files and then expect equivalent performance when you move
to a much larger number of files.
``mds_cache_memory_limit`` settings), then make sure you test it
appropriately: don't test your system with a small number of files and then
expect equivalent performance when you move to a much larger number of files.
Do you need a file system?
--------------------------

View File

@ -5,10 +5,9 @@ This section describes ways to limit MDS cache size.
You can limit the size of the Metadata Server (MDS) cache by:
* *A memory limit*: A new behavior introduced in the Luminous release. Use the `mds_cache_memory_limit` parameters. We recommend to use memory limits instead of inode count limits.
* *Inode count*: Use the `mds_cache_size` parameter. By default, limiting the MDS cache by inode count is disabled.
* *A memory limit*: A new behavior introduced in the Luminous release. Use the `mds_cache_memory_limit` parameters.
In addition, you can specify a cache reservation by using the `mds_cache_reservation` parameter for MDS operations. The cache reservation is limited as a percentage of the memory or inode limit and is set to 5% by default. The intent of this parameter is to have the MDS maintain an extra reserve of memory for its cache for new metadata operations to use. As a consequence, the MDS should in general operate below its memory limit because it will recall old state from clients in order to drop unused metadata in its cache.
In addition, you can specify a cache reservation by using the `mds_cache_reservation` parameter for MDS operations. The cache reservation is limited as a percentage of the memory and is set to 5% by default. The intent of this parameter is to have the MDS maintain an extra reserve of memory for its cache for new metadata operations to use. As a consequence, the MDS should in general operate below its memory limit because it will recall old state from clients in order to drop unused metadata in its cache.
The `mds_cache_reservation` parameter replaces the `mds_health_cache_threshold` in all situations except when MDS nodes sends a health alert to the Monitors indicating the cache is too large. By default, `mds_health_cache_threshold` is 150% of the maximum cache size.

View File

@ -75,11 +75,11 @@ Message: "Client *name* failing to respond to cache pressure"
Code: MDS_HEALTH_CLIENT_RECALL, MDS_HEALTH_CLIENT_RECALL_MANY
Description: Clients maintain a metadata cache. Items (such as inodes) in the
client cache are also pinned in the MDS cache, so when the MDS needs to shrink
its cache (to stay within ``mds_cache_size`` or ``mds_cache_memory_limit``), it
sends messages to clients to shrink their caches too. If the client is
unresponsive or buggy, this can prevent the MDS from properly staying within
its cache limits and it may eventually run out of memory and crash. This
message appears if a client has failed to release more than
its cache (to stay within ``mds_cache_memory_limit``), it sends messages to
clients to shrink their caches too. If the client is unresponsive or buggy,
this can prevent the MDS from properly staying within its cache limits and it
may eventually run out of memory and crash. This message appears if a client
has failed to release more than
``mds_recall_warning_threshold`` capabilities (decaying with a half-life of
``mds_recall_max_decay_rate``) within the last
``mds_recall_warning_decay_rate`` second.
@ -126,6 +126,6 @@ Code: MDS_HEALTH_CACHE_OVERSIZED
Description: The MDS is not succeeding in trimming its cache to comply with the
limit set by the administrator. If the MDS cache becomes too large, the daemon
may exhaust available memory and crash. By default, this message appears if
the actual cache size (in inodes or memory) is at least 50% greater than
``mds_cache_size`` (default 100000) or ``mds_cache_memory_limit`` (default
1GB). Modify ``mds_health_cache_threshold`` to set the warning ratio.
the actual cache size (in memory) is at least 50% greater than
``mds_cache_memory_limit`` (default 1GB). Modify ``mds_health_cache_threshold``
to set the warning ratio.

View File

@ -5,9 +5,8 @@
``mds cache memory limit``
:Description: The memory limit the MDS should enforce for its cache.
Administrators should use this instead of ``mds cache size``.
:Type: 64-bit Integer Unsigned
:Default: ``1073741824``
:Default: ``1G``
``mds cache reservation``
@ -18,14 +17,6 @@
:Type: Float
:Default: ``0.05``
``mds cache size``
:Description: The number of inodes to cache. A value of 0 indicates an
unlimited number. It is recommended to use
``mds_cache_memory_limit`` to limit the amount of memory the MDS
cache uses.
:Type: 32-bit Integer
:Default: ``0``
``mds cache mid``

View File

@ -38,9 +38,9 @@ specific clients as misbehaving, you should investigate why they are doing so.
Generally it will be the result of
#. Overloading the system (if you have extra RAM, increase the "mds cache size"
config from its default 100000; having a larger active file set than your MDS
cache is the #1 cause of this!).
#. Overloading the system (if you have extra RAM, increase the
"mds cache memory limit" config from its default 1GiB; having a larger active
file set than your MDS cache is the #1 cause of this!).
#. Running an older (misbehaving) client.

View File

@ -144,7 +144,7 @@ These sections include:
the Ceph Storage Cluster, and override the same setting in
``global``.
:Example: ``mds_cache_size = 10G``
:Example: ``mds_cache_memory_limit = 10G``
``client``

View File

@ -38,15 +38,18 @@ class TestClientLimits(CephFSTestCase):
:param use_subdir: whether to put test files in a subdir or use root
"""
cache_size = open_files/2
# Set MDS cache memory limit to a low value that will make the MDS to
# ask the client to trim the caps.
cache_memory_limit = "1K"
self.set_conf('mds', 'mds cache size', cache_size)
self.set_conf('mds', 'mds_cache_memory_limit', cache_memory_limit)
self.set_conf('mds', 'mds_recall_max_caps', open_files/2)
self.set_conf('mds', 'mds_recall_warning_threshold', open_files)
self.fs.mds_fail_restart()
self.fs.wait_for_daemons()
mds_min_caps_per_client = int(self.fs.get_config("mds_min_caps_per_client"))
mds_max_caps_per_client = int(self.fs.get_config("mds_max_caps_per_client"))
mds_recall_warning_decay_rate = self.fs.get_config("mds_recall_warning_decay_rate")
self.assertTrue(open_files >= mds_min_caps_per_client)
@ -87,7 +90,7 @@ class TestClientLimits(CephFSTestCase):
num_caps = self.get_session(mount_a_client_id)['num_caps']
if num_caps <= mds_min_caps_per_client:
return True
elif num_caps < cache_size:
elif num_caps <= mds_max_caps_per_client:
return True
else:
return False

View File

@ -7557,11 +7557,6 @@ std::vector<Option> get_mds_options() {
.set_description("interval in seconds between heap releases")
.set_flag(Option::FLAG_RUNTIME),
Option("mds_cache_size", Option::TYPE_INT, Option::LEVEL_ADVANCED)
.set_default(0)
.set_description("maximum number of inodes in MDS cache (<=0 is unlimited)")
.set_long_description("This tunable is no longer recommended. Use mds_cache_memory_limit."),
Option("mds_cache_memory_limit", Option::TYPE_SIZE, Option::LEVEL_BASIC)
.set_default(1*(1LL<<30))
.set_description("target maximum memory usage of MDS cache")

View File

@ -147,7 +147,6 @@ MDCache::MDCache(MDSRank *m, PurgeQueue &purge_queue_) :
(g_conf()->mds_dir_max_commit_size << 20) :
(0.9 *(g_conf()->osd_max_write_size << 20));
cache_inode_limit = g_conf().get_val<int64_t>("mds_cache_size");
cache_memory_limit = g_conf().get_val<Option::size_t>("mds_cache_memory_limit");
cache_reservation = g_conf().get_val<double>("mds_cache_reservation");
cache_health_threshold = g_conf().get_val<double>("mds_health_cache_threshold");
@ -212,8 +211,6 @@ MDCache::~MDCache()
void MDCache::handle_conf_change(const std::set<std::string>& changed, const MDSMap& mdsmap)
{
if (changed.count("mds_cache_size"))
cache_inode_limit = g_conf().get_val<int64_t>("mds_cache_size");
if (changed.count("mds_cache_memory_limit"))
cache_memory_limit = g_conf().get_val<Option::size_t>("mds_cache_memory_limit");
if (changed.count("mds_cache_reservation"))
@ -232,7 +229,6 @@ void MDCache::handle_conf_change(const std::set<std::string>& changed, const MDS
void MDCache::log_stat()
{
mds->logger->set(l_mds_inode_max, cache_inode_limit ? : INT_MAX);
mds->logger->set(l_mds_inodes, lru.lru_get_size());
mds->logger->set(l_mds_inodes_pinned, lru.lru_get_num_pinned());
mds->logger->set(l_mds_inodes_top, lru.lru_get_top());

View File

@ -186,16 +186,12 @@ class MDCache {
explicit MDCache(MDSRank *m, PurgeQueue &purge_queue_);
~MDCache();
uint64_t cache_limit_inodes(void) {
return cache_inode_limit;
}
uint64_t cache_limit_memory(void) {
return cache_memory_limit;
}
double cache_toofull_ratio(void) const {
double inode_reserve = cache_inode_limit*(1.0-cache_reservation);
double memory_reserve = cache_memory_limit*(1.0-cache_reservation);
return fmax(0.0, fmax((cache_size()-memory_reserve)/memory_reserve, cache_inode_limit == 0 ? 0.0 : (CInode::count()-inode_reserve)/inode_reserve));
return fmax(0.0, (cache_size()-memory_reserve)/memory_reserve);
}
bool cache_toofull(void) const {
return cache_toofull_ratio() > 0.0;
@ -204,7 +200,7 @@ class MDCache {
return mempool::get_pool(mempool::mds_co::id).allocated_bytes();
}
bool cache_overfull(void) const {
return (cache_inode_limit > 0 && CInode::count() > cache_inode_limit*cache_health_threshold) || (cache_size() > cache_memory_limit*cache_health_threshold);
return cache_size() > cache_memory_limit*cache_health_threshold;
}
void advance_stray() {
@ -1269,7 +1265,6 @@ class MDCache {
void finish_uncommitted_fragment(dirfrag_t basedirfrag, int op);
void rollback_uncommitted_fragment(dirfrag_t basedirfrag, frag_vec_t&& old_frags);
uint64_t cache_inode_limit;
uint64_t cache_memory_limit;
double cache_reservation;
double cache_health_threshold;

View File

@ -3176,7 +3176,6 @@ void MDSRank::create_logger()
mds_plb.add_u64_counter(l_mds_dir_commit, "dir_commit", "Directory commit");
mds_plb.add_u64_counter(l_mds_dir_split, "dir_split", "Directory split");
mds_plb.add_u64_counter(l_mds_dir_merge, "dir_merge", "Directory merge");
mds_plb.add_u64(l_mds_inode_max, "inode_max", "Max inodes, cache size");
mds_plb.add_u64(l_mds_inodes_pinned, "inodes_pinned", "Inodes pinned");
mds_plb.add_u64(l_mds_inodes_expired, "inodes_expired", "Inodes expired");
mds_plb.add_u64(l_mds_inodes_with_caps, "inodes_with_caps",
@ -3638,7 +3637,6 @@ const char** MDSRankDispatcher::get_tracked_conf_keys() const
"mds_cache_memory_limit",
"mds_cache_mid",
"mds_cache_reservation",
"mds_cache_size",
"mds_cache_trim_decay_rate",
"mds_cap_revoke_eviction_timeout",
"mds_dump_cache_threshold_file",

View File

@ -51,7 +51,6 @@ enum {
l_mds_dir_commit,
l_mds_dir_split,
l_mds_dir_merge,
l_mds_inode_max,
l_mds_inodes,
l_mds_inodes_top,
l_mds_inodes_bottom,

View File

@ -262,10 +262,9 @@
;debug mds = 20
;debug journaler = 20
# The number of inodes to cache.
# Type: 32-bit Integer
# (Default: 100000)
;mds cache size = 250000
# The memory limit the MDS should enforce for its cache.
# (Default: 1G)
;mds cache memory limit = 2G
;[mds.alpha]
; host = alpha

View File

@ -207,7 +207,7 @@ usage=$usage"\t-N, --not-new: reuse existing cluster config (default)\n"
usage=$usage"\t--valgrind[_{osd,mds,mon,rgw}] 'toolname args...'\n"
usage=$usage"\t--nodaemon: use ceph-run as wrapper for mon/osd/mds\n"
usage=$usage"\t--redirect-output: only useful with nodaemon, directs output to log file\n"
usage=$usage"\t--smallmds: limit mds cache size\n"
usage=$usage"\t--smallmds: limit mds cache memory limit\n"
usage=$usage"\t-m ip:port\t\tspecify monitor address\n"
usage=$usage"\t-k keep old configuration files\n"
usage=$usage"\t-x enable cephx (on by default)\n"
@ -1242,7 +1242,8 @@ if [ "$smallmds" -eq 1 ]; then
wconf <<EOF
[mds]
mds log max segments = 2
mds cache size = 10000
# Default 'mds cache memory limit' is 1GiB, and here we set it to 100MiB.
mds cache memory limit = 100M
EOF
fi