Commit Graph

27 Commits

Author SHA1 Message Date
Jeff Layton
5ab91d53a1 qa: remove REQUIRE_KCLIENT_REMOTE
Nothing references this variable anymore since commit 2df7caae4b (qa:
remove obsolete test).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2021-10-19 11:07:09 -04:00
Patrick Donnelly
0e6f238ce1
Merge PR #37618 into master
* refs/pull/37618/head:
	mds: throttle cap acquisition via readdir

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2020-11-06 20:51:06 -08:00
Kotresh HR
c0de657d3f mds: throttle cap acquisition via readdir
A trivial "find" command on a large directory hierarchy will cause the
client to receive caps significantly faster than it will release. The
MDS will try to have the client reduce its caps below the
mds_max_caps_per_client limit but the recall throttles prevent it from
catching up to the pace of acquisition. The solution is to throttle
readdir from client. This patch does the same.

The readdir is throttled on the condition that the number of caps
acquired is greater than certain percentage of mds_max_caps_per_client
(default is 10%) and cap acquisition via readdir is certain percentage
of mds_max_caps_per_client (the default is 50%). When the above
condition is met, the readdir request is retried after
'mds_cap_acquisition_throttle_retry_request_timeout' (default is 0.5)
seconds.

Fixes: https://tracker.ceph.com/issues/47307
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2020-10-22 18:56:43 +05:30
Xiubo Li
def177ff3b qa/tasks: switch to _kill_background() helper to terminate the daemons
Fixes: https://tracker.ceph.com/issues/46883
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2020-10-21 04:01:35 -04:00
Patrick Donnelly
f4fc138849
qa: add tests for mds_min_caps_working_set
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-08-06 19:17:03 -07:00
Patrick Donnelly
edc5c14d1c
qa: use config_set/config_get
It's simpler and does not require MDS restarts.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-08-06 19:17:03 -07:00
Patrick Donnelly
ac6c150eb0
qa: do not append file names to dirname
Otherwise the files generated are not actually under the sub-directory!
This is correcting a confusing aspect of the test infrastructure but
doesn't actually require any changes to the tests.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-08-06 19:17:02 -07:00
Xiubo Li
5c24d91327 qa/tasks/cephfs: add mount_wait() support to simplify the code
Mostly we should wait the mountpoint to get ready, especially for
the fuse mountpoint, sometimes it may take a few seconds to get
ready.

Fixes: https://tracker.ceph.com/issues/44044
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2020-04-14 07:47:04 -04:00
Kefu Chai
162be92106 qa/tasks/cephfs: cast mds_recall_warning_decay_rate to float
this change should address the failure of
```
2020-04-05T15:14:23.088 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-04-05T15:14:23.088 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_tchaikov_ceph_wip-qa-py3/qa/tasks/cephfs/test_client_limits.py", line 110, in test_client_pin_mincaps
2020-04-05T15:14:23.089 INFO:tasks.cephfs_test_runner:    self._test_client_pin(True, 200)
2020-04-05T15:14:23.089 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_tchaikov_ceph_wip-qa-py3/qa/tasks/cephfs/test_client_limits.py", line 71, in _test_client_pin
2020-04-05T15:14:23.090 INFO:tasks.cephfs_test_runner:    self.wait_for_health("MDS_CLIENT_RECALL", mds_recall_warning_decay_rate*2)
2020-04-05T15:14:23.091 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_tchaikov_ceph_wip-qa-py3/qa/tasks/ceph_test_case.py", line 152, in wait_for_health
2020-04-05T15:14:23.091 INFO:tasks.cephfs_test_runner:    self.wait_until_true(seen_health_warning, timeout)
2020-04-05T15:14:23.092 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_tchaikov_ceph_wip-qa-py3/qa/tasks/ceph_test_case.py", line 193, in wait_until_true
2020-04-05T15:14:23.093 INFO:tasks.cephfs_test_runner:    if elapsed >= timeout:
2020-04-05T15:14:23.093 INFO:tasks.cephfs_test_runner:TypeError: unorderable types: int() >= str()
```

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-07 21:51:22 +08:00
Kefu Chai
2089bf04b9 qa/tasks: use "a // b" instead of "a / b"
for expressions where the value is expected to be integer. as in
python3, `a / b` returns a float.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-07 20:33:47 +08:00
Patrick Donnelly
59f641e295
qa: reduce cache size further
1M isn't low enough to trigger recall/trimming.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-12-10 13:24:50 -08:00
Ramana Raja
aeaef1b4c5 mds: obsoleting 'mds_cache_size'
Remove last bits of support for 'mds_cache_size'.
'mds_cache_memory_limit' is preferred.

Fixes: https://tracker.ceph.com/issues/41951
Signed-off-by: Ramana Raja <rraja@redhat.com>
2019-12-02 14:51:25 +05:30
Patrick Donnelly
1071f73c76
qa: use skipTest method instead of exception
This is the recommended method to skip a test according to [1]. It also lets us
avoid an unnecessary import.

[1] https://docs.python.org/2/library/unittest.html#unittest.TestCase.skipTest

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-24 09:38:52 -07:00
Yan, Zheng
8e81bd74c5 qa/cephfs: relax min_caps_per_client check
new kernel client proactively release caps. caps count can go below
mds_min_caps_per_client

Fixes: http://tracker.ceph.com/issues/38270
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2019-03-07 21:32:20 +08:00
Patrick Donnelly
c0b3a11484
mds: simplify recall warnings
Instead of a timeout and complicated decisions about whether the client is
releasing caps in an expeditious fashion, just use a DecayCounter that tracks
the number of caps we've recalled. This counter is decremented whenever the
client releases caps. If the counter passes a threshold, then we raise the
warning.

Similar reworking is done for the steady-state recall of client caps. Another
release DecayCounter is added so we can tell when the client is not releasing
any more caps.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-31 12:07:54 -08:00
Patrick Donnelly
30aaa884bf
qa: test mds_max_caps_per_client conf
That the MDS will not let a client sit above mds_max_caps_per_client caps.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-29 15:16:31 -08:00
Patrick Donnelly
ef46216d8d
mds: recall caps incrementally
As with trimming, use DecayCounters to throttle the number of caps we recall,
both globally and per-session.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-01-29 15:16:30 -08:00
Patrick Donnelly
67ca6cd229
mds: obsolete MDSMap option configs
These configs were used for initialization but it is more appropriate to
require setting these file system attributes via `ceph fs set`. This is similar
to what was already done with max_mds. There are new variables added for `fs
set` where missing.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-12-13 18:30:52 -08:00
Jeff Layton
3321cc7b37 mds: fold mds_revoke_cap_timeout into mds_session_timeout
Right now, we have two different timeout settings -- one for when the
client is just not responding at all (mds_session_timeout), and one for
when the client is otherwise responding but isn't returning caps in a
timely fashion (mds_cap_revoke_timeout).

The default settings on them are equivalent (60s), but only the
mds_session_timeout is communicated via the mdsmap. The
mds_cap_revoke_timeout is known only to the MDS. Neither timeout results
in anything other than warnings in the current codebase.

There is also a third setting (mds_session_autoclose) that is also
communicated via the MDSmap. Exceeding that value (default of 300s)
could eventually result in the client being blacklisted from the
cluster. The code to implement that doesn't exist yet, however.

The current codebase doesn't do any real sanity checking of these
timeouts, so the potential for admins to get them wrong is rather high.
It's hard to concoct a use-case where we'd want to warn about these
events at different intervals.

Simplify this by just removing the mds_cap_revoke_timeout setting, and
replace its use in the code with the mds_session_timeout. With that, the
client can at least determine when warnings might start showing up in
the MDS' logs.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-11-14 07:27:01 -05:00
Patrick Donnelly
b37c7f7db7
qa: relax cap expected value check
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-09-29 08:48:14 -07:00
Patrick Donnelly
538834171f
mds: cap client recall to min caps per client
Fixes: http://tracker.ceph.com/issues/21575

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-09-28 15:55:57 -07:00
Patrick Donnelly
06c94de584
mds: support limiting cache by memory
This introduces two config parameters:

    mds_cache_memory_limit: Sets the soft maximum of the cache to the given
    byte count. (Like mds_cache_size, this doesn't actually limit the maximum
    size of the cache. It just dictates the steady-state size.)

    mds_cache_reservation: This replaces mds_health_cache_threshold everywhere
    except the Beacon heartbeat sent to the mons. The idea here is to specify a
    reservation of memory (5% by default) for operations and the MDS tries to
    always maintain that reservation. So, the MDS will recall caps from clients
    when it begins dipping into its reservation of memory.

mds_cache_size still limits the cache by Inode count but is now by-default 0
(i.e. unlimited). The new preferred way of specifying cache limits is by memory
size. The default is 1GB.

Fixes: http://tracker.ceph.com/issues/20594
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1464976

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-09-12 20:02:41 -07:00
Patrick Donnelly
ced01a2335
qa: fix wait for wrong health message
Fixes: http://tracker.ceph.com/issues/20805

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-27 14:40:05 -07:00
Patrick Donnelly
f8e0571982
qa: fix MDS_CLIENT_RECALL copy error
Fixes: http://tracker.ceph.com/issues/20682

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-18 16:06:20 -07:00
Yan, Zheng
e4844706b0 qa/cephfs: don't use int() to convert string of float point number
Fixes: http://tracker.ceph.com/issues/20582
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-07-13 15:55:22 +08:00
Sage Weil
25717f7e84 qa/tasks/ceph_test_case.py: update health check helpers
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-12 12:52:03 -04:00
Sage Weil
c01f2ee0e2 move ceph-qa-suite dirs into qa/ 2016-12-14 11:29:55 -06:00