mirror of
https://github.com/ceph/ceph
synced 2025-01-03 17:42:36 +00:00
45caf26140
The "session ls" and "session evict" are now "client ls" and "client evict" (the old ones are still there for backwards compatibility). The automatic client eviction now emits cluster logs that call the client by its friendly name (usually the hostname). Signed-off-by: John Spray <john.spray@redhat.com>
146 lines
5.0 KiB
ReStructuredText
146 lines
5.0 KiB
ReStructuredText
|
|
===============================
|
|
Ceph filesystem client eviction
|
|
===============================
|
|
|
|
When a filesystem client is unresponsive or otherwise misbehaving, it
|
|
may be necessary to forcibly terminate its access to the filesystem. This
|
|
process is called *eviction*.
|
|
|
|
Evicting a CephFS client prevents it from communicating further with MDS
|
|
daemons and OSD daemons. If a client was doing buffered IO to the filesystem,
|
|
any un-flushed data will be lost.
|
|
|
|
Clients may either be evicted automatically (if they fail to communicate
|
|
promptly with the MDS), or manually (by the system administrator).
|
|
|
|
The client eviction process applies to clients of all kinds, this includes
|
|
FUSE mounts, kernel mounts, nfs-ganesha gateways, and any process using
|
|
libcephfs.
|
|
|
|
Automatic client eviction
|
|
=========================
|
|
|
|
There are two situations in which a client may be evicted automatically:
|
|
|
|
On an active MDS daemon, if a client has not communicated with the MDS for
|
|
over ``mds_session_autoclose`` seconds (300 seconds by default), then it
|
|
will be evicted automatically.
|
|
|
|
During MDS startup (including on failover), the MDS passes through a
|
|
state called ``reconnect``. During this state, it waits for all the
|
|
clients to connect to the new MDS daemon. If any clients fail to do
|
|
so within the time window (``mds_reconnect_timeout``, 45 seconds by default)
|
|
then they will be evicted.
|
|
|
|
A warning message is sent to the cluster log if either of these situations
|
|
arises.
|
|
|
|
Manual client eviction
|
|
======================
|
|
|
|
Sometimes, the administrator may want to evict a client manually. This
|
|
could happen if a client is died and the administrator does not
|
|
want to wait for its session to time out, or it could happen if
|
|
a client is misbehaving and the administrator does not have access to
|
|
the client node to unmount it.
|
|
|
|
It is useful to inspect the list of clients first:
|
|
|
|
::
|
|
|
|
ceph tell mds.0 client ls
|
|
|
|
[
|
|
{
|
|
"id": 4305,
|
|
"num_leases": 0,
|
|
"num_caps": 3,
|
|
"state": "open",
|
|
"replay_requests": 0,
|
|
"completed_requests": 0,
|
|
"reconnecting": false,
|
|
"inst": "client.4305 172.21.9.34:0/422650892",
|
|
"client_metadata": {
|
|
"ceph_sha1": "ae81e49d369875ac8b569ff3e3c456a31b8f3af5",
|
|
"ceph_version": "ceph version 12.0.0-1934-gae81e49 (ae81e49d369875ac8b569ff3e3c456a31b8f3af5)",
|
|
"entity_id": "0",
|
|
"hostname": "senta04",
|
|
"mount_point": "/tmp/tmpcMpF1b/mnt.0",
|
|
"pid": "29377",
|
|
"root": "/"
|
|
}
|
|
}
|
|
]
|
|
|
|
|
|
|
|
Once you have identified the client you want to evict, you can
|
|
do that using its unique ID, or various other attributes to identify it:
|
|
|
|
::
|
|
|
|
# These all work
|
|
ceph tell mds.0 client evict id=4305
|
|
ceph tell mds.0 client evict client_metadata.=4305
|
|
|
|
|
|
Advanced: Un-blacklisting a client
|
|
==================================
|
|
|
|
Ordinarily, a blacklisted client may not reconnect to the servers: it
|
|
must be unmounted and then mounted anew.
|
|
|
|
However, in some situations it may be useful to permit a client that
|
|
was evicted to attempt to reconnect.
|
|
|
|
Because CephFS uses the RADOS OSD blacklist to control client eviction,
|
|
CephFS clients can be permitted to reconnect by removing them from
|
|
the blacklist:
|
|
|
|
::
|
|
|
|
ceph osd blacklist ls
|
|
# ... identify the address of the client ...
|
|
ceph osd blacklist rm <address>
|
|
|
|
Doing this may put data integrity at risk if other clients have accessed
|
|
files that the blacklisted client was doing buffered IO to. It is also not
|
|
guaranteed to result in a fully functional client -- the best way to get
|
|
a fully healthy client back after an eviction is to unmount the client
|
|
and do a fresh mount.
|
|
|
|
If you are trying to reconnect clients in this way, you may also
|
|
find it useful to set ``client_reconnect_stale`` to true in the
|
|
FUSE client, to prompt the client to try to reconnect.
|
|
|
|
Advanced: Configuring blacklisting
|
|
==================================
|
|
|
|
If you are experiencing frequent client evictions, due to slow
|
|
client hosts or an unreliable network, and you cannot fix the underlying
|
|
issue, then you may want to ask the MDS to be less strict.
|
|
|
|
It is possible to respond to slow clients by simply dropping their
|
|
MDS sessions, but permit them to re-open sessions and permit them
|
|
to continue talking to OSDs. To enable this mode, set
|
|
``mds_session_blacklist_on_timeout`` to false on your MDS nodes.
|
|
|
|
For the equivalent behaviour on manual evictions, set
|
|
``mds_session_blacklist_on_evict`` to false.
|
|
|
|
Note that if blacklisting is disabled, then evicting a client will
|
|
only have an effect on the MDS you send the command to. On a system
|
|
with multiple active MDS daemons, you would need to send an
|
|
eviction command to each active daemon. When blacklisting is enabled
|
|
(the default), sending an eviction to command to just a single
|
|
MDS is sufficient, because the blacklist propagates it to the others.
|
|
|
|
Advanced options
|
|
================
|
|
|
|
``mds_blacklist_interval`` - this setting controls how many seconds
|
|
entries will remain in the blacklist for.
|
|
|
|
|