ceph/doc/cephfs/eviction.rst
John Spray 9f53eeb88d doc: add cephfs ENOSPC and eviction information
Adding this at this time to give us a sensible place
to talk about the epoch barrier stuff.  The eviction
stuff will probably get simplified once we add a mon-side
eviction command that handles blacklisting and MDS session
eviction in one go.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-12-16 20:55:24 +00:00

120 lines
4.1 KiB
ReStructuredText

Ceph filesystem client eviction
===============================
When a filesystem client is unresponsive or otherwise misbehaving, it
may be necessary to forcibly terminate its access to the filesystem. This
process is called *eviction*.
This process is somewhat thorough in order to protect against data inconsistency
resulting from misbehaving clients.
OSD blacklisting
----------------
First, prevent the client from performing any more data operations by *blacklisting*
it at the RADOS level. You may be familiar with this concept as *fencing* in other
storage systems.
Identify the client to evict from the MDS session list:
::
# ceph daemon mds.a session ls
[
{ "id": 4117,
"num_leases": 0,
"num_caps": 1,
"state": "open",
"replay_requests": 0,
"reconnecting": false,
"inst": "client.4117 172.16.79.251:0\/3271",
"client_metadata": { "entity_id": "admin",
"hostname": "fedoravm.localdomain",
"mount_point": "\/home\/user\/mnt"}}]
In this case the 'fedoravm' client has address ``172.16.79.251:0/3271``, so we blacklist
it as follows:
::
# ceph osd blacklist add 172.16.79.251:0/3271
blacklisting 172.16.79.251:0/3271 until 2014-12-09 13:09:56.569368 (3600 sec)
OSD epoch barrier
-----------------
While the evicted client is now marked as blacklisted in the central (mon) copy of the OSD
map, it is now necessary to ensure that this OSD map update has propagated to all daemons
involved in subsequent filesystem I/O. To do this, use the ``osdmap barrier`` MDS admin
socket command.
First read the latest OSD epoch:
::
# ceph osd dump
epoch 12
fsid fd61ca96-53ff-4311-826c-f36b176d69ea
created 2014-12-09 12:03:38.595844
modified 2014-12-09 12:09:56.619957
...
In this case it is 12. Now request the MDS to barrier on this epoch:
::
# ceph daemon mds.a osdmap barrier 12
MDS session eviction
--------------------
Finally, it is safe to evict the client's MDS session, such that any capabilities it held
may be issued to other clients. The ID here is the ``id`` attribute from the ``session ls``
output:
::
# ceph daemon mds.a session evict 4117
That's it! The client has now been evicted, and any resources it had locked will
now be available for other clients.
Background: OSD epoch barrier
-----------------------------
The purpose of the barrier is to ensure that when we hand out any
capabilities which might allow touching the same RADOS objects, the
clients we hand out the capabilities to must have a sufficiently recent
OSD map to not race with cancelled operations (from ENOSPC) or
blacklisted clients (from evictions)
More specifically, the cases where we set an epoch barrier are:
* Client eviction (where the client is blacklisted and other clients
must wait for a post-blacklist epoch to touch the same objects)
* OSD map full flag handling in the client (where the client may
cancel some OSD ops from a pre-full epoch, so other clients must
wait until the full epoch or later before touching the same objects).
* MDS startup, because we don't persist the barrier epoch, so must
assume that latest OSD map is always required after a restart.
Note that this is a global value for simplicity: we could maintain this on
a per-inode basis. We don't, because:
* It would be more complicated
* It would use an extra 4 bytes of memory for every inode
* It would not be much more efficient as almost always everyone has the latest
OSD map anyway, in most cases everyone will breeze through this barrier
rather than waiting.
* We only do this barrier in very rare cases, so any benefit from per-inode
granularity would only very rarely be seen.
The epoch barrier is transmitted along with all capability messages, and
instructs the receiver of the message to avoid sending any more RADOS
operations to OSDs until it has seen this OSD epoch. This mainly applies
to clients (doing their data writes directly to files), but also applies
to the MDS because things like file size probing and file deletion are
done directly from the MDS.