If we get a CLOSED message on a session, remove/kick any requests on
that session before tearing it down. Otherwise, we get a crash like
2013-09-26 03:51:44.704446 7f4d35a46700 10 client.4111 kick_requests for mds.0
2013-09-26 03:51:45.014156 7f4d35a46700 -1 ./include/xlist.h: In function 'xlist<T>::~xlist() [with T = MetaRequest*]' thread 7f4d35a46700 time 2013-09-26 03:51:44.751908
./include/xlist.h: 69: FAILED assert(_size == 0)
ceph version 0.61.5 (8ee10dc4bb73bdd918873f29c70eedc3c7ef1979)
1: (MetaSession::~MetaSession()+0x425) [0x4e0105]
2: (Client::_closed_mds_session(MetaSession*)+0x116) [0x48a696]
3: (Client::handle_client_session(MClientSession*)+0x2bb) [0x48bf5b]
4: (Client::ms_dispatch(Message*)+0x56b) [0x4bfa0b]
5: (DispatchQueue::entry()+0x3f1) [0x621b31]
6: (DispatchQueue::DispatchThread::entry()+0xd) [0x6191bd]
7: (()+0x7851) [0x7f4d3c168851]
8: (clone()+0x6d) [0x7f4d3b09d90d]
Note that this can happen if we fail to reconnect do an MDS during its
reconnect interval. If that happens, we probably have inodes in our
cache with no caps and things are generally not going to work very well.
This is but one step in improving the situation.
Separate out the two methods since they share little/no behavior.
Signed-off-by: Sage Weil <sage@inktank.com>
When compiling, it met this error:
>In file included from /usr/local/include/fuse/fuse.h:19:0,
> from client/fuse_ll.cc:17:
>/usr/local/include/fuse/fuse_common.h:474:4: error: #error only API
>version 30 or greater is supported
Update FUSE_USE_VERSION from 26 to 30.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Previous patch makes MDS send notification to clients when an inode
is deleted. When receiving a such notification, we invalidate any
dentry link to the deleted inode. If there is no other reference to
the inode, the inode gets trimmed.
For cephfs fuse client, we use fuse_lowlevel_notify_inval_entry() or
fuse_lowlevel_notify_delete() to notify the kernel to trim the deleted
inode. (this is not completely reliable because we play unlink/link
tricks when handle MDS replies. it's difficult to keep the user space
cache and kernel dcache in sync)
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Fun facts:
- fpp = false positive probability
- fpp is a function of insert count only
- at .1% fpp, we pay about 2 bytes per insert
- at 1-2% fpp, we pay about 1 byte per insert
- at 15% fpp, we pay about .5 bytes per insert
Signed-off-by: Sage Weil <sage@inktank.com>
Add new safe_read_file() and safe_write_file() to update files atomically
Used instead of original OSD::read_meta(), OSD::write_meta() they are based on
Used by read_superblock() and write_superblock()
Used by write_version_stamp() and version_stamp_is_valid()
Fixes: #6422
Signed-off-by: David Zafman <david.zafman@inktank.com>
Based on a dialog with Sam ( as published at http://dachary.org/?p=2320 ).
* Remove PGBackend-h.rst because PGBackend.h is now in master.
* Fix typos caught by ispell
* Update recovery links to point to PGBackend recover methods
* Workaround formating warning
developer_notes.rst:3: WARNING: Duplicate explicit target name:
"erasurecodepluginexample" which should be legitimate.
Signed-off-by: Loic Dachary <loic@dachary.org>
* Update to the current state of the ghobject implementaiton and the fact
that they encode the shard_t Although the pool also contains the shard
id, it is less relevant to understand the implementation.
* Update with the erasure code plugin infrastructure and the example
plugin now in master.
* Move jerasure to a separate page to be expanded and link it from the
toc
* Kill the partial read and writes notes as it will probably not be
implemented in the near future. Kill some of the notes because they
are no longer relevant.
* Add a definition for "chunk rank"
* Reword, update schemas, fix typos.
Signed-off-by: Loic Dachary <loic@dachary.org>
Intuition differs regarding the sort order of the ghobject shard and
generation. Document the rationale for the chosen sort order.
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
The get_filestore_key* methods are changed to just call the
corresponding hobject methods instead of providing an identical
implementation.
Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
In order to make this happen, we make the switch to generate the complete
transaction in the generic copy code and save it into the Callback. Then
in finish_copy() we just take that transaction and prepend it to the existing
transaction.
With that change, and by making use of the existing CopyCallback data,
we no longer need to access the CopyOp from the OpContext, so we can remove it.
Hurray, the pipelines are now independent!
Signed-off-by: Greg Farnum <greg@inktank.com>
standby replay may fall behind and get -ENOENT when reading the
journal. return -EAGAIN in this case, it makes the MDS respawn itself.
fixes: #5458
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This is incomplete and unfortunately unusable in its current state:
- it would only set USES_TMAP for old encoded object_info_t and tmapput,
but would NOT set it for tmapup
- a config option turned that off by default.
That means that the mds conversion from tmap -> omap won't be able to use
this because any existing cluster has tmap objects without the USES_TMAP
flag set. And we don't want to unconditionally try a tmap->omap conversion
on omap operations because there are lots of existing librados users out
there that will be negatively impacted by this.
Instead, the MDS will need to handle this conversion on the client side by
reading either tmap or omap objects and explicitly rewriting the content
with omap (while truncating the tmap data away).
The auto-conversion function was added in v0.44.
Signed-off-by: Sage Weil <sage@inktank.com>
ISDIRTY will query whether the dirty flag is set on an object. UNDIRTY
will explicitly clear it. Note that a user doing so will likely run amok
with the caching code.
Signed-off-by: Sage Weil <sage@inktank.com>
We implement enough of the CopyFromCallback that CopyOp no longer needs
a direct reference to the OpContext, so we remove it and replace all
references with calls to cop->cb->complete().
Signed-off-by: Greg Farnum <greg@inktank.com>
We'll start using it in the next commit; eventually we can use the interfaces
we're putting their to replace our link to the OpContext.
Signed-off-by: Greg Farnum <greg@inktank.com>