Commit Graph

29179 Commits

Author SHA1 Message Date
Sage Weil
721f1703a8 client: remove requests from closed MetaSession
If we get a CLOSED message on a session, remove/kick any requests on
that session before tearing it down.  Otherwise, we get a crash like

2013-09-26 03:51:44.704446 7f4d35a46700 10 client.4111 kick_requests for mds.0
2013-09-26 03:51:45.014156 7f4d35a46700 -1 ./include/xlist.h: In function 'xlist<T>::~xlist() [with T = MetaRequest*]' thread 7f4d35a46700 time 2013-09-26 03:51:44.751908
./include/xlist.h: 69: FAILED assert(_size == 0)

 ceph version 0.61.5 (8ee10dc4bb73bdd918873f29c70eedc3c7ef1979)
 1: (MetaSession::~MetaSession()+0x425) [0x4e0105]
 2: (Client::_closed_mds_session(MetaSession*)+0x116) [0x48a696]
 3: (Client::handle_client_session(MClientSession*)+0x2bb) [0x48bf5b]
 4: (Client::ms_dispatch(Message*)+0x56b) [0x4bfa0b]
 5: (DispatchQueue::entry()+0x3f1) [0x621b31]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x6191bd]
 7: (()+0x7851) [0x7f4d3c168851]
 8: (clone()+0x6d) [0x7f4d3b09d90d]

Note that this can happen if we fail to reconnect do an MDS during its
reconnect interval.  If that happens, we probably have inodes in our
cache with no caps and things are generally not going to work very well.
This is but one step in improving the situation.

Separate out the two methods since they share little/no behavior.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:42:43 -07:00
majianpeng
63f5814855 ceph: Update FUSE_USE_VERSION from 26 to 30.
When compiling, it met this error:
>In file included from /usr/local/include/fuse/fuse.h:19:0,
>                 from client/fuse_ll.cc:17:
>/usr/local/include/fuse/fuse_common.h:474:4: error: #error only API
>version 30 or greater is supported
Update FUSE_USE_VERSION from 26 to 30.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
2013-10-02 14:42:43 -07:00
Yan, Zheng
f8a947d920 client: trim deleted inode
Previous patch makes MDS send notification to clients when an inode
is deleted. When receiving a such notification, we invalidate any
dentry link to the deleted inode. If there is no other reference to
the inode, the inode gets trimmed.

For cephfs fuse client, we use fuse_lowlevel_notify_inval_entry() or
fuse_lowlevel_notify_delete() to notify the kernel to trim the deleted
inode. (this is not completely reliable because we play unlink/link
tricks when  handle MDS replies. it's difficult to keep the user space
cache and kernel dcache in sync)

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-10-02 14:42:42 -07:00
Loic Dachary
d3ba8da597 Merge pull request #682 from ceph/wip-copying
sync up COPYING and debian/copyright
2013-10-02 14:36:41 -07:00
Sage Weil
65ae9b8aeb COPYING: fix URL
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:30:19 -07:00
Sage Weil
11461cbeef debian/copyright: sync up with COPYING
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:18:07 -07:00
Sage Weil
1a56fe9935 COPYING: add Packaging: section
Again, debian-specific, but who cares.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:16:19 -07:00
Sage Weil
e70ea84cb9 COPYING: add debian-style headers
This may not be necessary here, but it makes this identical to the
debian/copyright file, which is a win.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:15:13 -07:00
Sage Weil
fea12e21e8 COPYING: fix formatting
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:14:48 -07:00
Sage Weil
a2e175bf10 COPYING: make note of common/bloom_filer.hpp (boost) license
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:09:13 -07:00
Sage Weil
f31d691275 common/bloom_filter: fix whitespace
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:09:12 -07:00
Sage Weil
fdb8b0d8ff common/bloom_filter: test behavior of sequences of bloom filters
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:09:12 -07:00
Sage Weil
f1584fb05c common/bloom_filter: unit tests
Fun facts:

- fpp = false positive probability
- fpp is a function of insert count only
- at .1% fpp, we pay about 2 bytes per insert
- at 1-2% fpp, we pay about 1 byte per insert
- at 15% fpp, we pay about .5 bytes per insert

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-02 14:09:12 -07:00
Alfredo Deza
8fb7a47601 Merge pull request #678 from ceph/wip-5981
ceph-disk: make initial journal files 0 bytes
2013-10-02 11:13:43 -07:00
Sage Weil
348890232d Merge pull request #649 from ceph/wip-6422
#6422

Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-02 10:51:29 -07:00
athanatos
b822373afd Merge pull request #620 from dachary/wip-erasure-doc
ErasureCode: doc updates
2013-10-02 10:40:09 -07:00
Sage Weil
58d0a1f0df Merge pull request #677 from ceph/wip-store-tool
wip-store-tool: Few patches to ceph_test_store_tool

Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-02 10:16:21 -07:00
Sage Weil
73409ef5ae Merge pull request #680 from dmick/next
mon/PGMap.cc: don't output header for pg dump_stuck if nothing stuck

Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-02 10:14:15 -07:00
David Zafman
8835ef8f98 common, os, osd: Use common functions for safe file reading and writing
Add new safe_read_file() and safe_write_file() to update files atomically
Used instead of original OSD::read_meta(), OSD::write_meta() they are based on
Used by read_superblock() and write_superblock()
Used by write_version_stamp() and version_stamp_is_valid()

Fixes: #6422

Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-10-02 10:11:43 -07:00
David Zafman
c0cbd9aa5e osd: In read_meta() leave an extra byte in buffer to nul terminate
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-10-02 10:10:15 -07:00
Loic Dachary
238a303cff ErasureCode: update PGBackend description
Based on a dialog with Sam ( as published at http://dachary.org/?p=2320 ).

* Remove PGBackend-h.rst because PGBackend.h is now in master.

* Fix typos caught by ispell

* Update recovery links to point to PGBackend recover methods

* Workaround formating warning
  developer_notes.rst:3: WARNING: Duplicate explicit target name:
  "erasurecodepluginexample" which should be legitimate.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-10-02 15:58:01 +02:00
Loic Dachary
ff4887324a ErasureCode: doc updates
* Update to the current state of the ghobject implementaiton and the fact
  that they encode the shard_t Although the pool also contains the shard
  id, it is less relevant to understand the implementation.

* Update with the erasure code plugin infrastructure and the example
  plugin now in master.

* Move jerasure to a separate page to be expanded and link it from the
  toc

* Kill the partial read and writes notes as it will probably not be
  implemented in the near future. Kill some of the notes because they
  are no longer relevant.

* Add a definition for "chunk rank"

* Reword, update schemas, fix typos.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-10-02 15:02:53 +02:00
Loic Dachary
e69baee07d Merge pull request #652 from dachary/wip-ghobjects
common: ghobject sort order & get_filestore_key
2013-10-02 00:24:32 -07:00
Loic Dachary
d1c1f3eb90 common: document ghobject sort order rationale
Intuition differs regarding the sort order of the ghobject shard and
generation. Document the rationale for the chosen sort order.

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-10-02 09:17:09 +02:00
Loic Dachary
16fbdcdf9f common: ghobject get_filestore_key* use hobject counterpart
The get_filestore_key* methods are changed to just call the
corresponding hobject methods instead of providing an identical
implementation.

Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-10-02 09:15:53 +02:00
Dan Mick
2d7dced184 mon/PGMap.cc: don't output header for pg dump_stuck if nothing stuck
Formatted output is already correct (no header)

Fixes: #4577
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-10-01 22:23:24 -07:00
Greg Farnum
d29be45319 ReplicatedPG: rename finish_copy -> finish_copyfrom
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-01 21:52:11 -07:00
Greg Farnum
a96b12f03a ReplicatedPG: copy: use CopyCallback instead of CopyOp in OpContext
In order to make this happen, we make the switch to generate the complete
transaction in the generic copy code and save it into the Callback. Then
in finish_copy() we just take that transaction and prepend it to the existing
transaction.
With that change, and by making use of the existing CopyCallback data,
we no longer need to access the CopyOp from the OpContext, so we can remove it.
Hurray, the pipelines are now independent!

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-01 21:52:11 -07:00
Yan, Zheng
d2cb2bf6ba mds: return -EAGAIN if standby replay falls behind
standby replay may fall behind and get -ENOENT when reading the
journal. return -EAGAIN in this case, it makes the MDS respawn itself.

fixes: #5458

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-01 21:10:56 -07:00
Sage Weil
fbeabccaf0 os/FileStore: report errors from _crc_load_... and _crc_save
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 21:07:49 -07:00
Sage Weil
895939f093 Merge pull request #671 from ceph/wip-tmap
remove tmap->omap auto-upgrade

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-10-01 18:01:04 -07:00
Joao Eduardo Luis
dfea81e77a ceph_test_store_tool: add 'set prefix key' feature
Allow reading from a file.  See --help for more info.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-10-02 01:30:19 +01:00
Joao Eduardo Luis
398249a05f test: test_store_tool: optionally output value crc when listing keys
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-10-02 01:28:22 +01:00
Joao Eduardo Luis
18fcd91319 test: test_store_tool: add 'crc <prefix> <key>' command
Returns the CRC of contents for a given key with a given prefix.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-10-02 01:28:22 +01:00
Sage Weil
84c028674a rados: add 'tmap-to-omap' command
Explicitly convert tmap object data to omap keys.  Removes the old tmap
content at the same time.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 17:21:21 -07:00
Sage Weil
20974dc052 rados: make 'tmap dump' gracefully handle non-tmap data
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 17:21:21 -07:00
Sage Weil
a9e5323586 osd: remove magical tmap -> omap conversion
This is incomplete and unfortunately unusable in its current state:

 - it would only set USES_TMAP for old encoded object_info_t and tmapput,
   but would NOT set it for tmapup
 - a config option turned that off by default.

That means that the mds conversion from tmap -> omap won't be able to use
this because any existing cluster has tmap objects without the USES_TMAP
flag set.  And we don't want to unconditionally try a tmap->omap conversion
on omap operations because there are lots of existing librados users out
there that will be negatively impacted by this.

Instead, the MDS will need to handle this conversion on the client side by
reading either tmap or omap objects and explicitly rewriting the content
with omap (while truncating the tmap data away).

The auto-conversion function was added in v0.44.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 17:21:21 -07:00
Sage Weil
1db0a572c1 Merge pull request #675 from ceph/wip-osd-dirty
osd: add a dirty flag for objects.

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-10-01 17:15:25 -07:00
Sage Weil
0d610926d7 osd: add ISDIRTY, UNDIRTY rados operations
ISDIRTY will query whether the dirty flag is set on an object.  UNDIRTY
will explicitly clear it.  Note that a user doing so will likely run amok
with the caching code.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 17:04:44 -07:00
Greg Farnum
da1b9b6c10 ReplicatedPG: copy: implement CopyFromCallback::finish, remove CopyOp::ctx
We implement enough of the CopyFromCallback that CopyOp no longer needs
a direct reference to the OpContext, so we remove it and replace all
references with calls to cop->cb->complete().

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-01 16:53:35 -07:00
Greg Farnum
613841a670 ReplicatedPG: copy: add CopyCallback pointer to CopyOp, and set it up
We'll start using it in the next commit; eventually we can use the interfaces
we're putting their to replace our link to the OpContext.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-01 16:51:33 -07:00
Greg Farnum
0b472766f1 ReplicatedPG: copy: start defining CopyCallback structures
Outline the basic interfaces we're going to use, and implement
the more obvious ones.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-01 16:50:56 -07:00
Sage Weil
dcd475dd57 osdc/Objecter: fix return value for copy_get
We should return the return code even when we don't have an encoding error!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 16:48:00 -07:00
Greg Farnum
1784ef96f4 ReplicatedPG: copy: split up the transaction generation from the PG management
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-01 16:29:46 -07:00
Greg Farnum
010ff3759e ReplicatedPG: copy: specify the temp_oid in the caller
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-01 16:29:44 -07:00
Greg Farnum
1ae8ef28e7 ReplicatedPG: copy: take an ObjectContextRef in start_copy and use that
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-10-01 16:28:51 -07:00
Sage Weil
7e3084eb17 osd/ReplicatedPG: mark objects dirty in make_writeable()
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 16:24:20 -07:00
Sage Weil
d42d2b97cf osd/osd_types: object_info_t::get_flag_string()
Stop adding these ad-hoc to the operator<<.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 16:23:21 -07:00
Sage Weil
a0ed9c2004 osd/osd_types: add object_info_t::FLAG_DIRTY
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 16:19:48 -07:00
Dan Mick
f97772aa03 Merge pull request #673 from liewegas/wip-usage
make rbd, rados bad command errors more friendly
2013-10-01 16:19:09 -07:00