Commit Graph

29000 Commits

Author SHA1 Message Date
Loic Dachary
ff4887324a ErasureCode: doc updates
* Update to the current state of the ghobject implementaiton and the fact
  that they encode the shard_t Although the pool also contains the shard
  id, it is less relevant to understand the implementation.

* Update with the erasure code plugin infrastructure and the example
  plugin now in master.

* Move jerasure to a separate page to be expanded and link it from the
  toc

* Kill the partial read and writes notes as it will probably not be
  implemented in the near future. Kill some of the notes because they
  are no longer relevant.

* Add a definition for "chunk rank"

* Reword, update schemas, fix typos.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-10-02 15:02:53 +02:00
Loic Dachary
e69baee07d Merge pull request #652 from dachary/wip-ghobjects
common: ghobject sort order & get_filestore_key
2013-10-02 00:24:32 -07:00
Loic Dachary
d1c1f3eb90 common: document ghobject sort order rationale
Intuition differs regarding the sort order of the ghobject shard and
generation. Document the rationale for the chosen sort order.

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-10-02 09:17:09 +02:00
Loic Dachary
16fbdcdf9f common: ghobject get_filestore_key* use hobject counterpart
The get_filestore_key* methods are changed to just call the
corresponding hobject methods instead of providing an identical
implementation.

Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-10-02 09:15:53 +02:00
Yan, Zheng
d2cb2bf6ba mds: return -EAGAIN if standby replay falls behind
standby replay may fall behind and get -ENOENT when reading the
journal. return -EAGAIN in this case, it makes the MDS respawn itself.

fixes: #5458

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-01 21:10:56 -07:00
Sage Weil
fbeabccaf0 os/FileStore: report errors from _crc_load_... and _crc_save
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 21:07:49 -07:00
Sage Weil
895939f093 Merge pull request #671 from ceph/wip-tmap
remove tmap->omap auto-upgrade

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-10-01 18:01:04 -07:00
Sage Weil
84c028674a rados: add 'tmap-to-omap' command
Explicitly convert tmap object data to omap keys.  Removes the old tmap
content at the same time.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 17:21:21 -07:00
Sage Weil
20974dc052 rados: make 'tmap dump' gracefully handle non-tmap data
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 17:21:21 -07:00
Sage Weil
a9e5323586 osd: remove magical tmap -> omap conversion
This is incomplete and unfortunately unusable in its current state:

 - it would only set USES_TMAP for old encoded object_info_t and tmapput,
   but would NOT set it for tmapup
 - a config option turned that off by default.

That means that the mds conversion from tmap -> omap won't be able to use
this because any existing cluster has tmap objects without the USES_TMAP
flag set.  And we don't want to unconditionally try a tmap->omap conversion
on omap operations because there are lots of existing librados users out
there that will be negatively impacted by this.

Instead, the MDS will need to handle this conversion on the client side by
reading either tmap or omap objects and explicitly rewriting the content
with omap (while truncating the tmap data away).

The auto-conversion function was added in v0.44.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 17:21:21 -07:00
Sage Weil
1db0a572c1 Merge pull request #675 from ceph/wip-osd-dirty
osd: add a dirty flag for objects.

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-10-01 17:15:25 -07:00
Sage Weil
0d610926d7 osd: add ISDIRTY, UNDIRTY rados operations
ISDIRTY will query whether the dirty flag is set on an object.  UNDIRTY
will explicitly clear it.  Note that a user doing so will likely run amok
with the caching code.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 17:04:44 -07:00
Sage Weil
dcd475dd57 osdc/Objecter: fix return value for copy_get
We should return the return code even when we don't have an encoding error!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 16:48:00 -07:00
Sage Weil
7e3084eb17 osd/ReplicatedPG: mark objects dirty in make_writeable()
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 16:24:20 -07:00
Sage Weil
d42d2b97cf osd/osd_types: object_info_t::get_flag_string()
Stop adding these ad-hoc to the operator<<.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 16:23:21 -07:00
Sage Weil
a0ed9c2004 osd/osd_types: add object_info_t::FLAG_DIRTY
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 16:19:48 -07:00
Dan Mick
f97772aa03 Merge pull request #673 from liewegas/wip-usage
make rbd, rados bad command errors more friendly
2013-10-01 16:19:09 -07:00
Sage Weil
ece11f4a82 Merge remote-tracking branch 'gh/next'
Conflicts:
	PendingReleaseNotes
2013-10-01 16:01:24 -07:00
Sage Weil
9b7a2ae329 crush: invalidate rmap on create (and thus decode)
If we have an existing CrushWrapper object and decode from a bufferlist,
reset build_rmaps so that they get rebuilt.

Remove the build_rmaps() all in decode that was useless on a redecode
(because have_rmaps == true in that case and it did nothing).

Fixes: #6442
Backport: dumpling, maybe cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-10-01 15:53:42 -07:00
Sage Weil
011bff3405 osd/osd_types: bump encoding from 11 -> 12
Meant to do this in a1b82f2a56 or
d421b66293 but forgot!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 15:32:59 -07:00
Sage Weil
62cc39866a rbd: be helpful with invalid command
$ rbd asdf
rbd: error parsing command 'asdf'; -h or --help for usage

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 15:29:52 -07:00
Sage Weil
8e33d331e3 rados: do not dump usage on invalid command
I hate this; it makes it impossible to see that there was an error message.

We made this same change a while back with rbd.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 15:29:52 -07:00
Alfredo Deza
26235e4984 Merge pull request #666 from dmick/wip-6384
ceph.in: clean up error message when missing required parameter
2013-10-01 15:28:29 -07:00
Sage Weil
0459dc4f46 Merge pull request #670 from ceph/wip-osd-whiteout
osd: add basic whiteout infrastructure

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-10-01 14:18:52 -07:00
Sage Weil
d421b66293 osd/osd_types: convert object_info_t::uses_tmap to a flag
Treat the second encoded bool as bits 9-16 of a (now) 16-bit flags field,
and use bit 9 (what used to be set by the use_tmap bool) as FLAG_USES_TMAP.

No encoding compatibility change.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 14:17:59 -07:00
Sage Weil
1aa606711e osd/ReplicatedPG: update all find_object_context() users to handle whiteouts
In each case, we treat the whiteout as if we got an ENOENT.

We do not change the semantics of bool exists to avoid breaking lots of
potentially fragile code.  We are only interested in changing the
user-visible behavior of the object, not the way it is internally stored
or managed.

This will likely be refined as we grow acutal users for whiteoutes in the
pool caching code.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 14:17:59 -07:00
Sage Weil
ea65b5a5d3 osd/osd_types: add WHITEOUT flag to object_info_t
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 14:17:58 -07:00
Sage Weil
a1b82f2a56 osd/osd_types: replace bool lost with a flags field
This is more generic.  We could also fold uses_tmap flag into here,
but the encoding change for that is non-trivial.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 14:17:58 -07:00
Dan Mick
1bdc3f7034 Add unit_to_bytesize test for 'k' on input; continues fix for #4612
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-10-01 13:22:11 -07:00
Sage Weil
399f1d53f7 Merge pull request #669 from ceph/wip-6443
ReplicatedPG: don't bless C_OSD_SendMessageOnConn

Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-01 12:40:36 -07:00
Samuel Just
334f655c27 ReplicatedPG: don't bless C_OSD_SendMessageOnConn
C_OSD_SendMessageOnConn doesn't need to lock the pg.
Canceling it resulted in a leaked message.

Fixes: 6443
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-10-01 12:22:02 -07:00
Sage Weil
a9df335b12 msgr: debug delay_thread join
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 12:04:42 -07:00
Dan Mick
b43bc1a0b0 Use 'k' when printing 'kilo'; accept either 'K' or 'k' as input
Fixes: #4612
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-01 10:50:08 -07:00
Sage Weil
cb43abda89 Merge pull request #659 from ceph/wip-objecter-notier
Wip objecter notier

Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-01 10:41:42 -07:00
Josh Durgin
c415d46e01 Merge pull request #668 from liewegas/wip-cache-stall
osdc/ObjectCacher: limit writeback IOs generated while holding lock
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-10-01 10:24:10 -07:00
Gregory Farnum
bf4234c0be Merge pull request #663 from ceph/wip-cancel-copy
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-10-01 10:23:17 -07:00
Sage Weil
3d062c2a23 rbd: fix cli test
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 10:02:40 -07:00
Sage Weil
cce990efc8 osdc/ObjectCacher: limit writeback IOs generated while holding lock
While analyzing a log from Mike Dawson I saw a long stall while librbd's
objectcacher was starting lots (many hundreds) of IOs.  Limit the amount of
time we spend doing this at a time to allow IO replies to be processed so
that the cache remains responsive.

I'm not sure this warrants a tunable (which we would need to add for both
libcephfs and librbd).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 09:28:29 -07:00
Yehuda Sadeh
055e31359a rgw: quiet down warning message
Fixes: #6123
We don't want to know about failing to read region map info
if it's not found, only if failed on some other error. In
any case it's just a warning.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-10-01 09:22:00 -07:00
Sage Weil
75b94ba295 osd/ReplicatedPG: fix iterator corruption in cancel_copy_ops()
The cancel_copy() method removes the entry from copy_ops.  Move the
iterator forward before calling.

Fixes segfault when thrashing osds with a copy-from workload.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 08:56:12 -07:00
Josh Durgin
4f3487a400 Merge pull request #664 from ceph/wip-6445
Wip 6445
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-09-30 21:09:24 -07:00
Dan Mick
3452aadd81 ceph_argparse.py: clean up error reporting when required param missing
Treat "need 1, got 0" as a special case, and change the message to
"missing required parameter <x>".  Also, when failing for that reason,
print the command concise description and its helptext.

Fixes: #6384
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-09-30 21:09:16 -07:00
Josh Durgin
988373baaf Merge pull request #665 from ceph/wip-6444
Wip 6444
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-09-30 21:07:42 -07:00
Guangliang Zhao
409aba6ff5 rbd.cc: add readonly option for "rbd map"
The device could only be set to rw(default) when mapping
now. This patch only handle the user space, because the
kernel part has been completed.

Signed-off-by: Guangliang Zhao <guangliang@unitedstack.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-09-30 20:34:53 -07:00
Yehuda Sadeh
b032931dc7 PendingReleaseNotes: update regarding librados change
Fix related to issue #6444

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-09-30 16:30:25 -07:00
Sage Weil
56711370c3 Merge pull request #660 from ceph/wip-fs-crc
sloppy / opportunistic CRC tracking in the filestore

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-09-30 16:21:29 -07:00
Sage Weil
b245ca151b os/FileStore: add sloppy crc tracking
Opportunistically track CRCs for data we write and verify it for data
we read.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-30 16:21:17 -07:00
Yehuda Sadeh
8912462f0c rgw: drop async pool create completion reference
Fixes: #6444
Backport: dumpling
If pool creation fails (e.g., due to -EEXIST) then we leak the
completion object. Earlier we couldn't just drop the reference, as
librados have already removed the internal completion object. This fix
drop the completion reference even if got an error, which is now
possible.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-09-30 15:44:25 -07:00
Yehuda Sadeh
46057925a8 librados: pool async create / delete does not delete completion handle
Backport: dumpling
The pool async delete / create function used to delete the internal
completion object. However, caller still holds the allocated completion
object, which it can't drop a reference to (as it'd try to deallocate
the already freed internal object). This fix removes the internal object
deletion, a following commit will fix a related leak (#6444) by having
the application (radosgw) drop the reference even if got an error.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-09-30 15:40:02 -07:00
Alfredo Deza
fac4a897f9 Merge pull request #662 from dmick/next
Invoke python with /usr/bin/env python instead of directly
2013-09-30 15:28:54 -07:00