Commit Graph

22918 Commits

Author SHA1 Message Date
Sage Weil
07b36992da mds: move from EXCL to SYNC if nobody wants to write
We were moving to the MIX even if nobody wanted to write; that is not
useful, since if we only want to read SYNC will let us cache those reads.
SYNC is also a more friendly place (all things equal) to be.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-07 10:18:40 -10:00
Sam Lang
636048db61 mds/locker: Add debugging for excl->mix trans
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-07 10:18:40 -10:00
Sam Lang
fa5a46c75e test/libcephfs: Add a test for validating caps
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-07 10:18:35 -10:00
Sam Lang
10bf150990 client: Add routine to get caps of file/fd
In order to properly validate the client capabilities,
we need to be able to access them from libcephfs.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-07 10:18:26 -10:00
Josh Durgin
efc6614883 librbd: change internal order parameter to pass-by-value
It doesn't change in any of these places.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-07 10:37:51 -08:00
Josh Durgin
57d5c69985 librbd: clean up after errors in create
Split format 1 and 2 image creation into separate functions for better
readability. Format 2 requires more error handling.

Fixes: #2677
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-07 10:37:51 -08:00
Josh Durgin
c1bf2291e8 librbd: bump version for new functions
copy2, clone2, and create3 are new.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-07 10:37:51 -08:00
Joao Eduardo Luis
bc6f726825 mon: PGMonitor: erase entries from 'creating_pgs_by_osd' when set is empty
This patch avoids sending empty MOSDPGCreate's every tick.

Fixes: #3571

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-07 04:16:15 -08:00
Sage Weil
f81d720766 doc/install/os-recommendations: fix syncfs notes
For argonaut, squeeze and wheezy lack syncfs.

For bobtail, only older kernels are problematic; we don't depend on glibc
support.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-07 04:10:02 -08:00
Sage Weil
4d43c86389 doc: fix bobtail version in os-recommendations
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-07 04:09:16 -08:00
Joao Eduardo Luis
e1c27fe178 mon: Monitor: rework 'paxos' to a list instead of a vector
After adding the gv patches, during Monitor::recovered_leader() we started
waking up contexts following the order of the 'paxos' vector. However,
given that the mdsmon has a forgotten dependency on the osdmon paxos
machine, we were incurring in a situation in which we proposed a value
through the osdmon before creating a new pending value (but by being
active, the mdsmon would go through with it nonetheless).

This is easily fixed by making sure that the mdsmon callbacks are only
awaken *after* the osdmon has been taken care of.

Fixes: #3495

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-07 04:04:14 -08:00
Sage Weil
58f6798f3d Merge branch 'testing' into next 2012-12-07 04:00:22 -08:00
Sage Weil
533f847ce2 Merge remote-tracking branch 'gh/wip_doc' 2012-12-07 03:25:04 -08:00
Samuel Just
27071f3bc2 OSD: store current pg epoch in info and load at that epoch
Prior to split, this did not matter.  With split, however, it's
crucial that a pg go through advance_pg() for the map causing
the split.  During operation, a PG lags the OSD superblock
epoch.  If the OSD dies after the OSD epoch passes the split
but before the pg epoch passes the split, the PG will be
reloaded at the OSD epoch and won't see the split operation.
The PG collection might after that point contain incorrect
objects which should have been split into a child.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:53:07 -08:00
Samuel Just
9f169ac0f5 OSD: account for split in project_pg_history
split causes a new interval.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:53:07 -08:00
Samuel Just
15d899370f PG: update info.last_update_started in split_into
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:53:07 -08:00
Samuel Just
338f3688b0 OSDMonitor: require --allow-experimental-feature to increase pg_num
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:53:06 -08:00
Samuel Just
fb738506f6 PG: set child up/acting in split_into
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:53:06 -08:00
Samuel Just
3f412e88fa OSD: do _remove_pg in add_newly_split_pg is pool if gone
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:52:54 -08:00
Samuel Just
19e6861daf osd/: dirty info and log on child during split
Otherwise, the log may not get written out.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:51:56 -08:00
Samuel Just
9835e19015 osd/: mark info.stats as invalid after split, fix in scrub
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:51:56 -08:00
Samuel Just
5f8a3634c4 PG: split ops for child objects into child
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:51:56 -08:00
Samuel Just
9981bee565 OSD: add initial split support
PGs are split after updating to the map on which they split.
OSD::activate_map populates the set of currently "splitting"
pgs.  Messages for those pgs are delayed until the split
is complete.  We add the newly split children to pg_map
once the transaction populating their on-disk state completes.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-06 22:51:52 -08:00
Samuel Just
58890cfad5 librados: watch() should set the WRITE flag on the op
This caused a bug where the watch operation bypassed the is_degraded()
check in the write path and the repop got sent to the replica where the
replica crashed due to the is_missing() assert in sub_op_modify.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-06 16:56:41 -08:00
Samuel Just
f2914af52e HashIndex: fix list_by_hash handling of next->is_max()
get_path_str() should not handle hobject_t::get_max().  get_path_str()
now asserts that the passed object is not max and the callers now check
for is_max().  This caused HashIndex.cc to incorrectly scan an entire
collection before returning no objects rather than scanning the top
level and returning no objects.  It did not actually list_by_hash to
return an incorrect answer, however.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-06 16:56:25 -08:00
Dan Mick
0c01094972 rbd: remove block-by-block messages when exporting
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-12-06 15:58:19 -08:00
John Wilkins
ef24f5318c doc: Change per doc request.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-06 14:20:00 -08:00
Josh Durgin
ca1a4db457 release: add note about 'ceph osd create' syntax
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-06 12:25:39 -08:00
Sam Lang
214c7a1705 client: Allow cap release timeout to be configured
The delay for releasing an inode's capability is
hardcoded to 5 seconds.  This patch takes the timeout
value from a config parameter, which defaults presently
to 5 seconds.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-06 05:30:17 -08:00
Sage Weil
0a137d76bd mkcephfs: fix fs_type assignment typo
Reported-by: Matthew Via <via@matthewvia.info>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-06 05:27:41 -08:00
Sage Weil
4c31598e0a upstart: fix radosgw upstart job
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-06 05:26:11 -08:00
Sage Weil
47266cdaec upstart: rename ceph -> ceph-all
This avoids a conflict with the sysvinit job.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-06 05:26:01 -08:00
Dan Mick
0d2e885823 Merge branch 'next' 2012-12-05 18:19:09 -08:00
Dan Mick
3e98d1af4d Merge branch 'testing' into next 2012-12-05 18:18:41 -08:00
Dan Mick
b7b724299e rbd: update manpage for import/export
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-12-05 18:17:35 -08:00
Dan Mick
7f906b5afd Merge branch 'next'
Pull in fixes for 3567 and 3524
2012-12-05 17:39:17 -08:00
Dan Mick
e9653f27de librbd: hold AioCompletion lock while modifying global state
C_AioRead::finish needs to add in each chunk of a partial read
request to the 'partial' map in the AioCompletion's state
(in destriper, of type StripedReadResult).  That map is global
and must be protected from simultaneous access.  Use the
AioCompletion lock; could create a separate lock if contention is an
issue.

Fixes: #3567
Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit a55700cc0a)
2012-12-05 17:38:05 -08:00
Dan Mick
b2ccf11d3a librbd: handle parent change while async I/Os are in flight
During a test_librbd_fsx run including flatten, ImageCtx->parent
was being dereferenced while null.  Between the time the parent
overlap is calculated and the time the guard+write completes
with ENOENT and submits the copyup+write, the parent image
could have changed (by resize) or been made irrelevant (by
child flatten) such that the parent overlap is now incorrect.

Handle "no parent" by just sending the copyup+write; the copyup
part will be a no-op.  Move to WRITE_FLAT state in this case
because there's no more child to deal with.

Handle "overlap changed" by recalculating overlap before
reading parent data; if none is left, don't read, but rather
just clear m_object_image_extents, in which case the copyup
will again be a no-op because it will be of zero length.
However we still have a parent, so stay in WRITE_COPYUP state
and come back through as usual.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Fixes: #3524
(cherry picked from commit 41e16a3b40)
2012-12-05 17:38:05 -08:00
Dan Mick
64ecc87057 Striper: use local variable inside if() that tested it
Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 917a6f2963)
2012-12-05 17:38:05 -08:00
Dan Mick
a55700cc0a librbd: hold AioCompletion lock while modifying global state
C_AioRead::finish needs to add in each chunk of a partial read
request to the 'partial' map in the AioCompletion's state
(in destriper, of type StripedReadResult).  That map is global
and must be protected from simultaneous access.  Use the
AioCompletion lock; could create a separate lock if contention is an
issue.

Fixes: #3567
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-12-05 17:05:18 -08:00
Dan Mick
41e16a3b40 librbd: handle parent change while async I/Os are in flight
During a test_librbd_fsx run including flatten, ImageCtx->parent
was being dereferenced while null.  Between the time the parent
overlap is calculated and the time the guard+write completes
with ENOENT and submits the copyup+write, the parent image
could have changed (by resize) or been made irrelevant (by
child flatten) such that the parent overlap is now incorrect.

Handle "no parent" by just sending the copyup+write; the copyup
part will be a no-op.  Move to WRITE_FLAT state in this case
because there's no more child to deal with.

Handle "overlap changed" by recalculating overlap before
reading parent data; if none is left, don't read, but rather
just clear m_object_image_extents, in which case the copyup
will again be a no-op because it will be of zero length.
However we still have a parent, so stay in WRITE_COPYUP state
and come back through as usual.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Fixes: #3524
2012-12-05 17:05:18 -08:00
Dan Mick
917a6f2963 Striper: use local variable inside if() that tested it
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-12-05 17:05:18 -08:00
Josh Durgin
930bb55006 Merge branch 'next' 2012-12-05 15:55:35 -08:00
Josh Durgin
2a5549cc0c qa: add script for running xfstests in a vm
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-05 15:54:49 -08:00
Yehuda Sadeh
2779325596 rgw: fix rgw_tools get_obj()
The original implementation broke whenever data exceeded
the chunk size. Also don't keep cache for objects that
exceed the chunk size as cache is not designed for
it. Increased chunk size to 512k.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-12-05 14:33:34 -08:00
Josh Durgin
cb19e994f2 doc: ceph osd create takes a uuid, not an osd id
This was updated by 36e7b077a7, but
accidentally reverted in later changes.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-05 12:50:24 -08:00
Samuel Just
993ff14357 PG: add split_into to populate child members
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-05 11:34:19 -08:00
Samuel Just
6e67a27f89 osd/: splitting a pg now triggers a new interval
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-05 11:34:18 -08:00
Samuel Just
36c0fd220e PrioritizedQueue: allow caller to get items removed by removed_by_filter
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-05 11:34:18 -08:00
Samuel Just
b6c49b484a mon/OSDMonitor: enable split in Monitor
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-05 11:34:18 -08:00