Commit Graph

22804 Commits

Author SHA1 Message Date
Sage Weil
e4d0aeace1 Merge remote-tracking branch 'gh/wip-filestore2' into next
Reviewed-by: Sam Just <sam.just@inktank.com>
2012-12-10 14:34:07 -08:00
Samuel Just
788992bbf5 config_opts.h: adjust recovery defaults
osd max backfills: 5 was too low for a default, 10
 seems to work better in testing.  The message
 priority system should minimize disruption of
 push and pull operations anyway.

osd recovery max chunk: 1MB was too small for a
 default.  8MB is reasonable for a single push
 and will allow us to recover an rbd block in
 one push rather then 4 reducing client io
 latency during log-based recovery.

osd recovery op priority: 10 rather than 30 will
 further reduce the client io latency impact of
 push and pull operations.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-10 13:53:10 -08:00
Sage Weil
45865285e7 Merge remote-tracking branch 'gh/wip-3559' into next
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-10 12:55:14 -08:00
Sage Weil
333b3f43b5 mon: fix leak of pool op reply data
We pass a pointer because it is an optional argument, but we shouldn't
put the bufferlist on the heap or else we have to manage it's life
cycle, and that's fragile (and previously broken).

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-08 21:44:54 -08:00
Sage Weil
f66fe7783e os/JournalingObjectStore: simplify op_submitting sanity check
A list is overkill; just use a seq and make sure it increments to ensure
the op_submit_finish calls are in order.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-08 09:32:47 -08:00
Sage Weil
a88b584933 os/JournalingObjectStore: remove unused ops_submitting
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-08 09:32:47 -08:00
Sage Weil
ad4158d1ab os/JourningObjectStore: drop now-useless max_applying_seq
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-08 09:32:47 -08:00
Sage Weil
d9dce4e927 filestore: simplify op quescing
The delicate balancing with op_apply_start() and that fact that it can
block was making it very hard to determine how long commit_start() should
wait, since requests in the workqueue threads could op_apply_start() in
any order.  For example,

 threadA: gets osr1 from wq
 threadA: gets osr2 from wq
 threadA: dequeue seq 11 from osr1, op_apply_start
 threadC: commit_start on 11
 threadA: op_apply_finish on seq 11
 threadC: commit_started, commit_finish
 threadB: dequeue seq 10 from osr2
   <failed assert, badness>

Instead, rip out all this code, and use the ThreadPool pause() method to
quiesce operations.  Keep some of the (now unnecessary) fields around
for sanity checks (blocked, open_ops, max_applying_seq, etc.).

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-08 09:32:47 -08:00
Sage Weil
25ea06969f osd: make pool_stat_t encoding backward compatible with v0.41 and older
In particular, this is the encoding that is used in precise.

Fixes: #3212
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-08 09:24:38 -08:00
Sage Weil
81e567c90d Merge remote-tracking branch 'gh/wip-ceph-test' into next 2012-12-08 09:18:21 -08:00
Sage Weil
e227c70945 crush/CrushWrapper: do not crash if you move an item with no current home
This will let us take an existing orphan and place it somewhere.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-08 09:17:51 -08:00
Joao Eduardo Luis
1acb691008 mon: Elector: init elector before each election
Fixes: #3587

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-08 09:16:02 -08:00
Sage Weil
42d21937fb Merge branch 'testing' into next 2012-12-08 09:12:21 -08:00
Sage Weil
f3029833c3 init-ceph: =, not ==
Reported-by: v@alan.lt
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-08 09:12:07 -08:00
Dan Mick
8816b39aad debian: add ceph.postinst to remove /etc/init/ceph.conf on update
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-07 23:11:39 -08:00
Samuel Just
fc58299eea PG: remove last_epoch_started asserts in proc_primary_info
These asserts are valid for a uniform cluster, but they won't hold
for a replica running a version without the info.last_epoch_started
patch.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 0756052cff)
2012-12-07 22:36:36 -08:00
Yehuda Sadeh
81fdea135c auth: set default auth_client_required
Fixes: #3578
Set auth_client_required to default to "cephx, none".

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-12-07 22:33:31 -08:00
Peter Reiher
a3908a6898 auth: changed order of test for legacy and new authentication
Changed order of test for legacy and new configuration options
in several places.

Signed-off-by: Peter Reiher <reiher@inktank.com>
2012-12-07 22:33:27 -08:00
Yehuda Sadeh
907da185a8 auth: improve logging
Add some logging around failure cases.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-12-07 22:32:59 -08:00
Dan Mick
8355733027 rbd: use ExportContext for progress, not cerr
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-07 16:41:04 -08:00
Sage Weil
07b36992da mds: move from EXCL to SYNC if nobody wants to write
We were moving to the MIX even if nobody wanted to write; that is not
useful, since if we only want to read SYNC will let us cache those reads.
SYNC is also a more friendly place (all things equal) to be.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-07 10:18:40 -10:00
Sam Lang
636048db61 mds/locker: Add debugging for excl->mix trans
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-07 10:18:40 -10:00
Sam Lang
fa5a46c75e test/libcephfs: Add a test for validating caps
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-07 10:18:35 -10:00
Sam Lang
10bf150990 client: Add routine to get caps of file/fd
In order to properly validate the client capabilities,
we need to be able to access them from libcephfs.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-07 10:18:26 -10:00
Josh Durgin
efc6614883 librbd: change internal order parameter to pass-by-value
It doesn't change in any of these places.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-07 10:37:51 -08:00
Josh Durgin
57d5c69985 librbd: clean up after errors in create
Split format 1 and 2 image creation into separate functions for better
readability. Format 2 requires more error handling.

Fixes: #2677
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-07 10:37:51 -08:00
Josh Durgin
c1bf2291e8 librbd: bump version for new functions
copy2, clone2, and create3 are new.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-07 10:37:51 -08:00
Joao Eduardo Luis
bc6f726825 mon: PGMonitor: erase entries from 'creating_pgs_by_osd' when set is empty
This patch avoids sending empty MOSDPGCreate's every tick.

Fixes: #3571

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-07 04:16:15 -08:00
Joao Eduardo Luis
e1c27fe178 mon: Monitor: rework 'paxos' to a list instead of a vector
After adding the gv patches, during Monitor::recovered_leader() we started
waking up contexts following the order of the 'paxos' vector. However,
given that the mdsmon has a forgotten dependency on the osdmon paxos
machine, we were incurring in a situation in which we proposed a value
through the osdmon before creating a new pending value (but by being
active, the mdsmon would go through with it nonetheless).

This is easily fixed by making sure that the mdsmon callbacks are only
awaken *after* the osdmon has been taken care of.

Fixes: #3495

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-07 04:04:14 -08:00
Sage Weil
58f6798f3d Merge branch 'testing' into next 2012-12-07 04:00:22 -08:00
Samuel Just
58890cfad5 librados: watch() should set the WRITE flag on the op
This caused a bug where the watch operation bypassed the is_degraded()
check in the write path and the repop got sent to the replica where the
replica crashed due to the is_missing() assert in sub_op_modify.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-06 16:56:41 -08:00
Samuel Just
f2914af52e HashIndex: fix list_by_hash handling of next->is_max()
get_path_str() should not handle hobject_t::get_max().  get_path_str()
now asserts that the passed object is not max and the callers now check
for is_max().  This caused HashIndex.cc to incorrectly scan an entire
collection before returning no objects rather than scanning the top
level and returning no objects.  It did not actually list_by_hash to
return an incorrect answer, however.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-06 16:56:25 -08:00
Dan Mick
0c01094972 rbd: remove block-by-block messages when exporting
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-12-06 15:58:19 -08:00
Josh Durgin
ca1a4db457 release: add note about 'ceph osd create' syntax
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-06 12:25:39 -08:00
Sam Lang
214c7a1705 client: Allow cap release timeout to be configured
The delay for releasing an inode's capability is
hardcoded to 5 seconds.  This patch takes the timeout
value from a config parameter, which defaults presently
to 5 seconds.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-06 05:30:17 -08:00
Sage Weil
0a137d76bd mkcephfs: fix fs_type assignment typo
Reported-by: Matthew Via <via@matthewvia.info>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-06 05:27:41 -08:00
Sage Weil
4c31598e0a upstart: fix radosgw upstart job
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-06 05:26:11 -08:00
Sage Weil
47266cdaec upstart: rename ceph -> ceph-all
This avoids a conflict with the sysvinit job.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-06 05:26:01 -08:00
Dan Mick
3e98d1af4d Merge branch 'testing' into next 2012-12-05 18:18:41 -08:00
Dan Mick
b7b724299e rbd: update manpage for import/export
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-12-05 18:17:35 -08:00
Dan Mick
e9653f27de librbd: hold AioCompletion lock while modifying global state
C_AioRead::finish needs to add in each chunk of a partial read
request to the 'partial' map in the AioCompletion's state
(in destriper, of type StripedReadResult).  That map is global
and must be protected from simultaneous access.  Use the
AioCompletion lock; could create a separate lock if contention is an
issue.

Fixes: #3567
Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit a55700cc0a)
2012-12-05 17:38:05 -08:00
Dan Mick
b2ccf11d3a librbd: handle parent change while async I/Os are in flight
During a test_librbd_fsx run including flatten, ImageCtx->parent
was being dereferenced while null.  Between the time the parent
overlap is calculated and the time the guard+write completes
with ENOENT and submits the copyup+write, the parent image
could have changed (by resize) or been made irrelevant (by
child flatten) such that the parent overlap is now incorrect.

Handle "no parent" by just sending the copyup+write; the copyup
part will be a no-op.  Move to WRITE_FLAT state in this case
because there's no more child to deal with.

Handle "overlap changed" by recalculating overlap before
reading parent data; if none is left, don't read, but rather
just clear m_object_image_extents, in which case the copyup
will again be a no-op because it will be of zero length.
However we still have a parent, so stay in WRITE_COPYUP state
and come back through as usual.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Fixes: #3524
(cherry picked from commit 41e16a3b40)
2012-12-05 17:38:05 -08:00
Dan Mick
64ecc87057 Striper: use local variable inside if() that tested it
Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 917a6f2963)
2012-12-05 17:38:05 -08:00
Dan Mick
a55700cc0a librbd: hold AioCompletion lock while modifying global state
C_AioRead::finish needs to add in each chunk of a partial read
request to the 'partial' map in the AioCompletion's state
(in destriper, of type StripedReadResult).  That map is global
and must be protected from simultaneous access.  Use the
AioCompletion lock; could create a separate lock if contention is an
issue.

Fixes: #3567
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-12-05 17:05:18 -08:00
Dan Mick
41e16a3b40 librbd: handle parent change while async I/Os are in flight
During a test_librbd_fsx run including flatten, ImageCtx->parent
was being dereferenced while null.  Between the time the parent
overlap is calculated and the time the guard+write completes
with ENOENT and submits the copyup+write, the parent image
could have changed (by resize) or been made irrelevant (by
child flatten) such that the parent overlap is now incorrect.

Handle "no parent" by just sending the copyup+write; the copyup
part will be a no-op.  Move to WRITE_FLAT state in this case
because there's no more child to deal with.

Handle "overlap changed" by recalculating overlap before
reading parent data; if none is left, don't read, but rather
just clear m_object_image_extents, in which case the copyup
will again be a no-op because it will be of zero length.
However we still have a parent, so stay in WRITE_COPYUP state
and come back through as usual.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Fixes: #3524
2012-12-05 17:05:18 -08:00
Dan Mick
917a6f2963 Striper: use local variable inside if() that tested it
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-12-05 17:05:18 -08:00
Josh Durgin
2a5549cc0c qa: add script for running xfstests in a vm
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-05 15:54:49 -08:00
Samuel Just
a83d13a3b7 OSD: ignore queries on now deleted pools
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-05 11:33:26 -08:00
Greg Farnum
4cdc30b943 Merge remote-tracking branch 'origin/wip-mds' into next
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-12-04 16:48:19 -08:00
Sage Weil
3ef741ac2d Merge branch 'wip-filestore' into next
Reviewed-by: Sam Just <sam.just@inktank.com>
2012-12-04 15:05:18 -08:00