Commit Graph

22098 Commits

Author SHA1 Message Date
Samuel Just
8d27edae03 FileJournal: rename queue_lock to finisher_lock
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:10 -07:00
Samuel Just
1a5b6263ed FileJournal: write_cond is not used
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:10 -07:00
Samuel Just
c6c8fce46b FileJournal: break writeq locking from queue_lock
This prevents the relatively long process of queueing
finishers from preventing op submission.

In submit_entry, we no longer check for full before placing
the write in the writeq, committed_thru should work anyway,
and we don't want to grab the required lock.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:10 -07:00
Samuel Just
2646a8fe06 Throttle: reduce lock hold periods
Previously, we tended to dump a lot of log output under
the Throttle lock.  The log level for most log statements
has been reduced to 10.

Additionally, count and max are now atomic_t and can be
read without the Throttle lock.

Finally, most of the perf counter manipulations have been
moved outside of the lock.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:10 -07:00
Samuel Just
bc91f9dd72 os: instrument submit lock, apply lock, queue_lock, write_lock
Adds Mutex perfcounter tracking to mutexes of interest.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:10 -07:00
Samuel Just
2ed667ae9a FileStore: add op_throttle_lock
Avoid using op_tp lock for the op throttle.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:10 -07:00
Samuel Just
542e1344b6 FileStore: don't lock op_tp in queue_op
Neither caller of queue_op can race.
1) in queue_transactions, already under submit lock
2) in _journaled_ahead, journal finisher is single threaded

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:10 -07:00
Samuel Just
a8ac453a82 perf_counters: add dec()
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
9601b29132 JournalingFileStore: move apply/commit sequencing to apply_manager
syncing the filestore requires a stable commit point (i.e., all ops
up to applied_seq must have been applied).  Previously, we used
journal_lock to atomically block new applies while waiting for
the remaining ones to finish.  This creates unnecessary contention.
We now use apply_manager to manage that state atomically with its
own lock.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
1d9f5d27d6 JournalingFileStore: create submit_manager to order op submission
Previously, we ensured op ordering by queueing for journal and
the op queue under the journal lock.  All that is required is
that obtaining an op sequence, queueing for journal, and
(for parallel) queueing for application to the fs are done
atomically.  To that end, submit_manager now handles op submission.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
117ac901ac JournalingObjectStore: remove force_commit, no longer needed
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
2d180e7b89 JournalingObjectStore: whitespace fix
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
c2c912b99e FileStore: remove trigger_commit
This is no longer used.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
5326c2212a JournalingFileStore: pass -1 as the alignment if unimportant
Previously, data_align began at 0 and remained that way if no
transaction contained a large data segment.  This 0 was propagated
to prepare_single_write, which padded out most of a page to ensure
that the bl started with 0 alignment.  Passing -1 will ensure that
we don't prepad these small segments.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
f7727dd598 FileStore: next_finish is not used
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
a268afa117 test/bench: add tp bench
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
e814d8fbe7 test/bench: small io benchmarker
Precreates objects and does writes to random offsets within
random objects.

Includes rados, filestore, and vanilla fs variants

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Samuel Just
fe2814e4f6 Mutex: Instrument Mutex with perfcouter for Lock() wait
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-10-30 13:31:09 -07:00
Sage Weil
184a676e64 msg/SimpleMessenger: start accepter in ready()
Start the accepter thread when the first dispatcher is ready.  This ensures
that there will be someone around to verify authorizers for incoming
connections, and means we have a bit less failure noise on the monitors
as a result.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 13:21:48 -07:00
Sage Weil
c830a9b241 mon: separate pre- and post-fork init
Do most init pre-fork, then do the last little bit (start up messenger,
bootstrap) post-fork.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 13:21:48 -07:00
Sage Weil
5dd5471643 msg/Pipe: fix seq # fix
02f6262f47 got this all wrong (though it
worked by accident).

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 13:21:48 -07:00
Sage Weil
1db4bd9fc7 osd: verify authorizers for heartbeat dispatcher
This was broken with the fixed messenger behavior with missing
verify_authorizer methods in 100fcca3cb.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 13:21:48 -07:00
Josh Durgin
a12bc435ce doc: fix typo in cinder upstart config name
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-30 12:34:19 -07:00
John Wilkins
06c62c5217 doc: Added syntax fixes to Peter's session authentication doc.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-10-30 11:20:51 -07:00
Sage Weil
402e1f5319 ceph-disk-prepare: poke kernel into refreshing partition tables
Prod the kernel to refresh the partition table after we create one.  The
partprobe program is packaged with parted, which we already use, so this
introduces no new dependency.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 10:40:58 -07:00
Sage Weil
2e32a0ee2d ceph-disk-prepare: fix journal partition creation
The end value needs to have + to indicate it is relative to wherever the
start is.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 10:40:58 -07:00
Sage Weil
8921fc7c7b ceph-disk-prepare: assume parted failure means no partition table
If the disk has no valid label we get an error like

  Error: /dev/sdi: unrecognised disk label

Assume any error we get is that and go with an id label of 1.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 10:40:58 -07:00
Sage Weil
a4db58fc11 msg/Pipe: whitespace cleanup
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 10:00:54 -07:00
Sage Weil
02f6262f47 msg/Pipe: only randomize start seq #'s if MSG_AUTH feature is present
The kernel client expects seq #'s to start at 1 or else it is unhappy.
So, only randomize these values if the MSG_AUTH feature is present--that is
the only time it matters anyway.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 10:00:42 -07:00
Sage Weil
3a48cbf245 doc: update fs recommendations
More forceful about recommending XFS.  More warning about using btrfs in
production deployments.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-30 09:46:25 -07:00
Sage Weil
1a236e16ac cephx: don't check signature if MSG_AUTH feature isn't present
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-29 15:48:15 -07:00
Sage Weil
56bce3ba26 auth: include features in cephx SessionHandler
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-29 15:47:45 -07:00
Peter Reiher
100fcca3cb Fixed problem with checking authorizer in accept().
Signed-off-by: Peter Reiher <reiher@inktank.com>
2012-10-29 14:47:14 -07:00
Dan Mick
5324d2d94a librbd: Fix 32-bit compilation errors
Switch size_t in clip_io to uint64_t; it's just easier, and the
alternative would be to limit 32-bit builds to sizes <= 4GB

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-29 14:16:14 -07:00
Peter Reiher
343a687dab Merge branch 'master' of github.com:ceph/ceph 2012-10-29 12:47:18 -07:00
Peter Reiher
2157bcbf65 Temporary patch to a problem in Pipe related to monitor initialization.
Signed-off-by: Peter Reiher <reiher@inktank.com>
2012-10-29 12:42:29 -07:00
Sage Weil
4ce9da3b87 Merge branch 'wip-oc-neg'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-29 12:37:08 -07:00
Sage Weil
b9eccdf8ba osd: make pool_snap_info_t encoding backward compatible
Way back in fc869dee1e (v0.42) when we redid
the osd type encoding we forgot to make this conditionally encode the old
format for old clients.  In particular, this means that kernel clients
will fail to decode the osdmap if there is a rados pool with a pool-level
snapshot defined.

Fixes: #3290
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-29 11:05:59 -07:00
Gary Lowell
7239e806cc dep-report.sh: ceph package dependency report.
This script searches the ceph build area for dependent header files and
and libraries to attempt to identify ceph package dependecies.
2012-10-29 09:55:33 -07:00
Sam Lang
1638f62668 client: Fix ref counting double free with hardlink
Peforming a hard link through the libcephfs interface causes
a double free on shutdown, due to the Client::link call decrementing
the parent (of the target) directory's inode.  This fix removes the
put_inode(dir) call, to match the behavior of Client::ll_link.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-29 08:58:36 -07:00
Sam Lang
49ca7d50f9 test: Functional test for hardlink/unmount pattern
This test currently breaks on libcephfs as reported
in #3367.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-10-29 10:40:42 -05:00
Sage Weil
84c7a34b51 osdc/ObjectCacher: remove dead locking code
This is unused, and mostly broken in that there is no cleanup when there
is a failure.  Also, the support in the OSD has been largely removed.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-27 13:56:24 -07:00
Dan Mick
17c8589a19 librbd: clip requests past end-of-image.
Rename check_io to clip_io, which can modify the passed-in length
to clamp it to the device size.  This is expected behavior for
block-device emulation.

Call clip_io in rbd_write(); need to return clipped length there,
even though aio_write() is calling clip_io() as well (for the
direct path).

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 20:35:45 -07:00
Sage Weil
86de1faa2c librbd: size max objects based on actual image object order size
This has to happen after we open the image.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 17:12:44 -07:00
caleb miles
07e7bc3b3d rgw_cache: change call signature to overwrite rgw_rados put_obj_meta()
Signed-off-by: caleb miles <caleb.miles@inktank.com>
2012-10-26 17:04:54 -07:00
Yan, Zheng
3384431b6d mds: Fix SnapRealm differ check in CInode::encode_inodestat()
When checking if inode's SnapRealm is different from readdir
SnapRealm, we should use find_snaprealm() to get inode's SnapRealm.
Without this fix, I got lots of "ceph_add_cap: couldn't find snap
realm 100" from kernel client.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-10-26 15:58:34 -07:00
Sage Weil
eafe0a8acb mds: allow try_eval to eval replica locks
Allow try_eval(MDSCacheObject*, int mask) to eval locks on replica objects
so that they don't get stuck in an unstable state.  The eval(CInode*, mask)
handles the non-auth already.  For the dentry case, call eval_any(), which
handles the non-auth case, instead of directly calling simple_eval(), which
does not.

Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 15:48:52 -07:00
Yan, Zheng
f0c2e12cae mds: Send mdsdir as base inode for rejoins
Stray dir inodes are no longer base inodes, they are in the mdsdir
and the mdrdir is base inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-10-26 15:43:33 -07:00
Yan, Zheng
ceeebaf4a4 mds: Fix stray check in Migrator::export_dir()
Commit f8110c (Allow export subtrees in other MDS' stray directory)
make the "directory in stray " check always return false. This is
because the directory in question is grandchild of mdsdir.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-10-26 15:43:07 -07:00
Yan, Zheng
d2ac024a09 mds: fix stray migration/reintegration check in handle_client_rename
The stray migration/reintegration generates a source path that will
be rooted in a (possibly remote) MDS's MDSDIR; adjust the check in
handle_client_rename()

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-10-26 15:41:29 -07:00