Commit Graph

22162 Commits

Author SHA1 Message Date
Sage Weil
4ce9da3b87 Merge branch 'wip-oc-neg'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-29 12:37:08 -07:00
Sage Weil
b9eccdf8ba osd: make pool_snap_info_t encoding backward compatible
Way back in fc869dee1e (v0.42) when we redid
the osd type encoding we forgot to make this conditionally encode the old
format for old clients.  In particular, this means that kernel clients
will fail to decode the osdmap if there is a rados pool with a pool-level
snapshot defined.

Fixes: #3290
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-29 11:05:59 -07:00
Gary Lowell
7239e806cc dep-report.sh: ceph package dependency report.
This script searches the ceph build area for dependent header files and
and libraries to attempt to identify ceph package dependecies.
2012-10-29 09:55:33 -07:00
Sam Lang
1638f62668 client: Fix ref counting double free with hardlink
Peforming a hard link through the libcephfs interface causes
a double free on shutdown, due to the Client::link call decrementing
the parent (of the target) directory's inode.  This fix removes the
put_inode(dir) call, to match the behavior of Client::ll_link.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-29 08:58:36 -07:00
Sam Lang
49ca7d50f9 test: Functional test for hardlink/unmount pattern
This test currently breaks on libcephfs as reported
in #3367.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-10-29 10:40:42 -05:00
Sage Weil
84c7a34b51 osdc/ObjectCacher: remove dead locking code
This is unused, and mostly broken in that there is no cleanup when there
is a failure.  Also, the support in the OSD has been largely removed.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-27 13:56:24 -07:00
Dan Mick
17c8589a19 librbd: clip requests past end-of-image.
Rename check_io to clip_io, which can modify the passed-in length
to clamp it to the device size.  This is expected behavior for
block-device emulation.

Call clip_io in rbd_write(); need to return clipped length there,
even though aio_write() is calling clip_io() as well (for the
direct path).

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 20:35:45 -07:00
Sage Weil
86de1faa2c librbd: size max objects based on actual image object order size
This has to happen after we open the image.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 17:12:44 -07:00
caleb miles
07e7bc3b3d rgw_cache: change call signature to overwrite rgw_rados put_obj_meta()
Signed-off-by: caleb miles <caleb.miles@inktank.com>
2012-10-26 17:04:54 -07:00
Yan, Zheng
3384431b6d mds: Fix SnapRealm differ check in CInode::encode_inodestat()
When checking if inode's SnapRealm is different from readdir
SnapRealm, we should use find_snaprealm() to get inode's SnapRealm.
Without this fix, I got lots of "ceph_add_cap: couldn't find snap
realm 100" from kernel client.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-10-26 15:58:34 -07:00
Sage Weil
eafe0a8acb mds: allow try_eval to eval replica locks
Allow try_eval(MDSCacheObject*, int mask) to eval locks on replica objects
so that they don't get stuck in an unstable state.  The eval(CInode*, mask)
handles the non-auth already.  For the dentry case, call eval_any(), which
handles the non-auth case, instead of directly calling simple_eval(), which
does not.

Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 15:48:52 -07:00
Yan, Zheng
f0c2e12cae mds: Send mdsdir as base inode for rejoins
Stray dir inodes are no longer base inodes, they are in the mdsdir
and the mdrdir is base inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-10-26 15:43:33 -07:00
Yan, Zheng
ceeebaf4a4 mds: Fix stray check in Migrator::export_dir()
Commit f8110c (Allow export subtrees in other MDS' stray directory)
make the "directory in stray " check always return false. This is
because the directory in question is grandchild of mdsdir.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-10-26 15:43:07 -07:00
Yan, Zheng
d2ac024a09 mds: fix stray migration/reintegration check in handle_client_rename
The stray migration/reintegration generates a source path that will
be rooted in a (possibly remote) MDS's MDSDIR; adjust the check in
handle_client_rename()

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-10-26 15:41:29 -07:00
Peter Reiher
1b258764bc Merge branch 'master' of github.com:ceph/ceph 2012-10-26 15:32:48 -07:00
Sage Weil
2f09d47d21 mon: fix leading error string from 'ceph report'
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 14:55:43 -07:00
John Wilkins
31284f74fe Merge branch 'master' of https://github.com/ceph/ceph 2012-10-26 14:49:00 -07:00
John Wilkins
9cea18129a doc: updated front page graphic.
fixes: #3412

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-10-26 14:45:08 -07:00
Noah Watkins
6aab4af7eb Merge branch 'wip-java-cephfs'
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Joe Buck <joe.buck@inktank.com>
2012-10-26 14:38:34 -07:00
Jim Schutt
65ed99be85 PG: Do not discard op data too early
Under a sustained cephfs write load where the offered load is higher
than the storage cluster write throughput, a backlog of replication ops
that arrive via the cluster messenger builds up.  The client message
policy throttler, which should be limiting the total write workload
accepted by the storage cluster, is unable to prevent it, for any
value of osd_client_message_size_cap, under such an overload condition.

The root cause is that op data is released too early, in op_applied().

If instead the op data is released at op deletion, then the limit
imposed by the client policy throttler applies over the entire
lifetime of the op, including commits of replication ops.  That
makes the policy throttler an effective means for an OSD to
protect itself from a sustained high offered load, because it can
effectively limit the total, cluster-wide resources needed to process
in-progress write ops.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
2012-10-26 14:31:23 -07:00
Noah Watkins
047f58dbcf java: use unique directory in test
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
0a1e0b792a java: add tests for double mounting
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
be94fb42d1 java: add AlreadyMounted exception
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
13f76dfa92 java: remove deprecated ceph_shutdown
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
16a4c92d20 java: clean-up in finalize()
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
d88c60c61e java: enable ceph_release
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
5c91428063 java: enable ceph_unmount
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
7e7e06f4c9 java: mkdirs returns IOException
For example, CephFileAlreadyExistsException may be returned if mkdirs is
called to create a directory already present.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
9c9c247dce java: log listdir contents in java client
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:16 -07:00
Noah Watkins
4a5abc60f7 java: remove tabs to fix formatting
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:30:27 -07:00
Noah Watkins
1c45775a83 java: add O_WRONLY open flag
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:30:27 -07:00
Noah Watkins
712bfa59b9 java: add FileAlreadyExists exception
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:30:27 -07:00
Sage Weil
1de33053ae osdc/ObjectCacher: handle zero bufferheads on read
Interpret a zero bufferhead as zeros in _readx().

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:56:05 -07:00
Sage Weil
94a84d2908 osdc/ObjectCacher: add ZERO bufferheads from map_read()
When we add a bufferhead with zeros to the Object data map, use the new
zero type instead of allocating actual zeros.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:56:05 -07:00
Sage Weil
fde7fe6840 osdc/ObjectCacher: add zero bufferhead state
Wired up, but not yet used.

Treat these as clean.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:55:47 -07:00
Sage Weil
4fb6a00357 test_librbd_fsx: sleep before exit
This gives the log time to flush to disk.  Kludgey!

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:33:53 -07:00
Sage Weil
45946c2fe9 osdc/ObjectCacher: some extra debugging
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:33:53 -07:00
Sage Weil
fdf556a31e osdc/ObjectCacher: fill in zero buffers in map_read() on miss if complete
If we know we have the complete object in cache, fill in zero buffers
when we miss.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
9dc887d660 osdc/ObjectCacher: improve debug output for readx()
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
b9b1be6d68 osdc/ObjectCacher: set complete flag when we observe ENOENT
If we observe an ENOENT on a read, set the complete flag.  Any dirty
buffers we have will still be in memory, even if the write are in flight,
because the TX state remains pinned until the writes commit.  Writes cannot
proceed faster than reads, even though reads may proceed faster than
writes.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
94d2b91d5b osdc/ObjectCacher: clear complete on trim, release
Clear the complete flag when we are discarding buffers.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
ab56e41997 osdc/ObjectCacher: add complete flag
This is set when we know we have *all* the data for this object.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
f3db940f05 osdc/ObjectCacher: refresh iterator in read apply loop
The p iterator points to the next bh, but try_merge_bh() at the end of the
loop might merge that into our result and invalidate the iterator.  Fix
this by repeating the lookup on each pass through the loop.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
e287296584 osdc/ObjectCacher: do read completions after assimilating read result
Wait until we have applied the entire read result to the cache before we
trigger any read completion events.  This is a cleaner and safer approach
since we can be sure that the callback won't get blocked again on data we
have but haven't applied yet.  It also fixes a crash I just observed where
the completion did a read, called trim(), and invalidated/destroyed the
iterator/bh p was referencing.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
9407046dc7 osdc/ObjectCacher: do not close objects explicitly
Let the trimmer do that.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
8920f417bf osdc/ObjectCacher: make trim() trim Objects
Pull unpinned objects off the LRU in trim().  This never happens currently
due to all the explicit calls to close_object()...

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
f241e22f52 osdc/ObjectCacher: check lru_is_expireable() in can_close()
We assert that if can_close(), the Object isn't pinned in the LRU.  This
assumes we did yur get/put refcounting properly, such that the pins are
at least as restrictive as can_close().

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
cd8e991af8 osdc/ObjectCacher: add LRU for Object
Incomplete; we aren't trimming yet.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
46897fd4ff osdc/ObjectCacher: take Object ref for bh writes
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
21dc0e0262 osdc/ObjectCacher: take refs for inflight lock ops
These are all dead/unused; should probably just rip out this code!

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00