Commit Graph

22098 Commits

Author SHA1 Message Date
Peter Reiher
1b258764bc Merge branch 'master' of github.com:ceph/ceph 2012-10-26 15:32:48 -07:00
Sage Weil
2f09d47d21 mon: fix leading error string from 'ceph report'
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 14:55:43 -07:00
John Wilkins
31284f74fe Merge branch 'master' of https://github.com/ceph/ceph 2012-10-26 14:49:00 -07:00
John Wilkins
9cea18129a doc: updated front page graphic.
fixes: #3412

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-10-26 14:45:08 -07:00
Noah Watkins
6aab4af7eb Merge branch 'wip-java-cephfs'
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Joe Buck <joe.buck@inktank.com>
2012-10-26 14:38:34 -07:00
Jim Schutt
65ed99be85 PG: Do not discard op data too early
Under a sustained cephfs write load where the offered load is higher
than the storage cluster write throughput, a backlog of replication ops
that arrive via the cluster messenger builds up.  The client message
policy throttler, which should be limiting the total write workload
accepted by the storage cluster, is unable to prevent it, for any
value of osd_client_message_size_cap, under such an overload condition.

The root cause is that op data is released too early, in op_applied().

If instead the op data is released at op deletion, then the limit
imposed by the client policy throttler applies over the entire
lifetime of the op, including commits of replication ops.  That
makes the policy throttler an effective means for an OSD to
protect itself from a sustained high offered load, because it can
effectively limit the total, cluster-wide resources needed to process
in-progress write ops.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
2012-10-26 14:31:23 -07:00
Noah Watkins
047f58dbcf java: use unique directory in test
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
0a1e0b792a java: add tests for double mounting
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
be94fb42d1 java: add AlreadyMounted exception
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
13f76dfa92 java: remove deprecated ceph_shutdown
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
16a4c92d20 java: clean-up in finalize()
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
d88c60c61e java: enable ceph_release
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
5c91428063 java: enable ceph_unmount
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
7e7e06f4c9 java: mkdirs returns IOException
For example, CephFileAlreadyExistsException may be returned if mkdirs is
called to create a directory already present.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:20 -07:00
Noah Watkins
9c9c247dce java: log listdir contents in java client
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:58:16 -07:00
Noah Watkins
4a5abc60f7 java: remove tabs to fix formatting
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:30:27 -07:00
Noah Watkins
1c45775a83 java: add O_WRONLY open flag
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:30:27 -07:00
Noah Watkins
712bfa59b9 java: add FileAlreadyExists exception
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 13:30:27 -07:00
Sage Weil
1de33053ae osdc/ObjectCacher: handle zero bufferheads on read
Interpret a zero bufferhead as zeros in _readx().

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:56:05 -07:00
Sage Weil
94a84d2908 osdc/ObjectCacher: add ZERO bufferheads from map_read()
When we add a bufferhead with zeros to the Object data map, use the new
zero type instead of allocating actual zeros.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:56:05 -07:00
Sage Weil
fde7fe6840 osdc/ObjectCacher: add zero bufferhead state
Wired up, but not yet used.

Treat these as clean.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:55:47 -07:00
Sage Weil
4fb6a00357 test_librbd_fsx: sleep before exit
This gives the log time to flush to disk.  Kludgey!

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:33:53 -07:00
Sage Weil
45946c2fe9 osdc/ObjectCacher: some extra debugging
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:33:53 -07:00
Sage Weil
fdf556a31e osdc/ObjectCacher: fill in zero buffers in map_read() on miss if complete
If we know we have the complete object in cache, fill in zero buffers
when we miss.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
9dc887d660 osdc/ObjectCacher: improve debug output for readx()
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
b9b1be6d68 osdc/ObjectCacher: set complete flag when we observe ENOENT
If we observe an ENOENT on a read, set the complete flag.  Any dirty
buffers we have will still be in memory, even if the write are in flight,
because the TX state remains pinned until the writes commit.  Writes cannot
proceed faster than reads, even though reads may proceed faster than
writes.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
94d2b91d5b osdc/ObjectCacher: clear complete on trim, release
Clear the complete flag when we are discarding buffers.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
ab56e41997 osdc/ObjectCacher: add complete flag
This is set when we know we have *all* the data for this object.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
f3db940f05 osdc/ObjectCacher: refresh iterator in read apply loop
The p iterator points to the next bh, but try_merge_bh() at the end of the
loop might merge that into our result and invalidate the iterator.  Fix
this by repeating the lookup on each pass through the loop.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
e287296584 osdc/ObjectCacher: do read completions after assimilating read result
Wait until we have applied the entire read result to the cache before we
trigger any read completion events.  This is a cleaner and safer approach
since we can be sure that the callback won't get blocked again on data we
have but haven't applied yet.  It also fixes a crash I just observed where
the completion did a read, called trim(), and invalidated/destroyed the
iterator/bh p was referencing.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
9407046dc7 osdc/ObjectCacher: do not close objects explicitly
Let the trimmer do that.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
8920f417bf osdc/ObjectCacher: make trim() trim Objects
Pull unpinned objects off the LRU in trim().  This never happens currently
due to all the explicit calls to close_object()...

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:45 -07:00
Sage Weil
f241e22f52 osdc/ObjectCacher: check lru_is_expireable() in can_close()
We assert that if can_close(), the Object isn't pinned in the LRU.  This
assumes we did yur get/put refcounting properly, such that the pins are
at least as restrictive as can_close().

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
cd8e991af8 osdc/ObjectCacher: add LRU for Object
Incomplete; we aren't trimming yet.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
46897fd4ff osdc/ObjectCacher: take Object ref for bh writes
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
21dc0e0262 osdc/ObjectCacher: take refs for inflight lock ops
These are all dead/unused; should probably just rip out this code!

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
57e18a74fe osdc/ObjectCacher: take Object ref when there are buffers
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
a34a8b8bae osdc/ObjectCacher: add ref count to Object
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
24d07e8727 osdc/ObjectCacher: rename lru_* -> bh_lru_*
We'll be adding LRUs for objects, too.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-26 11:31:44 -07:00
Sage Weil
57a4cbbfb2 librbd: fix race in AioCompletion that are still being built
When caching is enabled, it is possible for the io completion to happen
faster than we call ->finish_adding_requests() (e.g., on cache read).
When that happens, the final read request completion doesn't see a
pending_count == 0 and thus doesn't do all the final buffer construction
that is necessary to return correct data.  In particular, users will see
zeroed buffers.  test_librbd_fsx is turning this up consistently after
several thousand ops with an image size of ~100MB and cloning disabled.

This was introduced with the extra logic added here with striping.

Fix this by making a separate flag to indicate the completion is under
construction, and make sure we call complete() when both pending_count==0
and building==false.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-26 11:30:38 -07:00
Peter Reiher
a67afa4b09 Merge branch 'wip-msgauth4'
Conflicts:
	src/common/config_opts.h
	Added a couple of options related to session authentication, accepted new values for option from master
2012-10-26 09:25:15 -07:00
Noah Watkins
e572b4b4cf Merge branch 'wip-client-unmount'
Signed-off-by: Noah Watkins <noah.watkins@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-26 09:07:30 -07:00
Peter Reiher
ffb8c605a8 Various cleanup changes to session authentication code.
Signed-off-by: Peter Reiher <reiher@inktank.com>
2012-10-26 08:57:29 -07:00
Noah Watkins
67bc92aa54 client: add ceph_release, ceph_shutdown
Notes that ceph_shutdown() is now deprecated.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 08:38:26 -07:00
Noah Watkins
f1eef53200 client: double mount returns -EISCONN
Change error code from -EDOM to -EISCONN when mounting an already
mounted ceph_mount_info instance.  The current convention is to return
-ENOTCONN when using the libcephfs interface in an unmounted state.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-10-26 08:38:26 -07:00
Sage Weil
1152656c62 Merge branch 'wip-mds'
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-10-25 20:49:25 -07:00
Sage Weil
c9ca3c997c client: do not reset session state on reopened sessions
We can have a sequence one the MDS like:

 - queue REQUEST_CLOSE to journal
 - force_open, queue open to journal
 - request_close acked, do nothing
 - force_open acked, send OPEN

In this case, the MDS never actually closed the session, and all of the
state remained valid.  The client, however, gets a suprious OPEN
message and resets the session state.

Fix this by not resetting that state.

A nicer fix might be to not send the second OPEN at all, but that would
require a REOPENING state on the MDS which is more complicated; this is
good enough.  Also, that approach might not give the client an
appropriate opportunity to say "um, no..." and resend the
REQUEST_CLOSE.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-25 20:49:04 -07:00
Sage Weil
3153ec740d mds: fix handling of cache_expire export
During export, between the warning stage and the final notify, we may
get cache expire messages because the replicas are sending to both us
and the new auth.  This check should look for >= WARNING so that it
includes the EXPORTING states as well as the portion of WARNING after
we heard from that replica.  This aligns the conditional with the
following assert such that they are properly mutually exclusive.

Fixes: #1527
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-25 20:49:04 -07:00
Sage Weil
4ac45200f1 mds: do not mark closed connections disposable
These will get reused when the client reconnects.  If we are going to
clean these up, we need a different strategy.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-25 20:49:03 -07:00
Sage Weil
ad839c70dc mds: use connection on closed sessions in force_open_sessions
If the have a Connection*, use it.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-25 20:49:03 -07:00