When we add a bufferhead with zeros to the Object data map, use the new
zero type instead of allocating actual zeros.
Signed-off-by: Sage Weil <sage@inktank.com>
If we observe an ENOENT on a read, set the complete flag. Any dirty
buffers we have will still be in memory, even if the write are in flight,
because the TX state remains pinned until the writes commit. Writes cannot
proceed faster than reads, even though reads may proceed faster than
writes.
Signed-off-by: Sage Weil <sage@inktank.com>
The p iterator points to the next bh, but try_merge_bh() at the end of the
loop might merge that into our result and invalidate the iterator. Fix
this by repeating the lookup on each pass through the loop.
Signed-off-by: Sage Weil <sage@inktank.com>
Wait until we have applied the entire read result to the cache before we
trigger any read completion events. This is a cleaner and safer approach
since we can be sure that the callback won't get blocked again on data we
have but haven't applied yet. It also fixes a crash I just observed where
the completion did a read, called trim(), and invalidated/destroyed the
iterator/bh p was referencing.
Signed-off-by: Sage Weil <sage@inktank.com>
Pull unpinned objects off the LRU in trim(). This never happens currently
due to all the explicit calls to close_object()...
Signed-off-by: Sage Weil <sage@inktank.com>
We assert that if can_close(), the Object isn't pinned in the LRU. This
assumes we did yur get/put refcounting properly, such that the pins are
at least as restrictive as can_close().
Signed-off-by: Sage Weil <sage@inktank.com>
When caching is enabled, it is possible for the io completion to happen
faster than we call ->finish_adding_requests() (e.g., on cache read).
When that happens, the final read request completion doesn't see a
pending_count == 0 and thus doesn't do all the final buffer construction
that is necessary to return correct data. In particular, users will see
zeroed buffers. test_librbd_fsx is turning this up consistently after
several thousand ops with an image size of ~100MB and cloning disabled.
This was introduced with the extra logic added here with striping.
Fix this by making a separate flag to indicate the completion is under
construction, and make sure we call complete() when both pending_count==0
and building==false.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Change error code from -EDOM to -EISCONN when mounting an already
mounted ceph_mount_info instance. The current convention is to return
-ENOTCONN when using the libcephfs interface in an unmounted state.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
We can have a sequence one the MDS like:
- queue REQUEST_CLOSE to journal
- force_open, queue open to journal
- request_close acked, do nothing
- force_open acked, send OPEN
In this case, the MDS never actually closed the session, and all of the
state remained valid. The client, however, gets a suprious OPEN
message and resets the session state.
Fix this by not resetting that state.
A nicer fix might be to not send the second OPEN at all, but that would
require a REOPENING state on the MDS which is more complicated; this is
good enough. Also, that approach might not give the client an
appropriate opportunity to say "um, no..." and resend the
REQUEST_CLOSE.
Signed-off-by: Sage Weil <sage@inktank.com>
During export, between the warning stage and the final notify, we may
get cache expire messages because the replicas are sending to both us
and the new auth. This check should look for >= WARNING so that it
includes the EXPORTING states as well as the portion of WARNING after
we heard from that replica. This aligns the conditional with the
following assert such that they are properly mutually exclusive.
Fixes: #1527
Signed-off-by: Sage Weil <sage@inktank.com>
These will get reused when the client reconnects. If we are going to
clean these up, we need a different strategy.
Signed-off-by: Sage Weil <sage@inktank.com>
There is one case where populate_obc_watchers gets called when the object
is missing: during a revert. And in that case we *should* do the populate,
since all that is getting reverted is the object version.
Fixes: #3405
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
warning: mds/MDS.cc:1586:27: ignoring return value of ‘char* getcwd(char*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
Signed-off-by: Sage Weil <sage@inktank.com>
This effectively reverts faddb80c42
which prevented vstart.sh from being used in an enviroment where
CEPH_BIN pointed to a make install target.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Previously we would explicitly STAT the object to see if it exists before
sending the write to the OSD. Instead, send the write optimistically, and
assert that the object already exists. This avoids an extra round trip in
the optimistic/common case, and makes the existence check in the initial
first-write case more expensive because we send the data payload along.
Signed-off-by: Sage Weil <sage@inktank.com>
Add a guard operation for writes that asserts that the object already
exists. To avoid requiring new functionality on the OSD side, implement
this by including a STAT operation, and discard the results on the
client side.
Signed-off-by: Sage Weil <sage@inktank.com>