If we race with e.g. truncate and are in bh_write_commit but the oset
is already clean, we should not call the flush callback (again).
This is reproduced by:
- kludging slow osd replies into the code (e.g., 2 second delay)
- mount ceph-fuse with --client-oc-max-dirty-age 1
- dd if=/dev/zero of=mnt/foo count=1
sleep 1
truncate --size 0 mnt/foo
-> crash
Signed-off-by: Sage Weil <sage@inktank.com>
We are careful to clear this reference when processing it.
Add an assert here. There's no way we can get 2 quick replies because
of the kick-back below.
Signed-off-by: Sage Weil <sage@inktank.com>
If we initiate io (success == false) but have no waiter, we need to
delete the OSDRead.
This affects libcephfs/ceph-fuse, but not librbd, which does no readahead.
Signed-off-by: Sage Weil <sage@inktank.com>
Without this check, 'rbd mv foo' crashed trying to use a NULL char* as
a string.
Reported-by: Andrey Korolyov <andrey@xdel.ru>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Start ref count at 0; get_snap_realm() will increment it after alloc.
Fix the ref drop order so that the xlist is empty.
Signed-off-by: Sage Weil <sage@inktank.com>
The get_caps() had a confusing out-arg called "got" that is really what
caps we *have*; it only takes a ref on the *need* cap. We should only
put that one explicitly (CEPH_CAP_FILE_RD). The _write() method already
does this properly, but _read() did not.
Fixes: #3470
Signed-off-by: Sage Weil <sage@inktank.com>
1) use right snap id when forming parent spec to search for children
2) add test case for "unprotect with extant children"
Signed-off-by: Dan Mick <dan.mick@inktank.com>
We already kicked waiters for request, but we need to kick waiters on open
too (e.g., a client trying to mount).
Signed-off-by: Sage Weil <sage@inktank.com>
rbd ls of format-2 images was looping on the first 64 (when more than 64
were present). The key name passed to the omap layer needs to always
contain the prefix, and the "inside-the-loop next-chunk" statement
was missing the "add the prefix" call.
Also, add a test for listing 100 images, format 1 and 2.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #3400
Removed a few lines of code that prematurely created the head
part of the final object (before creating the manifest).
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Fixes: #3401
The problem was that put_obj_meta() was assuming object is going
to be reset, so it was resetting the object anyway. This is not
true when dealing with the immutable multipart upload parts.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
If the part that we're reading is corrupted and we end up
reading zero bytes, we need to exit, otherwise we'd just
loop forever.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Users have been seeing failures where rbd rm is half-done; could be
because of outstanding watches on the rbd_header object. The state
is that rbd_children no longer contains the child, but other pieces
remain; remove considers this a failure.
Fix: test for ENOENT from remove_child, and treat that as an ignorable
error and drive on. Simulate this in copy.sh by removing the
rbd_children object altogether, which also results in ENOENT return
from remove_child.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Previously, we asserted that a log entry with a divergent
prior_version must be a clone. Consider the following
case:
6'11(6'2) m foo
7'12(6'3) m bar
7'13(7'12) m bar
If this is merged with:
6'11(6'2) m foo
8'12(6'4) m baz
we will hit the assert. The correct behavior is simply to remove
the object as in the clone case.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Otherwise, we end up leaving snap hardlinks in the snapshot
index directories. This eventually results in an EEXIST error
when we attempt to re-link the clone into place during
recovery.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
If the given device is already mounted at the target location, do not
mount --move it again and create a bunch of dup entries in the /etc/mtab
and kernel mount table.
Signed-off-by: Sage Weil <sage@inktank.com>
Previously the snap_trimmer would continuously requeue itself until the
end of scrub. This degrades performance and fills up logs for No Good
Reason.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Prod the kernel to refresh the partition table after we create one. The
partprobe program is packaged with parted, which we already use, so this
introduces no new dependency.
Signed-off-by: Sage Weil <sage@inktank.com>
If the disk has no valid label we get an error like
Error: /dev/sdi: unrecognised disk label
Assume any error we get is that and go with an id label of 1.
Signed-off-by: Sage Weil <sage@inktank.com>
Way back in fc869dee1e (v0.42) when we redid
the osd type encoding we forgot to make this conditionally encode the old
format for old clients. In particular, this means that kernel clients
will fail to decode the osdmap if there is a rados pool with a pool-level
snapshot defined.
Fixes: #3290
Signed-off-by: Sage Weil <sage@inktank.com>
There is one case where populate_obc_watchers gets called when the object
is missing: during a revert. And in that case we *should* do the populate,
since all that is getting reverted is the object version.
Fixes: #3405
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
Turn these into asserts. The only two callers are create_object_context()
and get_object_context(), and they only get called when the object is no
longer missing.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Bug #3142 appears to be caused by the following sequence:
- object X missing on primary and replica
- [assert-ver,watch], notify, unwatch requests come in, get deferred
- object is recovered on primary, !missing, create_object_context
- populate_obc_watchers() does nothing, since still degraded
- notify happens now (odd but ok?)
- replica recovered, !degraded
- watch skips bc of bad assert
- unwatch trips up on an assert because populate_obc_watchers never
ran
Fix this by populating the obc watcher when !missing, not when
!degraded. This conditional dates back to Sam's original watch/notify
cleanup in October 2011.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Instead of just ,. Currently "foo.com, bar.com" will fail because of the
space after the comma. This patches fixes that, and makes all delim
chars interchangeable.
Signed-off-by: Sage Weil <sage@inktank.com>
Previously, the messenger would queue messages for a destination that
didn't exist when you were a server; that changed a while back with the
wip-msgr merge (circa v0.52). The result is that when we force open
client sessions and queue messages, they are dropped on the floor and the
client--when it does connect--gets confusing stuff from the MDS.
Instead, explicitly queue and send these messages. Also, *always* send
via the Connection* instead of the inst.
Fixes: #2681
Signed-off-by: Sage Weil <sage@inktank.com>