Fixes: #3400
Removed a few lines of code that prematurely created the head
part of the final object (before creating the manifest).
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Fixes: #3401
The problem was that put_obj_meta() was assuming object is going
to be reset, so it was resetting the object anyway. This is not
true when dealing with the immutable multipart upload parts.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
If the part that we're reading is corrupted and we end up
reading zero bytes, we need to exit, otherwise we'd just
loop forever.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Users have been seeing failures where rbd rm is half-done; could be
because of outstanding watches on the rbd_header object. The state
is that rbd_children no longer contains the child, but other pieces
remain; remove considers this a failure.
Fix: test for ENOENT from remove_child, and treat that as an ignorable
error and drive on. Simulate this in copy.sh by removing the
rbd_children object altogether, which also results in ENOENT return
from remove_child.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Previously, we asserted that a log entry with a divergent
prior_version must be a clone. Consider the following
case:
6'11(6'2) m foo
7'12(6'3) m bar
7'13(7'12) m bar
If this is merged with:
6'11(6'2) m foo
8'12(6'4) m baz
we will hit the assert. The correct behavior is simply to remove
the object as in the clone case.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Otherwise, we end up leaving snap hardlinks in the snapshot
index directories. This eventually results in an EEXIST error
when we attempt to re-link the clone into place during
recovery.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
If the given device is already mounted at the target location, do not
mount --move it again and create a bunch of dup entries in the /etc/mtab
and kernel mount table.
Signed-off-by: Sage Weil <sage@inktank.com>
Previously the snap_trimmer would continuously requeue itself until the
end of scrub. This degrades performance and fills up logs for No Good
Reason.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Prod the kernel to refresh the partition table after we create one. The
partprobe program is packaged with parted, which we already use, so this
introduces no new dependency.
Signed-off-by: Sage Weil <sage@inktank.com>
If the disk has no valid label we get an error like
Error: /dev/sdi: unrecognised disk label
Assume any error we get is that and go with an id label of 1.
Signed-off-by: Sage Weil <sage@inktank.com>
Way back in fc869dee1e (v0.42) when we redid
the osd type encoding we forgot to make this conditionally encode the old
format for old clients. In particular, this means that kernel clients
will fail to decode the osdmap if there is a rados pool with a pool-level
snapshot defined.
Fixes: #3290
Signed-off-by: Sage Weil <sage@inktank.com>
There is one case where populate_obc_watchers gets called when the object
is missing: during a revert. And in that case we *should* do the populate,
since all that is getting reverted is the object version.
Fixes: #3405
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
Turn these into asserts. The only two callers are create_object_context()
and get_object_context(), and they only get called when the object is no
longer missing.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Bug #3142 appears to be caused by the following sequence:
- object X missing on primary and replica
- [assert-ver,watch], notify, unwatch requests come in, get deferred
- object is recovered on primary, !missing, create_object_context
- populate_obc_watchers() does nothing, since still degraded
- notify happens now (odd but ok?)
- replica recovered, !degraded
- watch skips bc of bad assert
- unwatch trips up on an assert because populate_obc_watchers never
ran
Fix this by populating the obc watcher when !missing, not when
!degraded. This conditional dates back to Sam's original watch/notify
cleanup in October 2011.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Instead of just ,. Currently "foo.com, bar.com" will fail because of the
space after the comma. This patches fixes that, and makes all delim
chars interchangeable.
Signed-off-by: Sage Weil <sage@inktank.com>
Previously, the messenger would queue messages for a destination that
didn't exist when you were a server; that changed a while back with the
wip-msgr merge (circa v0.52). The result is that when we force open
client sessions and queue messages, they are dropped on the floor and the
client--when it does connect--gets confusing stuff from the MDS.
Instead, explicitly queue and send these messages. Also, *always* send
via the Connection* instead of the inst.
Fixes: #2681
Signed-off-by: Sage Weil <sage@inktank.com>
CID 728080 (#1 of 1): Incorrect sizeof expression (BAD_SIZEOF)
Taking the size of pointer parameter "layout" is suspicious.
At (2): Non-static class member field "layout.fl_stripe_unit" is not initialized in this constructor nor in any functions that it calls.
At (4): Non-static class member field "layout.fl_stripe_count" is not initialized in this constructor nor in any functions that it calls.
At (6): Non-static class member field "layout.fl_object_size" is not initialized in this constructor nor in any functions that it calls.
At (8): Non-static class member field "layout.fl_cas_hash" is not initialized in this constructor nor in any functions that it calls.
At (10): Non-static class member field "layout.fl_object_stripe_unit" is not initialized in this constructor nor in any functions that it calls.
At (12): Non-static class member field "layout.fl_unused" is not initialized in this constructor nor in any functions that it calls.
CID 717206 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (14): Non-static class member field "layout.fl_pg_pool" is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Sage Weil <sage@inktank.com>
At (2): Non-static class member "readdir_offset" is not initialized in this constructor nor in any functions that it calls.
At (4): Non-static class member "readdir_end" is not initialized in this constructor nor in any functions that it calls.
At (6): Non-static class member "readdir_num" is not initialized in this constructor nor in any functions that it calls.
CID 717207 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (8): Non-static class member "tid" is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Sage Weil <sage@inktank.com>
CID 727992 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
At (2): Non-static class member "tag_timeout" is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Sage Weil <sage@inktank.com>
Fixes: #3296
Specifically, is host name string already has ':', then
don't try to append theport (swift auth).
backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Previously, ceph-disk-* would only let you use a journal that was a
file inside the OSD data directory. With this, you can do:
ceph-disk-prepare /dev/sdb /dev/sdb
to put the journal as a second partition on the same disk as the OSD
data (might save some file system overhead), or, more interestingly:
ceph-disk-prepare /dev/sdb /dev/sdc
which makes it create a new partition on /dev/sdc to use as the
journal. Size of the partition is decided by $osd_journal_size.
/dev/sdc must be a GPT-format disk. Multiple OSDs may share the same
journal disk (using separate partitions); this way, a single fast SSD
can serve as journal for multiple spinning disks.
The second use case currently requires parted, so a Recommends: for
parted has been added to Debian packaging.
Closes: #3078Closes: #3079
Signed-off-by: Tommi Virtanen <tv@inktank.com>
- drop xattr warning; this is not an issue with the leveldb stuff.
- the ext3 vs xattr discussion was somewhat inaccurate. also, no longer
relevant.
Signed-off-by: Sage Weil <sage@inktank.com>