The _merge_old_entry structure had trouble distinguishing between the
following cases:
missing: foo, 1,1
merge_old_entry modify 1,1 0,0
merge_old_entry modify 1,2 1,1
and
merge_old_entry modify 1,2 1,1
In the first case, we should end up with foo removed from missing
at the end. In the second, we need foo added to missing at 1,1.
It's far simpler to present all of the divergent entries for a single
object at once.
Signed-off-by: Samuel Just <sam.just@inktank.com>
In an effort to reduce fragmentation, prefix every rbd write with
a CEPH_OSD_OP_SETALLOCHINT osd op with an expected_write_size value set
to the object size (1 << order). Backwards compatibility is taken care
of on the osd side.
"The CEPH_OSD_OP_SETALLOCHINT hint is durable, in that it's enough to
do it once. The reason every rbd write is prefixed is that rbd doesn't
explicitly create objects and relies on writes creating them
implicitly, so there is no place to stick a single hint op into. To
get around that we decided to prefix every rbd write with a hint (just
like write and setattr ops, hint op will create an object implicitly if
it doesn't exist)."
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Add a new config option, filestore_max_alloc_hint_size, to cap
SETALLOCHINT hint size. The unit is a byte, the default value is
1 megabyte.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Introduce XfsFileStoreBackend class, currently the only filestore
backend implementing SETALLOCHINT op. This commit adds a build-time
dependency on libxfs as xfs-specific ioctl (XFS_IOC_FSSETXATTR /
XFS_XFLAG_EXTSIZE) is used to implement the new set_alloc_hint()
method.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Refactor FS detection checks in FileStore::_detect_fs() so that they
look the same as the ones in FileStore::mkfs(). This is in preparation
for adding XfsFileStoreBackend class.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
This is primarily for librbd/krbd's benefit and is supposed to combat
fragmentation:
"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks. We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."
SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.
xfs is hooked up in the subsequent commits.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Check that rados put immediately followed by rados get retrieves exactly
the same content.
http://tracker.ceph.com/issues/7423 refs #7423
Signed-off-by: Loic Dachary <loic@dachary.org>
When reading from a replicated pool, trying to read more than the object
size results in a short read that does not go beyond the object size. In
erasure coded pools, objects are padded and the read will return more
bytes than the object actually contains.
http://tracker.ceph.com/issues/7423fixes#7423
Signed-off-by: Loic Dachary <loic@dachary.org>
In the event that mod_desc.bl contains pointers into a large
message buffer, we'd otherwise end up keeping around the entire
MOSDECSubOpWrite which created each log entry.
Fixes: #7539
Signed-off-by: Samuel Just <sam.just@inktank.com>
The !tracking_enabled branch actually had a leak which was unreachable
since the caller does the check for tracking_enabled.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Otherwise, clear_data on MOSDOp will leave essentially
all of the buffers intact. This is a problem since the
OpTracker mechanism relies on being able to keep the mesage
around without keeping around the data.
Signed-off-by: Samuel Just <sam.just@inktank.com>
This was broken by 40bdcb88. The 'acting' array had
the up_primary and acting_primary appended.
Fixes: #7572
Signed-off-by: John Spray <john.spray@inktank.com>
CID 717359 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member "bucket_exists" is not
initialized in this constructor nor in any functions that it calls.
Set bucket_exists to false in req_state::req_state().
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1160848 (#1 of 1): Uninitialized scalar variable (UNINIT)
uninit_use: Using uninitialized value "best".
Init 'best' with -1 (from the code logic it will be set at least to 0)
to silence coverity.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Fix type handling in dump_stuck_pg_stats. If type is type doesn't
match to known PGMap::STUCK_* type print out a message and return
directly from function.
CID 1030132 (#2 of 2): Uninitialized scalar variable (UNINIT)
uninit_use_in_call: Using uninitialized value "stuck_type" when calling
"PGMap::dump_stuck(ceph::Formatter *, PGMap::StuckPG, utime_t) const"
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 716921 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "dir" to function
"operator <<(std::ostream &, CDir &)", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 716990 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "cur" when calling
"MDCache::replicate_inode(CInode *, int, ceph::bufferlist &)"
Add assert to check for return value from get_inode() as done in other places.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1135931 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "ondisk" going out of scope leaks the storage it
points to.
CID 1135932 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "onreadable" going out of scope leaks the storage
it points to.
CID 1135933 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "onreadable_sync" going out of scope leaks the
storage it points to.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1188154 (#2 of 2): Resource leak (RESOURCE_LEAK)
overwrite_var: Overwriting "op" in "op = rados_create_read_op()" leaks
the storage that "op" points to.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1160833 (#3 of 3): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "ioctx" going out of scope leaks the storage
it points to
CID 1160835 (#3 of 3): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "ioctx" going out of scope leaks the storage
it points to.
CID 1188156 (#5 of 5): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "ioctx" going out of scope leaks the storage
it points to.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1188126 (#1 of 1): Unchecked return value (CHECKED_RETURN)
2. check_return: Calling function "ObjectStore::stat(coll_t,
ghobject_t const &, stat *, bool)" without checking return value
(as is done elsewhere 8 out of 9 times).
3. unchecked_value: No check of the return value of "this->store->stat(
coll_t(this->cid), hoid, &buf, false)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1188131 (#1 of 1): Division or modulo by zero (DIVIDE_BY_ZERO)
divide_by_zero: In expression "lower_sum * 1000000UL / total", division
by expression "total" which may be zero has undefined behavior
Added check for non zero total.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1188145 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "cb" going out of scope leaks the storage it points to.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>