Commit Graph

57594 Commits

Author SHA1 Message Date
Sage Weil
718d6c2ab1 Merge pull request #11009 from liewegas/wip-bluestore-keys
os/bluestore: make onode keys more efficient (and sort correctly)

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2016-09-08 09:02:07 -05:00
Kefu Chai
7d7a984846 cmake: make py3 a nice-to-have
python3 is not a hard requirement to build ceph, so make it optional.
add an option named "WITH_PYTHON3" which accepts ON, OFF, or CHECK.

Fixes: http://tracker.ceph.com/issues/17103
Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-09-08 21:49:45 +08:00
Sage Weil
2313079cb4 Merge pull request #11017 from ifed01/wip-bluestore-cleanup
os/bluestore: remove some copy-pastes

Reviewed-by: Sage Weil <sage@redhat.com>
2016-09-08 08:35:47 -05:00
Igor Fedotov
ccd9b5ea1e os/bluestore: remove some copy-pastes
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-09-08 13:22:08 +00:00
Igor Fedotov
d1ab24b041 Merge pull request #11011 from liewegas/wip-bluestore-inline-bl
os/bluestore: fix a few memory utilization leaks and wasters

Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>
2016-09-08 16:17:00 +03:00
Wido den Hollander
b8a530e3d7
mds: Set mds_snap_max_uid to 4294967294
Since kernel version 2.6 the Linux kernel supports 32-bit integers
and thus the limit is no longer 65536.

By setting this to a higher default value we make sure that all users
will be allowed to create snapshots in the future by default.

Signed-off-by: Wido den Hollander <wido@42on.com>
2016-09-08 14:49:25 +02:00
Jianpeng Ma
7d1d689a62 os/bluestore/BlueFS: For logs of rocksdb & bluefs only use directio.
Now it use bluefs_buffered_io to control whether use buffer or directio
when write. But in fact for logs of rocksdb & bluefs, whether
bluefs_buffer_io is true or false, the logs only need directio.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2016-09-08 19:40:41 +08:00
John Spray
88996fb876 Merge pull request #10957 from ukernel/wip-17172
client: properly set inode number of created inode in replay request

Reviewed-by: John Spray <john.spray@redhat.com>
2016-09-08 10:30:45 +01:00
Kefu Chai
b1f417f25d Merge pull request #9936 from onyb/wip-ceph-disk-py3
ceph-disk: Compatibility fixes for Python 3

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-09-08 17:15:43 +08:00
xie xingguo
a776f5a68c build: drop dryrun of autogen.sh from run-cmake-check.sh script
Introduced by https://github.com/ceph/ceph/pull/11007.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2016-09-08 14:14:22 +08:00
Sage Weil
265730aa5a os/FuseStore: only flush if dirty
No need to rewrite the object unless it was changed.

This partially fixes truncate.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 18:14:16 -04:00
Sage Weil
b6870a618b qa/workunits/objectstore: fix test_fuse.sh
We switched from @ to # a while back.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 18:14:15 -04:00
Sage Weil
662136f76a os/bluestore: do not waste memory on cached encoded blobs
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 16:46:57 -04:00
Sage Weil
9035883c33 buffer: implement buffer::list::reserve(n)
Make sure we have N bytes of append_buffer reserved. On
a new or cleared list, this allocates exactly that much
runway, allowing us to control memory usage.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 16:45:29 -04:00
Sage Weil
9cd042a566 os/bluestore/BlueFS: fix Dir memory leak
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 15:58:26 -04:00
Sage Weil
dbe23c94c0 os/bluestore/BlueFS: do not start racing async compaction
Compaction is triggred from sync_metadata.  If one compaction is
in progress and another thread also calls sync_metadata, do not
trigger a second async compaction!

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 15:29:30 -04:00
Sage Weil
b9227e3f80 unittest_bluestore_types: resurrect blob and extent_map unit tests
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 15:26:06 -04:00
Sage Weil
b39df47fb3 os/bluestore: bits for unit tests
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 15:26:06 -04:00
Sage Weil
0ac8ab9dc8 os/bluestore: remove faulted debug hackery
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 15:26:06 -04:00
Sage Weil
b53f405e38 os/bluestore: small put_ref cleanup
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 14:30:22 -04:00
Sage Weil
f8abb628a4 Merge pull request #10978 from chhabaramesh/extent_alloc
os/bluestore: Hint based allocation in bitmap Allocator

Reviewed-by: Sage Weil <sage@redhat.com>
2016-09-07 13:00:42 -05:00
Kefu Chai
4508448889 Merge pull request #11007 from liewegas/wip-autotools-must-die
remove autotools

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Ali Maredia <amaredia@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-09-08 01:06:07 +08:00
Sage Weil
ef12b5132b os/bluestore: encode shard id in single byte
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 13:00:29 -04:00
Sage Weil
df17cc9be0 os/bluestore: fix key+name sort wonkiness
We want to unconditoinally start with the name or key, *then* do < > or =
(and if not =, a trailing name).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 13:00:29 -04:00
xie xingguo
55c8143ba9 os/bluestore: drop unnecessary separator between fields
We have fixed length/order for integer fields and use !
to terminate string fields, so there is no need to use
any extra separators, which is simpler as well as faster.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2016-09-07 13:00:29 -04:00
xie xingguo
18c348de66 os/bluestore: remove never reachable asserts
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2016-09-07 13:00:19 -04:00
xie xingguo
420150d9dc os/bluestore: don't dirty onode if its size is already at desired offset
for truncate operation.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2016-09-07 13:00:19 -04:00
Sage Weil
c953d27fe6 os/bluestore: tunable prealloc size for ExtentMap inline_bl
Otherwise we eat 4KB for every Onode.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 12:41:45 -04:00
Sage Weil
b0d1f865fb buffer: clear append_buffer on clear()
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 12:41:45 -04:00
Sage Weil
bc0de8bc7a Merge pull request #11008 from liewegas/wip-libaio
test/objectstore/CMakeLists.txt: fix libaio conditional
2016-09-07 11:41:27 -05:00
Sage Weil
578668f128 test/objectstore/CMakeLists.txt: fix libaio conditional
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 12:40:53 -04:00
Sage Weil
fba798dcad remove autotools
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:50:14 -04:00
Sage Weil
68cf9d82c0 Merge pull request #10963 from liewegas/wip-bluestore-sharded-extent-map
os/bluestore: shard extent map

Reviewed-by: Allen Samuels <allen.samuels@sandisk.com>
2016-09-07 10:36:34 -05:00
Sage Weil
fad3d99853 os/bluestore: assert shared blob cache cleared on split
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:07 -04:00
Sage Weil
2d8a145d02 os/bluestore: dump some stats after fsck
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:07 -04:00
Sage Weil
f69af0b885 os/bluestore: instrument onode reshard events
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:07 -04:00
Sage Weil
3fb6c5c18c os/bluestore: instrument transaction count
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:07 -04:00
Sage Weil
e152e97ce1 os/bluestore: instrument big/small writes
Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:06 -04:00
Sage Weil
2df9aa8e79 os/bluestore: make blob_t unused helpers use logical length
These were taking min_alloc_size, but this can change
across mounts; better to use the logical blob length
instead (that's what we want anyway!).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:06 -04:00
Sage Weil
dcc58c9b93 os/bluestore: use block_size for allocator unit
We need to handle objects written during previous mounts
that may have had a smaller min_alloc_size.  Use
block_size, which is a safe lower bound.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:06 -04:00
Sage Weil
6e251cfd47 os/bluestore: fix fsck used_block bitmap
This has to be block_size bits because min_alloc_size
can vary over mounts.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:06 -04:00
Sage Weil
7f35725fdb os/bluestore: optimize compress_extent_map
Only examine the range we just wrote to (and to the left
and right).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:06 -04:00
Sage Weil
ec58b21acd os/bluestore: make nid and blobid allocation less racy
We could bump the _max value for a TransContext in it's
prepare state, have it wait for a long time on IO, and
let another txc allocate and commit something with
an id higher than the previous max.

Fix this first by pushing the max ids into the
TransContext where we can deal with them at commit time,
and then making _kv_sync_thread bump the committed
max in a safe way.

Note that this will need to change if/when we do
these commits in parallel.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:05 -04:00
Sage Weil
4fbb1efd1d os/bluestore: shard extent map
Rewrote much of the persistence of onode metadata.  The
highlights:

 - extents and blobs stored together (the blob with the
   first referencing extent).
 - extents sharded across multiple k/v keys
 - if a blob if referenced from multiple blobs, it's
   stored in the onode key (called a "spanning blob").
 - when we clone a blob we copy the metadata, but mark
   it shared and put (just) the ref_map on the underlying
   blocks in a shared_blob key.  at this point we also
   assign a globally unique id (sbid = shared blob id)
   so the key has a unique name.
 - we instantiate a SharedBlob in memory regardless of
   whether we need to load the ref_map (which is only
   needed for deallocations!).  the BufferSpace is
   attached to this SharedBlob so we get unified caching
   across clones.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-09-07 11:26:05 -04:00
Kefu Chai
b307a64330 Merge pull request #10973 from wjwithagen/wip-wjw-freebsd-cmake-excludes-2
cmake: FreeBSD specific excludes in CMakeLists.txt

Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-09-07 23:19:47 +08:00
Ramesh Chander
3c06717ac6 Hint argument in alloc_blocks + test case changes
Signed-off-by: Ramesh Chander <Ramesh.Chander@sandisk.com>
2016-09-07 08:05:48 -07:00
Ramesh Chander
89afb56f7d remove wrap argument and handle in wrappers
Signed-off-by: Ramesh Chander <Ramesh.Chander@sandisk.com>
2016-09-07 08:05:48 -07:00
Ramesh Chander
2eed1ef196 hint in extent_alloc code
Signed-off-by: Ramesh Chander <Ramesh.Chander@sandisk.com>
2016-09-07 08:05:48 -07:00
John Spray
6594022072 Merge pull request #10996 from jcsp/wip-16973
mds: log path with CDir damage messages

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2016-09-07 15:43:14 +01:00
Sage Weil
4fe06e9509 Merge pull request #10928 from stiopaa1/rbd_mirror_imagesyncthrottler
rbd_mirror/ImageSynceThrottler: move struct to .cc

Reviewed-by: Sage Weil <sage@redhat.com>
2016-09-07 09:23:02 -05:00