python3 is not a hard requirement to build ceph, so make it optional.
add an option named "WITH_PYTHON3" which accepts ON, OFF, or CHECK.
Fixes: http://tracker.ceph.com/issues/17103
Signed-off-by: Kefu Chai <kchai@redhat.com>
Since kernel version 2.6 the Linux kernel supports 32-bit integers
and thus the limit is no longer 65536.
By setting this to a higher default value we make sure that all users
will be allowed to create snapshots in the future by default.
Signed-off-by: Wido den Hollander <wido@42on.com>
Now it use bluefs_buffered_io to control whether use buffer or directio
when write. But in fact for logs of rocksdb & bluefs, whether
bluefs_buffer_io is true or false, the logs only need directio.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Make sure we have N bytes of append_buffer reserved. On
a new or cleared list, this allocates exactly that much
runway, allowing us to control memory usage.
Signed-off-by: Sage Weil <sage@redhat.com>
Compaction is triggred from sync_metadata. If one compaction is
in progress and another thread also calls sync_metadata, do not
trigger a second async compaction!
Signed-off-by: Sage Weil <sage@redhat.com>
We have fixed length/order for integer fields and use !
to terminate string fields, so there is no need to use
any extra separators, which is simpler as well as faster.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
These were taking min_alloc_size, but this can change
across mounts; better to use the logical blob length
instead (that's what we want anyway!).
Signed-off-by: Sage Weil <sage@redhat.com>
We need to handle objects written during previous mounts
that may have had a smaller min_alloc_size. Use
block_size, which is a safe lower bound.
Signed-off-by: Sage Weil <sage@redhat.com>
We could bump the _max value for a TransContext in it's
prepare state, have it wait for a long time on IO, and
let another txc allocate and commit something with
an id higher than the previous max.
Fix this first by pushing the max ids into the
TransContext where we can deal with them at commit time,
and then making _kv_sync_thread bump the committed
max in a safe way.
Note that this will need to change if/when we do
these commits in parallel.
Signed-off-by: Sage Weil <sage@redhat.com>
Rewrote much of the persistence of onode metadata. The
highlights:
- extents and blobs stored together (the blob with the
first referencing extent).
- extents sharded across multiple k/v keys
- if a blob if referenced from multiple blobs, it's
stored in the onode key (called a "spanning blob").
- when we clone a blob we copy the metadata, but mark
it shared and put (just) the ref_map on the underlying
blocks in a shared_blob key. at this point we also
assign a globally unique id (sbid = shared blob id)
so the key has a unique name.
- we instantiate a SharedBlob in memory regardless of
whether we need to load the ref_map (which is only
needed for deallocations!). the BufferSpace is
attached to this SharedBlob so we get unified caching
across clones.
Signed-off-by: Sage Weil <sage@redhat.com>