Commit Graph

54178 Commits

Author SHA1 Message Date
Igor Fedotov
e2f6a66ded os/bluestore: Enables cow for cloning at bluestore for store test
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:40:49 -04:00
Igor Fedotov
c7ed3aa7a2 os/bluestore: Fixes Bnode serialization/deserialization and removes legacy Bnode::ref_map
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:40:49 -04:00
Igor Fedotov
b0981b3391 ceph_test_objectstore: extends SimpleObjectTest with the case where write happens for neighboring csum blocks to verify for potential alignment issue
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:40:49 -04:00
Igor Fedotov
5361cb887f os/bluestore: Removes legacy block_size retrieval
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:40:49 -04:00
Sage Weil
326bb0f865 os/bluestore: use WriteContext and do_alloc_write for _do_write_small
Kill some mostly-duplicated code

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:49 -04:00
Sage Weil
eb977e6ba3 os/bluestore: consolidate WriteContext items into a write_item
Also include b_off in there.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:49 -04:00
Sage Weil
3d49c2eb57 os/bluestore: avoid unnecessary write_onode calls
_wctx_finish callers always write the onode; we only need to worry about
our changes to the bnode.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:49 -04:00
Sage Weil
b0cabb78db os/bluestore: drop unused _pad_* methods
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:49 -04:00
Sage Weil
24578bc8f8 os/bluestore: drop unused _pad_zeros args
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:49 -04:00
Sage Weil
a9a5e63d99 os/bluestore: fix offset skew check
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Sage Weil
ea1a787c65 os/bluestore: ~0x -> ~
e.g., 0x432da000~1000 instead of 0x432da000~0x1000

I think it's sufficiently clear that the value after ~ should have the same
base as the first bit, and it's easier to read.  And less text.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Igor Fedotov
183db05a35 compressor: Extends decompressor interface to be able to provide compressed data length.
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:40:48 -04:00
Sage Weil
fe6aaca1db os/bluestore: compress on write
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Sage Weil
502473a95c os/bluestore: do not partially deallocate compressed blobs
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Sage Weil
931264ecab os/bluestore: _do_write_big: limit size of blobs based on compression mode
We may want to compress in smaller chunks based on hints/policy.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Sage Weil
53b73328dd os/bluestore: track new compression config options
Class-wide Compressor, compression mode, and options.  For now these are
global, although later we'll do them per-Collection so they can be pool-
specific.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Sage Weil
a7c9c84eac os/bluestore/bluestore_types: add length to the compression_header_t
Snappy fails to decompress if there are extra zeros in the input buffer.
So, store the length explicitly in the header to avoid feeding them into
the decompressor.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Sage Weil
c4f1facaa5 os/bluestore: fix BufferSpace::read()
- we weren't reading from 'clean' buffers
- restructured loop a bit chasing another bug (but it ended up being
  in the caller)

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Sage Weil
6977d28863 librados: add COMPRESSIBLE and INCOMPRESSIBLE alloc hints
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:48 -04:00
Sage Weil
9aec0a7a5a compressor: add a get_type() method to Compressor interface
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:47 -04:00
Sage Weil
53b699edda os/bluestore: fix _do_read cached vs read result assembly
We weren't handling the case of

 read block 0~300
 cache bloc 100~100

where the result is read(head) + cached + read(tail). Restructure the
loop to handle this.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:47 -04:00
Sage Weil
d315a21be9 os/bluestore: fix _do_read read out of buffer cache
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:47 -04:00
Sage Weil
32c6ba129d os/bluestore: fix up _set_csum helper
- make it thread-safe
- call during mount

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:47 -04:00
Igor Fedotov
f4c8d845e6 os/store_test: Fixes dump_mismatch_bl to avoid assert on lengths mismatch. Starts using it for BufferCacheTest
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:40:47 -04:00
Sage Weil
fb45f389de os/bluestore: use bdev_block_size instead of min_alloc_size for allocators
min_alloc_size is more dynamic; we just need the block size unit here.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:47 -04:00
Ramesh Chander
8185f2d356 os/bluestore: min_alloc_size options for different media types
Signed-off-by: Ramesh Chander <Ramesh.Chander@sandisk.com>
2016-06-01 11:40:47 -04:00
Igor Fedotov
6148e1e74a os/bluestore: Fixes duplicate blob move when cloning
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:54 -04:00
Sage Weil
8b417f346a os/bluestore: avoid passing overlapping allocated/released sets to fm
BitmapFreelistManager doesn't like overlapping allocated+released sets
when the debug option is enabled, because it does a read to verify the
op is valid and that may not have been applied to the kv store yet.

This makes bluestore ObjectStore/StoreTest.SimpleCloneTest/2 pass with
bluestore_clone_cow = false and bluestore_freelist_type = bitmap.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:54 -04:00
Sage Weil
7c04c21574 os/bluestore/BitmapFreelistManager: drop newline on hex dumps
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:54 -04:00
Sage Weil
46522cf0d2 buffer: add no-newline hexdump option
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
7f6174e9d6 os/bluestore/BitmapFreelistManager: use hex
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
c97578070e os/bluestore: drop warning
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
0b80659a0f ceph_test_objectstore: fix BufferCacheReadTest
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
617b606c66 os/bluestore: _dump_onode crcs in hex
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
226a686279 os/bluestore: remove obsolete tail cache
The buffer cache will cover this in a much more general way.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Igor Fedotov
7b72f5a74d os/bluestore: Fixes improper length calculation in BufferSpace::read + adds simplified test case to highlight an issue for append to existing blob
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:53 -04:00
Sage Weil
0a99cbfa2f os/bluestore: drop min_alloc_size locals
We have this in the class, now.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
a19cf11a66 os/bluestore: fix min_alloc_size global
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
a806af54b8 os/bluestore: release partial extents
Use the blob put_ref helper so that we can deallocate blobs partially
(instead of always waiting until they are completely unused).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
8bdf2d906c os/bluestore: only write into a blob region that is allocated
We're only worried about direct writes and wal overwrites; the other write
paths are to freshly allocated blobs.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
282947e29d os/bluestore/bluestore_types: blob_t: add tracking for released extents
We reference count which parts of the blob are used (by lextents), but
currently we only release our space back to the system when all references
go away.  That is a problem if the blob is large (say, 4MB), and we, say,
truncate off most (but not all) of it.

Unfortunately, we can't simply deallocate anything that doesn't have a
reference, because the logical refs are on byte boundaries, and allocation
happens in larger units (min_alloc_size).  A one byte logical punch_hole
might be responsible for the release of a larger block of storage.

To resolve this, we keep track of which portions of the blob have been
released by poisoning the offset in the extents vector.  We expect that
this vector will be almost always short, so we do not bother with a
indexed structure, since iterating a blob offset to determine if it is
still allocated is likely faster.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
75d1083cb6 os/bluestore/bluestore_types: add poison offset to pextent_t
This is a "magic" offset that we can use to indicate an invalid extent
(vs, say, an extent at offset 0 that might clobber real data if it were
used).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
e7dc9a8b90 os/bluestore: remove dead _txc_release
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
499d7f6da0 os/bluestore: only direct write into unused blob space
We can only do a direct write into an already-allocated blob once, if that
range hasn't yet been used.  Once it has been used, it is much to complex
to keep track of when all references to it have committed to disk before
reusing it, so we don't try to handle that case at all.

Since the range has never been used, we can assert that there are no
references to it.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
82ed3ecf9c os/bluestore: mark used range on partial blob writes
- writing into unreferenced blob space
- wal blob writes

both need to update the blob used map.  The full blob writes generates
blobs that are always full, so no change is needed there.  New partial
blob creations need to indicate which parts aren't yet used.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
f63d207914 os/bluestore/bluestore_types: add blob_t unused
Keep track of which ranges of this blob have *never* been used.  We do
this as a negative so that the common case of a fully-written blob is an
empty set.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
07d1e43abf unittest_bluestore_types: benchmark different csum methods
crc32c wins on my laptop.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
6e39180829 unittest_bluestore_types: run csum tests on all algorithms
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
7b3126eea3 os/bluestore/bluestore_types: blob_t: add xxhash64
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
7a92f42ffb common/Checksummer: add xxhash64
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00