Commit Graph

54285 Commits

Author SHA1 Message Date
Sage Weil
d315a21be9 os/bluestore: fix _do_read read out of buffer cache
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:47 -04:00
Sage Weil
32c6ba129d os/bluestore: fix up _set_csum helper
- make it thread-safe
- call during mount

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:47 -04:00
Igor Fedotov
f4c8d845e6 os/store_test: Fixes dump_mismatch_bl to avoid assert on lengths mismatch. Starts using it for BufferCacheTest
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:40:47 -04:00
Sage Weil
fb45f389de os/bluestore: use bdev_block_size instead of min_alloc_size for allocators
min_alloc_size is more dynamic; we just need the block size unit here.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:40:47 -04:00
Ramesh Chander
8185f2d356 os/bluestore: min_alloc_size options for different media types
Signed-off-by: Ramesh Chander <Ramesh.Chander@sandisk.com>
2016-06-01 11:40:47 -04:00
Igor Fedotov
6148e1e74a os/bluestore: Fixes duplicate blob move when cloning
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:54 -04:00
Sage Weil
8b417f346a os/bluestore: avoid passing overlapping allocated/released sets to fm
BitmapFreelistManager doesn't like overlapping allocated+released sets
when the debug option is enabled, because it does a read to verify the
op is valid and that may not have been applied to the kv store yet.

This makes bluestore ObjectStore/StoreTest.SimpleCloneTest/2 pass with
bluestore_clone_cow = false and bluestore_freelist_type = bitmap.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:54 -04:00
Sage Weil
7c04c21574 os/bluestore/BitmapFreelistManager: drop newline on hex dumps
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:54 -04:00
Sage Weil
46522cf0d2 buffer: add no-newline hexdump option
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
7f6174e9d6 os/bluestore/BitmapFreelistManager: use hex
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
c97578070e os/bluestore: drop warning
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
0b80659a0f ceph_test_objectstore: fix BufferCacheReadTest
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
617b606c66 os/bluestore: _dump_onode crcs in hex
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
226a686279 os/bluestore: remove obsolete tail cache
The buffer cache will cover this in a much more general way.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Igor Fedotov
7b72f5a74d os/bluestore: Fixes improper length calculation in BufferSpace::read + adds simplified test case to highlight an issue for append to existing blob
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:53 -04:00
Sage Weil
0a99cbfa2f os/bluestore: drop min_alloc_size locals
We have this in the class, now.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:53 -04:00
Sage Weil
a19cf11a66 os/bluestore: fix min_alloc_size global
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
a806af54b8 os/bluestore: release partial extents
Use the blob put_ref helper so that we can deallocate blobs partially
(instead of always waiting until they are completely unused).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
8bdf2d906c os/bluestore: only write into a blob region that is allocated
We're only worried about direct writes and wal overwrites; the other write
paths are to freshly allocated blobs.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
282947e29d os/bluestore/bluestore_types: blob_t: add tracking for released extents
We reference count which parts of the blob are used (by lextents), but
currently we only release our space back to the system when all references
go away.  That is a problem if the blob is large (say, 4MB), and we, say,
truncate off most (but not all) of it.

Unfortunately, we can't simply deallocate anything that doesn't have a
reference, because the logical refs are on byte boundaries, and allocation
happens in larger units (min_alloc_size).  A one byte logical punch_hole
might be responsible for the release of a larger block of storage.

To resolve this, we keep track of which portions of the blob have been
released by poisoning the offset in the extents vector.  We expect that
this vector will be almost always short, so we do not bother with a
indexed structure, since iterating a blob offset to determine if it is
still allocated is likely faster.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
75d1083cb6 os/bluestore/bluestore_types: add poison offset to pextent_t
This is a "magic" offset that we can use to indicate an invalid extent
(vs, say, an extent at offset 0 that might clobber real data if it were
used).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
e7dc9a8b90 os/bluestore: remove dead _txc_release
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
499d7f6da0 os/bluestore: only direct write into unused blob space
We can only do a direct write into an already-allocated blob once, if that
range hasn't yet been used.  Once it has been used, it is much to complex
to keep track of when all references to it have committed to disk before
reusing it, so we don't try to handle that case at all.

Since the range has never been used, we can assert that there are no
references to it.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:52 -04:00
Sage Weil
82ed3ecf9c os/bluestore: mark used range on partial blob writes
- writing into unreferenced blob space
- wal blob writes

both need to update the blob used map.  The full blob writes generates
blobs that are always full, so no change is needed there.  New partial
blob creations need to indicate which parts aren't yet used.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
f63d207914 os/bluestore/bluestore_types: add blob_t unused
Keep track of which ranges of this blob have *never* been used.  We do
this as a negative so that the common case of a fully-written blob is an
empty set.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
07d1e43abf unittest_bluestore_types: benchmark different csum methods
crc32c wins on my laptop.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
6e39180829 unittest_bluestore_types: run csum tests on all algorithms
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
7b3126eea3 os/bluestore/bluestore_types: blob_t: add xxhash64
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
7a92f42ffb common/Checksummer: add xxhash64
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
5fd42e5073 os/bluestore: drop old Checksummer
blob_t uses it directly via the static methods.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
8b1f9ac9ec os/bluestore: use blob_t csum methods
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:51 -04:00
Sage Weil
0bc8066dc7 os/bluestore/bluestore_types: simpler {calc,verify}_csum methods
This keeps the CSUM_* definitions local to blob_t, and avoids passing
arguments around.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:50 -04:00
Sage Weil
d4f4fa0312 os/bluestore: defer csum calcuations sometimes
When we are doing a partial chunk overwrite, we need to defer the csum_data
update.  Otherwise, another write in the same transaction might need to
read part of the chunk, not find the data in the buffer cache, read it
from disk, and fail the csum check.

This patch defers the calculation until after we've build the transaction
and are about to commit to the kv store.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:50 -04:00
Sage Weil
f6151f7697 doc/dev/bluestore: update based on Igor's feedback
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:50 -04:00
Igor Fedotov
fe2ed4e9b0 os/bluestore: Fixes some issues when using Buffer Cache from _do_read and improves test coverage
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:50 -04:00
Igor Fedotov
4464180d6b os/bluestore: Fixes invalid assert in Buffer::truncate
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:50 -04:00
Igor Fedotov
d0acbd08ae test/objectstore: Adds trivial test case to verify buffer cache use in bluestore
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:50 -04:00
Igor Fedotov
1a0e9754db Adds cached buffer processing for _do_read
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:50 -04:00
Sage Weil
f492eda0da os/bluestore: add a very simple (incomplete) buffer cache
Attach it to each onode.

There is no trimming yet.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:50 -04:00
Sage Weil
8f0414b60c os/bluestore: make tail cache a bit smarter
This is really a stop-gap.  Since we are doing reads in the pre-commit
write path, we need to have some sort of buffer cache so that a sequence
of writes in the same transaction can remain coherent (the second write
must "read" the first write in order to fill out the chunk).

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:49 -04:00
Sage Weil
977881adb6 os/bluestore: dump tail_bl state in _dump_onode
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:49 -04:00
Igor Fedotov
f46ad54f45 os/bluestore: Adds lacking methods in bluestore_compression_header_t to fix encoder test build
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:49 -04:00
Igor Fedotov
2942f36f36 os/bluestore: add decompressor call to read path
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:49 -04:00
Igor Fedotov
bbaa788f01 os/bluestore/bluestore_types: add bluestore_compression_header_t
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:49 -04:00
Igor Fedotov
a305d6b5a3 compressor: Refactor to allow bufferlist::iterator as an input
Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
2016-06-01 11:38:49 -04:00
Sage Weil
9cfb096300 os/bluestore: new write path
- simplified wal_op_t.  we still have overlays in there, although that
  might need to get removed soon too.
- init_csum cleanup
- totally new write path

Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:49 -04:00
Sage Weil
ef99f9446a os/bluestore: verify blob ref_maps during fsck
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:48 -04:00
Sage Weil
aa77a05ad3 os/bluestore: return EIO on csum verification error
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:48 -04:00
Sage Weil
e6a7e9d2e5 os/bluestore: simplify _verify_csum
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:48 -04:00
Sage Weil
62b779968a os/bluestore: cleanup _read_extent_sparse
Signed-off-by: Sage Weil <sage@redhat.com>
2016-06-01 11:38:48 -04:00