BitmapFreelistManager doesn't like overlapping allocated+released sets
when the debug option is enabled, because it does a read to verify the
op is valid and that may not have been applied to the kv store yet.
This makes bluestore ObjectStore/StoreTest.SimpleCloneTest/2 pass with
bluestore_clone_cow = false and bluestore_freelist_type = bitmap.
Signed-off-by: Sage Weil <sage@redhat.com>
Use the blob put_ref helper so that we can deallocate blobs partially
(instead of always waiting until they are completely unused).
Signed-off-by: Sage Weil <sage@redhat.com>
We're only worried about direct writes and wal overwrites; the other write
paths are to freshly allocated blobs.
Signed-off-by: Sage Weil <sage@redhat.com>
We reference count which parts of the blob are used (by lextents), but
currently we only release our space back to the system when all references
go away. That is a problem if the blob is large (say, 4MB), and we, say,
truncate off most (but not all) of it.
Unfortunately, we can't simply deallocate anything that doesn't have a
reference, because the logical refs are on byte boundaries, and allocation
happens in larger units (min_alloc_size). A one byte logical punch_hole
might be responsible for the release of a larger block of storage.
To resolve this, we keep track of which portions of the blob have been
released by poisoning the offset in the extents vector. We expect that
this vector will be almost always short, so we do not bother with a
indexed structure, since iterating a blob offset to determine if it is
still allocated is likely faster.
Signed-off-by: Sage Weil <sage@redhat.com>
This is a "magic" offset that we can use to indicate an invalid extent
(vs, say, an extent at offset 0 that might clobber real data if it were
used).
Signed-off-by: Sage Weil <sage@redhat.com>
We can only do a direct write into an already-allocated blob once, if that
range hasn't yet been used. Once it has been used, it is much to complex
to keep track of when all references to it have committed to disk before
reusing it, so we don't try to handle that case at all.
Since the range has never been used, we can assert that there are no
references to it.
Signed-off-by: Sage Weil <sage@redhat.com>
- writing into unreferenced blob space
- wal blob writes
both need to update the blob used map. The full blob writes generates
blobs that are always full, so no change is needed there. New partial
blob creations need to indicate which parts aren't yet used.
Signed-off-by: Sage Weil <sage@redhat.com>
Keep track of which ranges of this blob have *never* been used. We do
this as a negative so that the common case of a fully-written blob is an
empty set.
Signed-off-by: Sage Weil <sage@redhat.com>
When we are doing a partial chunk overwrite, we need to defer the csum_data
update. Otherwise, another write in the same transaction might need to
read part of the chunk, not find the data in the buffer cache, read it
from disk, and fail the csum check.
This patch defers the calculation until after we've build the transaction
and are about to commit to the kv store.
Signed-off-by: Sage Weil <sage@redhat.com>
This is really a stop-gap. Since we are doing reads in the pre-commit
write path, we need to have some sort of buffer cache so that a sequence
of writes in the same transaction can remain coherent (the second write
must "read" the first write in order to fill out the chunk).
Signed-off-by: Sage Weil <sage@redhat.com>
- simplified wal_op_t. we still have overlays in there, although that
might need to get removed soon too.
- init_csum cleanup
- totally new write path
Signed-off-by: Sage Weil <sage@redhat.com>