mirror of
https://github.com/ceph/ceph
synced 2025-01-20 10:01:45 +00:00
Merge pull request #35338 from myoungwon/wip-doc-manifet-snap
doc/dev/osd_internals/manifest.rst: add information about clone snap refcounting Reviewed-by: Myoungwon Oh <myoungwon.oh@samsung.com> Reviewed-by: Samuel Just <sjust@redhat.com>
This commit is contained in:
commit
942c73bf07
@ -218,6 +218,76 @@ we may want to exploit.
|
||||
The dedup-tool needs to be updated to use LIST_SNAPS to discover
|
||||
clones as part of leak detection.
|
||||
|
||||
An important question is how we deal with the fact that many clones
|
||||
will frequently have references to the same backing chunks at the same
|
||||
offset. In particular, make_writeable will generally create a clone
|
||||
that shares the same object_manifest_t references with the exception
|
||||
of any extents modified in that transaction. The metadata that
|
||||
commits as part of that transaction must therefore map onto the same
|
||||
refcount as before because otherwise we'd have to first increment
|
||||
refcounts on backing objects (or risk a reference to a dead object)
|
||||
Thus, we introduce a simple convention: consecutive clones which
|
||||
share a reference at the same offset share the same refcount. This
|
||||
means that a write that invokes make_writeable may decrease refcounts,
|
||||
but not increase them. This has some conquences for removing clones.
|
||||
Consider the following sequence ::
|
||||
|
||||
write foo [0, 1024)
|
||||
flush foo ->
|
||||
head: [0, 512) aaa, [512, 1024) bbb
|
||||
refcount(aaa)=1, refcount(bbb)=1
|
||||
snapshot 10
|
||||
write foo [0, 512) ->
|
||||
head: [512, 1024) bbb
|
||||
10 : [0, 512) aaa, [512, 1024) bbb
|
||||
refcount(aaa)=1, refcount(bbb)=1
|
||||
flush foo ->
|
||||
head: [0, 512) ccc, [512, 1024) bbb
|
||||
10 : [0, 512) aaa, [512, 1024) bbb
|
||||
refcount(aaa)=1, refcount(bbb)=1, refcount(ccc)=1
|
||||
snapshot 20
|
||||
write foo [0, 512) (same contents as the original write)
|
||||
head: [512, 1024) bbb
|
||||
20 : [0, 512) ccc, [512, 1024) bbb
|
||||
10 : [0, 512) aaa, [512, 1024) bbb
|
||||
refcount(aaa)=?, refcount(bbb)=1
|
||||
flush foo
|
||||
head: [0, 512) aaa, [512, 1024) bbb
|
||||
20 : [0, 512) ccc, [512, 1024) bbb
|
||||
10 : [0, 512) aaa, [512, 1024) bbb
|
||||
refcount(aaa)=?, refcount(bbb)=1, refcount(ccc)=1
|
||||
|
||||
What should be the refcount for aaa be at the end? By our
|
||||
above rule, it should be two since the two aaa refs are not
|
||||
contiguous. However, consider removing clone 20 ::
|
||||
|
||||
initial:
|
||||
head: [0, 512) aaa, [512, 1024) bbb
|
||||
20 : [0, 512) ccc, [512, 1024) bbb
|
||||
10 : [0, 512) aaa, [512, 1024) bbb
|
||||
refcount(aaa)=2, refcount(bbb)=1, refcount(ccc)=1
|
||||
trim 20
|
||||
head: [0, 512) aaa, [512, 1024) bbb
|
||||
10 : [0, 512) aaa, [512, 1024) bbb
|
||||
refcount(aaa)=?, refcount(bbb)=1, refcount(ccc)=0
|
||||
|
||||
At this point, our rule dictates that refcount(aaa) is 1.
|
||||
This means that removing 20 needs to check for refs held by
|
||||
the clones on either side which will then match.
|
||||
|
||||
See osd_types.h:object_manifest_t::calc_refs_to_drop_on_removal
|
||||
for the logic implementing this rule.
|
||||
|
||||
This seems complicated, but it gets us two valuable properties:
|
||||
|
||||
1) The refcount change from make_writeable will not block on
|
||||
incrementing a ref
|
||||
2) We don't need to load the object_manifest_t for every clone
|
||||
to determine how to handle removing one -- just the ones
|
||||
immediately preceeding and suceeding it.
|
||||
|
||||
All clone operations will need to consider adjacent chunk_maps
|
||||
when adding or removing references.
|
||||
|
||||
Cache/Tiering
|
||||
-------------
|
||||
|
Loading…
Reference in New Issue
Block a user