* Update to the current state of the ghobject implementaiton and the fact
that they encode the shard_t Although the pool also contains the shard
id, it is less relevant to understand the implementation.
* Update with the erasure code plugin infrastructure and the example
plugin now in master.
* Move jerasure to a separate page to be expanded and link it from the
toc
* Kill the partial read and writes notes as it will probably not be
implemented in the near future. Kill some of the notes because they
are no longer relevant.
* Add a definition for "chunk rank"
* Reword, update schemas, fix typos.
Signed-off-by: Loic Dachary <loic@dachary.org>
Intuition differs regarding the sort order of the ghobject shard and
generation. Document the rationale for the chosen sort order.
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
The get_filestore_key* methods are changed to just call the
corresponding hobject methods instead of providing an identical
implementation.
Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
standby replay may fall behind and get -ENOENT when reading the
journal. return -EAGAIN in this case, it makes the MDS respawn itself.
fixes: #5458
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This is incomplete and unfortunately unusable in its current state:
- it would only set USES_TMAP for old encoded object_info_t and tmapput,
but would NOT set it for tmapup
- a config option turned that off by default.
That means that the mds conversion from tmap -> omap won't be able to use
this because any existing cluster has tmap objects without the USES_TMAP
flag set. And we don't want to unconditionally try a tmap->omap conversion
on omap operations because there are lots of existing librados users out
there that will be negatively impacted by this.
Instead, the MDS will need to handle this conversion on the client side by
reading either tmap or omap objects and explicitly rewriting the content
with omap (while truncating the tmap data away).
The auto-conversion function was added in v0.44.
Signed-off-by: Sage Weil <sage@inktank.com>
ISDIRTY will query whether the dirty flag is set on an object. UNDIRTY
will explicitly clear it. Note that a user doing so will likely run amok
with the caching code.
Signed-off-by: Sage Weil <sage@inktank.com>
If we have an existing CrushWrapper object and decode from a bufferlist,
reset build_rmaps so that they get rebuilt.
Remove the build_rmaps() all in decode that was useless on a redecode
(because have_rmaps == true in that case and it did nothing).
Fixes: #6442
Backport: dumpling, maybe cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
I hate this; it makes it impossible to see that there was an error message.
We made this same change a while back with rbd.
Signed-off-by: Sage Weil <sage@inktank.com>
Treat the second encoded bool as bits 9-16 of a (now) 16-bit flags field,
and use bit 9 (what used to be set by the use_tmap bool) as FLAG_USES_TMAP.
No encoding compatibility change.
Signed-off-by: Sage Weil <sage@inktank.com>
In each case, we treat the whiteout as if we got an ENOENT.
We do not change the semantics of bool exists to avoid breaking lots of
potentially fragile code. We are only interested in changing the
user-visible behavior of the object, not the way it is internally stored
or managed.
This will likely be refined as we grow acutal users for whiteoutes in the
pool caching code.
Signed-off-by: Sage Weil <sage@inktank.com>
This is more generic. We could also fold uses_tmap flag into here,
but the encoding change for that is non-trivial.
Signed-off-by: Sage Weil <sage@inktank.com>
C_OSD_SendMessageOnConn doesn't need to lock the pg.
Canceling it resulted in a leaked message.
Fixes: 6443
Signed-off-by: Samuel Just <sam.just@inktank.com>
While analyzing a log from Mike Dawson I saw a long stall while librbd's
objectcacher was starting lots (many hundreds) of IOs. Limit the amount of
time we spend doing this at a time to allow IO replies to be processed so
that the cache remains responsive.
I'm not sure this warrants a tunable (which we would need to add for both
libcephfs and librbd).
Signed-off-by: Sage Weil <sage@inktank.com>
Fixes: #6123
We don't want to know about failing to read region map info
if it's not found, only if failed on some other error. In
any case it's just a warning.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The cancel_copy() method removes the entry from copy_ops. Move the
iterator forward before calling.
Fixes segfault when thrashing osds with a copy-from workload.
Signed-off-by: Sage Weil <sage@inktank.com>
Treat "need 1, got 0" as a special case, and change the message to
"missing required parameter <x>". Also, when failing for that reason,
print the command concise description and its helptext.
Fixes: #6384
Signed-off-by: Dan Mick <dan.mick@inktank.com>
The device could only be set to rw(default) when mapping
now. This patch only handle the user space, because the
kernel part has been completed.
Signed-off-by: Guangliang Zhao <guangliang@unitedstack.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #6444
Backport: dumpling
If pool creation fails (e.g., due to -EEXIST) then we leak the
completion object. Earlier we couldn't just drop the reference, as
librados have already removed the internal completion object. This fix
drop the completion reference even if got an error, which is now
possible.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Backport: dumpling
The pool async delete / create function used to delete the internal
completion object. However, caller still holds the allocated completion
object, which it can't drop a reference to (as it'd try to deallocate
the already freed internal object). This fix removes the internal object
deletion, a following commit will fix a related leak (#6444) by having
the application (radosgw) drop the reference even if got an error.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>