Previously, ScrubMap::objects was always sorted bitwise (nibblewise
before the comparator change was made. It didn't matter because we
always scrubbed whole hash values. Now, we need the objects in the
objectstore ordering because we may be missing objects at the tail of
the scanned range and need them to show up at the tail of the
ScrubMap::objects mapping. We don't need to do anything else to handle
the upgrade process since the actual objects *in* the map were
determined by the objectstore ordering.
Signed-off-by: Samuel Just <sjust@redhat.com>
Previously, we needed to scrub all objects in clones in a single
hash value mainly to ensure that _scrub had access to all clones
of a single object at the same time. Instead, just avoid letting
head or snapdir be a boundary (see the comment in the commit
for details).
Signed-off-by: Samuel Just <sjust@redhat.com>
We were encoding the message with the sending client's
features, which makes no sense: we need to encode with
the recipient's features so that it can decode the
message.
The simplest way to fix this is to rip out the bizarre
msg_bl handling code and simply keep a decoded Message
reference, and encode it when we send.
We encode the encapsulated message with the intersection
of the target mon's features and the sending client's
features. This probably doesn't matter, but it's
conceivable that there is some feature-dependent
behavior in the message encode/decode that is important.
Fixes: http://tracker.ceph.com/issues/17365
Signed-off-by: Sage Weil <sage@redhat.com>
m_require_lock_on_read should be cleared when holding owner_lock.
For safety, also check that exclusive_lock is not null.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
It's possible the watch/notify message is duplicated resulting in two
concurrent block_requests() call.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Currently fadvise_flags is only used to check whether
buffered write is necessary, so there is no need
to keep it in the WriteContext as we have already
pre-calculated and kept the buffered field instead.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
When we added this way back in d4f4fa0312,
we did not have our own buffer cache, and were relying
on the cache at the BlockDevice layer. In that case,
we would have the problem of a partial wal overwrite
followed by another partial write that needed to read
the rest of the chunk.
However, now we have our own cache, and any data we write
in the _do_write_small() wal path will go into the cache,
which means we will never read the old data off of
disk and need the old csum values.
Remove this now-unnecessary kludge!
Signed-off-by: Sage Weil <sage@redhat.com>
Write N bytes of garbage to the kv store on startup. With rocksdb,
this ensures that our log files are preallocated.
This option needs to match up with the rocksdb tunables so that it
is enough data to start recycling log files. For now, start with
128MB.
Signed-off-by: Sage Weil <sage@redhat.com>