It so happens that it's not safe to assume the monmap will be in an
empty state upon decoding.
Turns out the MonClient will reuse the MonMap instance when decoding
the just received map from the monitors. Should the monitors be on an
older version that do not support 'mon_info', this field will not be
decoded (after all, there's no field to decode from); but by this time,
the MonClient would already have a built monmap, which could have
populated 'mon_info' with temporary mon names from 'mon initial
members'.
Given the existing entries in 'mon_info', and the conflicting entries in
'mon_addr', we would end up asserting in 'sanitize_mons()'. This becomes
a non-issue if 'mon_info' is empty, as was unfortunately presumed.
Fixes: http://tracker.ceph.com/issues/18265
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
For cephfs, 4M buffer can only encode about 5k dentries. It's too
small for directory.
Fixes: http://tracker.ceph.com/issues/18314
Signed-off-by: Yan, Zheng <zyan@redhat.com>
qa: fixed script to schedule rados and other suites with --subset option
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
89fd030bf9 switched them to show up
as reads to avoid logging them, but we still pipeline them with
reconnects. Thus, also force them to be rwordered.
Fixes: http://tracker.ceph.com/issues/18310
Signed-off-by: Samuel Just <sjust@redhat.com>
client: set metadata["root"] from mount method when it's called with …
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: John Spray <john.spray@redhat.com>
os/bluestore: include modified objects in flush list even if onode unchanged
Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
...before sending a tell command. Otherwise osd.2 might
start without 1, the io unblocks, and the tell fails
because osd.1 is still down.
Fixes: http://tracker.ceph.com/issues/18303
Signed-off-by: Sage Weil <sage@redhat.com>
OSD split transactions look something like
mkcoll new
split old
...
omap_rmkey_range old
omap_setkeys old
omap_setkeys new
The last part splits the log into two pieces. The
problem is that the rmkey_range needs to wait on old
omap transactions to flush, and those are linked to the
old onode, and split clears the cache. The result is
that we don't wait, rmkeyrange leaves some recent pg log
keys behind, and on OSD restart we get an error because
the object doesn't belong to the (old) collection.
Fix this by preserving objects in the old collection and
only clear out objects that are moving to the newly
split collections. This will include the pgmeta object
that we care about.
(Note that we are one step closer to preserving the
cache contents across the split, but not quite there
yet: at this point we don't have all of the destination
collections. A change in the ObjectStore interface is
probably needed to make that not be extremely awkward.)
Signed-off-by: Sage Weil <sage@redhat.com>
/home/jenkins/workspace/ceph-master/src/include/str_list.h:99:10: warning: moving a local object in a return statement prevents copy elision [-Wpessimizing-move]
return std::move(str_vec);
^
/home/jenkins/workspace/ceph-master/src/include/str_list.h:99:10: note: remove std::move call here
return std::move(str_vec);
^~~~~~~~~~ ~
1 warning generated.
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
We use the onode flush list so that we can ->flush() as
a barrier before doing any read/modify/write. For
example, omap_rmkeyrange will flush before reading to
see what keys to erase in order to ensure that any
previous inserts are applied to the db and we see them
and remove them.
However, some omap operations don't update the onode
itself, which means write_onode() doesn't get called and
we aren't put on this list.
Add a note_modified_object() helper that can be called
instead of write_onode() for those cases. That way we
get on the list and flush() works as expected.
We could have resolved this by just putting ourselves on
the dirty onode list, but in practice every OSD op is
writing omap keys to the pgmeta object and there is no
need to touch the onode key in this case, so doing so
would be a big regression.
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ramesh Chander <Ramesh.Chander@sandisk.com>
zone block offset related test case
Signed-off-by: Ramesh Chander <Ramesh.Chander@sandisk.com>