If we have epoch X and find out we died as of epoch Y, we still want to
request X+1. Among other things, this fixes a 'stall' if Y happens to be
the most recent map published and no new maps are generated because we will
never get anything back from our subscription.
This makes this osdmap_subscribe() caller match every other caller by
passing in current epoch + 1.
Fixes: #8002
Signed-off-by: Sage Weil <sage@inktank.com>
This is useful only for debugging. The encoded contents of a message are
dumped to the log on message send. This is useful when valgrind is
triggering warnings about uninitialized memory in messages because the
call chain will indicate which message type is to blame, whereas the
usual writer thread context does not tell us any useful information.
Signed-off-by: Sage Weil <sage@inktank.com>
We should not respond to checks for map versions when we are in the
probing or electing states or else clients will get incorrect results when
they ask what the latest map version is.
Fixes: #7997
Signed-off-by: Sage Weil <sage@inktank.com>
This ensures that they get new maps before an op which requires them (that
they would then request from the monitor).
Signed-off-by: Greg Farnum <greg@inktank.com>
The hit_set transactions may include both a modify of the new hit_set and
deletion of an old one, spanning the backfill boundary, and we may end up
sending a backfill target a blank transaction that does not correctly
remove the old object. Later it will notice the stray object and
throw an assertion.
Fix this by skipping hit_set_persist() if any of the backfill targets are
still working on the very first hash value in the PG (which is where all
of the hit_set objects live). This is coarse but simple.
Another solution would be to send separate ops for the trim/deletion and
new hit_set update, but that is a bit more complex and a bit more
runtime overhead (twice the messages).
Fixes: #7983
Signed-off-by: Sage Weil <sage@inktank.com>
This reintroduces the same semantics that were in place in dumpling prior
to the refactoring of the cap/command matching code.
We haven't added this requirement to auth read-write operations as that
would have the potential to break a lot of well-configured keyrings once
the users upgraded, without any significant gain -- we assume that if
they have set 'rw' caps on a given entity, they are indeed expecting said
entity to be sort-of-privileged entities with regard to monitor access.
Fixes: #7919
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The following
./ceph osd pool create data-cache 8 8
./ceph osd tier add data data-cache
./ceph osd tier cache-mode data-cache writeback
./ceph osd tier set-overlay data data-cache
./rados -p data create foo
./rados -p data stat foo
results in
error stat-ing data/foo: No such file or directory
even though foo exists in the data-cache pool, as it should. STAT
checks for (exists && !is_whiteout()), but the whiteout flag isn't
cleared on CREATE as it is on WRITE and WRITEFULL. The problem is
that, for newly created 0-sized cache pool objects, CREATE handler in
do_osd_ops() doesn't get a chance to queue OP_TOUCH, and so the logic
in prepare_transaction() considers CREATE to be a read and therefore
doesn't clear whiteout. Fix it by allowing CREATE handler to queue
OP_TOUCH at all times, mimicking WRITE and WRITEFULL behaviour.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
When getting a REJECT from a backfill target, tell already GRANTed targets to
go back to RepNotRecovering state by sending a REJECT to them.
Fixes: #7922
Signed-off-by: David Zafman <david.zafman@inktank.com>
Fixes: #7978
We tried to move to the next placement rule, but we were already at the
last one, so we ended up looping forever.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
This will make the OSD randomly reject backfill reservation requests. This
exercises the failure code paths but does not break overall behavior
because the primary will back off and retry later.
This should help us reproduce #7922.
Signed-off-by: Sage Weil <sage@inktank.com>
Create a custom profile with ruleset-failure-domain=osd. (The default
ruleset-failure-domain=host won't do because this script assumes and
works only if all osds are on the same host.) While at it, set k and m
explicitly to avoid troubles in the future.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Unmap rbd images when stopping the whole cluster. Not doing so results
in images that cannot be unmapped until the same cluster is brought
back up. Issue a warning if we failed to unmap all images.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Command tracing here doesn't bring any value and simply pollutes the
terminal, as the script always runs to completion.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Set ruleset-failure-domain=osd so that
./ceph osd pool create ecpool 12 12 erasure
./rados --pool ecpool put SOMETHING /etc/group
works by default. When using a vstart cluster the default failure
domain (host) won't work because all OSDs are in "localhost".
Signed-off-by: Loic Dachary <loic@dachary.org>
If we shut down, clear out all of the lockdep state. This ensures that if
we start up again on another cct, we will not be confused by old type ids
and dependency state.
Possibly contributed to #7965.
Signed-off-by: Sage Weil <sage@inktank.com>
If we have already registered a cct for lockdep, do not accept another one.
We already check that the cct matches when we shut down. This we will run
for the life span of a single cct and no longer.
Fixes: #7965
Signed-off-by: Sage Weil <sage@inktank.com>
When we make an existing pool a tier, we start copying the snap metadata
from the base tier. That includes removed_snaps. In order for the OSD
to recognize that this value is changing for the first time, we need to
set snap_epoch, or else the OSD doesn't update it's in-memory PGPool
with removed snaps and we eventually hit an assertion failure because
PGPool::cached_remove_snaps is incorrect (e.g., empty).
Fix this by bumping snap_epoch when we add the new tier.
Fixes: #7915
Signed-off-by: Sage Weil <sage@inktank.com>