I revved this message and forgot to set the compat version correctly,
preventing post-change (e.g., bobtail) OSDs from talking to pre-change
(e.g., argonaut) monitors. This was in b64641c.
Signed-off-by: Sage Weil <sage@inktank.com>
Bending over backwards hasn't made coverity happy. We'll just ignore it
there.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c517fde269)
We should never consider old 'acks' from monitors on a new election. We
usually do it, but we didn't if an election expired, because this code
didn't foresee the possibility of monitors changing ranks in-between
elections -- which doesn't happen if we specify the monmap during the
monitor's mkfs, but may happen when relying on 'mon initial peers'.
Failing to do so triggered an assertion after fixing bug #3252.
Backport: argonaut
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Whenever we update the monmap we should bootstrap, in order to reset the
monitor's on-going activities and re-probe.
Not doing so contributed to bug #3252, during which we entered an infinite
election cycle. This may only happen though when we rely on 'mon initial
peers'. Specifying a monmap during the monitor's mkfs should not trigger
this bug.
Fixes: #3252
Backport: argonaut
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
We cannot propose until they all recover.
Fixes: #3260
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Cast to (unsigned long) when checking for magic values, so
real ptrs don't get sign-extended. Avoids triggering
assert(inq == &local_queue) failure.
Fixes: #3251
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
In collection_list_range, use an empty vector to pass into
collection_list_partial. collection_list_partial stops
listing when the output vector exceeds the specified max.
If this happens before we hit the end of the range,
collection_list_range will spin forever.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Mike Ryan <mike.ryan@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Check that allow_all() returns false when 'allow *' is not specified.
This would have caught #3228.
Add tests for the output operators as well.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
OSD_CAP_ANY is not a mask. Treating it as one made any allowance
equivalent to 'allow *'.
Fixes: #3228
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
OSD_CAP_ANY is not a flag, but a value (0xff) that will always
be true when treated as a mask with a non-zero rwxa_t.
Don't duplicate the rwxa_t output operator in the OSDCapSpec output
operator, just use it.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This moves the shutdown of the messenger outside of the client
to be able to handle error cases more appropriately.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Add perf counters tracking the number of inbound pushes along with the
amount of data in each request.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Previously, we considered backfill_pos degraded in order to delay
ops since a write to backfill_pos could generate a snap before
backfill_pos, and we assume that (0, backfill_pos) is fully
backfilled. This is a problem since it's possible that
backfill_pos is a valid object, but not one that currently exists.
For example, it might have been deleted since last_backfill was
last changed. Instead, we will explicitly delay ops on
backfill_pos in waiting_for_backfill_pos.
This error resulted in #2691 since wait_for_degraded_object also
attempts to recover the object. At this point, the primary would
attempt to recover the object, find that it isn't there, and put
it in the missing set with need=0,0. Eventually, recover_primary
attempts to recover that object, finds that it has been deleted
in the log, and asserts.
Signed-off-by: Samuel Just <sam.just@inktank.com>
When we encounter a divergent log entry, we put the
object into the missing set at the prior_version
for the divergent event. Unfortunately, the event
at prior_version might have been trimmed leaving
the missing set with an item with a need prior to
log_tail. Thus, last_complete also ends up being
prior to log_tail.
Caused #3208.
There is another bug related to this one: #3213.
Signed-off-by: Samuel Just <sam.just@inktank.com>