The all_participants is only updated on add_interval and
cannot be rebuilt from the intervals vector alone since
we only keep the smallset subset of peers to probe that
we need. And it wasn't being encoded/decoded. Or
dumped.
Signed-off-by: Sage Weil <sage@redhat.com>
If we are crossing an osdmap gap, clear our old past
interval since the information is stale: someone else
took the pg clean after everything we know if the maps
were trimmed.
This avoids an assert later in the past intervals check.
Signed-off-by: Sage Weil <sage@redhat.com>
We have a last_epoch_started value in pg_info_t; store
the corresponding last_interval_started value alongside
it.
Signed-off-by: Sage Weil <sage@redhat.com>
When we get a pg create from the mon we don't get
PastIntervals with it; generate it from scratch as
needed.
Signed-off-by: Sage Weil <sage@redhat.com>
These will track last_epoch_{started,clean} but match
the first epoch in the interval instead of the epoch when
the event happened. We didn't end up need this now, but
I suspect it will be useful in the future.
Signed-off-by: Sage Weil <sage@redhat.com>
The bounds are based on last_epoch_clean, which can
happen at any point during an interval (usually not the
beginning!). Instead of trying to ensure that the
PastIntervals include the oldest interval, just ensure
that they go at least as far back as last_epoch_clean.
This means that we might have *more* intervals, but given
that all we ever do is *clear* past_intervals when we
go clean, I don't think there is much value in trying
assert more.
Signed-off-by: Sage Weil <sage@redhat.com>
We may have no intervals but still be non-empty (have a first and
last), because during that period there were no osds. This easily
happens when a pool is created before osds are up.
Signed-off-by: Sage Weil <sage@redhat.com>
Because we are no longer explicitly tracking all past intervals
I think it is a good idea to log these more aggressively. This will
give us potentially vital information when debugging peering problems
(says someone who just debugged a peering problem in which the past
interval information provided a vital clue).
Signed-off-by: Sage Weil <sage@redhat.com>
Removes the asserts relating to past_intervals state.
They don't really make sense with the new representation,
we'll add new tests for that next.
Signed-off-by: Samuel Just <sjust@redhat.com>
c7d92d1d3f introduced this back when
the acting set could contain incomplete peers during backfill. That
hasn't been true since dumpling. Now, any interval where the acting
set contains an incomplete peer cannot possibly go active. Thus, it
can't change last_epoch_started or history.last_epoch_started. Thus,
even though choose_acting omits incomplete peers, the answer can't
change.
Signed-off-by: Samuel Just <sjust@redhat.com>