In the case of a replicated pool, the pg will transition to "peered"
rather than "active", where it can perform backfill bringing itself up
to min_size peers. To that end, most operations now require
is_peered rather than is_active (OSDOps being the primary exception).
Also, rather than using the query_epoch on the activation message as the
activation epoch (for last_epoch_started) on the replica, we instead
use the last_epoch_started in the info sent by the primary. This
allows the primary to not advance last_epoch_started past the last
known actual activation. This will prevent later peering epochs from
requiring the last_update from a peered epoch to go active (might be
divergent).
Fixes: #7862
Signed-off-by: Samuel Just <sam.just@inktank.com>
Otherwise, we might later go peered (not active) and not distribute
a non-0 last_epoch_started. This should be safe as the log recipient
will have a last_update reflecting that interval as well.
Signed-off-by: Samuel Just <sjust@redhat.com>
waiting_for_peered now holds ops until peering completes (activation,
not necessarily state active). waiting_for_active now holds
specifically MOSDOp blocked on:
- scrub
- replay
- state active
Signed-off-by: Samuel Just <sam.just@inktank.com>
crush: new straw2 bucket
Reviewed-by: Joao Eduardo Luis <joao@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
As far as I can tell, the posix_fadvise() distinction between WONTNEED and
NOREUSE is subtle: one says I won't access the data, and the other says
I will access it one more time and then not access it. That is, the
distinction is about time. This thread seems to confirm this
interpretation:
https://lkml.org/lkml/2011/6/27/44
Since we are attaching hints to the IO operations themselves, this
distinction doesn't make much sense for us. (Backends should be careful
about which hint they use; or rather, they should use WONTNEED *after*
doing the IO since NOREUSE is presenting a no-op in Linux.)
However, we want to make a totally different distinction:
WONTNEED - nobody will access this -> drop it from the cache
NOCACHE - *i* won't access this again -> don't let me affect your caching
decisions or the working set you're maintaining for other
clients.
The NOCACHE name is made-up and distinct from NOREUSE only so that it is
different from POSIX and doesn't introduce confusion for people familiar
with the POSIX meaning. Perhaps a more accurate name would be IWONTNEED
but that is only one character apart and too error-prone IMO.
Signed-off-by: Sage Weil <sage@redhat.com>
We convert old entries anyway, just complicates everything. Only use
that was kept is the one needed for the conversion function.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
If reading a bucket replicalog entry and one doesn't exist, fall back to
the old key, and convert it to the new one. When updating entries, if
entry does not exist do the same.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
The RBD_FEATURES environment variables was not being exported to
the Python and C++ integration tests. This resulted in the same
test cases being run multiple times instead of testing different
RBD features.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Fixes: #8251
Previously we were indexing the replica log only by bucket name, even
though we were provided with the bucket instance id. Fix that, and also
add the option to be able to revert to the old behavior. For
radosgw-admin we can do it if --replicalog-index-by-instance=false is
provided. For the replica log REST api we now have the index-by-instance
http param (defaults to true).
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>