We don't want to seal HitSets just because we're writing a
snapshot to disk; it potentially shrinks the in-memory one
we want to keep adding stuff to!
Signed-off-by: Greg Farnum <greg@inktank.com>
Any time we persist a hit_set object, take the opportunity to remove any
old ones that we don't want any more.
Note that this means if the admin decreases the number of objects to track,
we won't remove them until the next time we persist something. We also
don't clean up if the HitSet tracking is disabled entirely.
Signed-off-by: Sage Weil <sage@inktank.com>
This lets us put PGLS in a compound operation. Nothing does that yet, but
this would allow it.
Despite appearances, this is not a protocol change and does not require
a feature bit for clients: using the osd_ops vector mechanisms store all
the data in the same places as before, it just fills in some of the
already-decoded-but-empty data structures in the MOSDOpReply header.
<Greg note:> We may need a feature bit to let clients know they can send
compound PG ops to OSDs, though? Or maybe we can let it be covered
by supporting hitset ops.
Signed-off-by: Sage Weil <sage@inktank.com>
Add pool properties to control what type of HitSet we want to use, along with
some (mostly generic) parameters.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Track metadata about the currently accumulating HitSet as well as
previously archived ones in the pg_info_t. This will not scale well for
extremely long histories, but does let us avoid explicitly sharing this
metadata during recovery or other normal update activity.
Signed-off-by: Sage Weil <sage@inktank.com>
Track a set of hash values, either explicitly or using a bloom_filter. Hide
the implementation and allow us to transparently encode and decode.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
This makes it easier to create repops correctly, and should help
prevent bugs like the one we remove here in process_copy_op (we were
serializing on the wrong object!)
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
There are times when users may need to make sure the client has the
latest osdmap, for example after sending a mon command modifying
pool properties.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
squash "librados: add wait_for_latest_osdmap()"
The hashing is dependent on pool properties; capture (more of) it in a
method instead of having it in OSDMap.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The hash value, if provided, becomes the ps (placement seed) portion of the
pg_t, skipping any hashing of the object name (or locator key).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Instead of hashing the object name or key, we allow the hash position to be
provided explicitly.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
This way we can set the compatv preferentially depending on whether
we've actually encoded new information or not.
Signed-off-by: Greg Farnum <greg@inktank.com>
Return to caller at the end of each PG. This allows the caller to look at
the [pg_]hash_position and get something meaningful.
If there are no objects in the PG, we skip it so that every callback has
*some* data (unless the pool is totally empty!). So the real difference
here is that we don't move on to the next PG just to reach max_entries.
This gives the client some data sooner, but may mean more callbacks into
client code.
Signed-off-by: Sage Weil <sage@inktank.com>
The pgid field is used to store the pg the op mapped to. We were just
setting it directly for PGLS. Instead, fill in a new base_pgid, and copy that
to pgid in recalc_op_target(), the same way we do when we map an object
name to a PG.
In particular, we take this opportunity to map a raw pgid to an actual
pgid. This means the base_pg could come from a raw hash value (although
it doesn't, yet).
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
We don't use preferred placements any more, so this will
make it easier to start dropping references to it in new code.
Signed-off-by: Sage Weil <sage@inktank.com>
Fixes: #6678
We don't want to allow regular users to write to secondary zones,
otherwise we'd end up with data inconsistencies.
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>