Switch to use pointers for the out parameters instead of references.
These functions are still just pointing at the front of the generated
lists for the "primary" params, but now that all their callers respect
these outputs we can add programmatic leader assignment with just these
two functions.
Signed-off-by: Greg Farnum <greg@inktank.com>
And use pointers instead of references for out params.
Now pg_to_up_acting_osds and pg_to_acting_osds can plug in to this slightly
more real implementation, instead of making up their own. (We are still
just using the first member anyway, but we're about to plug it into
the bottom layer of functions.)
Signed-off-by: Greg Farnum <greg@inktank.com>
So that this works with future CRUSH changes, we copy the map and clear
out the primary_temp, then compare its output with the real map's output. If
they match, remove the primary_temp from the real map.
Signed-off-by: Greg Farnum <greg@inktank.com>
Bring our OSDMap encoding into the modern Ceph world! :) This is
fairly straightforward, but has a few rough edges:
Previously we had a "struct_v" which went at the beginning of the
OSDMap encoding, and then later on an ev "extended version" which
was used to store the more-frequently-changed OSDMap pieces. There
was no size information stored explicitly to let clients skip this,
but osd maps were always encoded into their own bufferlist before
being sent to clients, which had the same effect.
We now use the modern ENCODE_START three times:
1) for the overall OSDMap encoding,
2) for the client-usable portion of the map,
3) for the "extended" portion of the map
This will let us independently rev everything, which may come in
useful if we want to (for instance) add a "monitor" portion to the
map that the OSDs don't care about. It also makes adding new
client information a lot easier since older clients will still
be able to decode the map as a whole.
We may want to merge this OSDMAP_ENC feature with one of the others
we are creating during this cycle, since they're all very closely
related. That will also let us protect more naturally against old
clients getting a map they need to understand but can't (because
we only need the new map features-to-come when used with erasure-encoded
PGs, etc).
Signed-off-by: Greg Farnum <greg@inktank.com>
Use our pointer calling conventions instead of a reference for the
new version of the function.
Right now we're just setting the primaries equal to the first member
of up and acting (or -1 if none), but very shortly we'll modify our
private OSDMap functions to export them based on the contents of temp_primary.
While in general anybody querying for the mapping information will
need to pay attention to whom the primary is as well, we have lots
of callers who will need real code changes to do so. To serve them,
we keep a version that does not export the primary, but asserts
that the primary matches the first entry in its list.
Signed-off-by: Greg Farnum <greg@inktank.com>
This function does not (and never has!) used the raw vector, so remove it
and don't use a name which implies it is doing any sort of conversion.
Signed-off-by: Greg Farnum <greg@inktank.com>
These were the same except for a call to _raw_to_up_osds(). Move the
existing pg_to_up_acting_osds into a private function taking a pointer,
only fill in the up vector if it's a non-NULL pointer, and call it via
the obvious header implementations.
Signed-off-by: Greg Farnum <greg@inktank.com>
This was used only by SyntheticClient, and that wants get_pg_acting_primary()
anyway. Delete the easily-misused get_pg_primary() and switch.
Signed-off-by: Greg Farnum <greg@inktank.com>
Some of these look like what you should use for mapping and they absolutely
are not suitable for that. Make it clearer.
Signed-off-by: Greg Farnum <greg@inktank.com>
We've been using the role returned from this to determine if we're
the primary or not. Don't.
This is mostly about removing a few asserts; while in there I also
redirected some calls to use static dereference instead of going through
the osdmap lookup path.
Signed-off-by: Greg Farnum <greg@inktank.com>
When run in a shared environment ( as opposed as a machine created for
the purpose of running this test only ), it is important to cleanup
leftovers to avoid poluting the /tmp space. Create a common temporary
directory for all tmp files.
Signed-off-by: Loic Dachary <loic@dachary.org>
snap/clone promotion, flush, and other goodies
This is now passing the thrashing with both cache and snap ops:
sage-2014-01-13_15:45:26-rados:thrash-wip-cache-snap-testing-basic-plana
Reviewed-by: Samuel Just <sam.just@inktank.com>
find_object_context() has all the logic to choose a particular clone given
a logical snap. In the trim case, we want none of that: we just need to
pull the obc for a specific clone instance. Note that this changes
none of the failure cases (previous we asserted r == 0).
Signed-off-by: Sage Weil <sage@inktank.com>
We were fabricating an object_info_t correctly and writing it to disk, but
it was not reflected by the in-memory ObjectContext. If something came
along quickly (like backfill) and tried to use it, the info would be
invalid.
Fix this by fabricating it in the obc and copying it to the new_obs for
the update.
Fixes: #7122
Signed-off-by: Sage Weil <sage@inktank.com>
Previously, if a snap was deleted but the clone was there and we hadn't
trimmed it yet, we would still return the data. Instead, return ENOENT
unconditionally (even it's not removed yet). This makes the behavior from
the client perspective more predictable and conistent.
Signed-off-by: Sage Weil <sage@inktank.com>
This reliably returns ENODEV due to the test at the finish of flush. Not
because we are actually racing with trim, though: the trimmer doesn't run
at all. I believe it captures the important property, though. Namely:
we should not write a promoted object that is "behind" the snap trimmer's
progress. The fact that we are in front of it (the trimmer hasn't started
yet) should not matter since the object is logically deleted anyway.
We probably want to make the OSD return ENODEV on read in the normal case
when you try to access a clone that is pending trimming.
Signed-off-by: Sage Weil <sage@inktank.com>
If the object no longer exists (for example, because the snap trimmer just
killed it) clean up the flush state without trying to mark the object
clean.
Signed-off-by: Sage Weil <sage@inktank.com>
If we are promoting a clone and realize that the object is no longer
defined for any snaps, abort the copy and delete any temp object.
If the defined snaps have changed, make sure they are updated in memory
so that on promote completion the snapshot metadata is correct.
Signed-off-by: Sage Weil <sage@inktank.com>
Previously the caller was generating a temp object name and passing it
down in severaly different ways. Instead, generate one when we realize
that we need it, and store it in *one* place (CopyResults), where
the completions can get at the information.
Signed-off-by: Sage Weil <sage@inktank.com>