Instead of hashing the object name or key, we allow the hash position to be
provided explicitly.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
This way we can set the compatv preferentially depending on whether
we've actually encoded new information or not.
Signed-off-by: Greg Farnum <greg@inktank.com>
Return to caller at the end of each PG. This allows the caller to look at
the [pg_]hash_position and get something meaningful.
If there are no objects in the PG, we skip it so that every callback has
*some* data (unless the pool is totally empty!). So the real difference
here is that we don't move on to the next PG just to reach max_entries.
This gives the client some data sooner, but may mean more callbacks into
client code.
Signed-off-by: Sage Weil <sage@inktank.com>
The pgid field is used to store the pg the op mapped to. We were just
setting it directly for PGLS. Instead, fill in a new base_pgid, and copy that
to pgid in recalc_op_target(), the same way we do when we map an object
name to a PG.
In particular, we take this opportunity to map a raw pgid to an actual
pgid. This means the base_pg could come from a raw hash value (although
it doesn't, yet).
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
We don't use preferred placements any more, so this will
make it easier to start dropping references to it in new code.
Signed-off-by: Sage Weil <sage@inktank.com>
Fixes: #6678
We don't want to allow regular users to write to secondary zones,
otherwise we'd end up with data inconsistencies.
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Formerly, if sentries is empty, we skip missing. In general,
we need to continue adding items from missing until we get
to next (returned from collection_list_partial) to avoid
missing any objects.
Fixes: #6633
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
Split may cause holes such that head != tail and yet
log.empty().
Fixes: #6722
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
Due to split, there may be a hole at newhead.
Fixes: #6722
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
There might be two concurrent rollback ops each of which
adds snap x to snaps_in_use. Between when the first
completes and the second completes, snap x may be removed
since the first would have removed snap x from snaps_in_use.
Using sharedptr_registry here avoids this by ensuring that
the snap won't be removed from snaps_in_use until all refs
are gone.
This patch also adds size() to sharedptr_registry.
Fixes: #6719
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
If one requests JSON output, the progress message pollutes the output;
don't do that, send it to stderr instead
Signed-off-by: Dan Mick <dan.mick@inktank.com>
In case of a replay, a missing destination directory indicates that
the destination object and directory have been removed by a later
transaction. Thus, we need to remove the src object and return
0.
Fixes: #6714
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
galois_create_split_w8_tables() takes no parameter, remove '8' passed
to the function in one case.
osd/ErasureCodePluginJerasure/galois.c: In function 'galois_w32_region_multiply':
osd/ErasureCodePluginJerasure/galois.c:696:5: warning: call to function 'galois_create_split_w8_tables' without a real prototype [-Wunprototyped-calls]
In file included from osd/ErasureCodePluginJerasure/galois.c:53:0:
osd/ErasureCodePluginJerasure/galois.h:71:12: note: 'galois_create_split_w8_tables' was declared here
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
If we get a peering message for an old map we don't have, we
can throwit out: the sending OSD will learn about the newer
maps and update itself accordingly, and we don't have the
information to know if the message is valid. This situation
can only happen if the sender was down for a long enough time
to create a map gap and its PGs have not yet advanced from
their boot-up maps to the current ones, so we can rely on it
Fixes: #6712
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
I really don't know why I added this... Ops can be discarded from the
waiting_for_pg queue if we aren't primary simply because there must have
been an exchange of peering events before subops will be sent within a
particular epoch. Thus, any events in the waiting_for_pg queue must be
client ops which should only be seen by the primary. Peering events, on
the other hand, should only be discarded if we are in a new interval,
and that check might as well be performed in the peering wq.
Fixes: #6681
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Otherwise, if last_backfill_started is a snapdir, we will fail to send a
transaction for a client IO creating the head object and removing the
snapdir object. The result will be that head will eventually be
backfilled, but the snapdir object will erroneously not be removed.
Fixes: #6685
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The ro and rw options were added in linux 3.7. To be compatible with
older kernels, don't specify rw. The default will probably always be
rw, so this should not present any problems in the future.
Reported-by: nicolasc <nicolas.canceill@surfsara.nl>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This test attempts to generate a random number of holes within a
particular range, but may fail because hole placement is random.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(manually merged after next branch rebuilt)
OSDMonitor: be a little nicer about letting users do pg splitting
Reviewed-by: David Zafman <david.zafman@inktank.com>
We were previously blocking pg splits whenever pg creations were in-
progress, but we only really need to avoid splitting any pgs which are
currently being created. Let the user set a different pg_num if there
are no creating PGs on the pool in question.
Fixes: #6673, take two
Signed-off-by: Greg Farnum <greg@inktank.com>
404 is not actually a problem to clients like radosgw-agent, but 400
implies something about the request was incorrect.
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
See the included unit test update. Consider:
1) x = lookup_or_create(1, 1)
2) remove(1)
3) y = lookup_or_create(1, 2)
4) x.reset()
5) z = lookup(1)
The bug is that z will be null since x.reset() caused the
cleanup callback to remove y's key value from contents.
To fix this, contents also records the pointer value for
the weak_ptr. The removal callback only removes the
key from contents if it matches the ptr in contents.
This should work since the pointer passed to the removal
callback must be unique up to that point since it has
not yet been deleted.
This allowed a pg removal -> pg recreation -> pg removal
sequence to cause the second pg removal entry to be
erroneously cleared by the first pg removal's destructor
as it finally made its way through the removal queue.
Fixes: #5951
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
For new SubQueues `cur` is not intialized, so front/pop_front will freak
out. I honestly I have no idea how this hasn't been seen, but it was
being triggered frequently on OSX.
Fixes: #6686
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
This assert assumes that if olog.head != log.head, olog contains
a log entry at log.head, which may not be true since pg splitting
might have left the log with arbitrary holes.
Related: 0c2769d332
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
e22347df38 added a bad goto; just free
explicitly instead.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Noah Watkins <noahwatkins@gmail.com>