These should all be const. The remaining reference parameters
will be converted to pointers in another commit.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
librados namespace was not specified, hence required including
source files to add using namespace. This fixes it.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Otherwise importing into another pool when the default pool, rbd,
doesn't exist results in an error trying to open the rbd pool.
Reported-by: Sébastien Han <han.sebastien@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
There's no need to set the default pool in set_pool_image_name - this
is done later, in a way that doesn't ignore --pool if --dest-pool
is not specified.
This means --pool and --image can be used with import, just like
the rest of the commands. Without this change, --dest and --dest-pool
had to be used, and --pool would be silently ignored for rbd import.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
If two clients created a snapshot at the same time, the one with the
higher snapshot id might be created first, so the lower snapshot id
would be added to the snapshot context and the snaphot seq would be
set to the lower one.
Instead of allowing this to happen, return -ESTALE if the snapshot id
is lower than the currently stored snapshot sequence number. On the
client side, get a new id and retry if this error is encountered.
Backport: argonaut
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
CID 716902: Non-array delete for scalars (DELETE_ARRAY)
At (15): Deleting array variable "buf" with non-array delete in "delete buf".
Signed-off-by: Sage Weil <sage@inktank.com>
* a clone's size can't be overridden
* note which commands require format 2
* clarify details of copy
* add examples for cloning
* add pool to map example for consistency
* fix a couple warnings and re-sync man page with rst
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This chooses whether to use the original (supported by krbd)
or the new (supports layering) format.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
If the following sequence of events occured,
a clone could be created of an unprotected snapshot:
1. A: begin clone - check that snap foo is protected
2. B: rbd unprotect snap foo
3. B: check that all pools have no clones of foo
4. B: unprotect snap foo
5. A: finish creating clone of foo, add it as a child
To stop this from happening, check at the beginning and end of
cloning that the parent snapshot is protected. If it is not,
or checking protection status fails (possibly because the parent
snapshot was removed), remove the clone and return an error.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
These iterate over all pools and check for children of a
particular snapshot.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Should fix bug #2761.
If we are already pushing soid, recovery_ops will only be decremented once for
all current pushes, so only increment recovery_ops if we are not currently
pushing it.
This bug causes us to leak a recovery op and get stuck in backfill.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff. Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix#3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If requeue is false, we won't have cleared out waiting_for_ondisk; adjust
assert placement as appropriate. Also, make sur we handle the requeue
and !op case properly (although I'm not sure offhand if/when it would
come up).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If we don't wait for the callback, the finisher may cleanup the callback
context before the callback is actually invoked, causing a
use-after-free error.
This fixes#3048.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
If the mon session drops, we get an EAGAIN callback, which we already
correctly ignored. (Clean this up and comment so it's clearer what is
going on.)
Fix ms_handle_connect() to resubmit those requests.
Noticed while fixing #3049.
Signed-off-by: Sage Weil <sage@inktank.com>
If our map get_version check needs to be retried, tell the
is_latest_map() callers instead of giving returning 0 ("no").
Fixes: #3049
Signed-off-by: Sage Weil <sage@inktank.com>
We should requeue the dups along with the originals. This avoids
situations where, after requeue, the dups are reordered with respect to
each other. For example:
- client sends A, B, C
- osd receives A
- connection drops
- client sends A', B', C'
- osd puts A' in waiting_for_ondisk, starts B' and C'
- on_change() requeues everything
Final queue order (before this patch) is
A, B', C', A'
After this patch, the resulting queue order is
A, A', B', C'
Or somewhat more generally, it might be:
A, A', B, B', B'', C', C'', D'', ....
Fixes (another source of): #2947
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
We don't shut down all threads, and the surviving ones fight with
exit()'s teardown. Kludge until we have a clean shutdown process.
Signed-off-by: Sage Weil <sage@inktank.com>
Showing the current state and saying it is stuck doesn't tell you how it
is stuck (e.g. stuck unclean, stuck inactive, etc.). Also include the
stuck duration.
Fixes: #2876
Signed-off-by: Sage Weil <sage@inktank.com>
This is a fallback for when a user wishes to delete ALL benchmark files
matching a particular prefix. In the fast case, a metadata file tells us
enough to quickly delete the files in parallel. This is the slow case,
where each file's name must be checked against the prefix.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
This intelligently removes objects from a rados or rest benchmark run by
using parameters from the metadata file.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Store metadata for each benchmark run so that the objects can be
efficiently removed at a later point.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Per #2477, objects created during rados or rest write benchmark are
automatically cleaned up after the test. They can optionally be left in
place.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Lossless peers (osd<->osd, mds<->mds, mon<->mon) never reset sessions
to each other. In the osd and mds cases, there is no need to check for
session resets. More significantly, these checks can trigger with an
unfortunately sequence of socket failures. In particular,
- A sends connect request to B
- B accepts, increments connect_seq, then has a socket failure
before telling A
- A reconnects, stil with connect_seq == 0
- B sees connect_seq == 0 and thinks there was a reset
This warrants a closer look in the fs client <-> mds case, but for now,
in the cluster-internal communications, it is moot, since reset
detection is unnecessary.
In the monitor case: we do need to check with resets because the peers
reuse the same entity_addr_t's (nonce==0), which means that a daemon
restart is effectively a reset. In that case, use a different policy
that continues to check for resets.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
We currently prefer up osds, and then pull sequentially from peer_info
(strays we know about at the time). This adds an additional preference
for the current acting, which means we can avoid changes to acting when
they are largely useless.
In particular, I observed that we chose [5,3] and later (when recovery
completed) chose [5,1] because we had since heard about an eligible stray
on 1. That switch was basically a waste...
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>