Assert they are called only once per machine per election epoch. Fix the
recovered_peon() caller to do that.
Signed-off-by: Sage Weil <sage@inktank.com>
With out-of-tree builds, vstart.sh needs CEPH_BIN to be set, and
needs to look for init-ceph in CEPH_BIN rather than just ./init-ceph.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Using vstart.sh -n uses ceph-authtool to generate the keyring file
in ./keyring. The vstart.sh script then writes out the ceph.conf
with a keyring option in the [client] section, so when the monitors
start, they can't find a keyring file. This commit puts the keyring in
the [global] section.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Require the MON_GV feature when
- we see the ondisk feature is set on bootup
- we enable the ondisk feature
This means that once we form a quorum with the feature and enable it on
disk, there is no going back; we won't be able to talk to old monitors
without the feature, and a downgrade won't be possible.
Hopefully, in practice, any monitors with old code will be up at the time
we are upgrading, such that the quorum will not include the feature and we
won't make the transition. Otherwise, if they are down, and the remaining
nodes have the feature and enable it, and the old code starts up, it won't
be able ot join until it is upgraded to the new code as well.
Signed-off-by: Sage Weil <sage@inktank.com>
This is a marker that future versions will use to know whether they can
safely convert the monitor data to the new format. If the GV feature is
not present, they will refuse to convert.
Also set the ondisk GV feature at the same time.
Signed-off-by: Sage Weil <sage@inktank.com>
If the target as the NULLROUTE feature, use a new encoding that explicitly
indicates whether a message follows. If the feature is absent, use the
old encoding. The mon is responsible for not trying to send a null reply
if the target does not have the feature.
Signed-off-by: Sage Weil <sage@inktank.com>
Or maybe it was a spello, or a thinko, or something. In any case
I'm pretty sure Josh intended to call the function he added in
commit 78d6a60ca, and not the non-existent "test_import_args".
Signed-off-by: Alex Elder <elder@inktank.com>
(cherry picked from commit ed43d4de12)
Or maybe it was a spello, or a thinko, or something. In any case
I'm pretty sure Josh intended to call the function he added in
commit 78d6a60ca, and not the non-existent "test_import_args".
Signed-off-by: Alex Elder <elder@inktank.com>
The locker (entity_name_t) will be different each time the rbd
command line tool is run, so 'lock remove' is always breaking a lock.
Fixes: #2556
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
* no longer need to wait for watch timeout since #2948 was fixed
* use --format 2 instead of --new-format
* add test_cls_rbd to run-rbd-tests script
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Update the librbd locking api to make more sense:
* Add an optional tag to shared locking
* only make shared vs exclusive different functions in the user-visible api
* return a list of structs instead of a set of pairs
* fix incorrect range checking in the C api
* rename locks to lockers to be consistent with the generic locking class
* rename other_locker parameter to client, to match the list_lockers usage
Fixes: #2952
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This will be used by librbd to grab lock info along with
the rest of its header information in a single request.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
These should all be const. The remaining reference parameters
will be converted to pointers in another commit.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
librados namespace was not specified, hence required including
source files to add using namespace. This fixes it.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Otherwise importing into another pool when the default pool, rbd,
doesn't exist results in an error trying to open the rbd pool.
Reported-by: Sébastien Han <han.sebastien@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
There's no need to set the default pool in set_pool_image_name - this
is done later, in a way that doesn't ignore --pool if --dest-pool
is not specified.
This means --pool and --image can be used with import, just like
the rest of the commands. Without this change, --dest and --dest-pool
had to be used, and --pool would be silently ignored for rbd import.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
If two clients created a snapshot at the same time, the one with the
higher snapshot id might be created first, so the lower snapshot id
would be added to the snapshot context and the snaphot seq would be
set to the lower one.
Instead of allowing this to happen, return -ESTALE if the snapshot id
is lower than the currently stored snapshot sequence number. On the
client side, get a new id and retry if this error is encountered.
Backport: argonaut
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
CID 716902: Non-array delete for scalars (DELETE_ARRAY)
At (15): Deleting array variable "buf" with non-array delete in "delete buf".
Signed-off-by: Sage Weil <sage@inktank.com>
* a clone's size can't be overridden
* note which commands require format 2
* clarify details of copy
* add examples for cloning
* add pool to map example for consistency
* fix a couple warnings and re-sync man page with rst
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This chooses whether to use the original (supported by krbd)
or the new (supports layering) format.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
If the following sequence of events occured,
a clone could be created of an unprotected snapshot:
1. A: begin clone - check that snap foo is protected
2. B: rbd unprotect snap foo
3. B: check that all pools have no clones of foo
4. B: unprotect snap foo
5. A: finish creating clone of foo, add it as a child
To stop this from happening, check at the beginning and end of
cloning that the parent snapshot is protected. If it is not,
or checking protection status fails (possibly because the parent
snapshot was removed), remove the clone and return an error.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
These iterate over all pools and check for children of a
particular snapshot.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Default to .3. Setting to 0 effectively turns this off.
Also make OSDMap::osd_xinfo_t decode into a float to simplify the
arithmetic conversions.
Signed-off-by: Sage Weil <sage@inktank.com>
Scale the down/out interval the same way we do the heartbeat grace, so that
we give laggy osds a bit longer to recovery.
See #3047.
Signed-off-by: Sage Weil <sage@inktank.com>
Add a configurable halflife for the laggy probability and duration and
apply it at the time those values are used to adjust the heartbeat grace
period. Both are multiplied together, so it doesn't matter which you
think is being decayed (the probability or the interval).
Default to an hour.
Signed-off-by: Sage Weil <sage@inktank.com>
If, based on historical behavior, an observed osd failure is likely to be
due to unresponsiveness and not the daemon stopping, scale the heartbeat
grace period accordingly:
grace' = grace + laggy_probabiliy * laggy_interval
This will avoid fruitlessly marking OSDs down and generating additional
map update overhead when the cluster is overloaded and potentially
struggling to keep up with map updates. See #3045.
Signed-off-by: Sage Weil <sage@inktank.com>
Currently we only trigger a failure on receipt of a failure report. Move
the checks into a helper and check during tick() too, so that we will
trigger failures even when the thresholds are not met at failure report
time. This is rarely true now, but will be true once we locally scale the
grace period.
Signed-off-by: Sage Weil <sage@inktank.com>