Six copies are replaced with one, with an added option to check status
automatically. This should probably be used in a few places where the
return code is ignored.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
tasks:
...
- ceph.wait_for_mon_quorum: [a, b]
...
will block until the mon quorum consists of exactly [a, b]. This is
compared directly to the relevant field from 'ceph quorum_status'
which has the alphanumeric names only.
Signed-off-by: Sage Weil <sage@inktank.com>
Often we want to build a test collection that substitutes different
sequences of tasks into a parallel/sequential construction. However, the
yaml combination that happens when generating jobs is not smart enough to
substitute some fragment into a deeply-nested piece of yaml.
Instead, make these sequences top-level entries in the config dict, and
reference them. For example:
tasks:
- install:
- ceph:
- parallel:
- workload
- upgrade-sequence
workload:
workunit:
- something
upgrade-sequence:
install.restart: [osd.0, osd.1]
Signed-off-by: Sage Weil <sage@inktank.com>
Instead of relying on hardcoded values, obtain the max-skew default from
'ceph-mon --show-config-value mon_clock_drift_allowed' to match the mon's
expectation.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Sometimes the thing we're talking to is slow to start, or to register the
command we are running. Loop in that case, at least for a while.
Signed-off-by: Sage Weil <sage@inktank.com>
If not defined, defaults to 0.05; if 'max-skew' however is defined, it
must override whatever is on the config.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
This will make the CLI do every mon command twice and make sure they both
succeed. This catches problems with mon command idempotency faster than
waiting for random failures trigger.
Added sequential task and parallel task.
Changed _run_one_task to run_one_task (now called by new tasks too).
Fix#4969
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Fix of #5494 although bad description. Instead of adding a wait
the code used to detect if the guest was back up is fixed. The
previous code appeared to assume only one machine and broke
when it was waiting for multiple machines if the guests did not
come up within 10 seconds of each other
Make nuke not do the normal stuff if the machine is a VPS as we
just destroy them when they get unlocked.
Instead of getting downburst options from ~/.teuthology.yaml get
it from the yaml given to teuthology for the test/task instead.
Fixed an error that would make all the default downburst values
not take effect if any of them were set via a yaml.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Warren Usui <warren.usui@inktank.com>
Occasionally we don't wait long enough for the osd to start and
mark itself up. Keep trying until flush succeeds.
Fixes: #5431
Signed-off-by: Sage Weil <sage@inktank.com>
A very simple change. Just touch a file first (to create it if it
doesn't yet exist so the delete doesn't error out) and then delete
it before pushing the keys to the file. This should avoid the
id_rsa.pub and id_rsa files from getting messed up due to previous
runs which were interrupted or failed (or if those files exist for
some reason). This appears to be what was causing breaking in the
ceph-deploy nightlies.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>