In some rare cases (mainly centos/rhel after creating the
guest with downburst it does not come up right. It
gets a kernel panic at boot. Usually just turning it off
and then back on again is enough but to be on the safe
side I figured it should be re-created instead. This
insures you don't get hung jobs from a guest that didn't
come up correctly.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
For some reason lock_many() has a description but lock()
does not. This was useful in my testing of unlocking and
re-locking VPS machines to destroy.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Figuring out which machines output is coming from when things
are being executed on multiple machines can be a huge pain.
This prints the IP in the logs so you can easily see where one
machine stops and another begins.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
In order to make IP addresses less likely to change and to allow
a smaller DHCP pool to be used I generated static MAC addresses
for all the vpm entries in the DB. I also put the correct entries
for all the other types of machines as well for their primary
(eth0) mac address as well in order to keep things standardized
and so there is another location where we have this information.
Without this fix going through a few tests would exhaust the DHCP
pool which at the time was around 460 IP addresses for virtual
machines and has since been upped to ~690 IP addresses.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Warren Usui <warren.usui@inktank.com>
Fix of #5494 although bad description. Instead of adding a wait
the code used to detect if the guest was back up is fixed. The
previous code appeared to assume only one machine and broke
when it was waiting for multiple machines if the guests did not
come up within 10 seconds of each other
Make nuke not do the normal stuff if the machine is a VPS as we
just destroy them when they get unlocked.
Instead of getting downburst options from ~/.teuthology.yaml get
it from the yaml given to teuthology for the test/task instead.
Fixed an error that would make all the default downburst values
not take effect if any of them were set via a yaml.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Warren Usui <warren.usui@inktank.com>
Occasionally we don't wait long enough for the osd to start and
mark itself up. Keep trying until flush succeeds.
Fixes: #5431
Signed-off-by: Sage Weil <sage@inktank.com>
A very simple change. Just touch a file first (to create it if it
doesn't yet exist so the delete doesn't error out) and then delete
it before pushing the keys to the file. This should avoid the
id_rsa.pub and id_rsa files from getting messed up due to previous
runs which were interrupted or failed (or if those files exist for
some reason). This appears to be what was causing breaking in the
ceph-deploy nightlies.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
The usage doc string for a task is tedious to write and
hard to keep reconciled with the code as defaults are changed.
args.py includes a helper to put it all in one place.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Instead of going through the trouble of adding/removing lines
from authorized_keys which has all our normal keys in it, instead
push keys to the unused authorized_keys2 file which makes the key
management significantly simpler as that file can just be wiped
out each time instead of worrying about preserving contents.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Instead of going through the trouble of adding/removing lines
from authorized_keys which has all our normal keys in it, instead
push keys to the unused authorized_keys2 file which makes the key
management significantly simpler as that file can just be wiped
out each time instead of worrying about preserving contents.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
- use a separate pool for each client
- create pool at start, destroy pool at end
- use all clients, if not explicitly specified
Signed-off-by: Sage Weil <sage@inktank.com>