Commit Graph

91 Commits

Author SHA1 Message Date
Sandon Van Ness
a7f87f3a3a Longer timeout after sync/reboot.
With only a 5 second sleep via ssh and python it looks like a
race-condition was sometimes hitting where it would think
the machine is back up before the reboot command had completed.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-12-11 18:07:43 -08:00
Sage Weil
1b80f4aa1c nuke: ignore exceptions while issuing reboot command
I'm seeing failed tasks (and nuke) leak machines.  It looks like we are
getting an exception on the '... reboot -f -n' command when we should be
ignoring it and waiting for the machine to restart.

For example:
   http://qa-proxy.ceph.com/teuthology/sage-2013-12-08_19:25:06-rados:thrash-wip-tier-foo-basic-plana/136321/teuthology.log

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-09 11:42:12 -08:00
Sage Weil
c0a4327513 nuke: fix sync before reboot timeout
If you do 'timeout 5 sync' and sync hangs, timeout will block trying to
kill it.

Instead, just background sync, wait a few seconds, and reboot.  This means
we wait a few seconds even if sync returns immediately, but who cares!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 17:42:23 -08:00
Sage Weil
704b72eb0b nuke: remove old log arg to nuke_one call
Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-17 09:20:51 -07:00
Zack Cerza
ded5c219fc Remove needless arg from list_locks()
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-16 16:20:24 -05:00
Zack Cerza
59d14b8b4b Make nuke use its own logger
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-16 16:20:24 -05:00
Sage Weil
d79552ecc8 nuke: fix import
I think broken by f28a7ebc2c3662a8a9b8eebf09c47b6c710e4d26

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-16 13:32:45 -07:00
Zack Cerza
494c3b1fbe Make verbosity propagate correctly to modules
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-11 17:10:57 -05:00
Zack Cerza
f5729051ab Fix a circular import
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-11 12:48:55 -05:00
Zack Cerza
f28a7ebc2c Move imports to top-level
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-11 12:48:55 -05:00
Zack Cerza
01b81b7820 Move monkey patching to __init__.py
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-10 19:15:01 -05:00
Zack Cerza
e4753215ea PEP-8
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-10 19:09:34 -05:00
Zack Cerza
974898597f Move teuthology-lock's arg parsing to scripts/
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-10 19:09:34 -05:00
Zack Cerza
7ce4dfd97a Move teuthology-nuke's arg parsing to scripts/
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-10 19:09:34 -05:00
Zack Cerza
4902bfabf3 PEP-8 cleanup
Signed-off-by: Zack Cerza <zack.cerza@inktank.com>
2013-10-10 19:09:34 -05:00
Sage Weil
e17928d588 nuke: s/run_name/name/
This matches an existing argument (with the same meaning) and
avoids an error like

2013-10-01T17:20:35.395 CRITICAL:root:  File "/var/lib/teuthworker/teuthology-master/virtualenv/bin/teuthology", line 9, in <module>
    load_entry_point('teuthology==0.0.1', 'console_scripts', 'teuthology')()
  File "/home/teuthworker/teuthology-master/teuthology/run.py", line 235, in main
    nuke(ctx, log, ctx.lock)
  File "/home/teuthworker/teuthology-master/teuthology/nuke.py", line 391, in nuke
    if ctx.run_name:

2013-10-01T17:20:35.395 CRITICAL:root:AttributeError: 'Namespace' object has no attribute 'run_name'

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 20:57:53 -07:00
Zack Cerza
efd5ccbc4b Merge pull request #118 from ceph/wip-nukeskip
Check description of machines before nuking when -a is passed
2013-10-01 16:40:44 -07:00
Sandon Van Ness
6b248e80a6 Check description of machines before nuking when -a is passed
When teuthology-nuke is passed with --archive/-a to kill and nuke
machines from an archive folder it blindly will nuke all the
targets it grabs from the config.yaml in the archive dir. This
change will check the description of locked machines and make sure
the run name is in the description. if not it removes the target
from the list passed to nuke().

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2013-10-01 16:12:31 -07:00
Sage Weil
3e31c49344 nuke: make half-hearted attempt to sync before reboot
We don't want to block on sync for fear of a hung kernel
mount.  However, we can give it a try and wait a few seconds
to get what we can.

This fixes a problem where our recent modifications to the
sudoers file are lost, with a 0 byte file left in its place,
because the task fails and we do a reboot -f -n.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-10-01 09:57:34 -07:00
Zack Cerza
21765ce4f8 Move 'import os' to inside main()
This is necessary because of the monkey-patching.
2013-09-26 14:03:44 -05:00
Zack Cerza
a2c9bdc7ba Fix undefined name errors
(cherry picked from commit f59497ef2214f29d5995435d83766c7994e8f2cd)
2013-09-26 14:01:17 -05:00
Sage Weil
25bc62dec1 nuke: add missing import os
$ teuthology-nuke  -a . -r -u
Traceback (most recent call last):
  File "/home/ubuntu/bin/teuthology-nuke", line 9, in <module>
    load_entry_point('teuthology==0.0.1', 'console_scripts', 'teuthology-nuke')()
  File "/home/ubuntu/teuthology/teuthology/nuke.py", line 343, in main
    ifn = os.path.join(ctx.archive, 'info.yaml')
UnboundLocalError: local variable 'os' referenced before assignment

Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-25 13:42:03 -07:00
tamil
eb4c575f54 made help more readable
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2013-09-12 15:03:10 -07:00
tamil
40d6c60f13 feature # 5942. Added examples to teuthology binaries help page
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2013-09-11 17:13:22 -07:00
Sage Weil
5acc57f5ad remove basedir/testdir distinction
We should never run with a conflicting testdir in the basedir, and the
code to do this is confusing and buggy.  Go back to a single testdir and
simple checks.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-10 10:53:41 -07:00
Sage Weil
dcbf50b86c nuke: get pid, owner from info.yaml (if present)
Fall back to the old files if info.yaml is missing.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 10:02:27 -07:00
Zack Cerza
3981a8f1af Never use 'except:' without specifying an Exception. 2013-08-30 11:10:05 -05:00
Sage Weil
86caebbed7 nuke: clean up stray firmware.git locks
These get lost occasionally and cause all firmware.git updates to
fail when the kernel task runs.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-23 09:00:47 -07:00
Sage Weil
0985f8c386 nuke: killall ceph-disk, too
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-18 12:31:11 -07:00
Sandon Van Ness
d54932cbc8 Fix VM issues.
Fix of #5494 although bad description. Instead of adding a wait
the code used to detect if the guest was back up is fixed. The
previous code appeared to assume only one machine and broke
when it was waiting for multiple machines if the guests did not
come up within 10 seconds of each other

Make nuke not do the normal stuff if the machine is a VPS as we
just destroy them when they get unlocked.

Instead of getting downburst options from ~/.teuthology.yaml get
it from the yaml given to teuthology for the test/task instead.

Fixed an error that would make all the default downburst values
not take effect if any of them were set via a yaml.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Warren Usui <warren.usui@inktank.com>
2013-07-03 19:07:35 -07:00
Warren Usui
a4994e3bde Support added for running scheduled tasks on virtual machines.
This included:
    A). changes made so that full path names on some files were used
        (scheduled tasks started in different home directories).
    B.) Changes to insure tasks come up on the beanstalkc queue properly,
    C.) Finding and inserting the libvirt eqivalent code for vm machines
        in order to simulate ipmi actions,
    D.) Fix host key code, report valgrind issue more clearly.
    E.) Some message and downburst call changes.

    Fix #4988
    Fix #5122
    Signed-off-by: Warren Usui <warren.usui@inktank.com>
2013-06-07 19:32:15 -07:00
Josh Durgin
d7fe5c0a34 nuke: don't require noipmi in ctx
This is called from run.py too, which won't have ctx.noipmi.
The default of using impmi is fine for now for run.py.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-09 18:21:05 -07:00
Sage Weil
f7c8c27c7b Merge branch 'next' 2013-05-06 21:31:36 -07:00
Sam Lang
980973dc55 task/install.py: Allow installation of non-ceph
Generalizes the install task to specify a "project" which defaults to
'ceph', but can be configured to install different project packages,
for example:

install:
  project: samba
  extra_packages: samba

The default install task uses 'ceph' as the project, and relies on an
existing set of defined packages to install.  For other projects, the
packages to be installed must be specified with the extra_packages
field.  Multiple install tasks can be specified:

install:
install:
  project: samba
  extra_packages: samba

Which installs ceph packages and then samba packages.

Also, cleanup in nuke.py so that nuke and install use the same list of
packages when doing the remove steps.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-05-06 17:37:25 -07:00
Sam Lang
9e6f7b126b nuke.py: Allow ipmi power cycling to be skipped
Some nodes don't have ipmi setup.  Allow nuke to
skip the ipmi checking if -i (--noipmi) is specified.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-05-01 17:18:16 -07:00
Sage Weil
4a6e3b97e3 install, nuke: explicitly purge /var/lib/ceph
The packages won't do this anymore.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-22 15:22:38 -07:00
Warren Usui
0c75c6b1f7 Added el6 install functionality for CentOS systems.
install_packages, remove_packages and remove_sources are now the
installation and removal functions used by teuthology.  Debian
references have been removed outside of tasks/install.py.  CentOS
functionality parallel to Debian have been added to tasks/install.py,
and el6 references have been added to nuke.py, task/ceph-fuse.y and
task/install.py.

Some files created by CentOS are removed with rm -fr.  This should
be changed once the installation/removal rpm procedure is implemented.

Signed-off-by: Warren Usui <warren.usui@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-03-14 16:25:18 -07:00
Warren Usui
01a40cfbf1 Use service instead of initctl to restart rsyslog.
This change is needed to make sure teuthology works on CentOS when the
-a option is specified.

Signed-off-by: Warren Usui <warren.usui@inktank.com>
2013-03-13 18:37:25 -07:00
Sage Weil
bee8dffc34 nuke: blow away /home/ubuntu/cephtest too
(along with /tmp/cephtest)

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-25 17:54:49 -08:00
Sage Weil
d8021a1aa0 nuke: sudo for killall
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-22 10:51:51 -08:00
Josh Durgin
a862d8bf77 Fix unused vars, unused imports, and aliasing
Found by pyflakes

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-21 14:47:00 -08:00
Sage Weil
3f7c9bcaa4 move the install to a separate task.
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 15:06:52 -08:00
Sage Weil
d1d36241b7 ceph: use default data, keyring locations
This required reordering the cluster setup so that we do the ceph-osd
--mkfs --mkkey prior to gathering keys and initializing the monitors.

Also, run daemons as root.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 13:39:05 -08:00
Sage Weil
a54200d444 nuke: tolerate failed dpkg --configure -a/apt-get -f install
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 13:39:05 -08:00
Sage Weil
149be93639 nuke: dpkg --configure -a and apt-get -f install
Installing debs means we are more likely to hit a case where we interrupt
apt/dpkg.  Try to mop up as best we can in nuke.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 13:39:04 -08:00
Sage Weil
3400ea39ba nuke: whitespace
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 13:39:04 -08:00
Sage Weil
28116db6a0 nuke: remove librados, librbd
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 13:39:04 -08:00
Sander Pool
c525e1061b Install ceph debs and use installed debs
The ceph task installs ceph using the debian
packages now, and all invocations of binaries installed
in {tmpdir}/binary/usr/local/bin/ are replace with
the use of the binaries installed in standard locations
by the debs.

Author:    Sander Pool <sander.pool@inktank.com>
Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-02-18 13:39:03 -08:00
Sage Weil
d790eeb451 nuke: testrados -> ceph_test_rados
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 13:38:54 -08:00
Josh Durgin
ed3c3615c3 nuke: don't try unmount if we're rebooting everything anyway
This can cause issues when unmount hangs. Our automatic runs reboot
everything unconditionally, so this caused a bunch of unecessary hangs
when an fs was accidentally rendered un-unmountable.
2013-02-05 23:31:39 -08:00