I think this is what is going on...
Traceback (most recent call last):
File "/var/lib/teuthworker/teuthology-master/teuthology/contextutil.py", line 27, in nested
yield vars
File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph.py", line 1158, in task
yield
File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 25, in run_tasks
manager = _run_one_task(taskname, ctx=ctx, config=config)
File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 14, in _run_one_task
return fn(**kwargs)
File "/var/lib/teuthworker/teuthology-master/teuthology/task/dump_stuck.py", line 93, in task
manager.kill_osd(id_)
File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph_manager.py", line 665, in kill_osd
if 'powercycle' in self.config and self.config['powercycle']:
TypeError: argument of type 'NoneType' is not iterable
Linux doesn't guarantee device names (/dev/sdb, etc.)
are always mapped to the same disk. Instead of assigning
nominal devices to osds, we map devices by their wwn
(/dev/disk/by-id/wwn-*) to an osd (both data and journal).
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This patch defines a RemoteConsole class associated
with each Remote class instance, allowing
power cycling a target through ipmi.
Fixes/Implements #3782.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Teuthology uses /tmp/cephtest/ as the scratch test directory for
a run. This patch replaces /tmp/cephtest/ everywhere with a
per-run directory: {basedir}/{rundir} where {basedir} is a directory
configured in .teuthology.yaml (/tmp/cephtest if not specified),
and {rundir} is the name of the run, as given in --name. If no name
is specified, {user}-{timestamp} is used.
To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest
in .teuthology.yaml.
This change was modivated by #3782, which requires a test dir that
survives across reboots, but also resolves#3767.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This patch includes minor fixes to the teuthology
python code for syntax errors found by running
check-syntax.sh (which runs pyflakes on each file).
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Tests scenario where merge_old_entry encounters a divergent
entry where the prior_version is prior to log_tail. This
is a problem since it will go into the missing set, but won't
be re-added to the missing set during read_log() if the node
restarts prior to recovering the object.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Thrasher can now with configurable frequency test min_size by
taking down all but one osd, waiting, killing that osd and bringing
back the others, and verifying that the cluster goes clean.
Signed-off-by: Samuel Just <sam.just@inktank.com>
We need to wait for peering to either complete, or block because it is
waiting for another PG. _Then_ look at all the PG states and compare the
mon values with what we get from qeurying the OSDs directly.
Clean now also means the correct number of replicas, whereas recovered
means we have done all the work we can do given the replicas/osds we have.
For example, degraded and clean are now mutually exclusive.
Also move away from 'till'.
I'm not sure what the old code was trying to do, but I'm pretty sure it
wasn't doing it correctly.. a .1 chance_down was killing an OSD for me
virtually every time.
Stopping ceph-osd doesn't make it out (immediately). Prevent monitor
from doing this after a delay too so we can keep our notion of what is
up/down/in/out accurate.