If the user feeds in a yaml with targets, the worker will launch the job
but fail with
2013-09-03T11:18:34.333 CRITICAL:root:AssertionError: You cannot specify targets in a config file when using the --lock option
Just strip them out before scheduling. This eases my personal workflow
where I have a test I'm running manually against some prelocked machines
but also want to schedule it.
Signed-off-by: Sage Weil <sage@inktank.com>
This makes a run with --lock hang when connecting, for some
reason. E.g.,
$ teuthology -v a.yaml --lock
...
INFO:teuthology.task.internal:Opening connections...
DEBUG:teuthology.task.internal:connecting to ubuntu@plana06.front.sepia.ceph.com
<hangs>
No clue what is going on here, but this fixes it!
Signed-off-by: Sage Weil <sage@inktank.com>
Previous, a collection was a directory like this:
mycollection/
mycollection/facet1/
mycollection/facet1/1a.yaml
mycollection/facet1/1b.yaml
mycollection/facet2/
mycollection/facet2/2a.yaml
mycollection/facet3/
mycollection/facet3/3a.yaml
mycollection/facet3/3b.yaml
and this would expand to
1a + 2a + 3a
1a + 2a + 3b
1b + 2a + 3a
1b + 2a + 3b
The fixed directory depth and requirement for a subdir even
when there is only 1 item is annoying. Instead, allow an
arbitrary directory structure, with the following rules:
- a .yaml file is a taken as-is (duh); other files still
ignored
- a directory is normally just a way to organize files. We
recursively descend and build a list of what we fine.
- a directory with a '%' file in it is special:
- take the product of every item in the dir (much like
we did before)
- a directory with a '+' file in it is special:
- concatenate everything in the dir into one job
Note that this is equivalent to the previous structure if we
do:
for facet in mycollection/* ; do touch $facet/% ; done
We can clean up slightly be taking any dir with only one yaml
file in it and replacing the dir with the bare .yaml.
Once this is done, we can reorganize directories however we
like.
Signed-off-by: Sage Weil <sage@inktank.com>
Added code to the s3tests task to extract
multi-region info so that that data
can be added to the S3TEST_CONF file
used to run S3 tests.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The rgw task deletes the region info
from the config structure. The s3tests
task needs this info, so we persist
it by sticking it in the ctx object.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewd-by: Josh Durgin <josh.durgin@inktank.com>
Persist the hostname and port number used
by the radosgw-agent http server.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Moving a helper function into a more general
location so that it can be used by other
classes.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Teuthology doesnt care about os_type for baremetal (ATM). This
change makes it so you can run tests that have been switched over
to run on multiple distros (on vms) on baremetal as well as all
non-ubuntu tests will be skiped (to avoid running the same test
multiple times on baremetal.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Adding the option:
exclude_arch:
or
exclude_os_type:
in the ceph-qa-suite yaml allows tests to be skipped for certain
types of hardware or distros.
Example:
exclude_arch: armv7l
This will make said test not run on arm machines.
exclude_os_type: rhel
Would make multi-distro tests skip a specific test on RHEL.
Alter the code to use the 'check_status=True'
option in rgwadmin() rather than following the
call with 'assert not err'. Should make the
tests a bit more clear and result in a more
useful error (throw the call stack rather than
just 'assert new err failed').
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The rgw.py task was extended to dynamically
assign port numbers to radosgateways.
This patch extends the radosgw-admin task
to use those ports rather than making out-dated
assumptions of port numbering.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Enable multi-region calls and tests only if
the configuration has specified a
radosgw-agent tastk.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Do not make the domain root pool the same
as the zone root pool. That causes sync issues.
Also, clarify a logging message.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
When doing a lock-many, do not lock any of the vpms when downburst errors
occur. Made error messages more accurate, and removed a destroy_if_vm call
because the destroy was alreadly called in unlock. Changed some print
messages to be log.info displays.
Fix: 5957
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Made grub execution conditional and not done when ARM.
Use ctx parameter to change machine type to tala.
Fix kernel assignments when running ARM systems.
Fixes: #5000
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Made grub execution conditional and not done when ARM.
Use ctx parameter to change machine type to tala.
Fix kernel assignments when running ARM systems.
Fixes: #5000
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Adding tests for ticket #5604 to test
user propagation via the radosgw-agent.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Enable multi-region calls and tests only if
the configuration has specified a
radosgw-agent tastk.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Because it relies on the lock server which is presumably not available
since check-locks was set to False. It matters when using teuthology
on a minimal installation.
http://tracker.ceph.com/issues/5946fixes#5946
Signed-off-by: Loic Dachary <loic@dachary.org>
Refactored the radosgw-agent.py code so that it
is structured more like existing teuthology
tasks.
Additionally, added code to enable:
using the override field in YAML files,
specifying which radosgw-agent github branch
to use checkout and for the YAML file to
specify one of the following: a full sync,
an incremental sync and the starting of the
test radosgw-agent server (previously the
server was always started by this task).
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
In the example config, the region root and
zone root where pointing to the same pool,
which is not a best practice. Updated the
example to show them pointing to different pools.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
When pulling region info from the config
structure, if the region info isn't there,
log a more helpful message.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Extend the s3readwrite.py task to enable the
creation and deletion of users for the s3readwrite
tests to be independently specified with the default
assumption to both being true.
This is needed for tests that will create a user and
data in one execution and read it in another.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Extend the rados pool configuration options to
specify all pools (if desired).
Also, reordered zone and region configuration
so that they're configure (per client) in
this order: zone, region, set default region
Signed-off-by: Joe Buck <jbbuck@gmail.com>
The log_data and log_metadata are made configurable
via the YAML file and default to false
(meaning neither data nor metadata operations are
logged).
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Go to the master zone in the master region for radosgw-admin
operations. Trigger metadata sync. Other fixes.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Enable s3readwrite task to have the branch to
download specified and for overrides to be
incorporated into the config at run-time.
Code based on the s3tests.py task.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
I saw
2013-08-03T12:56:26.641 DEBUG:teuthology.orchestra.run:Running [10.214.131.28]: 'sudo killall -9 smbd'
2013-08-03T12:56:26.727 DEBUG:teuthology.orchestra.run:Running [10.214.131.28]: 'sudo lsof /home/ubuntu/cephtest/93695/mnt.0'
2013-08-03T12:56:26.830 INFO:teuthology.orchestra.run.out:[10.214.131.28]: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
2013-08-03T12:56:26.830 INFO:teuthology.orchestra.run.out:[10.214.131.28]: smbd 12381 root cwd DIR 0,0 0 1 /home/ubuntu/cephtest/93695/mnt.0
which makes me think we just need to wait a moment before
attempting the umount?
Signed-off-by: Sage Weil <sage@inktank.com>
The rgw task was failing to check for a None object
when parsing user info in the case where there were
config options set for the client that did not include
user info (e.g. valgrind: ).
Correcting a bug where specifying
a rgw server for a client but not specifying
a system user would throw an exception.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The log_data and log_metadata are made configurable
via the YAML file and default to false
(meaning neither data nor metadata operations are
logged).
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
- Read ceph.conf from stored copy that includes overrides
- Get system users and keys from cluster instead of reading other
tasks' yaml, which may not be complete.
- Put zone info extraction from the cluster into utility functions,
since it'll be useful for other tests later.
- Work with more than one agent on a single host
- Accept more than one client to run, like almost every other task
- Rename target to dest for consistency with radosgw-agent
- Don't make everything one large function
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This pulls access data out of the rgw task and off disk,
and then downloads, sets up, and runs an rgw sync agent
in test mode.
Signed-off-by: Greg Farnum <greg@inktank.com>
This makes --lock-many work when --machine-type vps is passed.
Before it wasn't handled correctly and guests were not created.
Now it creates and gives the back the user the list-targets for
said guests.
teuthology-lock --lock-many 4 --machine-type vps --os-type centos
This fixes issue #5836
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Alfredo Deza <alfredo@deza.pe>
On debian wheezy its mount output uses device-by-label and makes
our normal method of checking if a device is mounted not work.
Since vm's will always be vda for their boot device we will just
remove it from devs if its in there so it doesn't attempt to zap
vda.
I also added a strip() to remove the last blank entry that was
always getting added to the devs list on all machines. Example:
devs=['/dev/sda', '/dev/sdb', '/dev/sdc', '/dev/sdd', '']
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Alfredo Deza <alfredo@deza.pe>
Fixes a bug where an rgw client without
a system user specified would cause teuthology
to error out.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
By separating out the user creation from
generating the region/zone info, we can generate
users for RGW tests that run against the default
pools.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
A 'user create' call was being passed to radosgw-admin
with '--secret-key' instead of the valid '--secret'
which was causing a random secret to be generated,
which was causing subsequent tests to fail.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
fastcgi_sock dir needs to exist before radosgw starts, and apache-execed radosgw needs an explicit keyring argument.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Just a simple change to reconnect to SSH after running
ceph-qa-chef to get around things like ulimit changes.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
packages are missing (the old code skipped 'Nothing to do' messages, but these
cases are still errors).
Fixes#5803
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Reviewed by: Sandon Van Ness
Needed some more changes to allow for the case of creating vm's
manually with teuthology-lock instead of letting teuthology handle
it in internal.py with lock_machines(). Just some additional checks
to go to defaults when ctx.config is non-existent (causes an
attributeerror).
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Warren Usui <warren.usui@inktank.com>
Teuthology got updated to use --os-type and os_type in yaml
instead of --vm-type. I added this to teuthology but forgot
to update tuthology-lock as well for manually creating vms.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Only radosgw needs this option, and each one will be different, so
remove it from the ceph.conf template.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
The clients are pretty regularly reporting busy on unmount when
samba runs above them. This will hopefully give us some info about why.
Signed-off-by: Greg Farnum <greg@inktank.com>
Since getting the ostype is used multiple places I made a
function for it and modified the existing code to use
said function. I also added tests for the function.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Due to bug #5716, pools need to start with a '.' at present.
Updating the examples to follow this convention.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The post-yield code in create_dirs needed to
be tweaked to correctly delete the {tdir}/apache
directory (if it exists) on each client.
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Take client<->zone/region and the associated pools from ceph.conf, so
we don't have to invent a new format to specify it.
General region info is added to a new configuration section in the rgw
task. Each client is assumed to be a different zone, and a system user
is created with the key specified in the yaml, so it can be passed to
later task configuration as well. This isn't strictly necessary, but
avoids having to lookup this info in later tasks through something
like radosgw-admin.
Ports are allocated automatically because there's no obvious mapping
from host to client in the task configuration. Later tests can get the
endpoints desired by reading the region map.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Six copies are replaced with one, with an added option to check status
automatically. This should probably be used in a few places where the
return code is ignored.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
In some cases tests fail or nuke fails and the guest is
not properly destroyed. This will look to see if it gets
an error due to the guest already existing or its disks
existing and will re-create the guest.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Just to allow for the create to still work incase the os
volume is fairly large (takes a while to resize) and in
case the host machine is bogged down due to disk I/O.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Use os_type instead of vm_type for more generic naming
for when we start re-imaging bare metal. Also added a
os_version dictionary for default versions of distros
that we want over-riding what downburst defaults are.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
tasks:
...
- ceph.wait_for_mon_quorum: [a, b]
...
will block until the mon quorum consists of exactly [a, b]. This is
compared directly to the relevant field from 'ceph quorum_status'
which has the alphanumeric names only.
Signed-off-by: Sage Weil <sage@inktank.com>
Often we want to build a test collection that substitutes different
sequences of tasks into a parallel/sequential construction. However, the
yaml combination that happens when generating jobs is not smart enough to
substitute some fragment into a deeply-nested piece of yaml.
Instead, make these sequences top-level entries in the config dict, and
reference them. For example:
tasks:
- install:
- ceph:
- parallel:
- workload
- upgrade-sequence
workload:
workunit:
- something
upgrade-sequence:
install.restart: [osd.0, osd.1]
Signed-off-by: Sage Weil <sage@inktank.com>
Instead of relying on hardcoded values, obtain the max-skew default from
'ceph-mon --show-config-value mon_clock_drift_allowed' to match the mon's
expectation.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Sometimes the thing we're talking to is slow to start, or to register the
command we are running. Loop in that case, at least for a while.
Signed-off-by: Sage Weil <sage@inktank.com>
If not defined, defaults to 0.05; if 'max-skew' however is defined, it
must override whatever is on the config.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
teuthology-suite and schedulewill now take --worker instead of
--branch. The branch is set by setting teuthology_branch in the
yaml used to schedule the job.
The teuthology branches are assumed to be in ~/teuthology-$branch
of whatever user is running the workers.
This will make the CLI do every mon command twice and make sure they both
succeed. This catches problems with mon command idempotency faster than
waiting for random failures trigger.
Added sequential task and parallel task.
Changed _run_one_task to run_one_task (now called by new tasks too).
Fix#4969
Signed-off-by: Warren Usui <warren.usui@inktank.com>
We already install btrfs-tools and xfsprogs with ceph-qa-chef
Doing it here was just causing problems on non-ubuntu
distros and I really see no point for it to have it now.