2011-06-06 21:22:49 +00:00
|
|
|
==================================================
|
|
|
|
`Teuthology` -- The Ceph integration test runner
|
|
|
|
==================================================
|
|
|
|
|
|
|
|
The Ceph project needs automated tests. Because Ceph is a highly
|
|
|
|
distributed system, and has active kernel development, its testing
|
|
|
|
requirements are quite different from e.g. typical LAMP web
|
|
|
|
applications. Nothing out there seemed to handle our requirements,
|
|
|
|
so we wrote our own framework, called `Teuthology`.
|
|
|
|
|
|
|
|
|
|
|
|
Overview
|
|
|
|
========
|
|
|
|
|
|
|
|
Teuthology runs a given set of Python functions (`tasks`), with an SSH
|
|
|
|
connection to every host participating in the test. The SSH connection
|
|
|
|
uses `Paramiko <http://www.lag.net/paramiko/>`__, a native Python
|
|
|
|
client for the SSH2 protocol, and this allows us to e.g. run multiple
|
|
|
|
commands inside a single SSH connection, to speed up test
|
|
|
|
execution. Tests can use `gevent <http://www.gevent.org/>`__ to
|
|
|
|
perform actions concurrently or in the background.
|
|
|
|
|
|
|
|
|
|
|
|
Build
|
|
|
|
=====
|
|
|
|
|
|
|
|
Teuthology uses several Python packages that are not in the standard
|
|
|
|
library. To make the dependencies easier to get right, we use a
|
|
|
|
`virtualenv` to manage them. To get started, ensure you have the
|
|
|
|
``virtualenv`` and ``pip`` programs installed; e.g. on Debian/Ubuntu::
|
|
|
|
|
2012-05-24 16:37:46 +00:00
|
|
|
sudo apt-get install python-dev python-virtualenv python-pip libevent-dev
|
2011-06-06 21:22:49 +00:00
|
|
|
|
|
|
|
and then run::
|
|
|
|
|
|
|
|
./bootstrap
|
|
|
|
|
2011-09-13 21:53:02 +00:00
|
|
|
You can run Teuthology's internal unit tests with::
|
2011-06-06 21:22:49 +00:00
|
|
|
|
2011-09-13 21:53:02 +00:00
|
|
|
./virtualenv/bin/nosetests
|
2011-06-06 21:22:49 +00:00
|
|
|
|
|
|
|
|
|
|
|
Test configuration
|
|
|
|
==================
|
|
|
|
|
|
|
|
An integration test run takes three items of configuration:
|
|
|
|
|
2011-07-14 23:47:29 +00:00
|
|
|
- ``targets``: what hosts to run on; this is a dictionary mapping
|
|
|
|
hosts to ssh host keys, like:
|
|
|
|
"username@hostname.example.com: ssh-rsa long_hostkey_here"
|
2011-06-06 21:22:49 +00:00
|
|
|
- ``roles``: how to use the hosts; this is a list of lists, where each
|
|
|
|
entry lists all the roles to be run on a single host; for example, a
|
|
|
|
single entry might say ``[mon.1, osd.1]``
|
|
|
|
- ``tasks``: how to set up the cluster and what tests to run on it;
|
|
|
|
see below for examples
|
|
|
|
|
|
|
|
The format for this configuration is `YAML <http://yaml.org/>`__, a
|
|
|
|
structured data format that is still human-readable and editable.
|
|
|
|
|
|
|
|
For example, a full config for a test run that sets up a three-machine
|
2012-09-18 20:50:36 +00:00
|
|
|
cluster, mounts Ceph via ``ceph-fuse``, and leaves you at an interactive
|
2011-06-06 21:22:49 +00:00
|
|
|
Python prompt for manual exploration (and enabling you to SSH in to
|
|
|
|
the nodes & use the live cluster ad hoc), might look like this::
|
|
|
|
|
|
|
|
roles:
|
|
|
|
- [mon.0, mds.0, osd.0]
|
|
|
|
- [mon.1, osd.1]
|
|
|
|
- [mon.2, client.0]
|
|
|
|
targets:
|
2012-10-22 18:21:07 +00:00
|
|
|
ubuntu@host07.example.com: ssh-rsa host07_ssh_key
|
|
|
|
ubuntu@host08.example.com: ssh-rsa host08_ssh_key
|
|
|
|
ubuntu@host09.example.com: ssh-rsa host09_ssh_key
|
2011-06-06 21:22:49 +00:00
|
|
|
tasks:
|
|
|
|
- ceph:
|
2012-09-18 20:50:36 +00:00
|
|
|
- ceph-fuse: [client.0]
|
2011-06-06 21:22:49 +00:00
|
|
|
- interactive:
|
|
|
|
|
|
|
|
The number of entries under ``roles`` and ``targets`` must match.
|
|
|
|
|
|
|
|
Note the colon after every task name in the ``tasks`` section.
|
|
|
|
|
|
|
|
You need to be able to SSH in to the listed targets without
|
|
|
|
passphrases, and the remote user needs to have passphraseless `sudo`
|
2012-10-22 18:33:44 +00:00
|
|
|
access. Note that the ssh keys at the end of the ``targets``
|
2012-10-22 18:21:07 +00:00
|
|
|
entries are the public ssh keys for the hosts.
|
2012-10-22 18:33:44 +00:00
|
|
|
On Ubuntu, these are located at /etc/ssh/ssh_host_rsa_key.pub
|
2011-06-06 21:22:49 +00:00
|
|
|
|
|
|
|
If you'd save the above file as ``example.yaml``, you could run
|
|
|
|
teuthology on it by saying::
|
|
|
|
|
|
|
|
./virtualenv/bin/teuthology example.yaml
|
|
|
|
|
|
|
|
You can also pass the ``-v`` option, for more verbose execution. See
|
|
|
|
``teuthology --help`` for more.
|
|
|
|
|
|
|
|
|
|
|
|
Multiple config files
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
You can pass multiple files as arguments to ``teuthology``. Each one
|
|
|
|
will be read as a config file, and their contents will be merged. This
|
|
|
|
allows you to e.g. share definitions of what a "simple 3 node cluster"
|
|
|
|
is. The source tree comes with ``roles/3-simple.yaml``, so we could
|
|
|
|
skip the ``roles`` section in the above ``example.yaml`` and then
|
|
|
|
run::
|
|
|
|
|
|
|
|
./virtualenv/bin/teuthology roles/3-simple.yaml example.yaml
|
|
|
|
|
|
|
|
|
|
|
|
Reserving target machines
|
|
|
|
-------------------------
|
|
|
|
|
2011-07-07 19:16:10 +00:00
|
|
|
Before locking machines will work, you must create a .teuthology.yaml
|
|
|
|
file in your home directory that sets a lock_server, i.e.::
|
2011-06-06 21:22:49 +00:00
|
|
|
|
2011-07-07 19:16:10 +00:00
|
|
|
lock_server: http://host.example.com:8080/lock
|
2011-06-06 21:22:49 +00:00
|
|
|
|
2011-07-07 19:16:10 +00:00
|
|
|
Teuthology automatically locks nodes for you if you specify the
|
|
|
|
``--lock`` option. Without this option, you must specify machines to
|
|
|
|
run on in a ``targets.yaml`` file, and lock them using
|
|
|
|
teuthology-lock.
|
|
|
|
|
|
|
|
Note that the default owner of a machine is ``USER@HOST``.
|
|
|
|
You can override this with the ``--owner`` option when running
|
|
|
|
teuthology or teuthology-lock.
|
|
|
|
|
|
|
|
With teuthology-lock, you can also add a description, so you can
|
|
|
|
remember which tests you were running on them. This can be done when
|
|
|
|
locking or unlocking machines, or as a separate action with the
|
|
|
|
``--update`` option. To lock 3 machines and set a description, run::
|
|
|
|
|
|
|
|
./virtualenv/bin/teuthology-lock --lock-many 3 --desc 'test foo'
|
|
|
|
|
|
|
|
If machines become unusable for some reason, you can mark them down::
|
|
|
|
|
|
|
|
./virtualenv/bin/teuthology-lock --update --status down machine1 machine2
|
|
|
|
|
|
|
|
To see the status of all machines, use the ``--list`` option. This can
|
|
|
|
be restricted to particular machines as well::
|
|
|
|
|
|
|
|
./virtualenv/bin/teuthology-lock --list machine1 machine2
|
2011-06-06 21:22:49 +00:00
|
|
|
|
|
|
|
|
|
|
|
Tasks
|
|
|
|
=====
|
|
|
|
|
|
|
|
A task is a Python module in the ``teuthology.task`` package, with a
|
|
|
|
callable named ``task``. It gets the following arguments:
|
|
|
|
|
|
|
|
- ``ctx``: a context that is available through the lifetime of the
|
|
|
|
test run, and has useful attributes such as ``cluster``, letting the
|
|
|
|
task access the remote hosts. Tasks can also store their internal
|
|
|
|
state here. (TODO beware namespace collisions.)
|
|
|
|
- ``config``: the data structure after the colon in the config file,
|
2012-09-18 20:50:36 +00:00
|
|
|
e.g. for the above ``ceph-fuse`` example, it would be a list like
|
2011-06-06 21:22:49 +00:00
|
|
|
``["client.0"]``.
|
|
|
|
|
|
|
|
Tasks can be simple functions, called once in the order they are
|
|
|
|
listed in ``tasks``. But sometimes, it makes sense for a task to be
|
|
|
|
able to clean up after itself; for example, unmounting the filesystem
|
|
|
|
after a test run. A task callable that returns a Python `context
|
|
|
|
manager
|
|
|
|
<http://docs.python.org/library/stdtypes.html#typecontextmanager>`__
|
|
|
|
will have the manager added to a stack, and the stack will be unwound
|
|
|
|
at the end of the run. This means the cleanup actions are run in
|
|
|
|
reverse order, both on success and failure. A nice way of writing
|
|
|
|
context managers is the ``contextlib.contextmanager`` decorator; look
|
|
|
|
for that string in the existing tasks to see examples, and note where
|
|
|
|
they use ``yield``.
|
2011-08-09 22:42:17 +00:00
|
|
|
|
|
|
|
|
|
|
|
Troubleshooting
|
|
|
|
===============
|
|
|
|
|
|
|
|
Sometimes when a bug triggers, instead of automatic cleanup, you want
|
|
|
|
to explore the system as is. Adding a top-level::
|
|
|
|
|
|
|
|
interactive-on-error: true
|
|
|
|
|
|
|
|
as a config file for ``teuthology`` will make that possible. With that
|
|
|
|
option, any *task* that fails, will have the ``interactive`` task
|
|
|
|
called after it. This means that before any cleanup happens, you get a
|
2012-10-06 02:25:48 +00:00
|
|
|
chance to inspect the system -- both through Teuthology and via extra
|
2011-08-09 22:42:17 +00:00
|
|
|
SSH connections -- and the cleanup completes only when you choose so.
|
|
|
|
Just exit the interactive Python session to continue the cleanup.
|
|
|
|
|
|
|
|
TODO: this only catches exceptions *between* the tasks. If a task
|
|
|
|
calls multiple subtasks, e.g. with ``contextutil.nested``, those
|
|
|
|
cleanups *will* be performed. Later on, we can let tasks communicate
|
|
|
|
the subtasks they wish to invoke to the top-level runner, avoiding
|
|
|
|
this issue.
|
2013-01-23 20:37:39 +00:00
|
|
|
|
|
|
|
Test Sandbox Directory
|
|
|
|
======================
|
|
|
|
|
|
|
|
Teuthology currently places most test files and mount points in a sandbox
|
|
|
|
directory, defaulting to /tmp/cephtest/{rundir}. The {rundir} is the name
|
|
|
|
of the run (as given by --name) or if no name is specified, user@host-timestamp
|
|
|
|
is used. To change the location of the sandbox directory, the following
|
|
|
|
options can be specified in $HOME/.teuthology.yaml:
|
|
|
|
|
|
|
|
base_test_dir: <directory>
|
|
|
|
|
|
|
|
The ``base_test_dir`` option will set the base directory to use for the individual
|
|
|
|
run directories. If not specified, this defaults to: ``/tmp/cephtest``.
|
|
|
|
|
|
|
|
test_path: <directory>
|
|
|
|
|
|
|
|
The ``test_path`` option will set the complete path to use for the test directory.
|
|
|
|
This allows for the old behavior, where ``/tmp/cephtest`` was used as the sandbox
|
|
|
|
directory.
|
2013-04-03 01:27:38 +00:00
|
|
|
|
|
|
|
|
|
|
|
VIRTUAL MACHINE SUPPORT
|
|
|
|
======= ======= =======
|
|
|
|
|
|
|
|
Teuthology also supports virtual machines, which can function like
|
|
|
|
physical machines but differ in the following ways:
|
|
|
|
|
|
|
|
VPSHOST:
|
|
|
|
|
|
|
|
A new entry, vpshost, has been added to the teuthology database of
|
|
|
|
available machines. For physical machines, this value is null. For
|
|
|
|
virtual machines, this entry is the name of the physical machine that
|
|
|
|
that virtual machine resides on.
|
|
|
|
|
|
|
|
There are fixed "slots" for virtual machines that appear in the teuthology
|
|
|
|
database. These slots have a machine type of vps and can be locked like
|
|
|
|
any other machine. The existence of a vpshost field is how teuthology
|
|
|
|
knows whether or not a database entry represents a physical or a virtual
|
|
|
|
machine.
|
|
|
|
|
|
|
|
DOWNBURST:
|
|
|
|
|
|
|
|
When a virtual machine is locked, downburst is run on that machine to
|
|
|
|
install a new image. This allows the user to set different virtual
|
|
|
|
OSes to be installed on the newly created virtual machine. Currently
|
|
|
|
the default virtual machine is ubuntu (precise). A different vm installation
|
|
|
|
can be set using the --vm-type option in teuthology.lock.
|
|
|
|
|
|
|
|
When a virtual machine is unlocked, downburst destroys the image on the
|
|
|
|
machine.
|
|
|
|
|
|
|
|
Temporary yaml files are used to downburst a virtual machine. A typical
|
|
|
|
yaml file will look like this:
|
|
|
|
|
|
|
|
downburst:
|
|
|
|
cpus: 1
|
|
|
|
disk-size: 30G
|
|
|
|
distro: centos
|
|
|
|
networks:
|
|
|
|
- {source: front}
|
|
|
|
ram: 4G
|
|
|
|
|
|
|
|
These values are used by downburst to create the virtual machine.
|
|
|
|
|
|
|
|
HOST KEYS:
|
|
|
|
|
|
|
|
Because teuthology reinstalls a new machine, a new hostkey is generated.
|
|
|
|
After locking, once a connection is established to the new machine,
|
|
|
|
teuthology-lock with the --list or --list-targets options will display
|
|
|
|
the new keys. When vps machines are locked using the --lock-many option,
|
|
|
|
a message is displayed indicating that --list-targets should be run later.
|
|
|
|
|
|
|
|
CEPH-QA-CHEF:
|
|
|
|
|
|
|
|
Once teuthology starts after a new vm is installed, teuthology
|
|
|
|
checks for the existence of /ceph-qa-ready. If this file is not
|
|
|
|
present, ceph-qa-chef is run when teuthology first comes up.
|
|
|
|
|
|
|
|
ASSUMPTIONS:
|
|
|
|
|
|
|
|
It is assumed that downburst is on the user's PATH.
|
|
|
|
|