ceph/README.rst

==================================================
 `Teuthology` -- The Ceph integration test runner
==================================================

The Ceph project needs automated tests. Because Ceph is a highly
distributed system, and has active kernel development, its testing
requirements are quite different from e.g. typical LAMP web
applications. Nothing out there seemed to handle our requirements,
so we wrote our own framework, called `Teuthology`.


Overview
========

Teuthology runs a given set of Python functions (`tasks`), with an SSH
connection to every host participating in the test. The SSH connection
uses `Paramiko <http://www.lag.net/paramiko/>`__, a native Python
client for the SSH2 protocol, and this allows us to e.g. run multiple
commands inside a single SSH connection, to speed up test
execution. Tests can use `gevent <http://www.gevent.org/>`__ to
perform actions concurrently or in the background.


Build
=====

Teuthology uses several Python packages that are not in the standard
library. To make the dependencies easier to get right, we use a
`virtualenv` to manage them. To get started, ensure you have the
``virtualenv`` and ``pip`` programs installed; e.g. on Debian/Ubuntu::

	sudo apt-get install python-dev python-virtualenv python-pip libevent-dev

and then run::

	./bootstrap

You can run Teuthology's internal unit tests with::

	./virtualenv/bin/nosetests


Test configuration
==================

An integration test run takes three items of configuration:

- ``targets``: what hosts to run on; this is a dictionary mapping
  hosts to ssh host keys, like:
  "username@hostname.example.com: ssh-rsa long_hostkey_here"
- ``roles``: how to use the hosts; this is a list of lists, where each
  entry lists all the roles to be run on a single host; for example, a
  single entry might say ``[mon.1, osd.1]``
- ``tasks``: how to set up the cluster and what tests to run on it;
  see below for examples

The format for this configuration is `YAML <http://yaml.org/>`__, a
structured data format that is still human-readable and editable.

For example, a full config for a test run that sets up a three-machine
cluster, mounts Ceph via ``ceph-fuse``, and leaves you at an interactive
Python prompt for manual exploration (and enabling you to SSH in to
the nodes & use the live cluster ad hoc), might look like this::

	roles:
	- [mon.0, mds.0, osd.0]
	- [mon.1, osd.1]
	- [mon.2, client.0]
	targets:
	  ubuntu@host07.example.com: ssh-rsa host07_ssh_key 
	  ubuntu@host08.example.com: ssh-rsa host08_ssh_key
	  ubuntu@host09.example.com: ssh-rsa host09_ssh_key
	tasks:
	- ceph:
	- ceph-fuse: [client.0]
	- interactive:

The number of entries under ``roles`` and ``targets`` must match.

Note the colon after every task name in the ``tasks`` section.

You need to be able to SSH in to the listed targets without
passphrases, and the remote user needs to have passphraseless `sudo`
access. Note that the ssh keys at the end of the ``targets``
entries are the public ssh keys for the hosts. 
On Ubuntu, these are located at /etc/ssh/ssh_host_rsa_key.pub

If you'd save the above file as ``example.yaml``, you could run
teuthology on it by saying::

	./virtualenv/bin/teuthology example.yaml

You can also pass the ``-v`` option, for more verbose execution. See
``teuthology --help`` for more.


Multiple config files
---------------------

You can pass multiple files as arguments to ``teuthology``. Each one
will be read as a config file, and their contents will be merged. This
allows you to e.g. share definitions of what a "simple 3 node cluster"
is. The source tree comes with ``roles/3-simple.yaml``, so we could
skip the ``roles`` section in the above ``example.yaml`` and then
run::

	./virtualenv/bin/teuthology roles/3-simple.yaml example.yaml


Reserving target machines
-------------------------

Before locking machines will work, you must create a .teuthology.yaml
file in your home directory that sets a lock_server, i.e.::

	lock_server: http://host.example.com:8080/lock

Teuthology automatically locks nodes for you if you specify the
``--lock`` option. Without this option, you must specify machines to
run on in a ``targets.yaml`` file, and lock them using
teuthology-lock.

Note that the default owner of a machine is ``USER@HOST``.
You can override this with the ``--owner`` option when running
teuthology or teuthology-lock.

With teuthology-lock, you can also add a description, so you can
remember which tests you were running on them. This can be done when
locking or unlocking machines, or as a separate action with the
``--update`` option. To lock 3 machines and set a description, run::

	./virtualenv/bin/teuthology-lock --lock-many 3 --desc 'test foo'

If machines become unusable for some reason, you can mark them down::

	./virtualenv/bin/teuthology-lock --update --status down machine1 machine2

To see the status of all machines, use the ``--list`` option. This can
be restricted to particular machines as well::

	./virtualenv/bin/teuthology-lock --list machine1 machine2


Tasks
=====

A task is a Python module in the ``teuthology.task`` package, with a
callable named ``task``. It gets the following arguments:

- ``ctx``: a context that is available through the lifetime of the
  test run, and has useful attributes such as ``cluster``, letting the
  task access the remote hosts. Tasks can also store their internal
  state here. (TODO beware namespace collisions.)
- ``config``: the data structure after the colon in the config file,
  e.g. for the above ``ceph-fuse`` example, it would be a list like
  ``["client.0"]``.

Tasks can be simple functions, called once in the order they are
listed in ``tasks``. But sometimes, it makes sense for a task to be
able to clean up after itself; for example, unmounting the filesystem
after a test run. A task callable that returns a Python `context
manager
<http://docs.python.org/library/stdtypes.html#typecontextmanager>`__
will have the manager added to a stack, and the stack will be unwound
at the end of the run. This means the cleanup actions are run in
reverse order, both on success and failure. A nice way of writing
context managers is the ``contextlib.contextmanager`` decorator; look
for that string in the existing tasks to see examples, and note where
they use ``yield``.


Troubleshooting
===============

Sometimes when a bug triggers, instead of automatic cleanup, you want
to explore the system as is. Adding a top-level::

	interactive-on-error: true

as a config file for ``teuthology`` will make that possible. With that
option, any *task* that fails, will have the ``interactive`` task
called after it. This means that before any cleanup happens, you get a
chance to inspect the system -- both through Teuthology and via extra
SSH connections -- and the cleanup completes only when you choose so.
Just exit the interactive Python session to continue the cleanup.

TODO: this only catches exceptions *between* the tasks. If a task
calls multiple subtasks, e.g. with ``contextutil.nested``, those
cleanups *will* be performed. Later on, we can let tasks communicate
the subtasks they wish to invoke to the top-level runner, avoiding
this issue.

Test Sandbox Directory
======================

Teuthology currently places most test files and mount points in a sandbox
directory, defaulting to /tmp/cephtest/{rundir}.  The {rundir} is the name
of the run (as given by --name) or if no name is specified, user@host-timestamp
is used.  To change the location of the sandbox directory, the following
options can be specified in $HOME/.teuthology.yaml:

	base_test_dir: <directory>

The ``base_test_dir`` option will set the base directory to use for the individual
run directories.  If not specified, this defaults to: ``/tmp/cephtest``.

	test_path: <directory>

The ``test_path`` option will set the complete path to use for the test directory.
This allows for the old behavior, where ``/tmp/cephtest`` was used as the sandbox
directory.
First draft of documentation. 2011-06-06 21:22:49 +00:00			`==================================================`
			`Teuthology` -- The Ceph integration test runner
			`==================================================`

			`The Ceph project needs automated tests. Because Ceph is a highly`
			`distributed system, and has active kernel development, its testing`
			`requirements are quite different from e.g. typical LAMP web`
			`applications. Nothing out there seemed to handle our requirements,`
			so we wrote our own framework, called `Teuthology`.


			`Overview`
			`========`

			Teuthology runs a given set of Python functions (`tasks`), with an SSH
			`connection to every host participating in the test. The SSH connection`
			uses `Paramiko <http://www.lag.net/paramiko/>`__, a native Python
			`client for the SSH2 protocol, and this allows us to e.g. run multiple`
			`commands inside a single SSH connection, to speed up test`
			execution. Tests can use `gevent <http://www.gevent.org/>`__ to
			`perform actions concurrently or in the background.`


			`Build`
			`=====`

			`Teuthology uses several Python packages that are not in the standard`
			`library. To make the dependencies easier to get right, we use a`
			`virtualenv` to manage them. To get started, ensure you have the
			``virtualenv`` and ``pip`` programs installed; e.g. on Debian/Ubuntu::

Added python-dev to list of required packages. 2012-05-24 16:37:46 +00:00			`sudo apt-get install python-dev python-virtualenv python-pip libevent-dev`
First draft of documentation. 2011-06-06 21:22:49 +00:00
			`and then run::`

			`./bootstrap`

Move orchestra to teuthology.orchestra so there's just one top-level package. 2011-09-13 21:53:02 +00:00			`You can run Teuthology's internal unit tests with::`
First draft of documentation. 2011-06-06 21:22:49 +00:00
Move orchestra to teuthology.orchestra so there's just one top-level package. 2011-09-13 21:53:02 +00:00			`./virtualenv/bin/nosetests`
First draft of documentation. 2011-06-06 21:22:49 +00:00

			`Test configuration`
			`==================`

			`An integration test run takes three items of configuration:`

Make targets a dictionary mapping hosts to ssh host keys. 2011-07-14 23:47:29 +00:00			- ``targets``: what hosts to run on; this is a dictionary mapping
			`hosts to ssh host keys, like:`
			`"username@hostname.example.com: ssh-rsa long_hostkey_here"`
First draft of documentation. 2011-06-06 21:22:49 +00:00			- ``roles``: how to use the hosts; this is a list of lists, where each
			`entry lists all the roles to be run on a single host; for example, a`
			single entry might say ``[mon.1, osd.1]``
			- ``tasks``: how to set up the cluster and what tests to run on it;
			`see below for examples`

			The format for this configuration is `YAML <http://yaml.org/>`__, a
			`structured data format that is still human-readable and editable.`

			`For example, a full config for a test run that sets up a three-machine`
update doc to be ceph-fuse task (instead of cfuse) 2012-09-18 20:50:36 +00:00			cluster, mounts Ceph via ``ceph-fuse``, and leaves you at an interactive
First draft of documentation. 2011-06-06 21:22:49 +00:00			`Python prompt for manual exploration (and enabling you to SSH in to`
			`the nodes & use the live cluster ad hoc), might look like this::`

			`roles:`
			`- [mon.0, mds.0, osd.0]`
			`- [mon.1, osd.1]`
			`- [mon.2, client.0]`
			`targets:`
Edit to README.bst to note that the ssh keys needed in the targets entries are the hosts public ssh key and not the users defined in the same entry 2012-10-22 18:21:07 +00:00			`ubuntu@host07.example.com: ssh-rsa host07_ssh_key`
			`ubuntu@host08.example.com: ssh-rsa host08_ssh_key`
			`ubuntu@host09.example.com: ssh-rsa host09_ssh_key`
First draft of documentation. 2011-06-06 21:22:49 +00:00			`tasks:`
			`- ceph:`
update doc to be ceph-fuse task (instead of cfuse) 2012-09-18 20:50:36 +00:00			`- ceph-fuse: [client.0]`
First draft of documentation. 2011-06-06 21:22:49 +00:00			`- interactive:`

			The number of entries under ``roles`` and ``targets`` must match.

			Note the colon after every task name in the ``tasks`` section.

			`You need to be able to SSH in to the listed targets without`
			passphrases, and the remote user needs to have passphraseless `sudo`
fixing the tense of my README change 2012-10-22 18:33:44 +00:00			access. Note that the ssh keys at the end of the ``targets``
Edit to README.bst to note that the ssh keys needed in the targets entries are the hosts public ssh key and not the users defined in the same entry 2012-10-22 18:21:07 +00:00			`entries are the public ssh keys for the hosts.`
fixing the tense of my README change 2012-10-22 18:33:44 +00:00			`On Ubuntu, these are located at /etc/ssh/ssh_host_rsa_key.pub`
First draft of documentation. 2011-06-06 21:22:49 +00:00
			If you'd save the above file as ``example.yaml``, you could run
			`teuthology on it by saying::`

			`./virtualenv/bin/teuthology example.yaml`

			You can also pass the ``-v`` option, for more verbose execution. See
			``teuthology --help`` for more.


			`Multiple config files`
			`---------------------`

			You can pass multiple files as arguments to ``teuthology``. Each one
			`will be read as a config file, and their contents will be merged. This`
			`allows you to e.g. share definitions of what a "simple 3 node cluster"`
			is. The source tree comes with ``roles/3-simple.yaml``, so we could
			skip the ``roles`` section in the above ``example.yaml`` and then
			`run::`

			`./virtualenv/bin/teuthology roles/3-simple.yaml example.yaml`


			`Reserving target machines`
			`-------------------------`

Update readme for locking. 2011-07-07 19:16:10 +00:00			`Before locking machines will work, you must create a .teuthology.yaml`
			`file in your home directory that sets a lock_server, i.e.::`
First draft of documentation. 2011-06-06 21:22:49 +00:00
Update readme for locking. 2011-07-07 19:16:10 +00:00			`lock_server: http://host.example.com:8080/lock`
First draft of documentation. 2011-06-06 21:22:49 +00:00
Update readme for locking. 2011-07-07 19:16:10 +00:00			`Teuthology automatically locks nodes for you if you specify the`
			``--lock`` option. Without this option, you must specify machines to
			run on in a ``targets.yaml`` file, and lock them using
			`teuthology-lock.`

			Note that the default owner of a machine is ``USER@HOST``.
			You can override this with the ``--owner`` option when running
			`teuthology or teuthology-lock.`

			`With teuthology-lock, you can also add a description, so you can`
			`remember which tests you were running on them. This can be done when`
			`locking or unlocking machines, or as a separate action with the`
			``--update`` option. To lock 3 machines and set a description, run::

			`./virtualenv/bin/teuthology-lock --lock-many 3 --desc 'test foo'`

			`If machines become unusable for some reason, you can mark them down::`

			`./virtualenv/bin/teuthology-lock --update --status down machine1 machine2`

			To see the status of all machines, use the ``--list`` option. This can
			`be restricted to particular machines as well::`

			`./virtualenv/bin/teuthology-lock --list machine1 machine2`
First draft of documentation. 2011-06-06 21:22:49 +00:00

			`Tasks`
			`=====`

			A task is a Python module in the ``teuthology.task`` package, with a
			callable named ``task``. It gets the following arguments:

			- ``ctx``: a context that is available through the lifetime of the
			test run, and has useful attributes such as ``cluster``, letting the
			`task access the remote hosts. Tasks can also store their internal`
			`state here. (TODO beware namespace collisions.)`
			- ``config``: the data structure after the colon in the config file,
update doc to be ceph-fuse task (instead of cfuse) 2012-09-18 20:50:36 +00:00			e.g. for the above ``ceph-fuse`` example, it would be a list like
First draft of documentation. 2011-06-06 21:22:49 +00:00			``["client.0"]``.

			`Tasks can be simple functions, called once in the order they are`
			listed in ``tasks``. But sometimes, it makes sense for a task to be
			`able to clean up after itself; for example, unmounting the filesystem`
			after a test run. A task callable that returns a Python `context
			`manager`
			<http://docs.python.org/library/stdtypes.html#typecontextmanager>`__
			`will have the manager added to a stack, and the stack will be unwound`
			`at the end of the run. This means the cleanup actions are run in`
			`reverse order, both on success and failure. A nice way of writing`
			context managers is the ``contextlib.contextmanager`` decorator; look
			`for that string in the existing tasks to see examples, and note where`
			they use ``yield``.
Add interactive-on-error, to pause and explore on error. Closes: http://tracker.newdream.net/issues/1291 2011-08-09 22:42:17 +00:00

			`Troubleshooting`
			`===============`

			`Sometimes when a bug triggers, instead of automatic cleanup, you want`
			`to explore the system as is. Adding a top-level::`

			`interactive-on-error: true`

			as a config file for ``teuthology`` will make that possible. With that
			option, any task that fails, will have the ``interactive`` task
			`called after it. This means that before any cleanup happens, you get a`
Fix typo in README Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 2012-10-06 02:25:48 +00:00			`chance to inspect the system -- both through Teuthology and via extra`
Add interactive-on-error, to pause and explore on error. Closes: http://tracker.newdream.net/issues/1291 2011-08-09 22:42:17 +00:00			`SSH connections -- and the cleanup completes only when you choose so.`
			`Just exit the interactive Python session to continue the cleanup.`

			`TODO: this only catches exceptions between the tasks. If a task`
			calls multiple subtasks, e.g. with ``contextutil.nested``, those
			`cleanups will be performed. Later on, we can let tasks communicate`
			`the subtasks they wish to invoke to the top-level runner, avoiding`
			`this issue.`
Replace /tmp/cephtest/ with configurable path Teuthology uses /tmp/cephtest/ as the scratch test directory for a run. This patch replaces /tmp/cephtest/ everywhere with a per-run directory: {basedir}/{rundir} where {basedir} is a directory configured in .teuthology.yaml (/tmp/cephtest if not specified), and {rundir} is the name of the run, as given in --name. If no name is specified, {user}-{timestamp} is used. To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest in .teuthology.yaml. This change was modivated by #3782, which requires a test dir that survives across reboots, but also resolves #3767. Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> 2013-01-23 20:37:39 +00:00
			`Test Sandbox Directory`
			`======================`

			`Teuthology currently places most test files and mount points in a sandbox`
			`directory, defaulting to /tmp/cephtest/{rundir}. The {rundir} is the name`
			`of the run (as given by --name) or if no name is specified, user@host-timestamp`
			`is used. To change the location of the sandbox directory, the following`
			`options can be specified in $HOME/.teuthology.yaml:`

			`base_test_dir: <directory>`

			The ``base_test_dir`` option will set the base directory to use for the individual
			run directories. If not specified, this defaults to: ``/tmp/cephtest``.

			`test_path: <directory>`

			The ``test_path`` option will set the complete path to use for the test directory.
			This allows for the old behavior, where ``/tmp/cephtest`` was used as the sandbox
			`directory.`