mirror of
https://github.com/ceph/ceph
synced 2025-02-19 17:08:05 +00:00
doc/dev: Teuthology guide PR#37949 grammar edit
This PR improves the wording of the technical information added to the documentation in PR#37949. This is the second is a series of two PRs, which series is dedicated to testing a workflow wherein developers add technical information to the documentation and then technical writers improve its presentation. Signed-off-by: Zac Dover <zac.dover@gmail.com>
This commit is contained in:
parent
6baa74d8e7
commit
07a9bf4d55
@ -3,59 +3,66 @@
|
||||
Analyzing and Debugging A Teuthology Job
|
||||
========================================
|
||||
|
||||
For scheduling an integration test please refer to, `Scheduling Test Run`_.
|
||||
To learn more about how to schedule an integration test, refer to `Scheduling
|
||||
Test Run`_.
|
||||
|
||||
Once a teuthology run is successfully completed, we can access the results using
|
||||
pulpito dashboard, which looks like:
|
||||
When a teuthology run has been completed successfully, use `pulpito`_ dasboard
|
||||
to view the results::
|
||||
|
||||
http://pulpito.front.sepia.ceph.com/<job-name>/<job-id>/
|
||||
http://pulpito.front.sepia.ceph.com/<job-name>/<job-id>/
|
||||
|
||||
or via sshing into teuthology server::
|
||||
.. _pulpito: https://pulpito.ceph.com
|
||||
|
||||
or ssh into the teuthology server::
|
||||
|
||||
ssh <username>@teuthology.front.sepia.ceph.com
|
||||
|
||||
and accessing `teuthology archives`_, for example::
|
||||
|
||||
nano /a/teuthology-2021-01-06_07:01:02-rados-master-distro-basic-smithi/
|
||||
and access `teuthology archives`_, like this for example:
|
||||
|
||||
.. note:: This would require Sepia lab access. To know how to request it, see:
|
||||
.. prompt:: bash $
|
||||
|
||||
nano /a/teuthology-2021-01-06_07:01:02-rados-master-distro-basic-smithi/
|
||||
|
||||
.. note:: This requires you to have access to the Sepia lab. To learn how to
|
||||
request access to the Speia lab, see:
|
||||
https://ceph.github.io/sepia/adding_users/
|
||||
|
||||
On pulpito, the jobs in red specify either a failed or dead job.
|
||||
Here, a job is combination of daemons and configurations that are formed using
|
||||
On pulpito, jobs in red specify either a failed job or a dead job.
|
||||
A job is combination of daemons and configurations that are formed using
|
||||
`qa/suites`_ yaml fragments.
|
||||
Taking these configurations, teuthology runs the tasks that are present in
|
||||
Teuthology uses these configurations and runs the tasks that are present in
|
||||
`qa/tasks`_, which are commands used for setting up the test environment and
|
||||
testing Ceph's components.
|
||||
These tasks help us in covering large subset of usecase scenarios and hence
|
||||
exposing the bugs which were uncaught by `make check`_ testing.
|
||||
These tasks cover a large subset of use cases and help to
|
||||
expose the bugs that aren't caught by `make check`_ testing.
|
||||
|
||||
.. _make check: ../tests-integration-testing-teuthology-intro/#make-check
|
||||
|
||||
A job failure hence might be because of:
|
||||
A job failure might be caused by one or more of the following reasons:
|
||||
|
||||
* environment setup(`testing on varied systems<https://github.com/ceph/ceph/tree/master/qa/distros/supported>_`):
|
||||
* environment setup (`testing on varied
|
||||
systems <https://github.com/ceph/ceph/tree/master/qa/distros/supported>`_):
|
||||
testing compatibility with stable realeases for supported versions.
|
||||
|
||||
* permutation of config values: for instance, qa/suites/rados/thrash ensures to
|
||||
test Ceph under stressful workload, so that we be able to catch corner case
|
||||
bugs.
|
||||
The final setup config yaml that would be used for testing can be accessed
|
||||
at::
|
||||
* permutation of config values: for instance, `qa/suites/rados/thrash
|
||||
<https://github.com/ceph/ceph/tree/master/qa/suites/rados/thrash>`_ ensures
|
||||
running thrashing tests against Ceph under stressful workloads, so that we
|
||||
are able to catch corner-case bugs. The final setup config yaml used for
|
||||
testing can be accessed at::
|
||||
|
||||
/a/<job-name>/<job-id>/orig.config.yaml
|
||||
|
||||
More details about config.yaml can be found on `detailed test config`_
|
||||
More details about config.yaml can be found at `detailed test config`_
|
||||
|
||||
Triaging the cause of failure
|
||||
------------------------------
|
||||
|
||||
To triage a job failure, open the teuthology log for it using either(from the
|
||||
pulpito page):
|
||||
To triage a job failure, open the teuthology log for it using either the job
|
||||
name or the job id (from the pulpito page):
|
||||
|
||||
http://qa-proxy.ceph.com/<job-name>/<job-id>/teuthology.log
|
||||
http://qa-proxy.ceph.com/<job-name>/<job-id>/teuthology.log
|
||||
|
||||
and then opening log file with signature as:
|
||||
Open the log file:
|
||||
|
||||
/a/<job-name>/<job-id>/teuthology.log
|
||||
|
||||
@ -63,50 +70,55 @@ for example in our case::
|
||||
|
||||
nano /a/teuthology-2021-01-06_07:01:02-rados-master-distro-basic-smithi/5759282/teuthology.log
|
||||
|
||||
Generally, a job failure is recorded in teuthology log as a Traceback which gets
|
||||
added to job summary.
|
||||
While analyzing a job failure, we generally start looking for ``Traceback``
|
||||
keyword and further see the call stack and logs that might had lead to failure.
|
||||
Most of the time, traceback will also be including the failing command.
|
||||
A job failure is recorded in the teuthology log as a Traceback and is
|
||||
added to the job summary.
|
||||
|
||||
To analyze a job failure, locate the ``Traceback`` keyword and examine the call
|
||||
stack and logs for issues that caused the failure. Usually the traceback
|
||||
will include the command that failed.
|
||||
|
||||
.. note:: the teuthology logs are deleted every once in a while, if you are
|
||||
unable to access example link, please feel free to refer any other case from
|
||||
http://pulpito.front.sepia.ceph.com/
|
||||
unable to access example link, please feel free to refer any other
|
||||
case from http://pulpito.front.sepia.ceph.com/
|
||||
|
||||
Reporting the Issue
|
||||
-------------------
|
||||
|
||||
Once the cause of failure is triaged, and is something which might not be
|
||||
related to the developer's code change, this indicates that it might be a
|
||||
generic failure for the upstream branch (in our case octopus), in which case, we
|
||||
look for related failure keywords on https://tracker.ceph.com/.
|
||||
If a similar issue has been reported via a tracker.ceph.com ticket, please add
|
||||
any relevant feedback to it. Otherwise, please create a new tracker ticket for
|
||||
it. If you are not familiar with the cause of failure, someone else will look at
|
||||
it.
|
||||
After you have triaged the cause of the failure and you have determined that the
|
||||
failure was not caused by the developer's code change, this might indicate a
|
||||
known failure for the upstream branch (in our case, the upstream branch is
|
||||
octopus). If the failure was not caused by a developer's code change, go to
|
||||
https://tracker.ceph.com and look for tracker issues related to the failure by using keywords spotted in the failure under investigation.
|
||||
|
||||
If a similar issue has been reported via a tracker.ceph.com ticket, add to it a
|
||||
link to the new test run and any relevant feedback. If you don't find a ticket
|
||||
referring to an issue similar to the one that you have discovered, create a new
|
||||
tracker ticket for it. If you are not familiar with the cause of failure, ask
|
||||
one of the team members for help.
|
||||
|
||||
Debugging an issue using interactive-on-error
|
||||
---------------------------------------------
|
||||
|
||||
To investigate an issue, the first step would be to try to reproduce it, for
|
||||
that purpose. For this purpose you can run a job similar to the failed job,
|
||||
using `interactive-on-error`_ mode in teuthology::
|
||||
It is important to be able to reproduce an issue when investigating its cause.
|
||||
Run a job similar to the failed job, using the `interactive-on-error`_ mode in
|
||||
teuthology::
|
||||
|
||||
ideepika@teuthology:~/teuthology$ ./virtualenv/bin/teuthology -v --lock --block $<your-config-yaml> --interactive-on-error
|
||||
|
||||
we can either have a `custom config.yaml`_ or use the one from failed job; for
|
||||
which copy the ``orig.config.yaml`` to your local dir and change the `testing
|
||||
priority`_ accordingly, which would look like::
|
||||
For this job, use either `custom config.yaml`_ or the yaml file from
|
||||
the failed job. If you intend to use the yaml file from the failed job, copy
|
||||
``orig.config.yaml`` to your local dir and change the `testing priority`_
|
||||
accordingly, like so::
|
||||
|
||||
ideepika@teuthology:~/teuthology$ cp /a/teuthology-2021-01-06_07:01:02-rados-master-distro-basic-smithi/5759282/orig.config.yaml test.yaml
|
||||
ideepika@teuthology:~/teuthology$ ./virtualenv/bin/teuthology -v --lock --block test.yaml --interactive-on-error
|
||||
|
||||
|
||||
Teuthology will then lock the machines required by the ``config.yaml``, when
|
||||
their is job failure, which halts at an interactive python session which let's
|
||||
us investigate the ctx values and the targets via sshing into them, once we have
|
||||
In the event of job failure, teuthology will lock the machines required by
|
||||
``config.yaml``. Teuthology will halt at an interactive python session.
|
||||
By sshing into the targets, we can investigate their ctx values. After we have
|
||||
investigated the system, we can manually terminate the session and let
|
||||
teuthology cleanup.
|
||||
teuthology clean the session up.
|
||||
|
||||
Suggested Resources
|
||||
--------------------
|
||||
@ -121,3 +133,4 @@ Suggested Resources
|
||||
.. _interactive-on-error: https://docs.ceph.com/projects/teuthology/en/latest/detailed_test_config.html#troubleshooting
|
||||
.. _custom config.yaml: https://docs.ceph.com/projects/teuthology/en/latest/detailed_test_config.html#test-configuration
|
||||
.. _testing priority: ../tests-integration-testing-teuthology-intro/#testing-priority
|
||||
.. _thrash: https://github.com/ceph/ceph/tree/master/qa/suites/rados/thrash
|
||||
|
@ -4,8 +4,8 @@ Testing - Integration Tests - Introduction
|
||||
==========================================
|
||||
|
||||
Ceph has two types of tests: :ref:`make check <make-check>` tests and
|
||||
integration tests. When a test requires multiple machines, root access or lasts
|
||||
for a longer time (for example, to simulate a realistic Ceph deployment), it is
|
||||
integration tests. When a test requires multiple machines, root access, or lasts
|
||||
for a long time (for example, to simulate a realistic Ceph workload), it is
|
||||
deemed to be an integration test. Integration tests are organized into "suites",
|
||||
which are defined in the `ceph/qa sub-directory`_ and run with the
|
||||
``teuthology-suite`` command.
|
||||
@ -39,10 +39,10 @@ branch and the stable branches). Traditionally, these tests are called "the
|
||||
nightlies" because the Ceph core developers used to live and work in
|
||||
the same time zone and from their perspective the tests were run overnight.
|
||||
|
||||
The results of the nightlies are published at http://pulpito.ceph.com/. The
|
||||
developer nick shows in the test results URL and in the first column of the
|
||||
Pulpito dashboard. The results are also reported on the `ceph-qa mailing list
|
||||
<https://ceph.com/irc/>`_ for analysis.
|
||||
The results of nightly test runs are published at http://pulpito.ceph.com/
|
||||
under the user ``teuthology``. The developer nick appears in URL of the the
|
||||
test results and in the first column of the Pulpito dashboard. The results are
|
||||
also reported on the `ceph-qa mailing list <https://ceph.com/irc/>`_.
|
||||
|
||||
Testing Priority
|
||||
----------------
|
||||
@ -79,10 +79,9 @@ Job priority should be selected based on the following recommendations:
|
||||
* **200 <= Priority < 1000:** Use this priority for large test runs that can
|
||||
be done over the course of a week.
|
||||
|
||||
In case you don't know how many jobs would be triggered by ``teuthology-suite``
|
||||
command, use ``--dry-run`` to get a count first and then issue
|
||||
``teuthology-suite`` command again, this time without ``--dry-run`` and with
|
||||
``-p`` and an appropriate number as an argument to it.
|
||||
To learn how many jobs the ``teuthology-suite`` command will trigger, use the
|
||||
``--dry-run`` flag. If you are happy with the number of jobs, issue the ``teuthology-suite`` command again without
|
||||
``--dry-run`` and with ``-p`` and an appropriate number as an argument.
|
||||
|
||||
To skip the priority check, use ``--force-priority``. In order to be sensitive
|
||||
to the runs of other developers who also need to do testing, please use it in
|
||||
@ -92,14 +91,14 @@ Suites Inventory
|
||||
----------------
|
||||
|
||||
The ``suites`` directory of the `ceph/qa sub-directory`_ contains all the
|
||||
integration tests, for all the Ceph components.
|
||||
integration tests for all the Ceph components.
|
||||
|
||||
`ceph-deploy <https://github.com/ceph/ceph/tree/master/qa/suites/ceph-deploy>`_
|
||||
install a Ceph cluster with ``ceph-deploy`` (`ceph-deploy man page`_)
|
||||
|
||||
`dummy <https://github.com/ceph/ceph/tree/master/qa/suites/dummy>`_
|
||||
get a machine, do nothing and return success (commonly used to
|
||||
verify the integration testing infrastructure works as expected)
|
||||
verify that the integration testing infrastructure works as expected)
|
||||
|
||||
`fs <https://github.com/ceph/ceph/tree/master/qa/suites/fs>`_
|
||||
test CephFS mounted using FUSE
|
||||
@ -145,20 +144,20 @@ teuthology-describe-tests
|
||||
``teuthology-describe`` was added to the `teuthology framework`_ to facilitate
|
||||
documentation and better understanding of integration tests.
|
||||
|
||||
The upshot is that tests can be documented by embedding ``meta:``
|
||||
annotations in the yaml files used to define the tests. The results can be
|
||||
seen in the `teuthology-desribe usecases`_
|
||||
Tests can be documented by embedding ``meta:`` annotations in the yaml files
|
||||
used to define the tests. The results can be seen in the `teuthology-desribe
|
||||
usecases`_
|
||||
|
||||
Since this is a new feature, many yaml files have yet to be annotated.
|
||||
Developers are encouraged to improve the documentation, in terms of both
|
||||
coverage and quality.
|
||||
Developers are encouraged to improve the coverage and the quality of the
|
||||
documentation.
|
||||
|
||||
How integration tests are run
|
||||
-----------------------------
|
||||
|
||||
Given that - as a new Ceph developer - you will typically not have access
|
||||
to the `Sepia lab`_, you may rightly ask how you can run the integration
|
||||
tests in your own environment.
|
||||
As a new Ceph developer you will probably not have access to the `Sepia lab`_.
|
||||
You might however be able to run some integration tests in your own
|
||||
environment. Ask members from the relevant team how to do this.
|
||||
|
||||
One option is to set up a teuthology cluster on bare metal. Though this is a
|
||||
non-trivial task, it `is` possible. Here are `some notes
|
||||
|
@ -53,13 +53,8 @@ teuthology. This procedure explains how to run tests using teuthology.
|
||||
|
||||
ssh <username>@teuthology.front.sepia.ceph.com
|
||||
|
||||
This requires that you have access to the Sepia lab. Learn about requesting
|
||||
access here:
|
||||
|
||||
https://ceph.github.io/sepia/adding_users/
|
||||
|
||||
#. Install teuthology in a virtual environment and activate that virtual
|
||||
environment. Follow the relevant instructions in `Running Your First Test`_.
|
||||
This requires Sepia lab access. To request access to the Sepia lab, see:
|
||||
https://ceph.github.io/sepia/adding_users/
|
||||
|
||||
#. Run the ``teuthology-suite`` command:
|
||||
|
||||
@ -112,6 +107,11 @@ teuthology. This procedure explains how to run tests using teuthology.
|
||||
|
||||
|
||||
|
||||
Other frequently used/useful options are ``-d`` (or ``--distro``),
|
||||
``--distroversion``, ``--filter-out``, ``--timeout``, ``flavor``, ``-rerun``,
|
||||
``-l`` (for limiting number of jobs) , ``-n`` (for how many times the job will
|
||||
run). Run ``teuthology-suite --help`` to read descriptions of these and other
|
||||
options.
|
||||
|
||||
.. _teuthology_testing_qa_changes:
|
||||
|
||||
@ -124,12 +124,14 @@ rebuild the binaries before you re-run tests. If you make changes only in
|
||||
You just have to make sure to tell the ``teuthology-suite`` command to use a
|
||||
separate branch for running the tests.
|
||||
|
||||
The separate branch can be passed to the command by using ``--suite-repo`` and
|
||||
``--suite-branch``. The first option (``--suite-repo``) accepts the link to the GitHub fork where your PR branch exists and the second option (``--suite-branch``) accepts the name of the PR branch.
|
||||
If you made changes only in ``qa/``
|
||||
(https://github.com/ceph/ceph/tree/master/qa), you do not need to rebuild the
|
||||
binaries. You can use existing binaries that are built periodically for master and other stable branches and run your test changes against them.
|
||||
Your branch with the qa changes can be tested by passing two extra arguments to the ``teuthology-suite`` command: (1) ``--suite-repo``, specifying your ceph repo, and (2) ``--suite-branch``, specifying your branch name.
|
||||
|
||||
For example, if you want to make changes in ``qa/`` after testing ``branch-x``
|
||||
(which shows up in ceph-ci as a branch named ``wip-username-branch-x``), you
|
||||
can do so by running following command:
|
||||
(for which the ceph-ci branch is ``wip-username-branch-x``), run the following
|
||||
command::
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
@ -191,28 +193,26 @@ Viewing Tests Results
|
||||
Pulpito Dashboard
|
||||
*****************
|
||||
|
||||
Once the teuthology job is scheduled, the status/results for test run could
|
||||
be checked from https://pulpito.ceph.com/.
|
||||
It could be used for quickly checking out job logs, their status, etc.
|
||||
After the teuthology job is scheduled, the status and results of the test run
|
||||
can be checked at https://pulpito.ceph.com/.
|
||||
|
||||
Teuthology Archives
|
||||
*******************
|
||||
|
||||
Once the tests have finished running, the log for the job can be obtained by
|
||||
clicking on job ID at the Pulpito page for your tests. It's more convenient to
|
||||
download the log and then view it rather than viewing it in an internet browser
|
||||
since these logs can easily be up to size of 1 GB. It is easier to
|
||||
ssh into the teuthology machine again (``teuthology.front.sepia.ceph.com``), and
|
||||
access the following path::
|
||||
After the tests have finished running, the log for the job can be obtained by
|
||||
clicking on the job ID at the Pulpito page associated with your tests. It's
|
||||
more convenient to download the log and then view it rather than viewing it in
|
||||
an internet browser since these logs can easily be up to 1 GB in size. It is
|
||||
easier to ssh into the teuthology machine (``teuthology.front.sepia.ceph.com``)
|
||||
and access the following path::
|
||||
|
||||
/ceph/teuthology-archive/<test-id>/<job-id>/teuthology.log
|
||||
|
||||
For example, for above test ID path is::
|
||||
For example: for the above test ID, the path is::
|
||||
|
||||
/ceph/teuthology-archive/teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi/4588482/teuthology.log
|
||||
|
||||
This way the log can be viewed remotely without having to wait too
|
||||
much.
|
||||
This method can be used to view the log more quickly than would be possible through a browser.
|
||||
|
||||
.. note:: To access archives more conveniently, ``/a/`` has been symbolically
|
||||
linked to ``/ceph/teuthology-archive/``. For instance, to access the previous
|
||||
@ -222,13 +222,15 @@ much.
|
||||
|
||||
Killing Tests
|
||||
-------------
|
||||
Sometimes a teuthology job might not complete running for several minutes or
|
||||
even hours after tests that were trigged have completed running and other
|
||||
times wrong set of tests can be triggered is filter wasn't chosen carefully.
|
||||
To save resource it's better to termniate such a job. Following is the command
|
||||
to terminate a job::
|
||||
``teuthology-kill`` can be used to kill jobs that have been running
|
||||
unexpectedly for several hours, or when developers want to terminate tests
|
||||
before they complete.
|
||||
|
||||
teuthology-kill -r teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi
|
||||
Here is the command that terminates jobs:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
teuthology-kill -r teuthology-2019-12-10_05:00:03-smoke-master-testing-basic-smithi
|
||||
|
||||
Let's call the argument passed to ``-r`` as test ID. It can be found
|
||||
easily in the link to the Pulpito page for the tests you triggered. For
|
||||
|
Loading…
Reference in New Issue
Block a user