Commit Graph

170 Commits

Author SHA1 Message Date
Neha Ojha
df7adbf387 qa/tasks/ceph_manager.py: remove redundant quorum status logging
2020-10-21T03:42:45.985 INFO:teuthology.orchestra.run.smithi114:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph quorum_status
2020-10-21T03:42:58.574 INFO:teuthology.orchestra.run.smithi114.stdout:{"election_epoch":1650,"quorum":[0,2],"quorum_names":["a","c"],"quorum_leader_name":"a","quorum_age":0,"features":{"quorum_con":"4540138297136906239","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"]},"monmap":{"epoch":1,"fsid":"807c36f1-9e85-4fa3-81fc-95915ab50584","modified":"2020-10-21T00:34:48.421341Z","created":"2020-10-21T00:34:48.421341Z","min_mon_release":16,"min_mon_release_name":"pacific","election_strategy":3,"disallowed_leaders":"","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.114:6789","nonce":0}]},"addr":"172.21.15.114:6789/0","public_addr":"172.21.15.114:6789/0","priority":0,"weight":0},{"rank":1,"name":"b","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.133:6789","nonce":0}]},"addr":"172.21.15.133:6789/0","public_addr":"172.21.15.133:6789/0","priority":0,"weight":0},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.114:6790","nonce":0}]},"addr":"172.21.15.114:6790/0","public_addr":"172.21.15.114:6790/0","priority":0,"weight":0}]}}
2020-10-21T03:42:58.589 INFO:tasks.mon_thrash.ceph_manager:quorum_status is {"election_epoch":1650,"quorum":[0,2],"quorum_names":["a","c"],"quorum_leader_name":"a","quorum_age":0,"features":{"quorum_con":"4540138297136906239","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"]},"monmap":{"epoch":1,"fsid":"807c36f1-9e85-4fa3-81fc-95915ab50584","modified":"2020-10-21T00:34:48.421341Z","created":"2020-10-21T00:34:48.421341Z","min_mon_release":16,"min_mon_release_name":"pacific","election_strategy":3,"disallowed_leaders":"","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.114:6789","nonce":0}]},"addr":"172.21.15.114:6789/0","public_addr":"172.21.15.114:6789/0","priority":0,"weight":0},{"rank":1,"name":"b","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.133:6789","nonce":0}]},"addr":"172.21.15.133:6789/0","public_addr":"172.21.15.133:6789/0","priority":0,"weight":0},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.114:6790","nonce":0}]},"addr":"172.21.15.114:6790/0","public_addr":"172.21.15.114:6790/0","priority":0,"weight":0}]}}

Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-10-27 21:14:54 +00:00
Changcheng Liu
dbdcb2535d common: remove log_early configuration option
After deciding to always enable tracking log in early phase, there's no
need to keep "log_early" option here and remove it directly.

Suggested-by: Kefu Chai <kefu@redhat.com>
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
2020-10-19 14:30:28 +08:00
Neha Ojha
e7eddec5a0 qa/tasks/ceph_manager.py: remove redundant check in raw_cluster_cmd_result
Fixes 530982129e. The check for cephadm is no
longer needed since it was moved to run_cluster_cmd.

Fixes: https://tracker.ceph.com/issues/47239
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-09-15 17:56:33 +00:00
Kefu Chai
eda90040ad qa: always use subprocess.{DEVNULL,check_output}
no need to check for their existence, and prepare a replacement.
because we've migrated to python3. and we only support python3.6 and up.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-09-03 13:09:16 +08:00
Sage Weil
dfd01d7653 blacklist -> blocklist
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-08-24 19:53:08 +00:00
Rishabh Dave
530982129e qa: add method run ceph cluster command with better interface
This new method should allow better control on the process launched by
the passed command. This is achieved by allowing arguments provided by
teuthology.orchestra.run.run().

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2020-08-21 22:16:21 +05:30
Kefu Chai
a7f18e46b9 qa/tasks/{ceph,ceph_manager}: drop py2 support
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-07-05 10:58:28 +08:00
Patrick Donnelly
af4d4ee6f1
Merge PR #35522 into master
* refs/pull/35522/head:
	vstart_runner: set default values of stdout and stderr to None

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2020-06-24 11:34:04 -07:00
Kefu Chai
2fa726b88c qa/tasks: flake8 fixes
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-06-23 23:00:56 +08:00
Rishabh Dave
cc8f15818a vstart_runner: set default values of stdout and stderr to None
Not doing so leads to tests run successfully with vstart_runner.py but
crash when triggered with teuthology since the default values of these
variables there is None.

Fixes: https://tracker.ceph.com/issues/45815
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2020-06-17 14:42:53 +05:30
Neha Ojha
4deba4e8bd qa/tasks/ceph_manager.py: dump more useful info before failing
Add helpers that dump information only about PGs that haven't reached
the desired state when we fail. Previously we dumped the output of
"ceph pg dump" before failing, which prints a lot of unnecessary information
about PGs that are not responsible for the failure, making debugging harder.

Also, try to make the failure messages distinct.

Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-06-11 15:22:04 +00:00
Kefu Chai
6bc09c5041 qa/tasks/ceph_manager.py: do not return a filter
as the caller might want to `len(manager.get_osd_status()['raw'])`, and
`len()` does not accept a `filter` object.

also, the filtered osd statuses are printed out using `self.log()`, so
we should materialize the `filter` object before sending it to logging
facility. otherwise we will have something like:
```
2020-04-08T02:58:37.001 INFO:tasks.ceph.ceph_manager.ceph:<filter object at 0x7f5a080e1518>
```
in the logging message.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 11:02:29 +08:00
Kefu Chai
8bfe977854 qa/tasks: use StringIO for capturing string output
see d8d44ed156

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-07 21:51:22 +08:00
Kefu Chai
9ca45bd942 qa/tasks: do not random.choice(a_view)
use `random.sample()` instead of `random.choice(list(a_view))` for better performance.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-07 20:33:47 +08:00
Kefu Chai
d7258ea7fd qa/tasks: use next(iter(..)) for accessing first element in a view
in python2, dict.values() and dict.keys() return lists. but in python3,
they return views, which cannot be indexed directly using an integer index.

there are three use cases when we access these views in python3:

1. get the first element
2. get all the elements and then *might* want to access them by index
3. get the first element assuming there is only a single element in
   the view
4. iterate thru the view

in the 1st case, we cannot assume the number of elements, so to be
python3 compatible, we should use `next(iter(a_dict))` instead.

in the 2nd case, in this change, the view is materialized using
`list(a_dict)`.

in the 3rd case, we can just continue using the short hand of
```py
(first_element,) = a_dict.keys()
```
to unpack the view. this works in both python2 and python3.

in the 4th case, the existing code works in both python2 and python3, as
both list and view can be iterated using `iter`, and `len` works as
well.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-07 20:33:47 +08:00
Kefu Chai
9039db5962
Merge pull request #33805 from tchaikov/wip-44500
qa/tasks/ceph_manager: capture stderr for COT

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
2020-03-10 21:26:29 +08:00
Kefu Chai
d8d44ed156 qa/tasks/ceph_manager: use StringIO for capturing COT output
there are couple factors we should consider when choosing between
BytesIO and StringIO:

- if the producer is producing binary
- if we are expecting binary
- if the layers in between them are doing the decoding/encoding
  automatically.

in our case, the producer is either the ChannelFile instances returned
by paramiko.SSHClient or subprocess.CompletedProcess insances returned
by subprocess.run(). the former are file-like objects opened in "r" mode,
but their contents are decoded with utf-8 when reading if
ChannelFile.FLAG_BINARY is not specified. that's why we always try to
add this flag in orchestra/run.py when collecting the stdout and stderr
from paramiko.SSHClient after executing a command.

back in python2, this works just fine. as we don't differentiate bytes
from str by then.

but in python3, we have to make a decision. in the case of
ceph-objectstore-tool (COT for short), it does not produce binary and
we don't check its output with binary, so, if neither Remote.run() nor
LocalRemote.run() decodes/encodes for us, it's fine.

so it boils down to `copy_to_log()`:

i think we we should respect the consumer's expectation, and only decode
the output if a StringIO is passed in as stdout or stderr.

as we always log the output with logging we could either set
`ChannelFile.FLAG_BINARY` depending on the type of `capture` or not.
if it's not set, paramiko will return str (bytes) on python2, and str on
python3. if it's not set paramiko will return str (bytes) on python2,
and bytes on python3.

if there is non-ASCII in the output, logging will bail fail with
`UnicodeDecodeError` exception. and paramiko throws the same exception
when trying to decode for us if `ChannelFile.FLAG_BINARY` is not
specified.

so to ensure that we always have logging messages no matter if the
producer follows the rule of "use StringIO if you only emit text" or
not, we have to use `ChannelFile.FLAG_BINARY`, and force paramiko
to send us the bytes. but we still have the luxury to use StringIO
and do the decode when the caller asks for str explicitly. that'd save
the pain of using `str.decode()` or `six.ensure_str()` everywhere
even if we can assure that the program does not write binary.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-03-09 10:47:48 +08:00
Kefu Chai
78308f7207 qa/tasks/ceph_manager: capture stderr for COT
as we are expecting the error message written to stderr, and we need to
check for the error messages in it.

this change addresses the regression introduced by
204ceee156

Fixes: https://tracker.ceph.com/issues/44500
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-03-08 14:43:13 +08:00
Sage Weil
96220c0c05 qa/tasks/cephadm: put bootstrap config etc directly in /etc/ceph
This puts the conf and keyring in /etc/ceph earlier rather than later,
making them useful for debugging a live system *during* bootstrap.  It's
also less code.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-07 15:18:45 -06:00
Thomas Bechtold
46e22c422b qa: Enable basic mypy support for qa/ directory
A first step to do more automatic code checks on the qa/
directory. This is useful while transitioning to python3.

Also use log_exc to top-level to not run into:

error: Argument 1 to "log_exc" has incompatible type
  "Callable[[OSDThrasher], Any]"; expected "OSDThrasher"

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2020-03-05 06:54:56 +01:00
Kyr Shatskyy
4c992baf25 qa/tasks/ceph_manager: ensure str for py3 compat
Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
2020-03-04 13:09:17 +08:00
Kyr Shatskyy
e46eb8348e qa/tasks: fix imports for py3 compatibility
Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
2020-03-04 13:09:16 +08:00
Kyr Shatskyy
204ceee156 qa/tasks/ceph_manager: get rid of CStringIO for py3
Use io.BytesIO instead cStringIO.StringIO for py3 compatibility

Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
2020-03-04 13:09:16 +08:00
Sage Weil
b66f5df514 Merge PR #32986 into master
* refs/pull/32986/head:
	qa/tasks/ceph_manager: fix movement of cot exports with cephadm

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-02-01 10:47:56 -06:00
Sage Weil
d8a7c73a48 Merge PR #32987 into master
* refs/pull/32987/head:
	qa/tasks/ceph_manager: make fix_pgp_num behave when no pool is found

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-31 17:40:23 -06:00
Sage Weil
42768600d4 qa/tasks/ceph_manager: fix movement of cot exports with cephadm
I think this will finally work...

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-31 17:26:10 -06:00
Sage Weil
8c87110b54 qa/tasks/ceph_manager: add --log-early to raw_cluster_cmd
This is harmless if logging is low, but adds useful info when it is turned
up.

Hunting bug https://tracker.ceph.com/issues/43914

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-30 10:36:28 -06:00
Sage Weil
7d0a789b1b qa/tasks/ceph_manager: make fix_pgp_num behave when no pool is found
Fixes:

2020-01-30T04:41:24.697 INFO:tasks.thrashosds.thrasher:fixing pg num pool None
2020-01-30T04:41:24.698 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1070, in wrapper
    return func(self)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1200, in _do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 768, in fix_pgp_num
    if self.ceph_manager.set_pool_pgpnum(pool, force):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 2088, in set_pool_pgpnum
    assert isinstance(pool_name, six.string_types)
AssertionError

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-30 08:32:56 -06:00
Sage Weil
9a4dd1fb3d qa/tasks/ceph_manager: fix chmod on log dir during pg export copy
With cephadm, we should chmod both /var/log/ceph and /var/log/ceph/$fsid.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-28 13:54:18 -06:00
Sage Weil
29d3eaa1a3 qa/tasks/ceph_manager: kludge around /var/log/ceph permissions
The ceph.py task normally makes these permissive.  But a package upgrade
can reset the permissions so that we can't read and write the temp
export files.  (We put them in these dirs now because it's alreadly
mapped out of cephadm containers to the host.)

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-23 21:25:08 -06:00
Sage Weil
089e97c270 Merge PR #32725 into master
* refs/pull/32725/head:
	qa/tasks/ceph_manager: fix revive_osd path
	qa/tasks/ceph_manager: fix shell osd for ceph-objectstore-tool commands

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2020-01-20 22:20:25 -06:00
Sage Weil
3005d24001 qa/tasks/ceph_manager: fix revive_osd path
This was broken since it was introduced in b02e2f6cf2
a year and a half ago...

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 11:23:13 -06:00
Sage Weil
9600152593 qa/tasks/ceph_manager: fix post-osd-kill pg peered check
This was asserting that all PGs are active or peered, but that assertion
could fail if the concurrent workload created a new pool.

Switch to a loop that checks several times for the condition to be true.

Fixes: https://tracker.ceph.com/issues/43656
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 09:47:36 -06:00
Sage Weil
0aee4e5f04 qa/tasks/ceph_manager: fix shell osd for ceph-objectstore-tool commands
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-19 15:32:25 -06:00
Sage Weil
0bdaf4b953 qa/tasks/ceph_manager: fix ceph-objectstore-tool calls
Pass the correct paths based on whether this is the importing or exporting
OSD.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-17 17:08:00 -06:00
Sage Weil
878e27ee0d qa/tasks/ceph_manager: fix admin_socket remote when using cephadm
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-17 17:07:40 -06:00
Sage Weil
81c735fc9c qa/tasks/ceph_manager: --no-mon-config to ceph-objectstore-tool
The config is currently fetched at osd.admin, so the keyring is not
found.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-17 17:07:40 -06:00
Sage Weil
7fefcdb6d3 qa/tasks/ceph_manager: fix filestore split command
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-17 17:07:40 -06:00
Sage Weil
60c2b8e800 qa/tasks/ceph_manager: fix import line
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-17 17:07:39 -06:00
Sage Weil
2739ac4a60 qa/tasks/ceph_manager: enable ceph-objectstore-tool via cephadm
- drop support for keyvaluestore
- leave a few paths non-cephadm specific (filestore, upgrade workaround)

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-15 07:53:01 -06:00
Kefu Chai
e67034b95a qa/tasks/ceph_manager: do not pick a pool is there is no pools
random.choice(seq) raises IndexError if seq is empty. we cannot ensure
there is always one or more pools in the cluster while using pool
related thrasher. so skip the thrasher action if there is no pools at
that moment.

Fixes: https://tracker.ceph.com/issues/43412
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-01-09 10:26:32 +08:00
Kefu Chai
4c6a5798c2
Merge pull request #32222 from toabctl/qa-flake8-py3
qa: Run flake8 on python2 and python3

Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-12-24 10:47:07 +08:00
Sage Weil
4e57785e74 qa/tasks/ceph_manager: asok commands via cephadm shell
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-23 13:59:26 -06:00
Sage Weil
3268ec7ac8 Merge PR #32252 into master
* refs/pull/32252/head:
	qa/cephfs/begin: libaio-devel on el8
	qa/tasks: nosetests -> python -m nose
	qa/tasks/rbd_fio: fio 2.21 -> 3.16
	src/test/cli-integration/rbd/snap-diff.t: python -> python
	qa/workunits: use nose 3
	qa/tasks/cbt: install python3 deps
	qa/tasks/ceph_manager.py: do not use python to write a file
	test/pybind/test_rados: execute takes a bytes (not str) payload
	qa/packages/packages: python[3]-ceph is no more
	qa: use python3 for venvs etc
	packaging: remove python3-ipaddres, as it is part of the stdlib in py3
	qa/packages: python-ceph -> python3-ceph
	qa/distros: centos7 -> centos8, rhel7 -> rhel8
	spec: remove _python_buildid in favor of python3_pkgversion macro
	spec: remove python2 packages and conditions
	debian: remove python >= 2.7 requirement
	debian: add mgr python versions
	debian: explicitly set PYTHON2=OFF to prevent picking up python2 interpreter
	debian: update control file to use python3 dependency names
	debian: remove all python2 overrides and declarations
	debian: remove all python2 install files

Reviewed-by: Alfredo Deza <adeza@redhat.com>
2019-12-17 15:23:27 -06:00
Sage Weil
d4f4a2cbd8 qa/tasks/ceph_manager.py: do not use python to write a file
/usr/bin/python dne on el8, /usr/bin/python3 dne on el7.  But
all we need to do is write a file--we can do that with tee.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-13 12:44:42 -06:00
Thomas Bechtold
bdcc94a1d1 qa: Run flake8 on python2 and python3
To be able to catch problems with python2 *and* python3, run flake8
with both versions. From the flake8 homepage:

It is very important to install Flake8 on the correct version of
Python for your needs. If you want Flake8 to properly parse new
language features in Python 3.5 (for example), you need it to be
installed on 3.5 for Flake8 to understand those features. In many
ways, Flake8 is tied to the version of Python on which it runs.

Also fix the problems with python3 on the way.
Note: This requires now the six module for teuthology. But this is
already an install_require in teuthology itself.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-12-13 09:24:20 +01:00
Thomas Bechtold
0127cd1e88 qa: Enable flake8 tox and fix failures
There were a couple of problems found by flake8 in the qa/
directory (most of them fixed now). Enabling flake8 during the usual
check runs hopefully avoids adding new issues in the future.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-12-12 10:21:01 +01:00
Sage Weil
51ecc1b922 qa/tasks: ceph-daemon -> cephadm throughput var names and comments
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Patrick Donnelly
e8368d61be
Merge PR #29421 into master
* refs/pull/29421/head:
	qa/cephfs: add tests for ACLs
	qa/cephfs: allow running tests from xfstests-dev
	qa/tasks: add methods to get monitor's sockets
	qa/cephfs: don't crash if mountpoint dir is already deleted
	vstart_runner.py: set omit_sudo's default value to False
	qa/vstart_runner.py: fix get_keyring_path()
	qa/cephfs: don't abort if mountpoint is already present
	qa/cephfs: allow specifying mountpoint for kernel mounts
	qa/cephfs: allow specifying mountpoints for FUSE mounts
	qa/vstart_runner.py: allow specifying mountpoint for local FUSE mounts
	qa/mount.py: allow setting mountpoint
	qa/vstart_runner.py: add a method to create a temporary directory

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-12-05 13:25:03 -08:00
Patrick Donnelly
fcc9bf10a1
Merge PR #31428 into master
* refs/pull/31428/head:
	qa/tasks: Fixed AttributeError: can't set attribute
	qa/tasks: drop/update name from Thrasher

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-12-04 14:55:17 -08:00