After deciding to always enable tracking log in early phase, there's no
need to keep "log_early" option here and remove it directly.
Suggested-by: Kefu Chai <kefu@redhat.com>
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
no need to check for their existence, and prepare a replacement.
because we've migrated to python3. and we only support python3.6 and up.
Signed-off-by: Kefu Chai <kchai@redhat.com>
This new method should allow better control on the process launched by
the passed command. This is achieved by allowing arguments provided by
teuthology.orchestra.run.run().
Signed-off-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/35522/head:
vstart_runner: set default values of stdout and stderr to None
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Not doing so leads to tests run successfully with vstart_runner.py but
crash when triggered with teuthology since the default values of these
variables there is None.
Fixes: https://tracker.ceph.com/issues/45815
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add helpers that dump information only about PGs that haven't reached
the desired state when we fail. Previously we dumped the output of
"ceph pg dump" before failing, which prints a lot of unnecessary information
about PGs that are not responsible for the failure, making debugging harder.
Also, try to make the failure messages distinct.
Signed-off-by: Neha Ojha <nojha@redhat.com>
as the caller might want to `len(manager.get_osd_status()['raw'])`, and
`len()` does not accept a `filter` object.
also, the filtered osd statuses are printed out using `self.log()`, so
we should materialize the `filter` object before sending it to logging
facility. otherwise we will have something like:
```
2020-04-08T02:58:37.001 INFO:tasks.ceph.ceph_manager.ceph:<filter object at 0x7f5a080e1518>
```
in the logging message.
Signed-off-by: Kefu Chai <kchai@redhat.com>
in python2, dict.values() and dict.keys() return lists. but in python3,
they return views, which cannot be indexed directly using an integer index.
there are three use cases when we access these views in python3:
1. get the first element
2. get all the elements and then *might* want to access them by index
3. get the first element assuming there is only a single element in
the view
4. iterate thru the view
in the 1st case, we cannot assume the number of elements, so to be
python3 compatible, we should use `next(iter(a_dict))` instead.
in the 2nd case, in this change, the view is materialized using
`list(a_dict)`.
in the 3rd case, we can just continue using the short hand of
```py
(first_element,) = a_dict.keys()
```
to unpack the view. this works in both python2 and python3.
in the 4th case, the existing code works in both python2 and python3, as
both list and view can be iterated using `iter`, and `len` works as
well.
Signed-off-by: Kefu Chai <kchai@redhat.com>
there are couple factors we should consider when choosing between
BytesIO and StringIO:
- if the producer is producing binary
- if we are expecting binary
- if the layers in between them are doing the decoding/encoding
automatically.
in our case, the producer is either the ChannelFile instances returned
by paramiko.SSHClient or subprocess.CompletedProcess insances returned
by subprocess.run(). the former are file-like objects opened in "r" mode,
but their contents are decoded with utf-8 when reading if
ChannelFile.FLAG_BINARY is not specified. that's why we always try to
add this flag in orchestra/run.py when collecting the stdout and stderr
from paramiko.SSHClient after executing a command.
back in python2, this works just fine. as we don't differentiate bytes
from str by then.
but in python3, we have to make a decision. in the case of
ceph-objectstore-tool (COT for short), it does not produce binary and
we don't check its output with binary, so, if neither Remote.run() nor
LocalRemote.run() decodes/encodes for us, it's fine.
so it boils down to `copy_to_log()`:
i think we we should respect the consumer's expectation, and only decode
the output if a StringIO is passed in as stdout or stderr.
as we always log the output with logging we could either set
`ChannelFile.FLAG_BINARY` depending on the type of `capture` or not.
if it's not set, paramiko will return str (bytes) on python2, and str on
python3. if it's not set paramiko will return str (bytes) on python2,
and bytes on python3.
if there is non-ASCII in the output, logging will bail fail with
`UnicodeDecodeError` exception. and paramiko throws the same exception
when trying to decode for us if `ChannelFile.FLAG_BINARY` is not
specified.
so to ensure that we always have logging messages no matter if the
producer follows the rule of "use StringIO if you only emit text" or
not, we have to use `ChannelFile.FLAG_BINARY`, and force paramiko
to send us the bytes. but we still have the luxury to use StringIO
and do the decode when the caller asks for str explicitly. that'd save
the pain of using `str.decode()` or `six.ensure_str()` everywhere
even if we can assure that the program does not write binary.
Signed-off-by: Kefu Chai <kchai@redhat.com>
as we are expecting the error message written to stderr, and we need to
check for the error messages in it.
this change addresses the regression introduced by
204ceee156
Fixes: https://tracker.ceph.com/issues/44500
Signed-off-by: Kefu Chai <kchai@redhat.com>
This puts the conf and keyring in /etc/ceph earlier rather than later,
making them useful for debugging a live system *during* bootstrap. It's
also less code.
Signed-off-by: Sage Weil <sage@redhat.com>
A first step to do more automatic code checks on the qa/
directory. This is useful while transitioning to python3.
Also use log_exc to top-level to not run into:
error: Argument 1 to "log_exc" has incompatible type
"Callable[[OSDThrasher], Any]"; expected "OSDThrasher"
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
This is harmless if logging is low, but adds useful info when it is turned
up.
Hunting bug https://tracker.ceph.com/issues/43914
Signed-off-by: Sage Weil <sage@redhat.com>
Fixes:
2020-01-30T04:41:24.697 INFO:tasks.thrashosds.thrasher:fixing pg num pool None
2020-01-30T04:41:24.698 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1070, in wrapper
return func(self)
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1200, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 768, in fix_pgp_num
if self.ceph_manager.set_pool_pgpnum(pool, force):
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 2088, in set_pool_pgpnum
assert isinstance(pool_name, six.string_types)
AssertionError
Signed-off-by: Sage Weil <sage@redhat.com>
The ceph.py task normally makes these permissive. But a package upgrade
can reset the permissions so that we can't read and write the temp
export files. (We put them in these dirs now because it's alreadly
mapped out of cephadm containers to the host.)
Signed-off-by: Sage Weil <sage@redhat.com>
This was asserting that all PGs are active or peered, but that assertion
could fail if the concurrent workload created a new pool.
Switch to a loop that checks several times for the condition to be true.
Fixes: https://tracker.ceph.com/issues/43656
Signed-off-by: Sage Weil <sage@redhat.com>
random.choice(seq) raises IndexError if seq is empty. we cannot ensure
there is always one or more pools in the cluster while using pool
related thrasher. so skip the thrasher action if there is no pools at
that moment.
Fixes: https://tracker.ceph.com/issues/43412
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/32252/head:
qa/cephfs/begin: libaio-devel on el8
qa/tasks: nosetests -> python -m nose
qa/tasks/rbd_fio: fio 2.21 -> 3.16
src/test/cli-integration/rbd/snap-diff.t: python -> python
qa/workunits: use nose 3
qa/tasks/cbt: install python3 deps
qa/tasks/ceph_manager.py: do not use python to write a file
test/pybind/test_rados: execute takes a bytes (not str) payload
qa/packages/packages: python[3]-ceph is no more
qa: use python3 for venvs etc
packaging: remove python3-ipaddres, as it is part of the stdlib in py3
qa/packages: python-ceph -> python3-ceph
qa/distros: centos7 -> centos8, rhel7 -> rhel8
spec: remove _python_buildid in favor of python3_pkgversion macro
spec: remove python2 packages and conditions
debian: remove python >= 2.7 requirement
debian: add mgr python versions
debian: explicitly set PYTHON2=OFF to prevent picking up python2 interpreter
debian: update control file to use python3 dependency names
debian: remove all python2 overrides and declarations
debian: remove all python2 install files
Reviewed-by: Alfredo Deza <adeza@redhat.com>
/usr/bin/python dne on el8, /usr/bin/python3 dne on el7. But
all we need to do is write a file--we can do that with tee.
Signed-off-by: Sage Weil <sage@redhat.com>
To be able to catch problems with python2 *and* python3, run flake8
with both versions. From the flake8 homepage:
It is very important to install Flake8 on the correct version of
Python for your needs. If you want Flake8 to properly parse new
language features in Python 3.5 (for example), you need it to be
installed on 3.5 for Flake8 to understand those features. In many
ways, Flake8 is tied to the version of Python on which it runs.
Also fix the problems with python3 on the way.
Note: This requires now the six module for teuthology. But this is
already an install_require in teuthology itself.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
There were a couple of problems found by flake8 in the qa/
directory (most of them fixed now). Enabling flake8 during the usual
check runs hopefully avoids adding new issues in the future.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
* refs/pull/29421/head:
qa/cephfs: add tests for ACLs
qa/cephfs: allow running tests from xfstests-dev
qa/tasks: add methods to get monitor's sockets
qa/cephfs: don't crash if mountpoint dir is already deleted
vstart_runner.py: set omit_sudo's default value to False
qa/vstart_runner.py: fix get_keyring_path()
qa/cephfs: don't abort if mountpoint is already present
qa/cephfs: allow specifying mountpoint for kernel mounts
qa/cephfs: allow specifying mountpoints for FUSE mounts
qa/vstart_runner.py: allow specifying mountpoint for local FUSE mounts
qa/mount.py: allow setting mountpoint
qa/vstart_runner.py: add a method to create a temporary directory
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/31428/head:
qa/tasks: Fixed AttributeError: can't set attribute
qa/tasks: drop/update name from Thrasher
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>