Commit Graph

110179 Commits

Author SHA1 Message Date
Jason Dillaman
cb7b91dc02 rbd-mirror: unlink from remote snapshot if required
If a previous remote snapshot was synced but the unlink failed,
ensure we retry the unlink so that the remote can cleanup the unused
snapshot.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:01:15 -04:00
Jason Dillaman
281af0de86 rbd-mirror: prune unnecessary non-primary mirror snapshots
Once a non-primary snapshot is no longer required for syncing, delete it
from the image.

Fixes: https://tracker.ceph.com/issues/44105
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:00:28 -04:00
Jason Dillaman
cb8187c0dd rbd-mirror: propagate full snap-seq mapping in non-primary snapshots
Previously only newly created user snapshots were included in the
non-primary snapshot snap-seq mapping table. However, we need to
retain a full history of the mapping table if we want to be able to
prune non-primary snapshots.

Failovers are a special case since we won't have a valid snap seq mapping
so it will need to be rebuilt. Luckily, both sides should be read-only
in the previous state so we can use the snapshot names to find matches.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:00:28 -04:00
Jason Dillaman
6bf2132cf0 rbd-mirror: ignore non-primary read-only state for remote images
snapshot-based mirroring needs to be able to potentially delete a
demotion snapshot during the unlink process. Previously, these
snapshots have been left while the read-only error was ignored.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:00:28 -04:00
Jason Dillaman
0ca7817ece rbd: fix missing space when listing non-primary mirror snapshots
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:00:28 -04:00
Jason Dillaman
a3acdbd069 librbd: fixed race condition on demotion of snapshot-based mirrored image
A pending refresh could occur after setting the non-primary feature flag but
before the creation of the demotion snapshot. This would prevent the snapshot
from being created and would leave the image in a half-primary state.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:00:28 -04:00
Jason Dillaman
0102ce8870 librbd: store mirror peer uuids in non-primary demoted snapshots
This will allow a remote rbd-mirror process to have a snapshot to use for
delta sync operations during failover.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:00:28 -04:00
Jason Dillaman
eed00eb179 librbd: additional debug logs for mirror snapshot unlink peer
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:00:28 -04:00
Jason Dillaman
6a342bb5e0 test/rbd-mirror: fix gmock warnings during snapshot-based replayer tests
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-09 10:00:28 -04:00
Lenz Grimmer
c5dd11b7ec
Merge pull request #34411 from votdev/issue_44589
mgr/dashboard: lint error on plugins/debug.py

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kiefer Chang <kiefer.chang@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
2020-04-09 13:25:11 +02:00
Lenz Grimmer
0b18f60706
Merge pull request #34452 from rhcs-dashboard/fix-44923-master
mgr/dashboard: use FQDN for failover redirection

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
2020-04-09 12:20:41 +02:00
Ilya Dryomov
4b04f2ba83
Merge pull request #34471 from idryomov/wip-rbd-fio-rstrip
qa/tasks/rbd_fio: unbreak after the conversion from StringIO

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-09 11:37:14 +02:00
Mykola Golub
978a2e364f
Merge pull request #34408 from dillaman/wip-44727
rbd-mirror: improved replication statistics

Reviewed-by: Mykola Golub <mgolub@suse.com>
2020-04-09 10:13:57 +03:00
Kefu Chai
8c63b26fe8
Merge pull request #34264 from tchaikov/wip-qa-py3
qa/tasks: be python3 compatible

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2020-04-09 10:08:44 +08:00
Kefu Chai
3f253a9a83
Merge pull request #34284 from liewegas/followon-34266-cleanup
mgr/DaemonServer: add missing ceph_abort

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-09 08:39:27 +08:00
Casey Bodley
b26a525eae
Merge pull request #34414 from yuvalif/add_timeout_to_http_client
rgw/http: add timeout to http client

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2020-04-08 15:35:11 -04:00
Jan Fajerski
4fc07f4aae
Merge pull request #34463 from jan--f/c-v-batch-filter-check-lvs-before-access
ceph-volume/batch: check lvs list before access
2020-04-08 18:07:06 +02:00
Jan Fajerski
647e43ba31
Merge pull request #34472 from jan--f/c-v-noninteractive-batch-idempotency-all-filtered
ceph-volume/batch: return success when all devices are filtered
2020-04-08 17:14:24 +02:00
Jason Dillaman
8a18a7fc71 rbd-mirror: fixed race condition with snapshot sync and shutdown
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2020-04-08 09:45:39 -04:00
Jan Fajerski
cb3eade3d0 ceph-volume/batch: return success when all devices are filtered
batch should only return an error if some (but not all) devices are
filtered. When only some devices are filtered the resulting osd layout
could look very different from what a user expects. If all devies are
filtered just return success.

Fixes: https://tracker.ceph.com/issues/44994

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2020-04-08 15:22:26 +02:00
Kefu Chai
83c632099b mgr/telegraf: catch FileNotFoundError exception
in tasks/module_selftest.yaml, `TestModuleSelftest.test_telegraf()` is
called. but we fail to prepare a unix domain socket to which the telegraf
module can send stats. and telegraf module does not catch
FileNotFoundError exception, so the exception is populated to ceph-mgr
and is found by the test, hence the test is marked a failure whenever
telegraf is tested.

in this change,

* catch this exception, so it won't be caught by ceph-mgr
* whitelist the error message, so the test can pass

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 21:07:07 +08:00
Kefu Chai
f28a5fef3b qa/tasks/openssl_keys.py: sort cert configs before creating certs
we cannot rely on the order in which items are arranged in a dict, the
order varies from version to another. in Python2, it happens to work,
and we can always have the self-signed cert added first. but in Python3,
it does not. and an exception is thrown
```
teuthology.exceptions.ConfigError: ssl: ca root not found for
certificate rgw.client.0
```

in this change, before creating certs, the settings are reordered so
that the self-signed ones are created first.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 21:07:07 +08:00
Kefu Chai
8e093e5328
Merge pull request #34398 from rzarzynski/wip-crimson-outdata-to-pglog
crimson/osd: record op's outdata and rval in pg log

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-08 21:03:31 +08:00
Kefu Chai
02497d9cc4
Merge pull request #34466 from tchaikov/wip-cmake-get-git-version
cmake: check $top_srcdir/.git directly

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2020-04-08 20:29:08 +08:00
Sebastian Wagner
333439f2b2
Merge pull request #34220 from mgfritch/cephadm-nfs-container-image
mgr/cephadm: allow config for an nfs `container_image`

Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
2020-04-08 13:46:00 +02:00
Mykola Golub
dbb8495662
Merge pull request #34422 from dillaman/wip-44938
rbd: ignore tx-only mirror peers when adding new peers

Reviewed-by: Mykola Golub <mgolub@suse.com>
2020-04-08 14:23:51 +03:00
Lenz Grimmer
887d5bb044
Merge pull request #34058 from rhcs-dashboard/44228-fix-frontend-services-subscription-errors
mgr/dashboard: fix errors related to frontend service subscriptions.

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Tiago Melo <tmelo@suse.com>
2020-04-08 13:17:10 +02:00
Ilya Dryomov
c3f4f1d660 qa/tasks/rbd_fio: unbreak after the conversion from StringIO
Fix a bad typo in commit db7ae8eff6 ("qa/tasks/rbd_fio: get rid of
StringIO for py3").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-04-08 12:24:51 +02:00
Radoslaw Zarzynski
7dc579c5e4 crimson/osd: record op's outdata and rval in pg log.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2020-04-08 12:22:30 +02:00
Radoslaw Zarzynski
aae6e4c67f osd: pg_log_entry_t::set_op_returns() takes const reference now.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2020-04-08 12:22:30 +02:00
Kefu Chai
a05d7179e2 cmake: check $top_srcdir/.git directly
in 0437adc33a, we stop right before
reaching $top_srcdir, but we should stop at its parent directory.

in this change, instead of trying to be smart and to walk all the way
up to the root directory or $top_srcdir, we just check $top_srcdir/.git
directly, as we just know it's there or it does not exist at all.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 18:03:37 +08:00
Jan Fajerski
c6dcf3161b ceph-volume/batch: check lvs list before access
Fixes: https://tracker.ceph.com/issues/44989

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2020-04-08 10:41:56 +02:00
Volker Theile
45076ed13a mgr/dashboard: lint error on plugins/debug.py
Make pylint for Python 3.8 and older versions happy.

Fixes: https://tracker.ceph.com/issues/44589

Signed-off-by: Volker Theile <vtheile@suse.com>
2020-04-08 09:56:18 +02:00
Kefu Chai
e9f9e74f93
Merge pull request #34229 from Yan-waller/wip-walle-fixsparsereadlength
osd/PrimaryLogPG: fix SPARSE_READ stat

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-04-08 15:42:26 +08:00
Kefu Chai
ae9247b7a0
Merge pull request #34342 from ideepika/fixes-44862
mon: calculate min_size on osd pool set size

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-04-08 15:39:48 +08:00
Kefu Chai
c07e915dac
Merge pull request #34219 from yanghonggang/bluefs-tool
os/bluestore: Don't pollute old journal when add new device

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
2020-04-08 15:34:16 +08:00
Kefu Chai
e9796c4409
Merge pull request #34143 from tchaikov/wip-mgr-disable-dne-module
mon/MgrMonitor: show different error message when disabling a dne module

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
2020-04-08 15:32:49 +08:00
Kefu Chai
b8cac4f109
Merge pull request #34366 from SUSE/wip-mgr-fix-python-traceback
mgr/PyModule: fix missing tracebacks in handle_pyerror()

Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-08 15:29:34 +08:00
Kefu Chai
3311063916
Merge pull request #34337 from majianpeng/throttle-remove-lock
common/Throttle: Don't lock for atomic type update.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-08 15:28:04 +08:00
Kefu Chai
9748350a79
Merge pull request #34381 from rhcs-dashboard/fix-44721-master
rpm: add python3-saml as install dependency

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-08 15:25:22 +08:00
Kefu Chai
6734a8c589
Merge pull request #34409 from adamemerson/wip-namespace-osd
osd: build without `using namespace` declarations in headers

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-08 15:20:22 +08:00
Kefu Chai
293c8b39c5
Merge pull request #34460 from majianpeng/cmakefile-fix
cmake: remove duplicated code.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-08 15:07:00 +08:00
Kefu Chai
e4cad106cd
Merge pull request #34451 from tchaikov/wip-standalone-pgid
qa/standalone/scrub: s/$(pgid)/${pgid}/

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-04-08 12:50:47 +08:00
Kefu Chai
77ec9ce88d qa/tasks/ceph_objectstore_tool.py: use str.startswit
in Python3, string module does not offer `string.find()` anymore, let's
use `str.find()` method instead.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 12:32:56 +08:00
Kefu Chai
a7602a8449
Merge pull request #34245 from rzarzynski/wip-bug-24995
mgr: synchronize ClusterState's health and mon_status.

Reviewed-by: Tim Serong <tserong@suse.com>
2020-04-08 11:55:59 +08:00
Jianpeng Ma
9d76123fdf cmake: remove duplicated code.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2020-04-08 11:08:02 +08:00
Kefu Chai
523c623b28 test/rgw: upload using a NamedTemporaryFile
in boto, it tries to figure out the MIME type of a file by its name, if
the file-like objects has an attribute of "name". in Python2, the "name"
is always "<fdopen>", fortunately. while in Python3, `TemporaryFile` also
have a "name" which is its fd, and it is an integer now. so we have following
error when sending a `TemporaryFile` using
`upload_part_from_file()`:
```
2020-04-08T02:25:34.660 INFO:tasks.rgw_multisite_tests:Traceback (most recent call last):
2020-04-08T02:25:34.661 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-py3/virtualenv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
2020-04-08T02:25:34.661 INFO:tasks.rgw_multisite_tests:    self.test(*self.arg)
2020-04-08T02:25:34.662 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/github.com_tchaikov_ceph_wip-qa-py3/qa/tasks/rgw_multi/tests_ps.py", line 2567, in test_ps_creation_triggers
2020-04-08T02:25:34.662 INFO:tasks.rgw_multisite_tests:    uploader.upload_part_from_file(fp, 1)
2020-04-08T02:25:34.663 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-py3/virtualenv/lib/python3.5/site-packages/boto/s3/multipart.py", line 260, in upload_part_from_file
2020-04-08T02:25:34.663 INFO:tasks.rgw_multisite_tests:    query_args=query_args, size=size)
2020-04-08T02:25:34.664 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-py3/virtualenv/lib/python3.5/site-packages/boto/s3/key.py", line 1293, in set_contents_from_file
2020-04-08T02:25:34.664 INFO:tasks.rgw_multisite_tests:    chunked_transfer=chunked_transfer, size=size)
2020-04-08T02:25:34.664 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-py3/virtualenv/lib/python3.5/site-packages/boto/s3/key.py", line 750, in send_file
2020-04-08T02:25:34.665 INFO:tasks.rgw_multisite_tests:    chunked_transfer=chunked_transfer, size=size)
2020-04-08T02:25:34.665 INFO:tasks.rgw_multisite_tests:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_wip-py3/virtualenv/lib/python3.5/site-packages/boto/s3/key.py", line 920, in
_send_file_internal
2020-04-08T02:25:34.666 INFO:tasks.rgw_multisite_tests:    self.content_type = mimetypes.guess_type(self.path)[0]
2020-04-08T02:25:34.666 INFO:tasks.rgw_multisite_tests:  File "/usr/lib/python3.5/mimetypes.py", line 289, in guess_type
2020-04-08T02:25:34.667 INFO:tasks.rgw_multisite_tests:    return _db.guess_type(url, strict)
2020-04-08T02:25:34.667 INFO:tasks.rgw_multisite_tests:  File "/usr/lib/python3.5/mimetypes.py", line 114, in guess_type
2020-04-08T02:25:34.667 INFO:tasks.rgw_multisite_tests:    scheme, url = urllib.parse.splittype(url)
2020-04-08T02:25:34.668 INFO:tasks.rgw_multisite_tests:  File "/usr/lib/python3.5/urllib/parse.py", line 881, in splittype
2020-04-08T02:25:34.668 INFO:tasks.rgw_multisite_tests:    match = _typeprog.match(url)
2020-04-08T02:25:34.669 INFO:tasks.rgw_multisite_tests:TypeError: expected string or bytes-like object
```

to address this issue, in this change, a `NamedTemporaryFile` is used
instead of `TemporaryFile`. the former does have a "name" which is a
`str`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 11:02:29 +08:00
Kefu Chai
6fba221605 qa/standalone/scrub: s/$(pgid)/${pgid}/
to address the test failures like
```
2020-04-07T15:44:58.693 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:498: TEST_auto_repair_bluestore_failed:  ceph pg dump
pgs
2020-04-07T15:44:58.694 INFO:tasks.workunit.client.0.smithi049.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:498: TEST_auto_repair_bluestore_failed:  pgid
2020-04-07T15:44:58.694 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh: line 498: pgid: command not found
```

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 11:02:29 +08:00
Kefu Chai
6bc09c5041 qa/tasks/ceph_manager.py: do not return a filter
as the caller might want to `len(manager.get_osd_status()['raw'])`, and
`len()` does not accept a `filter` object.

also, the filtered osd statuses are printed out using `self.log()`, so
we should materialize the `filter` object before sending it to logging
facility. otherwise we will have something like:
```
2020-04-08T02:58:37.001 INFO:tasks.ceph.ceph_manager.ceph:<filter object at 0x7f5a080e1518>
```
in the logging message.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 11:02:29 +08:00
Kefu Chai
e4690c6a66
Merge pull request #34368 from majianpeng/msg-remove-unsued-code
msg, common/Throttle: remove unsued code.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-08 09:59:51 +08:00