Commit Graph

6190 Commits

Author SHA1 Message Date
xie xingguo
023524a26d osd/PeeringState: restart peering on any previous down acting member coming back
One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
  pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
  hence leaving the corresponding pg pinning undersized for a long time until all backfill
  targets completes

It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.

The right way to achieve the above goal is for:

	boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)

to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
2020-02-21 17:52:52 +08:00
Sage Weil
0eef82165b Merge PR #33110 into master
* refs/pull/33110/head:
	qa/distros: rhel and centos: whitelist cephadm logrotate selinux denial

Reviewed-by: Boris Ranto <branto@redhat.com>
2020-02-06 13:13:48 -06:00
Sage Weil
b91636dcfe qa/distros: rhel and centos: whitelist cephadm logrotate selinux denial
This is fixed in RHEL 8.1.1 (and by extension centos/rhel 8.2+).

No fix for el 7 yet

Partially-fixes: https://tracker.ceph.com/issues/43703
Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-06 08:52:52 -06:00
Tatjana Dehler
4515ab32fa
Merge pull request #32546 from votdev/issue_43089_passwd_cmplx_config
mgr/dashboard: Make password policy check configurable

Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Tiago Melo <tmelo@suse.com>
2020-02-06 09:44:48 +01:00
Kefu Chai
edca34ea67 qa/tasks/cephadm: test "orchestrator host ls"
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-02-06 09:53:17 +08:00
Kefu Chai
062e0365ea qa/tasks: drop test_cephadm_orchestrator.py
this test will end with a failure like

```
2020-01-30T18:15:15.870 INFO:tasks.ceph.mgr.x.smithi042.stderr:Warning: Permanently added 'smithi042.front.sepia.ceph.com,172.21.15.42' (ECDSA) to the list of known hosts.
2020-01-30T18:15:15.925 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.932 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.939 INFO:tasks.ceph.mgr.x.smithi042.stderr:root@smithi042.front.sepia.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
```

because mgr is not able to establish an ssh connection to that host with "root".
please note, the teuthology worker is acting using the "ubuntu" account on the
test node, and by default, "root" does not have its pubkey. and actually
`qa/tasks/cephadm.py` does push the pubkey to all the managed hosts before
testing cephadm.

since `qa/tasks/cephadm.py` is a better test for cephadm, let's just
drop this one.

as suites/rados/cephadm already covers cephadm

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-02-06 09:53:17 +08:00
Sage Weil
5948bd5545 Merge PR #32946 into master
* refs/pull/32946/head:
	qa/suites/rados: improve valgrind leak check
	common/ceph_context: add an asok command to deliberately leak memory

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-02-05 16:47:21 -06:00
Sage Weil
2611a73d57 Merge PR #33055 into master
* refs/pull/33055/head:
	qa/tasks/mgr/test_orchestrator_cli: support multiple DriveGroups

Reviewed-by: Joshua Schmid <jschmid@suse.de>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
2020-02-05 16:09:12 -06:00
Kefu Chai
5c5e1105bb
Merge pull request #33026 from liewegas/wip-el81
qa/distros: add rhel/centos 8.1

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-02-05 23:44:15 +08:00
Casey Bodley
8dbb92e4a1
Merge pull request #32996 from cbodley/wip-rgw-put-multipart-stripe
rgw: MultipartObjectProcessor supports stripe size > chunk size

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2020-02-05 08:42:50 -05:00
Kefu Chai
22da7813ef
Merge pull request #33018 from mgfritch/cephadm-docker-disabled
qa/workunits/cephadm/test_cephadm.sh: skip docker when service is disabled

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-05 20:46:16 +08:00
Abhishek L
7c1a690560
Merge pull request #30684 from theanalyst/rgw/qa/rgw-admin-user-stats
qa: radosgw_admin: validate a simple user stats output

Reviewed-By: Casey Bodley <cbodley@redhat.com>
2020-02-04 17:21:25 +01:00
Kiefer Chang
c9b12e2b66
qa/tasks/mgr/test_orchestrator_cli: support multiple DriveGroups
create_osds interface in Orchestrator supports multiple named DriveGroups
since https://github.com/ceph/ceph/pull/32972. Adapt the changes in
the test.

Fixes: https://tracker.ceph.com/issues/43945
Signed-off-by: Kiefer Chang <kiefer.chang@suse.com>
2020-02-04 14:31:21 +08:00
Sage Weil
e17ffa6c11 Merge PR #32977 into master
* refs/pull/32977/head:
	qa/workunits/cephadm/test_cephadm.sh: add missing monitoring tests
	cephadm: simplify Monitoring.components structure
	cephadm: add proper tox type for monitoring components

Reviewed-by: Patrick Seidensal <pseidensal@suse.com>
2020-02-03 16:28:04 -06:00
Sage Weil
ebca44ccaa qa/suites/rados: improve valgrind leak check
Verify we can detect leak in the osd, mon, and mgr independently.  Also
include a negative test (no leaks).

Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-03 10:25:39 -06:00
Casey Bodley
d486b5bc45 qa/rgw: test with non-default rgw-obj-stripe-size
each job will select one of the striping strategies at random

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2020-02-03 11:24:52 -05:00
Michael Fritch
4535216267
qa/workunits/cephadm/test_cephadm.sh: add missing monitoring tests
add tests for node-exporter, prometheus, and grafana

Signed-off-by: Michael Fritch <mfritch@suse.com>
2020-02-02 21:08:02 -07:00
Patrick Donnelly
29d850fb7e
Merge PR #32570 into master
* refs/pull/32570/head:
	cephfs-shell: Add tests for setxattr, getxattr and listxattr
	cephfs-shell: Add listxattr command
	cephfs-shell: Add getxattr command
	cephfs-shell: Add setxattr command
	doc: Update about extended attributes

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2020-02-02 06:56:50 -08:00
Patrick Donnelly
3a6f107331
Merge PR #31633 into master
* refs/pull/31633/head:
	cephfs-shell: Instead of assert use stat for tests in rmdir
	cephfs-shell: Add function for common rmdir test code
	cephfs-shell: Add rmdir test for non empty directory
	cephfs-shell: Add rmdir -p test for non empty directory
	cephfs-shell: Add rmdir -p test for non existing dir
	cephfs-shell: Add rmdir -p test to delete all dirs in given path
	cephfs-shell: Add rmdir -p test for root directory with empty directories
	cephfs-shell: Add rmdir test for valid file
	cephfs-shell: Add rmdir test for invalid directory
	cephfs-shell: Add rmdir test for valid directory
	cephfs-shell: Fix rmdir '-p' issues

Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2020-02-02 06:52:23 -08:00
Sage Weil
b66f5df514 Merge PR #32986 into master
* refs/pull/32986/head:
	qa/tasks/ceph_manager: fix movement of cot exports with cephadm

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-02-01 10:47:56 -06:00
Ramana Raja
b7768eca2a
Merge pull request #32030 from vshankar/wip-mgr-volumes-clone
mgr/volumes: clone from snapshot
2020-02-01 13:17:51 +05:30
Sage Weil
d8a7c73a48 Merge PR #32987 into master
* refs/pull/32987/head:
	qa/tasks/ceph_manager: make fix_pgp_num behave when no pool is found

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-31 17:40:23 -06:00
Sage Weil
a5d848d206 Merge PR #32989 into master
* refs/pull/32989/head:
	qa/tasks/ceph_manager: add --log-early to raw_cluster_cmd

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-31 17:40:13 -06:00
Sage Weil
42768600d4 qa/tasks/ceph_manager: fix movement of cot exports with cephadm
I think this will finally work...

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-31 17:26:10 -06:00
Michael Fritch
386a9eb89c
qa/workunits/cephadm/test_cephadm.sh: skip docker when service is disabled
Signed-off-by: Michael Fritch <mfritch@suse.com>
2020-01-31 10:27:07 -07:00
Sage Weil
d8c674757f qa/distros: add rhel/centos 8.1
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-31 07:36:21 -06:00
Volker Theile
3684d24a43 mgr/dashboard: Make password policy check configurable
Fixes: https://tracker.ceph.com/issues/43089

Signed-off-by: Volker Theile <vtheile@suse.com>
2020-01-31 11:28:17 +01:00
Venky Shankar
b5970ff80d test: add subvolume clone tests
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2020-01-31 05:09:14 -05:00
Sage Weil
a7988dfd3f Merge PR #32988 into master
* refs/pull/32988/head:
	qa/tasks/ceph: only re-request scrub on unscrubbed pgs

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-30 21:56:05 -06:00
Sage Weil
2954c607b7 Merge PR #32958 into master
* refs/pull/32958/head:
	qa/suites/rados/singleton/all/lost-unfound*: whitelist SLOW_OPS

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-30 11:01:24 -06:00
Sage Weil
f10cc22c60 Merge PR #32961 into master
* refs/pull/32961/head:
	qa/standalone/osd/osd-bench: debug bluestore

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-30 10:42:17 -06:00
Sage Weil
32a36f9c75 Merge PR #32968 into master
* refs/pull/32968/head:
	qa/suites/rados/verify: debug monc = 20

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-30 10:42:05 -06:00
Sage Weil
8c87110b54 qa/tasks/ceph_manager: add --log-early to raw_cluster_cmd
This is harmless if logging is low, but adds useful info when it is turned
up.

Hunting bug https://tracker.ceph.com/issues/43914

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-30 10:36:28 -06:00
Sage Weil
1dc2a8a09e qa/tasks/ceph: only re-request scrub on unscrubbed pgs
If we haven't scrubbed everything, we occasinoally re-request scrub in case
the request was missed by the OSD (this can happen).  But we were
re-requesting scrub on ALL pgs, and if they are done in a
semi-deterministic order and are slow, then we may never get to the final
ones.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-30 10:22:49 -06:00
Sage Weil
7d0a789b1b qa/tasks/ceph_manager: make fix_pgp_num behave when no pool is found
Fixes:

2020-01-30T04:41:24.697 INFO:tasks.thrashosds.thrasher:fixing pg num pool None
2020-01-30T04:41:24.698 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1070, in wrapper
    return func(self)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1200, in _do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 768, in fix_pgp_num
    if self.ceph_manager.set_pool_pgpnum(pool, force):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 2088, in set_pool_pgpnum
    assert isinstance(pool_name, six.string_types)
AssertionError

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-30 08:32:56 -06:00
Sage Weil
88c49d483a Merge PR #32969 into master
* refs/pull/32969/head:
	qa/suites/rados/cephadm: explicitly test many distros

Reviewed-by: Nathan Cutler <ncutler@suse.com>
2020-01-30 08:28:25 -06:00
Patrick Donnelly
2931433cd1
Merge PR #32854 into master
* refs/pull/32854/head:
	qa: fix testing kernel branch link

Reviewed-by: Sage Weil <sage@redhat.com>
2020-01-30 06:25:25 -08:00
Sage Weil
68d3c86106 Merge PR #32972 into master
* refs/pull/32972/head:
	python-common/ceph/deployment/translate: use 'prepare' instead of 'batch' for trivial case
	qa/tasks/cephadm: pass short dev name to osd prepare
	mgr/cephadm: fix detection of just-created OSDs
	mgr/cephadm: properly indent raise conditions
	mgr/cephadm: add warning to other orchestrators
	mgr/cephadm: separate acceptance criterias for Devices
	mgr/cephadm: fix typos
	mgr/cephadm: move utils in test/utils.py
	mgr/ssh: increase disk size to 20G
	drivegroups: add support for drivegroups + tests
	mgr/orch_cli: allow multiple drivegroups
	drivegroups: translate disk spec to ceph-volume call

Reviewed-by: Jan Fajerski <jfajerski@suse.com>
Reviewed-by: Joshua Schmid <jschmid@suse.de>
2020-01-30 07:01:47 -06:00
David Zafman
6bb36f862f
Merge pull request #32945 from dzafman/wip-43864
test: Update pg log test for new trimming behavior

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-29 16:03:44 -08:00
Varsha Rao
e601242afe cephfs-shell: Add tests for setxattr, getxattr and listxattr
Signed-off-by: Varsha Rao <varao@redhat.com>
2020-01-30 01:16:27 +05:30
Sage Weil
b119fc5f18 qa/suites/rados/cephadm: explicitly test many distros
Include _latest.yaml in a few cases here to be a bit future-proof.

cephadm-smoke/ is *just* a cephadm bring-up, and includes el7.  cephadm/
installs packages and runs a real workload.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 13:41:59 -06:00
Sage Weil
1ce0b70cc0 Merge PR #32943 into master
* refs/pull/32943/head:
	qa/tasks/ceph_manager: fix chmod on log dir during pg export copy

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-29 10:34:09 -06:00
Sage Weil
8e3eb592b0 qa/suites/rados/verify: debug monc = 20
Hunting https://tracker.ceph.com/issues/43882

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 09:53:41 -06:00
Sage Weil
b99e506a3f qa/standalone/osd/osd-bench: debug bluestore
Looking for https://tracker.ceph.com/issues/43888

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 07:43:41 -06:00
Sage Weil
c227b5d831 Merge PR #32940 into master
* refs/pull/32940/head:
	qa: remove rados/basic/tasks/rgw_snaps.yml

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
2020-01-29 07:23:34 -06:00
Sage Weil
f4156aea10 qa/suites/rados/singleton/all/lost-unfound*: whitelist SLOW_OPS
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 07:11:15 -06:00
David Zafman
e18519ad09 test: Update pg log test for new trimming behavior
Fixes: https://tracker.ceph.com/issues/43864

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-28 15:23:45 -08:00
Sage Weil
f026a1c9f6 qa/tasks/cephadm: pass short dev name to osd prepare
Zap needs a full path, but create/prepare needs the VG/LV
only if it is an existing LV.

We'll make c-v more friendly later.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-28 14:21:53 -06:00
Sage Weil
9a4dd1fb3d qa/tasks/ceph_manager: fix chmod on log dir during pg export copy
With cephadm, we should chmod both /var/log/ceph and /var/log/ceph/$fsid.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-28 13:54:18 -06:00
Ali Maredia
7cf2af6e5c qa: remove rados/basic/tasks/rgw_snaps.yml
rgw_snaps tasks should not be running in the rados suite.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-01-28 14:29:27 -05:00