Commit Graph

1296 Commits

Author SHA1 Message Date
Sage Weil
123338acc3 qa/suites/rados/ssh: only install ceph-daemon for packaged mode
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-03 02:26:18 +00:00
Sage Weil
3342436177 tasks/ceph2: add support for packaged ceph-daemon
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-02 21:19:09 +00:00
Sage Weil
61ba2d7b66 Merge PR #31677 into master
* refs/pull/31677/head:
	qa/standalone/ceph-helpers.sh: remove osd down check
	qa/standalone/ceph-helpers.sh: destroy_osd: mark osd down
	osd: add osd_fast_shutdown option (default true)

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-11-25 08:54:45 -06:00
Sage Weil
a7542dcf6b Merge PR #31502 into master
* refs/pull/31502/head:
	qa/tasks/ceph2: get ceph-daemon from same place as ceph
	qa/tasks/ceph2: use safe_while
	qa/tasks/ceph2: pull image using sha1
	qa/tasks/ceph2: docker needs quay.io/ prefix for image name
	qa/workunits/rados/test_python: make sure rbd pool exists
	qa/suites/rados/ssh: new tests!
	qa/tasks/ceph2: pull ceph-ci/ceph:$branch
	qa/tasks/ceph2: register_daemons after pods start
	qa/tasks/ceph2: fix conf
	qa/tasks/ceph2: add restart
	qa/tasks/ceph2: pass ceph-daemon path to DaemonState
	qa/tasks/ceph2: tolerate no mdss or 1 mgr
	qa/tasks/ceph: replace wait_for_osds_up with manager.wait_for_all_osds_up
	qa/tasks/ceph: wait-until-healthy
	qa/tasks/ceph2: set up managers
	qa/tasks/ceph2: use seed ceph.conf
	qa/tasks/ceph: healthy: use manager helpers (instead of teuthology/misc ones)
	qa/tasks/ceph2: name mds daemons
	qa/tasks/ceph2: fix osd ordering
	qa/tasks/ceph2: start up mdss
	qa/tasks/ceph2: set up daemon handles and use them to stop
	qa/tasks/ceph2: make it multicluster-aware
	qa/tasks/ceph2: can bring up mon, mgr, osds!
	qa/tasks/ceph2: basic task to bring up cluster with ceph-daemon and ssh

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-11-22 15:28:17 -06:00
Ilya Dryomov
6c7a23b343
Merge pull request #31771 from idryomov/wip-krbd-read-only-test
qa: update krbd_blkroset.t and add krbd_get_features.t

Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-11-22 12:43:13 +01:00
Sage Weil
71bc236588 Merge PR #31747 into master
* refs/pull/31747/head:
	qa/suites/rados/singleton-nomsgr/all/balancer: whitelist PG_AVAILABILITY

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-11-21 11:49:35 -06:00
Sage Weil
3d9686405c qa/suites/rados/ssh: new tests!
Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-21 10:46:54 -06:00
Ilya Dryomov
80528fcb6c qa: add krbd_get_features.t test
Run it together with krbd_blkroset.t.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-11-21 14:40:41 +01:00
Sage Weil
82c2320fbb qa/suites/rados/singleton-nomsgr/all/balancer: whitelist PG_AVAILABILITY
Balancer triggers peering, which may make PGs briefly go inactive--when
they possibly haven't been active yet.  E.g.,

    "PG_AVAILABILITY": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Reduced data availability: 3 pgs inactive, 3 pgs peering",
            "count": 6
        },
        "detail": [
            {
                "message": "pg 2.6 is stuck peering since forever, current state peering, last acting [2,0]"
            },
            {
                "message": "pg 2.1c is stuck peering since forever, current state peering, last acting [2,1]"
            },
            {
                "message": "pg 2.7a is stuck peering since forever, current state peering, last acting [2,0]"
            }
        ]
    }

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-19 20:38:08 -06:00
Sage Weil
31b7816e94 qa/suites/rados/thrash-old-clients: skip TestClsRbd.mirror
Older versions have this test and fail it.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-19 20:11:55 -06:00
Kefu Chai
153311a196 qa/suites/rados: whitelist health warnings
in cephtool/test.sh, we

ceph fs set cephfs inline_data {1,0}

so the health check fails when the test ends, like

mon.a (mon.0) 3498 : cluster [WRN] Health check failed: 1 filesystem
with deprecated feature inline_data (FS_INLINE_DATA_DEPRECATED)" in
cluster log

so, before we remove the test, we need to whitelist this warning

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-11-18 22:23:08 +08:00
Sage Weil
cf352c3ac0 osd: add osd_fast_shutdown option (default true)
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately.  This has a few important benefits:

 - We immediately stop responding (binding) to any sockets, which means
   other OSDs will immediately decide we are down (and dead!).  This
   minimizes IO interruption.
 - We avoid the complex "clean" shutdown process, which is historically a
   source of bugs.

In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind.  Set
this option to false for valgrind QA runs so we can still do that.

Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out.  This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on.  If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-15 09:31:50 -06:00
Sergio de Carvalho
2650ebe8af rgw: improvements to SSE-KMS with Vault
* add 'rgw crypt vault prefix' config setting to allow restricting
  secret space in Vault where RGW can retrieve keys from
* refuse Vault token file if permissions are too open
* improve concatenation of URL paths to avoid constructing an invalid
  URL (missing or double '/')
* doc: clarify SSE-KMS keys must be 256-bit long and base64 encoded,
  document Vault policies and tokens, plus other minor doc improvements
* qa: check SHA256 signature of Vault zip download
* qa: fix teuthology tests broken by previous PR which made SSE-KMS
  backend default to Barbican

Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com>
Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
2019-11-12 13:51:25 +00:00
Patrick Donnelly
19a08227fb
Merge PR #30890 into master
* refs/pull/30890/head:
	mgr: invoke plugin shutdown on SIG{TERM,INT} signals.
	mgr/volumes: guard volume delete by waiting for pending ops
	mgr/volumes: cleanup libcephfs handles when stopping
	Revert "qa/suites/rados/mgr/tasks/module_selftest: whitelist mgr client getting backlisted"

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-11-08 10:43:46 -08:00
Casey Bodley
7f068cb5b7
Merge pull request #31414 from cbodley/wip-qa-rgw-more-crypto-backend
qa/rgw: use 'testing' kms backend for other rgw subsuites

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2019-11-07 14:15:29 -05:00
Sage Weil
5def1df5e8 Merge PR #31064 into master
* refs/pull/31064/head:
	test: Test balancer module commands
	mgr: Improve balancer module status
	mgr: Release GIL before calling OSDMap::calc_pg_upmaps()

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-11-07 10:57:56 -06:00
Lenz Grimmer
fe8f786b6e
mgr/dashboard: add missing test_orchestrator suite (#31198)
mgr/dashboard: add missing test_orchestrator suite

Reviewed-by: Kiefer Chang <kiefer.chang@suse.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
2019-11-06 14:35:43 +00:00
Casey Bodley
d5863f5c2b qa/rgw: use 'testing' kms backend for other rgw subsuites
resolves test failures under rgw/{multifs,thrash,website} similar to
https://github.com/ceph/ceph/pull/30940

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2019-11-05 09:02:50 -05:00
Sebastian Wagner
157fb06fac mgr/orchestrator: check for DEVICE_{IDENT|FAULT}_ON
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2019-11-05 13:02:29 +01:00
Casey Bodley
ad4ff5f948 qa/rgw: use 'testing' kms backend for multisite tests
a missing piece from https://github.com/ceph/ceph/pull/30940

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2019-11-04 12:49:07 -05:00
Tatjana Dehler
8244028ca2 mgr/dashboard: add missing test_orchestrator suite
Fixes: https://tracker.ceph.com/issues/42244
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2019-11-04 13:49:29 +01:00
Patrick Donnelly
ced05b9eb3
Merge PR #31206 into master
* refs/pull/31206/head:
	qa: test fs:upgrade when running upgrade suite

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-10-31 13:08:34 -07:00
Ilya Dryomov
f41de0bec1
Merge pull request #31265 from idryomov/wip-krbd-unmap-msgr1
qa/suites/krbd: run unmap subsuite with msgr1 only

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-10-31 15:45:19 +01:00
Ilya Dryomov
5011cc926c qa/suites/krbd: run unmap subsuite with msgr1 only
pre-single-major.yaml kernel doesn't have any of the monitor client
fixes that came in 4.6.  If the connection is closed, it closes the
session and retries only after 10 seconds.  On top of that, there is
nothing to prevent it from picking the same monitor when reconnecting.
This means that when given both v1 and v2 ports (which look like two
different monitors), it is susceptible to mount_timeout (60 seconds):

  $ sudo rbd map img
  rbd: sysfs write failed
  In some cases useful info is found in syslog - try "dmesg | tail".
  rbd: map failed: (5) Input/output error

  [  822.242313] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  832.265494] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  842.296175] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  852.326924] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  862.357611] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  872.388373] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  882.676136] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)

Unlike newer kernels that return ETIMEDOUT, it returns EIO.

Newer kernels are much more aggressive about retries and will pick
a different monitor when reconnecting, hence they are always able to
establish the session in time.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-30 19:51:55 +01:00
Patrick Donnelly
9dc07d8096
qa: add tests for CephFS admin commands
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-10-30 11:44:26 -07:00
Ilya Dryomov
9c17ca0aa7
Merge pull request #31023 from idryomov/wip-krbd-udev-enumerate-retry
krbd: retry on transient errors from udev_enumerate_scan_devices()

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-10-29 11:40:45 +01:00
Patrick Donnelly
094df5c3f0
qa: test fs:upgrade when running upgrade suite
Sometimes this suite breaks because it's not usually tested when upgrade
suites are modified.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-10-28 20:41:29 -07:00
Patrick Donnelly
eb00dcd660
Merge PR #31063 into master
* refs/pull/31063/head:
	qa: disable too few PG warning during Mimic deploy

Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-10-28 20:37:31 -07:00
Sage Weil
d927374bb4 Merge PR #31168 into master
* refs/pull/31168/head:
	ceph-daemon: try py2 import before py3
	qa/suites/rados/singleton-nomsgr/ceph-daemon: make sure python3 is installed
	qa/standalone/test_ceph_damon.sh: test with python2 and python3
	mgr/ssh: python, not python3
	ceph-daemon: python, not python3
	ceph-daemon: os.makedirs
	ceph-daemon: configparser is ConfigParser on py2
	ceph-daemon: avoid py3-isms

Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Alfredo Deza <adeza@redhat.com>
2019-10-28 14:59:43 -05:00
Sage Weil
9fe9653c8c qa/suites/rados/singleton-nomsgr/ceph-daemon: make sure python3 is installed
Centos7 doesn't have it by default.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-28 12:15:47 -05:00
Kefu Chai
674bd8a9e6
Merge pull request #30434 from smithfarm/wip-41820
qa: enable dashboard tests to be run with "--suite rados/dashboard"

Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-10-27 09:18:16 +08:00
Ilya Dryomov
b7a0e2adcb qa: add script to stress udev_enumerate_scan_devices()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-25 22:05:38 +02:00
David Zafman
3a0e2c8ff1 test: Test balancer module commands
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-10-24 18:56:19 -07:00
Patrick Donnelly
8fb4e4c1e7
qa: disable too few PG warning during Mimic deploy
Mimic will raise this warning when we use 8 PGs for CephFS metadata/data
pools.

Fixes: fc88e6c6c5
Fixes: https://tracker.ceph.com/issues/42434
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-10-24 15:12:43 -07:00
Venky Shankar
a7994a0fdd Revert "qa/suites/rados/mgr/tasks/module_selftest: whitelist mgr client getting backlisted"
This reverts commit 0060f1c5b8.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
2019-10-24 03:34:44 -04:00
Sage Weil
bf09a04d22 Merge PR #31094 into master
* refs/pull/31094/head:
	ceph-daemon: remove redundant --privileged
	test_ceph_daemon: test unit, enter, shell
	ceph-daemon: drop exec
	ceph-daemon: fix exit code for run, shell, enter, exec
	ceph-daemon: allow optional command for 'enter'
	ceph-daemon: fix LANG for 'enter' command
	ceph-daemon: allow shell to take optional command
	qa/suites/rados/singleton-nomsgr/ceph-daemon: run test_ceph_daemon.sh
	qa/standalone/test_ceph_daemon.sh: add new functional tests
	test_ceph_daemon.sh: use newer image
	ceph-daemon: unconditionally enable and start crash unit
	ceph-daemon: fix crash unit cleanup
	ceph-daemon: include 'crash' unit/item in 'ls' output
	ceph-daemon: fix 'ls'
	mgr/orchestrator: s/sdd/ssd/
	mgr/ssh: remove stdout/stderr kludges
	ceph-daemon: fix ceph-volume command to write stdout to stdout

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-10-23 19:46:06 -05:00
Sage Weil
adf22b9e59 Merge PR #31054 into master
* refs/pull/31054/head:
	qa/suites/upgrade/*-x-singleton: suppress TOO_FEW_PGS warning

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-10-23 15:20:43 -05:00
Sage Weil
47777b9c0d qa/suites/rados/singleton-nomsgr/ceph-daemon: run test_ceph_daemon.sh
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-23 15:08:55 -05:00
Casey Bodley
250a65e045
Merge pull request #30997 from cbodley/wip-qa-rgw-objectstores
qa/rgw: drop some objectstore types

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-10-23 11:37:32 -04:00
Casey Bodley
604db96bbb
Merge pull request #28421 from pritha-srivastava/wip-rgw-omap-offload
rgw: add cls_queue and cls_rgw_gc for omap offload

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2019-10-23 09:59:28 -04:00
Sage Weil
9f912c2158 qa/suites/upgrade/*-x-singleton: suppress TOO_FEW_PGS warning
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-22 15:53:05 -05:00
Patrick Donnelly
581be7595b
Merge PR #30971 into master
* refs/pull/30971/head:
	qa: whitelist "Error recovering journal" for cephfs-data-scan

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-10-21 21:18:36 -07:00
Nathan Cutler
493ee6d78f qa: enable dashboard tests to be run with "--suite rados/dashboard"
This moves dashboard.yaml from rados/mgr into a new, separate rados/dashboard
suite. The common elements it uses are moved from rados/mgr into qa/ and
replaced with symlinks.

Fixes: https://tracker.ceph.com/issues/41820
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2019-10-21 12:31:51 +02:00
Ilya Dryomov
a80185d02c
Merge pull request #30965 from idryomov/wip-krbd-udev-socket-overrun
krbd: avoid udev netlink socket overrun

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-10-21 11:00:46 +02:00
Casey Bodley
0e76d40aa1 test/rgw: run ceph_test_rgw_gc_log in rgw verify suite
since it requires a running ceph cluster, it can't run in 'make check'
as a unittest. add it to the rgw/verify suite instead

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2019-10-19 13:28:18 +05:30
Ilya Dryomov
898c113f93 qa: add script to test udev event reaping
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-18 21:56:30 +02:00
Casey Bodley
85a37896b8 qa/rgw: drop some objectstore types
use the subset of objectstore configurations from .qa/objectstore_cephfs
instead of .qa/objectstore

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2019-10-18 13:20:20 -04:00
Ilya Dryomov
340d6f61b3
Merge pull request #30978 from idryomov/wip-krbd-modprobe
krbd: modprobe before calling build_map_buf()

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-10-18 11:11:07 +02:00
Kefu Chai
df8bb8b8f6
Merge pull request #30646 from shyukri/wip-qa-mgr-balancer
qa/mgr/balancer: Add cram based test for altering target_max_misplaced_ratio setting

Reviewed-by: Jan Fajerski <jfajerski@suse.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
2019-10-18 15:43:33 +08:00
Ilya Dryomov
286bdbfe24 krbd: modprobe before calling build_map_buf()
Otherwise add_key() in set_kernel_secret() fails as if running against
an ancient kernel and we fall back to secret= in options for the first
image being mapped on the machine.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-17 16:52:43 +02:00