Commit Graph

539 Commits

Author SHA1 Message Date
Kefu Chai
7b0973e33b
Merge pull request #32624 from alimaredia/wip-s3-tests-python-3
qa/suites: use s3-tests with python3 support

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-01-17 23:47:51 +08:00
Ali Maredia
0ed77d1c1a qa: remove s3readwrite & s3roundtrip tasks
As a result of the s3-tests python3 port, the
s3readwrite & s3roundtrip testing files were
deleted

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2020-01-13 15:46:31 -05:00
Kefu Chai
44fb077978 qa: whitelist FS_DEGRADED
`admin_socket_output --all` sends "respawn" to mds, so when the mds
restarts, FS_DEGRADED.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-01-08 22:20:35 +08:00
Patrick Donnelly
02b3883dd0
Merge PR #32363 into master
* refs/pull/32363/head:
	qa: add .qa link

Reviewed-by: Sage Weil <sage@redhat.com>
2020-01-06 12:18:12 -08:00
Sage Weil
5ec92e79a2 Merge PR #32232 into master
* refs/pull/32232/head:
	qa: no need to exclude ceph-mgr-diskprediction-cloud from package list to be installed
	qa/packages: do not install ceph-mgr-diskprediction-cloud by default
	ceph.spec.in: add runtime deps for mgr-diskprediction-cloud

Reviewed-by: Sage Weil <sage@redhat.com>
2019-12-24 08:17:35 -06:00
Sage Weil
dbc3e83ce5 Merge PR #32407 into master
* refs/pull/32407/head:
	qa/suites/rados: move cephadm_orchestrator to el8

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-12-23 17:55:41 -06:00
Sage Weil
5d8635d0d3 Merge PR #32377 into master
* refs/pull/32377/head:
	qa/suites/rados/thrash-old-clients: configure mons in terms of addrvecs
	qa/suites/rados/thrash-old-clients: hammer: fix package list
	qa/tasks/cephadm: set .conf to cluster config object
	qa/tasks/cephadm: archive /var/log/ceph logs too (not just cluster dir)
	qa/tasks/cephadm: client keyring
	qa/tasks/cephadm: setup thrashers ctx item
	qa/tasks/ceph_manager: asok commands via cephadm shell
	qa/suites/rados/thrash-old-clients: stick to el7
	qa/tasks/cephadm: check cluster log; support log-whitelist
	qa/suites/rados/thrash-old-clienets: python-foo to python3-foo
	qa/suites/rados/thrash-old-clients: add new exclude_packages
	qa/suites/rados/thrash-old-clients: use cephadm
	mon/ConfigMonitor: make legacy mon addr/port parseable by legacy code

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-12-23 17:54:30 -06:00
Sage Weil
c14234efd9 qa/suites/rados/thrash-old-clients: configure mons in terms of addrvecs
This is more explicit.  More importantly, the 'mon update' command
can't handle an "ip:port"; it wants either a CIDR, bare IP, or addrvec.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-23 16:35:09 -06:00
Sage Weil
365fa583d0 qa/suites/rados/thrash-old-clients: hammer: fix package list
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-23 16:35:09 -06:00
Sage Weil
74cf76e9e9 qa/suites/rados: move cephadm_orchestrator to el8
The python3-remoto dependency does not exist on 18.04 (or any ubuntu or
debian AFAICS).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-23 14:24:58 -06:00
Sage Weil
2ae3f66263 qa/suites/rados/cephadm: enable mgr debugging
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-23 10:01:35 -06:00
Sage Weil
a3a1e3e8ac qa/suites/rados/thrash-old-clients: stick to el7
Old distros don't have packages for bionic.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-21 19:53:38 -06:00
Sage Weil
50294ecc65 qa/suites/rados/thrash-old-clienets: python-foo to python3-foo
Jewel and Hammer don't have python-rgw tho.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-21 19:53:38 -06:00
Sage Weil
efc3df1160 qa/suites/rados/thrash-old-clients: add new exclude_packages
Due to 6f5fb95408

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-21 19:53:38 -06:00
Sage Weil
1c7bd1d1f9 qa/suites/rados/thrash-old-clients: use cephadm
- deploy cluster with cephadm so we can run a octopus+ cluster and also
  install client packages that are ancient.
- move client.2 back onto the third node, since packages no longer
  conflict.
- test on centos 7.x (i picked 6), since the old releases all built on
  that release.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-21 19:53:38 -06:00
Sage Weil
d96b6fd1c5 qa/quites/rados/singleton-flat/valgrind-leaks: specify centos8
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-21 09:03:55 -06:00
Sage Weil
47350be466 qa/suites/rados: test cephadm on centos and ubuntu both
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-21 09:03:55 -06:00
Samuel Just
bdc3ed252e
Merge pull request #32381 from athanatos/sjust/wip-read-from-replica-py2
osd: propagate mlcod to replicas and fix problems with read from replica

Reviewed-by: Sage Weil <sage@redhat.com>
2019-12-20 13:07:56 -08:00
Sage Weil
4a566bdc2d Merge PR #32362 into master
* refs/pull/32362/head:
	qa/packages/packages: el8 has granular -debuginfo
	qa/tasks/cbt: include py2 deps on ubuntu for now
	src/test: misc python -> python3
	qa/suites/rados/singleton-flag/valgrind-leaks: run on latest centos
	qa/workunits: env python -> env python3
	qa/suites/rados/dashboard/tasks/dashboard: whitelist OSDMAP_FLAGS
	qa/suites/rados/verify: ping to specific centos
	qa/workunits/rados/test_envlibrados_rocksdb: enable el8 PowerTools
	qa/workunits/rados/test_pool_access.sh: python -> python3
	mgr/crash: fix signature for py3

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
2019-12-20 09:29:47 -06:00
Sage Weil
c87a76096e qa/suites/rados/singleton-flag/valgrind-leaks: run on latest centos
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 07:17:10 -06:00
Sage Weil
fb60e71a23 qa/suites/rados/dashboard/tasks/dashboard: whitelist OSDMAP_FLAGS
"2019-12-19T20:42:43.020748+0000 mon.b (mon.0) 2771 : cluster [WRN] Health check failed: noout flag(s) set (OSDMAP_FLAGS)" in cluster log

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 07:17:10 -06:00
Sage Weil
344ff7f0ef qa/suites/rados/verify: ping to specific centos
The simple

 os_type: centos

in valgrind.yaml doesn't pick a particular centos, and we end up with
the teuthology default (currently 7.6).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 07:17:10 -06:00
Sage Weil
2ba0ff7117 qa/suites/rados/thrash[-erasure-code]: add misc -{localized,balanced}.yaml jobs
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-19 17:35:36 -08:00
Patrick Donnelly
4562823a19
qa: add .qa link
Continuation of 716db6e2fd.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-12-19 14:31:09 -08:00
Sage Weil
ad1a21712a Merge PR #32355 into master
* refs/pull/32355/head:
	qa/suites/rados/perf: run on ubuntu

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-12-19 16:20:45 -06:00
Sage Weil
922ba8bb4e qa/suites/rados/thrash-old-clients: centos -> ubuntu
We can't upgrade packages from el7 to el8, so do this on ubuntu 18.04.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-19 12:16:00 -06:00
Sage Weil
4dcb7dcc6a qa/suites/rados/perf: run on ubuntu
pdsh and collectl packages don't seem to exist on el8.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-19 12:03:10 -06:00
Kefu Chai
4148ff42b5 qa: no need to exclude ceph-mgr-diskprediction-cloud from package list to be installed
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-12-17 21:52:18 +08:00
Tatjana Dehler
8d1869a2bd mgr/dashboard: reactivate dashboard test suites
Reactivate the dashboard test suites that were commented
out in https://github.com/ceph/ceph/pull/30864 because
https://tracker.ceph.com/issues/41538 has been resolved
in the meanwhile.

Fixes: https://tracker.ceph.com/issues/42652
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2019-12-13 09:11:33 +01:00
Ali Maredia
73d9131839 qa: add force-branch to suites running s3readwrite & s3roundtrip tasks
Signed-off-by: Ali Maredia <amaredia@redhat.com>
2019-12-12 16:09:07 -05:00
Sage Weil
6c4541bb28 qa/tasks/ceph2 -> cephadm
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:24 -06:00
Sage Weil
cd1c05acbb mgr/ssh -> mgr/cephadm
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:24 -06:00
Sage Weil
137fa64e12 qa: rename ceph-daemon tests -> cephadm
Also move the workunit to a better location.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Sage Weil
7d63071d4e mgr/ssh,qa/tasks/ceph2: fix mode to be cephadm-package (vs root)
At the same time align the option names with ceph2.py, yay.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Sage Weil
c8750b7066 files,rpm,deb: rename ceph-daemon -> cephadm
This is just renaming the files and adjusting the packages.  Lots of
cleanup to do still.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Jason Dillaman
19272432a2 mgr/dashboard: fix broken rbd-mirror dashboard API test case
Fixes: https://tracker.ceph.com/issues/42512
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2019-12-05 09:32:41 -05:00
Sage Weil
123338acc3 qa/suites/rados/ssh: only install ceph-daemon for packaged mode
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-03 02:26:18 +00:00
Sage Weil
3342436177 tasks/ceph2: add support for packaged ceph-daemon
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-02 21:19:09 +00:00
Sage Weil
61ba2d7b66 Merge PR #31677 into master
* refs/pull/31677/head:
	qa/standalone/ceph-helpers.sh: remove osd down check
	qa/standalone/ceph-helpers.sh: destroy_osd: mark osd down
	osd: add osd_fast_shutdown option (default true)

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-11-25 08:54:45 -06:00
Sage Weil
a7542dcf6b Merge PR #31502 into master
* refs/pull/31502/head:
	qa/tasks/ceph2: get ceph-daemon from same place as ceph
	qa/tasks/ceph2: use safe_while
	qa/tasks/ceph2: pull image using sha1
	qa/tasks/ceph2: docker needs quay.io/ prefix for image name
	qa/workunits/rados/test_python: make sure rbd pool exists
	qa/suites/rados/ssh: new tests!
	qa/tasks/ceph2: pull ceph-ci/ceph:$branch
	qa/tasks/ceph2: register_daemons after pods start
	qa/tasks/ceph2: fix conf
	qa/tasks/ceph2: add restart
	qa/tasks/ceph2: pass ceph-daemon path to DaemonState
	qa/tasks/ceph2: tolerate no mdss or 1 mgr
	qa/tasks/ceph: replace wait_for_osds_up with manager.wait_for_all_osds_up
	qa/tasks/ceph: wait-until-healthy
	qa/tasks/ceph2: set up managers
	qa/tasks/ceph2: use seed ceph.conf
	qa/tasks/ceph: healthy: use manager helpers (instead of teuthology/misc ones)
	qa/tasks/ceph2: name mds daemons
	qa/tasks/ceph2: fix osd ordering
	qa/tasks/ceph2: start up mdss
	qa/tasks/ceph2: set up daemon handles and use them to stop
	qa/tasks/ceph2: make it multicluster-aware
	qa/tasks/ceph2: can bring up mon, mgr, osds!
	qa/tasks/ceph2: basic task to bring up cluster with ceph-daemon and ssh

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-11-22 15:28:17 -06:00
Sage Weil
71bc236588 Merge PR #31747 into master
* refs/pull/31747/head:
	qa/suites/rados/singleton-nomsgr/all/balancer: whitelist PG_AVAILABILITY

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-11-21 11:49:35 -06:00
Sage Weil
3d9686405c qa/suites/rados/ssh: new tests!
Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-21 10:46:54 -06:00
Sage Weil
82c2320fbb qa/suites/rados/singleton-nomsgr/all/balancer: whitelist PG_AVAILABILITY
Balancer triggers peering, which may make PGs briefly go inactive--when
they possibly haven't been active yet.  E.g.,

    "PG_AVAILABILITY": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Reduced data availability: 3 pgs inactive, 3 pgs peering",
            "count": 6
        },
        "detail": [
            {
                "message": "pg 2.6 is stuck peering since forever, current state peering, last acting [2,0]"
            },
            {
                "message": "pg 2.1c is stuck peering since forever, current state peering, last acting [2,1]"
            },
            {
                "message": "pg 2.7a is stuck peering since forever, current state peering, last acting [2,0]"
            }
        ]
    }

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-19 20:38:08 -06:00
Sage Weil
31b7816e94 qa/suites/rados/thrash-old-clients: skip TestClsRbd.mirror
Older versions have this test and fail it.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-19 20:11:55 -06:00
Kefu Chai
153311a196 qa/suites/rados: whitelist health warnings
in cephtool/test.sh, we

ceph fs set cephfs inline_data {1,0}

so the health check fails when the test ends, like

mon.a (mon.0) 3498 : cluster [WRN] Health check failed: 1 filesystem
with deprecated feature inline_data (FS_INLINE_DATA_DEPRECATED)" in
cluster log

so, before we remove the test, we need to whitelist this warning

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-11-18 22:23:08 +08:00
Sage Weil
cf352c3ac0 osd: add osd_fast_shutdown option (default true)
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately.  This has a few important benefits:

 - We immediately stop responding (binding) to any sockets, which means
   other OSDs will immediately decide we are down (and dead!).  This
   minimizes IO interruption.
 - We avoid the complex "clean" shutdown process, which is historically a
   source of bugs.

In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind.  Set
this option to false for valgrind QA runs so we can still do that.

Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out.  This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on.  If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-15 09:31:50 -06:00
Patrick Donnelly
19a08227fb
Merge PR #30890 into master
* refs/pull/30890/head:
	mgr: invoke plugin shutdown on SIG{TERM,INT} signals.
	mgr/volumes: guard volume delete by waiting for pending ops
	mgr/volumes: cleanup libcephfs handles when stopping
	Revert "qa/suites/rados/mgr/tasks/module_selftest: whitelist mgr client getting backlisted"

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-11-08 10:43:46 -08:00
Sage Weil
5def1df5e8 Merge PR #31064 into master
* refs/pull/31064/head:
	test: Test balancer module commands
	mgr: Improve balancer module status
	mgr: Release GIL before calling OSDMap::calc_pg_upmaps()

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-11-07 10:57:56 -06:00
Lenz Grimmer
fe8f786b6e
mgr/dashboard: add missing test_orchestrator suite (#31198)
mgr/dashboard: add missing test_orchestrator suite

Reviewed-by: Kiefer Chang <kiefer.chang@suse.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
2019-11-06 14:35:43 +00:00
Sebastian Wagner
157fb06fac mgr/orchestrator: check for DEVICE_{IDENT|FAULT}_ON
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2019-11-05 13:02:29 +01:00