Commit Graph

62 Commits

Author SHA1 Message Date
Greg Farnum
d02625331c Merge remote-tracking branch 'origin/master' into wip-stretch-mode 2020-09-14 02:32:19 +00:00
Sage Weil
c7244e7aad misc language changes: whitelist -> ignore etc
Signed-off-by: Sage Weil <sage@newdream.net>
2020-08-24 19:53:08 +00:00
Sage Weil
2ee9365d0b qa: log-whitelist -> log-ignorelist
Signed-off-by: Sage Weil <sage@newdream.net>
2020-08-24 19:53:08 +00:00
Greg Farnum
39d71f7841 test: add a mon_election directory to the rados and upgrade suites
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2020-07-08 04:26:03 +00:00
Kefu Chai
a2d1eaeb3f qa/suites/*/rados_cls_all.yaml: load all classes
the intention to add the whitelist was to test "sdk" class, but if we
add new classes to the list, and add tests exercising them, the tests
fail if we fail to update these `rados_cls_all.yaml` accordingly.

so in this change, the list is now '*' which allows OSD to load all
classes found in the specified directory

Fixes: https://tracker.ceph.com/issues/45113
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-20 18:58:35 +08:00
Sage Weil
7c19c1534b qa/suites/rados/verify/validater/valgrind: tolerate SLOW_OPS
Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-17 19:32:42 -05:00
Sage Weil
baeb051910 qa/suites/rados/verify/validater/valgrind: less bluestore logging
Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-17 19:32:42 -05:00
Sage Weil
4fda9d50f0 qa/suites/rados/verify/validater: increase heartbeat grace
Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-17 19:32:42 -05:00
Sage Weil
0bd14ab080 Revert "qa/suites/rados/verify: debug_ms = 1, osd_heartbeat_grace = 60"
This reverts commit 4f742f200df6c91db87bfee1109c37fad3c0548b.

This was in the wrong file.. see valgrind.yaml

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-17 19:32:42 -05:00
Sage Weil
12105ed9d7 Revert "qa/suites/rados/verify/validator/valgrind: debug refs = 5"
This reverts commit 65e81e6eb4f136bf21b67e5de10ab49f028f9e95.

This slows things down too much with valgrind.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-17 19:32:42 -05:00
Sage Weil
40a7bcea70 qa/suites/rados/verify: set ALLOW_TIMEOUTS for workunits
When running under valgrind (and thrashing) things can be slow.  Tell
tests in case they need to tolerate timeouts.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-17 18:24:12 -05:00
Sage Weil
4f742f200d qa/suites/rados/verify: debug_ms = 1, osd_heartbeat_grace = 60
The rados api tests are failing WatchNotify because the OSDs are so
heavily lagged.. in large part due to the high debug level of debug_ms=20
and debug_osd=25.  Reduce that.

Also increase the heartbeat grace so slow valgrind-y osds don't get marked
down.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-11 06:57:52 -05:00
Sage Weil
1400b35858 qa/suites/rados/verity/tasks/mon_recovery: whitelist SLOW_OPS
The mon can see slow ops when thrashing.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-01 07:58:11 -06:00
Sridhar Seshasayee
e527067666 qa: Whitelist 'slow request' within a bunch of tests
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2020-02-24 19:59:56 +05:30
Sage Weil
8e3eb592b0 qa/suites/rados/verify: debug monc = 20
Hunting https://tracker.ceph.com/issues/43882

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 09:53:41 -06:00
Sage Weil
344ff7f0ef qa/suites/rados/verify: ping to specific centos
The simple

 os_type: centos

in valgrind.yaml doesn't pick a particular centos, and we end up with
the teuthology default (currently 7.6).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 07:17:10 -06:00
Sage Weil
cf352c3ac0 osd: add osd_fast_shutdown option (default true)
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately.  This has a few important benefits:

 - We immediately stop responding (binding) to any sockets, which means
   other OSDs will immediately decide we are down (and dead!).  This
   minimizes IO interruption.
 - We avoid the complex "clean" shutdown process, which is historically a
   source of bugs.

In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind.  Set
this option to false for valgrind QA runs so we can still do that.

Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out.  This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on.  If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-15 09:31:50 -06:00
Sage Weil
71d74aa8c6 qa: more tries for mon tell when injecting msgr failures
With failure injection the default 2 tries isn't quite enough

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-11 14:16:42 -05:00
David Zafman
fdf93add0b
Merge pull request #30714 from dzafman/wip-41743
test: Ignore OSD_SLOW_PING_TIME* if injecting socket failures

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-10-04 18:28:48 -07:00
David Zafman
ded58ef91d test: Ignore OSD_SLOW_PING_TIME* if injecting socket failures
Fixes: https://tracker.ceph.com/issues/41743

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-10-03 09:09:10 -07:00
Sage Weil
52d706c75f qa/suites/rados/verify: whitelist MON_DOWN when using valgrind
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-29 10:27:01 -05:00
Sage Weil
e79dc454db qa/suites: disable valgrind leak checks on ceph-mgr
We've disabled the "clean" shutdown in ceph-mgr due to
https://tracker.ceph.com/issues/38621

Until then, no valgrind leak checks!

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-07 13:03:28 -06:00
Josh Durgin
d45f18119b qa/suites: remove mon kv backend options
rocksdb is the default, leveldb is not recommended at this point, so drop it.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2019-02-08 16:58:44 -05:00
Sage Weil
65e81e6eb4 qa/suites/rados/verify/validator/valgrind: debug refs = 5
If we detect a leak, let's include logging so we can find it.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-02-07 12:10:34 -06:00
Sage Weil
d518eb6cac qa/msgr: move msgr factet into generic re-usable dir
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:38 -06:00
Sage Weil
03908113b4 qa/suites: valgrind ceph-mgr too
Signed-off-by: Sage Weil <sage@redhat.com>
2018-11-09 08:52:07 -06:00
Casey Bodley
d897b92878 osd: remove statelog from osd_class_load_list config
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2018-09-19 10:32:55 -04:00
Sage Weil
44de03d5e6 qa/suites: test pg merging
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 12:09:05 -05:00
Patrick Donnelly
b39f9d06dc
qa: fix symlinks indirectly pointing at qa to .qa
Building on the previous commit.

Command used:

$ find suites/ -type l -and -not -name .qa -execdir ~/fix.sh {} \;

fix.sh:
    #!/bin/bash

    link="$(readlink "$1")"

    echo $link
    dirlink="$(dirname "$link")"
    baselink="$(basename "$link")"

    while true; do
        echo $dirlink
        if [ "$dirlink" -ef ~/ceph/qa ]; then
            ln -nsf ".qa/$baselink" "$1"
            exit
        else
            baselink="$(basename "$dirlink")/$baselink"
            dirlink="$(dirname "$dirlink")"
            if [ "$dirlink" -ef . ]; then
                break
            fi
        fi
    done

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-26 11:48:38 -07:00
Patrick Donnelly
716db6e2fd
qa: add .qa helper link
This utilizes the recent feature in teuthology [1] to skip hidden files in
suites when building the job matrix.

Idea of this change is to enable referring to the top-level qa directory in a
position-independent way such that copies of a suite to another location do not
break any symlinks.

[1] https://github.com/ceph/teuthology/pull/1185

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-26 11:33:48 -07:00
Kefu Chai
c237d0befb qa/suites/rados/verify: remove random-distro$
the distro specified by random-distro$ will be overwrited by the one
specfied by valgrind.yaml. and teuthology-suite will give

KeyError: '16.04 not a centos version or codename'

when scheduling a suite involving the facets above. also, i think it's
of not much value to run valgrind/lockdep with different distros.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-05-17 19:11:14 +08:00
Sage Weil
664af17b30
Merge pull request #21932 from yuriw/wip-yuriw-add-dollar-rgw
tests/qa: Adding $ distro mix - rgw

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2018-05-15 16:15:05 -05:00
Casey Bodley
7da0fe2832
Merge pull request #21680 from cbodley/wip-rm-replica-log
rgw: remove all traces of cls replica_log

Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2018-05-10 10:26:55 -04:00
Yuri Weinstein
c79a74a33c tests/qa: adding rados/.. dirs
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
2018-05-08 16:00:05 -07:00
Casey Bodley
f9ee48caa2 rgw: remove all traces of cls replica_log
replica log was for the old radosgw sync agent, which was replaced with
multisite v2 in jewel. no sense in continuing to maintain and test it

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2018-04-26 11:40:11 -04:00
Sage Weil
e331311b87 qa/suites/rados/verify/tasks/rados_api_tests: whitelist OBJECT_MISPLACED
The api tests do some splits, which can move data.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-25 10:33:52 -05:00
Kefu Chai
cdcbd47e1e qa/suite: whitelist PG_AVAILABILITY in rados_api_tests.yaml
pg will be created when increasing pgp-num and pg-num. so at that
moment, PG_AVAILABILITY is reported. so whitelist it in all tests which
run rados/test.sh. that script exercises ceph_test_rados_api_list.

Fixes: http://tracker.ceph.com/issues/23763
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-04-24 10:16:12 +08:00
Gregory Farnum
6d2e4c9b7b
Merge pull request #19973 from liewegas/wip-peering-fast-dispatch
osd: fast dispatch of peering events and pg_map + osd sharded wq refactor

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2018-04-06 11:48:11 -07:00
Joao Eduardo Luis
3997eed4db qa: enable mon osdmap pruning on 'rados/' suites
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
2018-04-06 04:18:23 +01:00
Sage Weil
26f00dd67c qa/suites: mon warn on pool no app = false for api tests
Among other things, the list.cc tests set pg_num which waits for cluster
healthy.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Kefu Chai
f5f2ced624 mgr/PGMap: drop REQUEST_{SLOW,STUCK} HEALTH_WARNs in mimic
SLOW_OPS unifies both of them since mimic

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-11-23 17:41:47 +08:00
Kefu Chai
4a1f2a5c78 qa: silence SLOW_OPS,PENDING_CREATING_PGS warnings
this is an intermediate step to deprecate REQUEST_SLOW warnings.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-11-23 13:59:42 +08:00
Sage Weil
d8dead1aaf qa/suites/rados: remove luminous tests
- snapdir conversion (at-end) stuff
- merge luminous-specific collections that avoided the above back
into their normal locations

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-28 23:10:32 -04:00
Sage Weil
41e5a85308 qa/suites/rados/verify/validater/valgrind: whitelist PG_
Peering might be slow due to valgrind.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-12 14:18:59 -04:00
Sage Weil
f683d2d374 qa/suites: change fixed-2.yaml users to get 4 openstack disks
Follow-up for 4203c4f88785d8149235dd34d37f87e471084d71

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-07 11:56:33 -04:00
Kefu Chai
d12c51ca91 qa/suites: escape the parenthesis of the whitelist text
so we can avoid the warnings like

grep: Unmatched ( or \(

because we pass the whitelisted string to `egrep -v "$1"` directly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-01 21:54:44 +08:00
Sage Weil
e398fd4ee4 qa/suites: more whitelisting
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 09:31:24 -04:00
Sage Weil
326019a466 qa/suites/rados: whitelist various tests
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-25 22:29:07 -04:00
John Spray
343e1a4281 qa: update whitelist for "wrongly marked me down"
Signed-off-by: John Spray <john.spray@redhat.com>
2017-07-24 14:54:46 +01:00
Sage Weil
960f00071f qa/suites: disable mon crush smoke test with valgrind
Valgrind runs itself on forked children, and does its cleanup when they
complete, and this is slow... slow enough that it frequently makes the
test time out.

Valgrind let's you ignore child *processes* that you exec, but I can't
find a way to skip forked children in the same address space.

Work around this by skip this validation when running under valgrind.

Fixes: http://tracker.ceph.com/issues/20602
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-14 11:51:47 -04:00