Commit Graph

26 Commits

Author SHA1 Message Date
Sage Weil
07badf051d qa/suites/rados/multimon: whitelist SLOW_OPS while thrashing mons
The mons may have slow ops.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-18 16:44:23 -05:00
Sage Weil
fd00136eb3 qa/suites/rados/multimon: skew clocks 2s (< paxos lease)
If the leader is the one with the accurate clock, it can still
form quorum, but if the leader has the skewed clock, all other mons appear
skewed from its perspective and no quorum is formed.  This leads to
intermittent failures, depending on the non-deterministic teuthology
deployment order and how the mon IPs sort.

Fix by reducing the skew.  This is enough skew to trigger a warning, but
not enough that it will break quorum.  This ensures that the parts of the
teuthology test that issue random mon commands won't fail (e.g., 'ceph osd
dump').

Fixes: http://tracker.ceph.com/issues/40112
Signed-off-by: Sage Weil <sage@redhat.com>
2019-06-03 10:49:02 -05:00
Sage Weil
8d137b9345 qa/suites/rados/multimon: create_rbd_pool: false
Signed-off-by: Sage Weil <sage@redhat.com>
2019-05-30 15:43:48 -05:00
Sage Weil
1991faaafe qa/suites/rados/multimon: no osds when skewing clock
Sometimes the clock skew prevents the mon quorum from making progress and
processing the osd boot messages.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-05-30 13:05:54 -05:00
Josh Durgin
d45f18119b qa/suites: remove mon kv backend options
rocksdb is the default, leveldb is not recommended at this point, so drop it.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2019-02-08 16:58:44 -05:00
Sage Weil
af435783b4 qa/suites/rados/multimon/tasks/mon_recovery: whitelist PG_AVAILABILITY
The mgr creates a pool for device health, and mons may be thrashing and
make peering slow.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-11 09:43:07 -06:00
Sage Weil
d518eb6cac qa/msgr: move msgr factet into generic re-usable dir
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:38 -06:00
Patrick Donnelly
b39f9d06dc
qa: fix symlinks indirectly pointing at qa to .qa
Building on the previous commit.

Command used:

$ find suites/ -type l -and -not -name .qa -execdir ~/fix.sh {} \;

fix.sh:
    #!/bin/bash

    link="$(readlink "$1")"

    echo $link
    dirlink="$(dirname "$link")"
    baselink="$(basename "$link")"

    while true; do
        echo $dirlink
        if [ "$dirlink" -ef ~/ceph/qa ]; then
            ln -nsf ".qa/$baselink" "$1"
            exit
        else
            baselink="$(basename "$dirlink")/$baselink"
            dirlink="$(dirname "$dirlink")"
            if [ "$dirlink" -ef . ]; then
                break
            fi
        fi
    done

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-26 11:48:38 -07:00
Patrick Donnelly
716db6e2fd
qa: add .qa helper link
This utilizes the recent feature in teuthology [1] to skip hidden files in
suites when building the job matrix.

Idea of this change is to enable referring to the top-level qa directory in a
position-independent way such that copies of a suite to another location do not
break any symlinks.

[1] https://github.com/ceph/teuthology/pull/1185

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-26 11:33:48 -07:00
David Zafman
918921ab2f test: Need to escape parens in log-whitelist for grep
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-05-21 09:47:59 -07:00
Yuri Weinstein
9f2c485942 tests/qa: adding rados/.. dirs
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
2018-05-11 14:03:15 -07:00
Kefu Chai
acc08559ce qa/suites: whitelist SLOW_OPS
Fixes: http://tracker.ceph.com/issues/23495
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-04-10 19:25:47 +08:00
Sage Weil
ef7eaa48be qa/suites/rados: fewer msgr failures
500 is a bit much... e.g., enough to hit timeouts forming mon quorum.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-15 08:39:28 -05:00
Sage Weil
431d1482ff qa/suites/rados/thrash: extend mgr beacon grace when many msgr failures injected
Fixes: http://tracker.ceph.com/issues/21147
Signed-off-by: Sage Weil <sage@redhat.com>
2017-11-29 10:29:52 -06:00
Sage Weil
12007044b1 qa/suites/rados/multimon/tasks/mon_lock_with_skew: whitelist PG_
Default pool pgs not up because mons too broken for OSDs to peer.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-12 14:15:15 -04:00
Sage Weil
ad23d7dc1f qa/suites/rados/multimon: whitelist mgr down vs clock skew test
Clock skew might make us fail the mgr.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-11 13:42:02 -04:00
Kefu Chai
d12c51ca91 qa/suites: escape the parenthesis of the whitelist text
so we can avoid the warnings like

grep: Unmatched ( or \(

because we pass the whitelisted string to `egrep -v "$1"` directly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-01 21:54:44 +08:00
Sage Weil
93de19adcf qa: whitelist health warnings
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-12 12:52:03 -04:00
Sage Weil
63f97ddcf6 qa/suites/rados: whitelist health warnings
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-12 12:52:02 -04:00
Sage Weil
5f5f370925 qa/suites/rados: switch require-luminous facet to use full_sequential_finally
This lets us run multiple cleanup steps right before ceph
teardown.

Note that we drop the facet from multimon/ because it
doesn't factor out cluster creation before this step
properly.  That's fine because the require_luminous
cleanup shouldn't be related to the multimon tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-05-05 13:39:14 -04:00
Sage Weil
4857f51e68 qa/suites/rados: expand other collections with no-require-luminous
Signed-off-by: Sage Weil <sage@redhat.com>
2017-04-14 11:45:05 -04:00
Sage Weil
14e7d6351a Merge pull request #14198 from liewegas/wip-fs
qa/suites: drop 'fs' facet, and add 'objectstore' facet where missing

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-04-08 13:21:03 -05:00
Sage Weil
271a7588b5 qa/suites: run mgr daemon(s)
Everything up upgrade/, which will be slightly tricky.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:26 -04:00
Sage Weil
73981ad807 qa/suites: remove 'fs' facet from all tests
The objectstore facet now covers bluestore, filestore(xfs),
and filestore(btrfs).

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-28 11:57:21 -04:00
Adam C. Emerson
750ad8340c common: Unskew clock
In preparation to deglobalizing CephContext, remove the CephContext*
parameter to ceph_clock_now() and ceph::real_clock::now() that carries
a configurable offset.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
2016-12-22 13:55:37 -05:00
Sage Weil
c01f2ee0e2 move ceph-qa-suite dirs into qa/ 2016-12-14 11:29:55 -06:00