Commit Graph

33 Commits

Author SHA1 Message Date
Patrick Donnelly
50c39dc007
qa: split fs begin task
To allow switching to cephadm task.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2022-02-02 10:44:35 -05:00
Patrick Donnelly
83d252cc30 qa: fold frag confs into conf/mds.yaml
These overrides are standard for all configurations. The config to
enable fragmentation is also long removed.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-26 07:14:38 -07:00
Patrick Donnelly
ec1b82fd24
qa: skip exit-on-first-failure option for valgrind on ubuntu
The valgrind version is too old.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-03-03 09:30:21 -08:00
Patrick Donnelly
5faf0ee0f3
mds,qa: exit instead of respawn under valgrind
valgrind can't handle execve of /proc/self/exe:

    2021-02-27T05:52:37.813 INFO:tasks.ceph.mds.d.smithi073.stderr:==00:01:03:20.556 41218== execve(0x18546740(/proc/self/exe), 0x18546670, 0x133ef310) failed, errno 2
    2021-02-27T05:52:37.813 INFO:tasks.ceph.mds.d.smithi073.stderr:==00:01:03:20.556 41218== EXEC FAILED: I can't recover from execve() failing, so I'm dying.
    2021-02-27T05:52:37.813 INFO:tasks.ceph.mds.d.smithi073.stderr:==00:01:03:20.556 41218== Add more stringent tests in PRE(sys_execve), or work out how to recover.

So configure the MDS to just exit so it can be restarted by QA infra (the
daemon watchdog).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-03-03 09:30:21 -08:00
Patrick Donnelly
1d85c9d535
qa: ignore all slow request warnings
Generalize the ignorelist for:

    2021-02-27T05:54:27.644 INFO:teuthology.orchestra.run.smithi002.stdout:2021-02-27T05:20:24.513041+0000 mds.d (mds.0) 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 183.680676 secs

From: /ceph/teuthology-archive/pdonnell-2021-02-26_23:40:39-fs-wip-pdonnell-testing-20210226.181017-distro-basic-smithi/5917580/teuthology.log

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-03-03 09:30:21 -08:00
Patrick Donnelly
dcac1dbe62
qa: add new mds beacon grace mon config
Otherwise the mons don't observe it.

Fixes: https://tracker.ceph.com/issues/49507
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-03-03 09:30:21 -08:00
Patrick Donnelly
6093b3a581
qa: run fs:verify on all distros
It's believed this is no longer a problem now that we use tcmalloc.

Fixes: https://tracker.ceph.com/issues/49391
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-02-25 13:27:24 -08:00
Sage Weil
dc64ccf063 qa/suites: do not use notcmalloc flavor
teuthology now knows how to run valgrind against a tcmalloc binary

Signed-off-by: Sage Weil <sage@newdream.net>
2021-02-18 10:26:28 -06:00
Patrick Donnelly
7f449dd09f
qa: merge multimds:verify with fs:verify
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Fixes: https://tracker.ceph.com/issues/48121
2021-01-07 12:55:25 -08:00
Patrick Donnelly
36d731c6f3
qa: only run valgrind on cephfs daemons
OSD valgrind slows things down too much to the point where some tasks
fail to complete.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-07 12:55:24 -08:00
Xiubo Li
0422673b61 qa/cephfs: add session_timeout option support
When the mds revoking the Fwbl caps, the clients need to flush
the dirty data back to the OSDs, but the flush may make the OSDs
to be overloaded and slow, which may take more than 60 seconds to
finish. Then the MDS daemons will report the WRN messages.

For the teuthology test cases, let's just increase the timeout
value to make it work.

Fixes: https://tracker.ceph.com/issues/47565
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2020-10-23 14:27:37 +08:00
Sage Weil
2ee9365d0b qa: log-whitelist -> log-ignorelist
Signed-off-by: Sage Weil <sage@newdream.net>
2020-08-24 19:53:08 +00:00
Patrick Donnelly
1fc33c54f8
qa: specify random distros in multimds
Note: the name is important so that kclient mount can override the
distro setting.

Fixes: https://tracker.ceph.com/issues/43968
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-02-05 12:36:50 -08:00
Patrick Donnelly
2cdb2972cd
qa: define centos version for fs:verify
Otherwise it uses the teuthology default of 7.6.

Fixes: https://tracker.ceph.com/issues/43516
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-01-07 13:20:00 -08:00
Sage Weil
cf352c3ac0 osd: add osd_fast_shutdown option (default true)
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately.  This has a few important benefits:

 - We immediately stop responding (binding) to any sockets, which means
   other OSDs will immediately decide we are down (and dead!).  This
   minimizes IO interruption.
 - We avoid the complex "clean" shutdown process, which is historically a
   source of bugs.

In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind.  Set
this option to false for valgrind QA runs so we can still do that.

Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out.  This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on.  If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-15 09:31:50 -06:00
Patrick Donnelly
7b520755ce
qa: extend MDS heartbeat grace for valgrind
Valgrind makes the MDS slowwwww. The newish mds_heartbeat_grace config allows
us to keep sending beacons to the mons even if the internal heartbeat is slow.
This avoids the laggy messages which are useful to grep for unrelated messaging
issues.

Fixes: http://tracker.ceph.com/issues/38723
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-03-13 09:18:32 -07:00
Sage Weil
e79dc454db qa/suites: disable valgrind leak checks on ceph-mgr
We've disabled the "clean" shutdown in ceph-mgr due to
https://tracker.ceph.com/issues/38621

Until then, no valgrind leak checks!

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-07 13:03:28 -06:00
Sage Weil
03908113b4 qa/suites: valgrind ceph-mgr too
Signed-off-by: Sage Weil <sage@redhat.com>
2018-11-09 08:52:07 -06:00
Patrick Donnelly
73fa0efcbb
qa: create common conf for all cephfs suites
This will be followed by removing common CephFS configurations in the
ceph.conf.template in teuthology.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-07-04 13:08:10 -07:00
Patrick Donnelly
b39f9d06dc
qa: fix symlinks indirectly pointing at qa to .qa
Building on the previous commit.

Command used:

$ find suites/ -type l -and -not -name .qa -execdir ~/fix.sh {} \;

fix.sh:
    #!/bin/bash

    link="$(readlink "$1")"

    echo $link
    dirlink="$(dirname "$link")"
    baselink="$(basename "$link")"

    while true; do
        echo $dirlink
        if [ "$dirlink" -ef ~/ceph/qa ]; then
            ln -nsf ".qa/$baselink" "$1"
            exit
        else
            baselink="$(basename "$dirlink")/$baselink"
            dirlink="$(dirname "$dirlink")"
            if [ "$dirlink" -ef . ]; then
                break
            fi
        fi
    done

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-26 11:48:38 -07:00
Patrick Donnelly
716db6e2fd
qa: add .qa helper link
This utilizes the recent feature in teuthology [1] to skip hidden files in
suites when building the job matrix.

Idea of this change is to enable referring to the top-level qa directory in a
position-independent way such that copies of a suite to another location do not
break any symlinks.

[1] https://github.com/ceph/teuthology/pull/1185

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-26 11:33:48 -07:00
Sage Weil
d0732fc96f qa/cephfs: test ec data pool
Signed-off-by: Sage Weil <sage@redhat.com>
2017-10-23 21:11:24 -05:00
Patrick Donnelly
9d348ad8c9
qa: add health whitelist for all fs sub-suites
Fixes: http://tracker.ceph.com/issues/20892

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-03 14:01:28 -07:00
Sage Weil
960f00071f qa/suites: disable mon crush smoke test with valgrind
Valgrind runs itself on forked children, and does its cleanup when they
complete, and this is slow... slow enough that it frequently makes the
test time out.

Valgrind let's you ignore child *processes* that you exec, but I can't
find a way to skip forked children in the same address space.

Work around this by skip this validation when running under valgrind.

Fixes: http://tracker.ceph.com/issues/20602
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-14 11:51:47 -04:00
Sage Weil
c7893283cd do all valgrind runs on centos
We are fighting two issues with valgrind on ubuntu (xenial, yakkety,
and z):

	http://tracker.ceph.com/issues/18126
	http://tracker.ceph.com/issues/20360

Revert this when it is fixed.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-30 09:33:18 -04:00
Greg Farnum
7d33e98bd3 qa: do not restrict valgrind runs to centos
This reverts 693bd23851, which was
added in response to http://tracker.ceph.com/issues/18126. But
we updated the Ubuntu packages in sepia so it should be good to go.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2017-06-23 16:25:16 -04:00
Sage Weil
aa76cf7488 Revert "qa: do not restrict valgrind runs to centos"
This reverts commit 5923961465.

See http://tracker.ceph.com/issues/20360

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-20 17:14:52 -04:00
Greg Farnum
5923961465 qa: do not restrict valgrind runs to centos
This reverts 693bd23851, which was
added in response to http://tracker.ceph.com/issues/18126. But
we updated the Ubuntu packages in sepia so it should be good to go.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2017-05-31 08:37:19 -07:00
John Spray
6369120d63 qa/suites: don't use btrfs for cephfs testing
This change happened a while back, but it got rolled back
when the generic objectstore/ dir had its filestore
entry split out into xfs and btrfs in 208675af.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-24 11:19:55 +01:00
John Spray
131d1bd570 qa: add log whitelists for MDS health messages
Now that we send these to the cluster log, we must
whitelist them in the tests that exercise those
unhealthy states.

Fixes: http://tracker.ceph.com/issues/19551
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-14 05:47:43 -04:00
Sage Weil
73981ad807 qa/suites: remove 'fs' facet from all tests
The objectstore facet now covers bluestore, filestore(xfs),
and filestore(btrfs).

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-28 11:57:21 -04:00
John Spray
76b73befd9 qa: remove simple functional tests from multimds
These were running so few ops that they weren't
giving any meaningful exercise to a multimds
system beyond what we're already covering in
the fs suite.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-02-07 13:51:47 +00:00
Sage Weil
c01f2ee0e2 move ceph-qa-suite dirs into qa/ 2016-12-14 11:29:55 -06:00