the intention to add the whitelist was to test "sdk" class, but if we
add new classes to the list, and add tests exercising them, the tests
fail if we fail to update these `rados_cls_all.yaml` accordingly.
so in this change, the list is now '*' which allows OSD to load all
classes found in the specified directory
Fixes: https://tracker.ceph.com/issues/45113
Signed-off-by: Kefu Chai <kchai@redhat.com>
This reverts commit 4f742f200df6c91db87bfee1109c37fad3c0548b.
This was in the wrong file.. see valgrind.yaml
Signed-off-by: Sage Weil <sage@redhat.com>
This reverts commit 65e81e6eb4f136bf21b67e5de10ab49f028f9e95.
This slows things down too much with valgrind.
Signed-off-by: Sage Weil <sage@redhat.com>
When running under valgrind (and thrashing) things can be slow. Tell
tests in case they need to tolerate timeouts.
Signed-off-by: Sage Weil <sage@redhat.com>
The rados api tests are failing WatchNotify because the OSDs are so
heavily lagged.. in large part due to the high debug level of debug_ms=20
and debug_osd=25. Reduce that.
Also increase the heartbeat grace so slow valgrind-y osds don't get marked
down.
Signed-off-by: Sage Weil <sage@redhat.com>
The simple
os_type: centos
in valgrind.yaml doesn't pick a particular centos, and we end up with
the teuthology default (currently 7.6).
Signed-off-by: Sage Weil <sage@redhat.com>
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately. This has a few important benefits:
- We immediately stop responding (binding) to any sockets, which means
other OSDs will immediately decide we are down (and dead!). This
minimizes IO interruption.
- We avoid the complex "clean" shutdown process, which is historically a
source of bugs.
In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind. Set
this option to false for valgrind QA runs so we can still do that.
Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out. This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on. If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.
Signed-off-by: Sage Weil <sage@redhat.com>
We've disabled the "clean" shutdown in ceph-mgr due to
https://tracker.ceph.com/issues/38621
Until then, no valgrind leak checks!
Signed-off-by: Sage Weil <sage@redhat.com>
This utilizes the recent feature in teuthology [1] to skip hidden files in
suites when building the job matrix.
Idea of this change is to enable referring to the top-level qa directory in a
position-independent way such that copies of a suite to another location do not
break any symlinks.
[1] https://github.com/ceph/teuthology/pull/1185
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
the distro specified by random-distro$ will be overwrited by the one
specfied by valgrind.yaml. and teuthology-suite will give
KeyError: '16.04 not a centos version or codename'
when scheduling a suite involving the facets above. also, i think it's
of not much value to run valgrind/lockdep with different distros.
Signed-off-by: Kefu Chai <kchai@redhat.com>
replica log was for the old radosgw sync agent, which was replaced with
multisite v2 in jewel. no sense in continuing to maintain and test it
Signed-off-by: Casey Bodley <cbodley@redhat.com>
pg will be created when increasing pgp-num and pg-num. so at that
moment, PG_AVAILABILITY is reported. so whitelist it in all tests which
run rados/test.sh. that script exercises ceph_test_rados_api_list.
Fixes: http://tracker.ceph.com/issues/23763
Signed-off-by: Kefu Chai <kchai@redhat.com>
- snapdir conversion (at-end) stuff
- merge luminous-specific collections that avoided the above back
into their normal locations
Signed-off-by: Sage Weil <sage@redhat.com>
so we can avoid the warnings like
grep: Unmatched ( or \(
because we pass the whitelisted string to `egrep -v "$1"` directly.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Valgrind runs itself on forked children, and does its cleanup when they
complete, and this is slow... slow enough that it frequently makes the
test time out.
Valgrind let's you ignore child *processes* that you exec, but I can't
find a way to skip forked children in the same address space.
Work around this by skip this validation when running under valgrind.
Fixes: http://tracker.ceph.com/issues/20602
Signed-off-by: Sage Weil <sage@redhat.com>