* refs/pull/32232/head:
qa: no need to exclude ceph-mgr-diskprediction-cloud from package list to be installed
qa/packages: do not install ceph-mgr-diskprediction-cloud by default
ceph.spec.in: add runtime deps for mgr-diskprediction-cloud
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/32377/head:
qa/suites/rados/thrash-old-clients: configure mons in terms of addrvecs
qa/suites/rados/thrash-old-clients: hammer: fix package list
qa/tasks/cephadm: set .conf to cluster config object
qa/tasks/cephadm: archive /var/log/ceph logs too (not just cluster dir)
qa/tasks/cephadm: client keyring
qa/tasks/cephadm: setup thrashers ctx item
qa/tasks/ceph_manager: asok commands via cephadm shell
qa/suites/rados/thrash-old-clients: stick to el7
qa/tasks/cephadm: check cluster log; support log-whitelist
qa/suites/rados/thrash-old-clienets: python-foo to python3-foo
qa/suites/rados/thrash-old-clients: add new exclude_packages
qa/suites/rados/thrash-old-clients: use cephadm
mon/ConfigMonitor: make legacy mon addr/port parseable by legacy code
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
This is more explicit. More importantly, the 'mon update' command
can't handle an "ip:port"; it wants either a CIDR, bare IP, or addrvec.
Signed-off-by: Sage Weil <sage@redhat.com>
- deploy cluster with cephadm so we can run a octopus+ cluster and also
install client packages that are ancient.
- move client.2 back onto the third node, since packages no longer
conflict.
- test on centos 7.x (i picked 6), since the old releases all built on
that release.
Signed-off-by: Sage Weil <sage@redhat.com>
The simple
os_type: centos
in valgrind.yaml doesn't pick a particular centos, and we end up with
the teuthology default (currently 7.6).
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/31502/head:
qa/tasks/ceph2: get ceph-daemon from same place as ceph
qa/tasks/ceph2: use safe_while
qa/tasks/ceph2: pull image using sha1
qa/tasks/ceph2: docker needs quay.io/ prefix for image name
qa/workunits/rados/test_python: make sure rbd pool exists
qa/suites/rados/ssh: new tests!
qa/tasks/ceph2: pull ceph-ci/ceph:$branch
qa/tasks/ceph2: register_daemons after pods start
qa/tasks/ceph2: fix conf
qa/tasks/ceph2: add restart
qa/tasks/ceph2: pass ceph-daemon path to DaemonState
qa/tasks/ceph2: tolerate no mdss or 1 mgr
qa/tasks/ceph: replace wait_for_osds_up with manager.wait_for_all_osds_up
qa/tasks/ceph: wait-until-healthy
qa/tasks/ceph2: set up managers
qa/tasks/ceph2: use seed ceph.conf
qa/tasks/ceph: healthy: use manager helpers (instead of teuthology/misc ones)
qa/tasks/ceph2: name mds daemons
qa/tasks/ceph2: fix osd ordering
qa/tasks/ceph2: start up mdss
qa/tasks/ceph2: set up daemon handles and use them to stop
qa/tasks/ceph2: make it multicluster-aware
qa/tasks/ceph2: can bring up mon, mgr, osds!
qa/tasks/ceph2: basic task to bring up cluster with ceph-daemon and ssh
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Balancer triggers peering, which may make PGs briefly go inactive--when
they possibly haven't been active yet. E.g.,
"PG_AVAILABILITY": {
"severity": "HEALTH_WARN",
"summary": {
"message": "Reduced data availability: 3 pgs inactive, 3 pgs peering",
"count": 6
},
"detail": [
{
"message": "pg 2.6 is stuck peering since forever, current state peering, last acting [2,0]"
},
{
"message": "pg 2.1c is stuck peering since forever, current state peering, last acting [2,1]"
},
{
"message": "pg 2.7a is stuck peering since forever, current state peering, last acting [2,0]"
}
]
}
Signed-off-by: Sage Weil <sage@redhat.com>
in cephtool/test.sh, we
ceph fs set cephfs inline_data {1,0}
so the health check fails when the test ends, like
mon.a (mon.0) 3498 : cluster [WRN] Health check failed: 1 filesystem
with deprecated feature inline_data (FS_INLINE_DATA_DEPRECATED)" in
cluster log
so, before we remove the test, we need to whitelist this warning
Signed-off-by: Kefu Chai <kchai@redhat.com>
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately. This has a few important benefits:
- We immediately stop responding (binding) to any sockets, which means
other OSDs will immediately decide we are down (and dead!). This
minimizes IO interruption.
- We avoid the complex "clean" shutdown process, which is historically a
source of bugs.
In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind. Set
this option to false for valgrind QA runs so we can still do that.
Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out. This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on. If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.
Signed-off-by: Sage Weil <sage@redhat.com>