The dnsmasq package on centos 8.0 is broken, see
https://tracker.ceph.com/issues/43744
For now, run this test on ubuntu.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/31232/head:
test: test case for openfiletable MAX_ITEMS_PER_OBJ value verification
mds/OpenFileTable: match MAX_ITEMS_PER_OBJ to osd_deep_scrub_large_omap_object_key_threshold
Reviewed-by: Zheng Yan <zyan@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
The cores will make teuthology fail the job--and we don't want them for
this test, where we are deliberately causing crashes.
Fixes: https://tracker.ceph.com/issues/43653
Signed-off-by: Sage Weil <sage@redhat.com>
the hadoop branch rel/release-2.8.5 fails to build with:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:37 min
[INFO] Finished at: 2020-01-14T13:09:02Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-parallel-tests-dirs) on project hadoop-aws: An Ant BuildException has occured: Unable to create javax script engine for javascript
Signed-off-by: Casey Bodley <cbodley@redhat.com>
this was added to test that admin apis forward relevent requests to the
master zone, but radosgw_admin_rest.py tries to create an admin user
with 'radosgw-admin user create'. this fails with:
Please run the command on master zone. Performing this operation on
non-master zone leads to inconsistent metadata between zones
Are you sure you want to go ahead? (requires --yes-i-really-mean-it)
Signed-off-by: Casey Bodley <cbodley@redhat.com>
* refs/pull/32232/head:
qa: no need to exclude ceph-mgr-diskprediction-cloud from package list to be installed
qa/packages: do not install ceph-mgr-diskprediction-cloud by default
ceph.spec.in: add runtime deps for mgr-diskprediction-cloud
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/32377/head:
qa/suites/rados/thrash-old-clients: configure mons in terms of addrvecs
qa/suites/rados/thrash-old-clients: hammer: fix package list
qa/tasks/cephadm: set .conf to cluster config object
qa/tasks/cephadm: archive /var/log/ceph logs too (not just cluster dir)
qa/tasks/cephadm: client keyring
qa/tasks/cephadm: setup thrashers ctx item
qa/tasks/ceph_manager: asok commands via cephadm shell
qa/suites/rados/thrash-old-clients: stick to el7
qa/tasks/cephadm: check cluster log; support log-whitelist
qa/suites/rados/thrash-old-clienets: python-foo to python3-foo
qa/suites/rados/thrash-old-clients: add new exclude_packages
qa/suites/rados/thrash-old-clients: use cephadm
mon/ConfigMonitor: make legacy mon addr/port parseable by legacy code
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
This is more explicit. More importantly, the 'mon update' command
can't handle an "ip:port"; it wants either a CIDR, bare IP, or addrvec.
Signed-off-by: Sage Weil <sage@redhat.com>
The OS image was changed in a9ee4bcf24 from Xenial to Bionic,
but the Bionic image path is incorrect.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
- deploy cluster with cephadm so we can run a octopus+ cluster and also
install client packages that are ancient.
- move client.2 back onto the third node, since packages no longer
conflict.
- test on centos 7.x (i picked 6), since the old releases all built on
that release.
Signed-off-by: Sage Weil <sage@redhat.com>
The simple
os_type: centos
in valgrind.yaml doesn't pick a particular centos, and we end up with
the teuthology default (currently 7.6).
Signed-off-by: Sage Weil <sage@redhat.com>
- This is an ancient swift version
- The tempest tests are newer and show provide similar coverage
- It somehow broke with the py3 transition
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/32278/head:
qa/suites/rgw: disable valgrind for tests that require py2/ubuntu
qa/suites/rgw: disable remaining ragweed test
qa/suites/rgw: pin swift tests to py2/ubuntu
qa/suites/rgw: ragweed on ubuntu
qa/suites: run s3tests on ubuntu
Reviewed-by: Casey Bodley <cbodley@redhat.com>
We cannot do a traditional upgrade (install old package, start cluster,
install new package, ...) because nautilus is el7-only and octopus is
el8-only.
So, do these tests on ubuntu.
Signed-off-by: Sage Weil <sage@redhat.com>
- Ensure the download code for all tasks running
s3-tests is consistent.
- Simplify download code to only use the config
variable 'force-branch' for the branch being
cloned.
- make ceph-master the force-branch for all
suites using s3-tests.
Fixes: https://tracker.ceph.com/issues/43077
Signed-off-by: Ali Maredia <amaredia@redhat.com>
* refs/pull/29421/head:
qa/cephfs: add tests for ACLs
qa/cephfs: allow running tests from xfstests-dev
qa/tasks: add methods to get monitor's sockets
qa/cephfs: don't crash if mountpoint dir is already deleted
vstart_runner.py: set omit_sudo's default value to False
qa/vstart_runner.py: fix get_keyring_path()
qa/cephfs: don't abort if mountpoint is already present
qa/cephfs: allow specifying mountpoint for kernel mounts
qa/cephfs: allow specifying mountpoints for FUSE mounts
qa/vstart_runner.py: allow specifying mountpoint for local FUSE mounts
qa/mount.py: allow setting mountpoint
qa/vstart_runner.py: add a method to create a temporary directory
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* Minor improvements to Vault documentation
* Add teuthology tests for Transit secrets engine
* Add unit tests for KV secrets engine, minor improvements to Transit
secrets engine
* use string_view::npos instead of string::npos
Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com>
Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
* refs/pull/31502/head:
qa/tasks/ceph2: get ceph-daemon from same place as ceph
qa/tasks/ceph2: use safe_while
qa/tasks/ceph2: pull image using sha1
qa/tasks/ceph2: docker needs quay.io/ prefix for image name
qa/workunits/rados/test_python: make sure rbd pool exists
qa/suites/rados/ssh: new tests!
qa/tasks/ceph2: pull ceph-ci/ceph:$branch
qa/tasks/ceph2: register_daemons after pods start
qa/tasks/ceph2: fix conf
qa/tasks/ceph2: add restart
qa/tasks/ceph2: pass ceph-daemon path to DaemonState
qa/tasks/ceph2: tolerate no mdss or 1 mgr
qa/tasks/ceph: replace wait_for_osds_up with manager.wait_for_all_osds_up
qa/tasks/ceph: wait-until-healthy
qa/tasks/ceph2: set up managers
qa/tasks/ceph2: use seed ceph.conf
qa/tasks/ceph: healthy: use manager helpers (instead of teuthology/misc ones)
qa/tasks/ceph2: name mds daemons
qa/tasks/ceph2: fix osd ordering
qa/tasks/ceph2: start up mdss
qa/tasks/ceph2: set up daemon handles and use them to stop
qa/tasks/ceph2: make it multicluster-aware
qa/tasks/ceph2: can bring up mon, mgr, osds!
qa/tasks/ceph2: basic task to bring up cluster with ceph-daemon and ssh
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Balancer triggers peering, which may make PGs briefly go inactive--when
they possibly haven't been active yet. E.g.,
"PG_AVAILABILITY": {
"severity": "HEALTH_WARN",
"summary": {
"message": "Reduced data availability: 3 pgs inactive, 3 pgs peering",
"count": 6
},
"detail": [
{
"message": "pg 2.6 is stuck peering since forever, current state peering, last acting [2,0]"
},
{
"message": "pg 2.1c is stuck peering since forever, current state peering, last acting [2,1]"
},
{
"message": "pg 2.7a is stuck peering since forever, current state peering, last acting [2,0]"
}
]
}
Signed-off-by: Sage Weil <sage@redhat.com>
in cephtool/test.sh, we
ceph fs set cephfs inline_data {1,0}
so the health check fails when the test ends, like
mon.a (mon.0) 3498 : cluster [WRN] Health check failed: 1 filesystem
with deprecated feature inline_data (FS_INLINE_DATA_DEPRECATED)" in
cluster log
so, before we remove the test, we need to whitelist this warning
Signed-off-by: Kefu Chai <kchai@redhat.com>
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately. This has a few important benefits:
- We immediately stop responding (binding) to any sockets, which means
other OSDs will immediately decide we are down (and dead!). This
minimizes IO interruption.
- We avoid the complex "clean" shutdown process, which is historically a
source of bugs.
In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind. Set
this option to false for valgrind QA runs so we can still do that.
Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out. This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on. If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.
Signed-off-by: Sage Weil <sage@redhat.com>
* add 'rgw crypt vault prefix' config setting to allow restricting
secret space in Vault where RGW can retrieve keys from
* refuse Vault token file if permissions are too open
* improve concatenation of URL paths to avoid constructing an invalid
URL (missing or double '/')
* doc: clarify SSE-KMS keys must be 256-bit long and base64 encoded,
document Vault policies and tokens, plus other minor doc improvements
* qa: check SHA256 signature of Vault zip download
* qa: fix teuthology tests broken by previous PR which made SSE-KMS
backend default to Barbican
Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com>
Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
resolves test failures under rgw/{multifs,thrash,website} similar to
https://github.com/ceph/ceph/pull/30940
Signed-off-by: Casey Bodley <cbodley@redhat.com>
* refs/pull/31206/head:
qa: test fs:upgrade when running upgrade suite
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
pre-single-major.yaml kernel doesn't have any of the monitor client
fixes that came in 4.6. If the connection is closed, it closes the
session and retries only after 10 seconds. On top of that, there is
nothing to prevent it from picking the same monitor when reconnecting.
This means that when given both v1 and v2 ports (which look like two
different monitors), it is susceptible to mount_timeout (60 seconds):
$ sudo rbd map img
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (5) Input/output error
[ 822.242313] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 832.265494] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 842.296175] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 852.326924] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 862.357611] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 872.388373] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 882.676136] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
Unlike newer kernels that return ETIMEDOUT, it returns EIO.
Newer kernels are much more aggressive about retries and will pick
a different monitor when reconnecting, hence they are always able to
establish the session in time.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
krbd: retry on transient errors from udev_enumerate_scan_devices()
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
* refs/pull/31063/head:
qa: disable too few PG warning during Mimic deploy
Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/31168/head:
ceph-daemon: try py2 import before py3
qa/suites/rados/singleton-nomsgr/ceph-daemon: make sure python3 is installed
qa/standalone/test_ceph_damon.sh: test with python2 and python3
mgr/ssh: python, not python3
ceph-daemon: python, not python3
ceph-daemon: os.makedirs
ceph-daemon: configparser is ConfigParser on py2
ceph-daemon: avoid py3-isms
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Alfredo Deza <adeza@redhat.com>
qa: enable dashboard tests to be run with "--suite rados/dashboard"
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Mimic will raise this warning when we use 8 PGs for CephFS metadata/data
pools.
Fixes: fc88e6c6c5
Fixes: https://tracker.ceph.com/issues/42434
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/31094/head:
ceph-daemon: remove redundant --privileged
test_ceph_daemon: test unit, enter, shell
ceph-daemon: drop exec
ceph-daemon: fix exit code for run, shell, enter, exec
ceph-daemon: allow optional command for 'enter'
ceph-daemon: fix LANG for 'enter' command
ceph-daemon: allow shell to take optional command
qa/suites/rados/singleton-nomsgr/ceph-daemon: run test_ceph_daemon.sh
qa/standalone/test_ceph_daemon.sh: add new functional tests
test_ceph_daemon.sh: use newer image
ceph-daemon: unconditionally enable and start crash unit
ceph-daemon: fix crash unit cleanup
ceph-daemon: include 'crash' unit/item in 'ls' output
ceph-daemon: fix 'ls'
mgr/orchestrator: s/sdd/ssd/
mgr/ssh: remove stdout/stderr kludges
ceph-daemon: fix ceph-volume command to write stdout to stdout
Reviewed-by: Sebastian Wagner <swagner@suse.com>
This moves dashboard.yaml from rados/mgr into a new, separate rados/dashboard
suite. The common elements it uses are moved from rados/mgr into qa/ and
replaced with symlinks.
Fixes: https://tracker.ceph.com/issues/41820
Signed-off-by: Nathan Cutler <ncutler@suse.com>
since it requires a running ceph cluster, it can't run in 'make check'
as a unittest. add it to the rgw/verify suite instead
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Otherwise add_key() in set_kernel_secret() fails as if running against
an ancient kernel and we fall back to secret= in options for the first
image being mapped on the machine.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>