EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder can overload OSDs and cause
heartbeat timeouts. Tests in test_envlibrados_for_rocksdb also generate slow
requests on OSDs. Use osd_client_message_cap to prevent this.
Since this option is disabled by default, this may be a good way to exercise it.
Fixes: https://tracker.ceph.com/issues/49064
Signed-off-by: Neha Ojha <nojha@redhat.com>
* refs/pull/39039/head:
src/test/cli/monmaptool: adjust for new monmap features
qa/tasks/cephadm: allow custom git_url for cephadm_branch pull
qa/suites/rados/upgrade: include pacific-x
qa/suites/upgrade/pacific-x/parallel
qa/suites: some clean up for quincy
mon: updates for quincy
mon: update for quincy ondisk features
script: add pacific
doc/dev/release-checklist: we tagged v17.0.0
ceph-volume: change to quincy
include/ceph_features: retire MON_SINGLE_PAXOS
include/ceph_features: define FEATURE_SERVER_QUINCY
mon/MgrMonitor: add always_on_modules for quincy
add feature/release name quincy
kickoff v17 quincy
doc/dev/release-checklists: uncheck everything!
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Normally the git_url is git://git.ceph.com/ceph-ci.git, which mirrors
upstream ceph-ci.git. However, the release branches aren't present there.
Allow a custom git_url so we can pull these from the main ceph.git.
Signed-off-by: Sage Weil <sage@newdream.net>
thrash_cache_writeback_proxy_none tests have been failing consistently. Some investigation
shows that the writeback overlay tests are reponsible for it. Instead of removing these
cache tiering tests entirely, we'll disable them for now.
Related to: https://tracker.ceph.com/issues/46323
Signed-off-by: Neha Ojha <nojha@redhat.com>
- remove upgrades from nautilus
- stubs for completing upgrade to quincy
Still missing the pacific-x upgrade tests.
Signed-off-by: Sage Weil <sage@newdream.net>
The current bionice version triggers a podman/conmon bug that
truncates output, affecting both cephadm bootstrap when 'mgr dump' is
large, and teuthology 'pg dump' when it is large.
See https://tracker.ceph.com/issues/48993
Signed-off-by: Sage Weil <sage@newdream.net>
Link directly to the distro version... no need to use _latest here since
it obscures the podman vs docker difference.
Signed-off-by: Sage Weil <sage@newdream.net>
Older cephadm is not smart enough to not combine --cap-add=SYS_PTRACE
and --privileged, which some version of podman cannot handle.
For upgrades, leave off the allow_ptrace behavior since we may be starting
on one of those old versions.
See also https://tracker.ceph.com/issues/46429
Fixes: https://tracker.ceph.com/issues/48142
Signed-off-by: Sage Weil <sage@newdream.net>
this will provide a more detailed output, like
```yaml
...snip...
service_type: node-exporter
service_name: node-exporter
placement:
host_pattern: '*'
status:
created: '2021-01-18T11:21:56.024810Z'
last_refresh: '2021-01-18T11:23:24.477672Z'
running: 0
size: 1
events:
- "2021-01-18T11:23:09.602644Z service:node-exporter [ERROR] \"Failed while placing\
\ node-exporter.ubuntuon ubuntu: cephadm exited with an error code: 1, stderr:Deploy\
\ daemon node-exporter.ubuntu ...\nVerifying port 9100 ...\nTraceback (most recent\
\ call last):\n File \"<stdin>\", line 7274, in <module>\n File \"<stdin>\", line\
\ 1563, in _default_image\n File \"<stdin>\", line 3698, in command_deploy\n File\
\ \"<stdin>\", line 2338, in deploy_daemon\n File \"<stdin>\", line 1961, in create_daemon_dirs\n\
AssertionError\""
...snip...
```
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
progress module can be turned off/on by using
the commands: 'progress off' and 'progress on'
As well as refractoring teuthology test suite
to prevent future bugs that can possibly occur
fixes: https://tracker.ceph.com/issues/47238
Signed-off-by: kamoltat <ksirivad@redhat.com>
Introduce a "scheduler" directory under the rados:perf tree to allow perf
suite to specify tests with the default scheduler(WPQ) and also with
the dmClock scheduler. One specification also overrides the number of
shards(1) and the number of threads per shard(16) to test with apart from
the default settings. This allows testing and performance benchmarking
with the new proposal to use one shard and multiple threads per shard with
the dmClock scheduler.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Add the possibility to assign the flags ['noup',
'nodown', 'noin', 'noout'] to single OSDs.
Fixes: https://tracker.ceph.com/issues/40739
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
Since it uses cephadm, at the moment it makes sense to run it as a part of
rados/cephadm. This gives better test coverage for developers and has exposed
bugs such as https://tracker.ceph.com/issues/45421 and
https://tracker.ceph.com/issues/47709. We can always restructure this later.
Signed-off-by: Neha Ojha <nojha@redhat.com>
tasks/cephadm.py gained RGW support very recently and
I'm now facing a dilemman:
* Either we set the upgrade start to 15.2.4 and thus
no longer upgrade from an old version, or
* Disable RGW upgrade for now.
I think doing both would be optinal, but for now, let's
disable RGW, in order to keep the coverage for everything
else.
Fixes: https://tracker.ceph.com/issues/46157
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
in this test, older ceph clients are installed on el7, but the ceph
cluster is deployed using cephadm, which in turn pulls ceph container
images built using the ceph being tested on el8.
since we've dropped the build of master on el7, there is no need to
verify if ceph package is available if cephadm is used for deploying the
cluster.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Include test case
Configurable by setting mon_osd_warn_num_repaired (default 10)
Ignore new health warning with random eio injection test
Fixes: https://tracker.ceph.com/issues/41564
Signed-off-by: David Zafman <dzafman@redhat.com>
This reverts commit a7994a0fdd.
Failed attempt at solving the issue is in PR #33272. Until we
find a clean solution for this, whiltelisting the warning is
probably the best thing for now.
Fixes: http://tracker.ceph.com/issues/43943
Signed-off-by: Venky Shankar <vshankar@redhat.com>
The balancer was turned on by default in
d4fbaf7ea9, as a result of which we might see
PG_AVAILABILITY health warnings when pg-upmap-items are applied.
Fixes: https://tracker.ceph.com/issues/45619
Signed-off-by: Neha Ojha <nojha@redhat.com>
Add a telemetry component in order to give the user the
possibility to configure the telemetry module in a more
guided fashion. The component offers broader explanations,
shows a preview of the generated report and asks the user
to accept the license before enabling the module.
Fixes: https://tracker.ceph.com/issues/43956
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
Since we've changed mon_osd_initial_require_min_compat_client to
luminous in 986c271b75, we can remove jewel from
mix.
Fixes: https://tracker.ceph.com/issues/45242
Signed-off-by: Neha Ojha <nojha@redhat.com>
the intention to add the whitelist was to test "sdk" class, but if we
add new classes to the list, and add tests exercising them, the tests
fail if we fail to update these `rados_cls_all.yaml` accordingly.
so in this change, the list is now '*' which allows OSD to load all
classes found in the specified directory
Fixes: https://tracker.ceph.com/issues/45113
Signed-off-by: Kefu Chai <kchai@redhat.com>
in tasks/module_selftest.yaml, `TestModuleSelftest.test_telegraf()` is
called. but we fail to prepare a unix domain socket to which the telegraf
module can send stats. and telegraf module does not catch
FileNotFoundError exception, so the exception is populated to ceph-mgr
and is found by the test, hence the test is marked a failure whenever
telegraf is tested.
in this change,
* catch this exception, so it won't be caught by ceph-mgr
* whitelist the error message, so the test can pass
Signed-off-by: Kefu Chai <kchai@redhat.com>
qa/suites/rados/cephadm/upgrade: start from v15.2.0
Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
* refs/pull/34105/head:
Merge PR #34042 into octopus
Merge PR #33959 into octopus
Merge PR #34067 into octopus
mgr/DaemonServer: add explicit check that acting matches for merge
Merge pull request #34040 from dillaman/wip-44396-partial-fix
Merge PR #34098 into octopus
mgr/rook: list rgw services
mgr/rook: tolerate timestamps that are None
mgr/orch: add 'subcluster' property to RGWSpec
mgr/rook: do not create radosgw pools
mgr/rook: refactor apply/add for rgw
Merge PR #34082 into octopus
Merge PR #34068 into octopus
cephadm: relabel /etc/ganesha mount
Merge PR #34046 into octopus
Merge PR #34092 into octopus
Merge pull request #33719 from ukernel/wip-44416
rbd-mirror: leader watcher should not cancel get locker if locker is invalid
rbd-mirror: snapshot sync request needs to check for interruption
librbd: request exclusive lock when moving to trash
rbd-mirror: basic integration with sync throttling
rbd-mirror: don't prematurely finish snapshot replay loop
rbd-mirror: pass InstanceWatcher to snapshot Replayer
doc/releases/octopus.rst: add note about ec recovery below min_size
mgr/cephadm: configure rgw_frontends for rgw service
cephadm: switch grafana image to the ceph repo
Merge PR #34034 into octopus
qa/suites/rados/cephadm/upgrade: update starting version
Merge PR #33540 into octopus
Merge PR #34023 into octopus
Merge PR #34044 into octopus
Merge PR #34030 into octopus
doc/orchestrator: update rgw creation
mgr/cephadm: clean up client.crash.* container_image settings after upgrade
cephadm: make add-repo --release and --version independent
cephadm: env over last used
mgr/orch: accept port and ssl flags to 'apply rgw'
mgr/orch: 'ceph upgrade ...' -> 'ceph orch upgrade ...'
cephadm: fall back to default for infer_image
cephadm: remove outdated check
cephadm: consolidate default image logic
remove ceph_test_rados_watch_notify
python-common/ceph/deployment/service_spec: add ssl to RGWSpec
cephadm: only infer image for shell, run, inspect-image, pull, ceph-volume
mgr/test_orchestrator: fix service filtering when using dummy data
mgr/dashboard: fix adding/removing host errors
mgr/rook: fix 'orch ps' for osds
qa: fix all the fsx.sh-invoking yaml files to install dependencies
mds: pass proper MutationImpl::LockOp to Locker::wrlock_start()
Reviewed-by: Kiefer Chang <kiefer.chang@suse.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
When running under valgrind (and thrashing) things can be slow. Tell
tests in case they need to tolerate timeouts.
Signed-off-by: Sage Weil <sage@redhat.com>
In any environments it is helpful to have SYS_PTRACE so that you can
gdb attach or strace a daemon.
Leave this off by default so that the container is more secure.
Enable this in teuthology and vstart.
Signed-off-by: Sage Weil <sage@redhat.com>
we have osd_min_pg_log_entries to 2 (good) but not osd_pg_log_trim_min
which defaults to 100. Thus, even on those tests we're only rarely vulnerable.
Reset osd_min_pg_log_entries to 0 to make sure we really
would keep a minimal pg log in hand.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
The rados api tests are failing WatchNotify because the OSDs are so
heavily lagged.. in large part due to the high debug level of debug_ms=20
and debug_osd=25. Reduce that.
Also increase the heartbeat grace so slow valgrind-y osds don't get marked
down.
Signed-off-by: Sage Weil <sage@redhat.com>
This version understands how to apply a mgr spec like '2;host=x' with a
semicolon. This particular test build does.
Signed-off-by: Sage Weil <sage@redhat.com>
In addition to logging slow ops in mon and osd specific log files,
re-introduce logging the same information along with slow op type
details to cluster logs as well. The objective is to make debugging
slow ops easier.
Modify the log whitelisting string to "slow request" within qa suites in
order to make the search for the new warning log message within the
cluster log successful. This should not cause any issue as it's a
substring of the earlier string.
Fixes: https://tracker.ceph.com/issues/43975
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
this test will end with a failure like
```
2020-01-30T18:15:15.870 INFO:tasks.ceph.mgr.x.smithi042.stderr:Warning: Permanently added 'smithi042.front.sepia.ceph.com,172.21.15.42' (ECDSA) to the list of known hosts.
2020-01-30T18:15:15.925 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.932 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.939 INFO:tasks.ceph.mgr.x.smithi042.stderr:root@smithi042.front.sepia.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
```
because mgr is not able to establish an ssh connection to that host with "root".
please note, the teuthology worker is acting using the "ubuntu" account on the
test node, and by default, "root" does not have its pubkey. and actually
`qa/tasks/cephadm.py` does push the pubkey to all the managed hosts before
testing cephadm.
since `qa/tasks/cephadm.py` is a better test for cephadm, let's just
drop this one.
as suites/rados/cephadm already covers cephadm
Signed-off-by: Kefu Chai <kchai@redhat.com>
Include _latest.yaml in a few cases here to be a bit future-proof.
cephadm-smoke/ is *just* a cephadm bring-up, and includes el7. cephadm/
installs packages and runs a real workload.
Signed-off-by: Sage Weil <sage@redhat.com>
For some reason the requests library has trouble connecting from
ubuntu 18.04. I reproduced this locally on my 18.04 desktop, although
there it fails on the first API request instead of the last (as in QA).
In any case, this appears to be a client library problem.
Fixes: https://tracker.ceph.com/issues/43720
Signed-off-by: Sage Weil <sage@redhat.com>