Commit Graph

8441 Commits

Author SHA1 Message Date
Kefu Chai
6f58a26281
Merge pull request #27465 from tchaikov/wip-38219
ceph-monstore-tool: use a large enough paxos/{first,last}_committed

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-06-16 09:38:45 +08:00
Sage Weil
091a32e130 Merge PR #41844 into master
* refs/pull/41844/head:
	qa/suites/orch/cephadm/dashboard: remove remaining bits

Reviewed-by: Michael Fritch <mfritch@suse.com>
2021-06-15 15:42:26 -04:00
Patrick Donnelly
174b8ad30b
Merge PR #41840 into master
* refs/pull/41840/head:
	qa: update cli syntax to conventional

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
2021-06-15 10:34:18 -07:00
Patrick Donnelly
03674f5197
Merge PR #41821 into master
* refs/pull/41821/head:
	qa: specify distro for fs:bugs

Reviewed-by: Rishabh Dave <ridave@redhat.com>
2021-06-15 10:33:38 -07:00
Patrick Donnelly
8cb34b3849
Merge PR #41771 into master
* refs/pull/41771/head:
	qa: update scrub start code to use comma sep scrubopts

Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-15 10:32:06 -07:00
Sage Weil
ebb5a3f0bc qa/suites/orch/cephadm/dashboard: remove remaining bits
Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-14 13:00:45 -05:00
Patrick Donnelly
a402b23c84
qa: update cli syntax to conventional
This was using an obscure syntax that worked at one time and wasn't
documented (AFAIK).

Fixes: https://tracker.ceph.com/issues/51182
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-14 10:21:43 -07:00
Kefu Chai
75b91d49b8
Merge pull request #39624 from sebastian-philipp/mypy-812
src,qa: Upgrade to mypy 0.901

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-14 22:53:02 +08:00
Sage Weil
9074e87611 Merge PR #41827 into master
* refs/pull/41827/head:
	qa: move dashboard e2e from cephadm -> rados suite

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2021-06-14 09:11:04 -04:00
Patrick Donnelly
7e320919f7
Merge PR #41482 into master
* refs/pull/41482/head:
	qa: remove obsolete deactivate routines

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2021-06-13 19:56:34 -07:00
Patrick Donnelly
6a095654f4
Merge PR #41422 into master
* refs/pull/41422/head:
	qa/tasks/cephfs/test_sessionmap: reap connections immediately
	msg/async: configurable threshold for reaping dead connections

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-13 19:52:48 -07:00
Patrick Donnelly
0441b3d60f
Merge PR #41403 into master
* refs/pull/41403/head:
	mgr/volumes: Add config to insert delay at the beginning of the clone

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-13 19:51:52 -07:00
Sage Weil
ac05b3568f qa: move dashboard e2e from cephadm -> rados suite
This test fails ~20% of the time.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-12 07:52:54 -05:00
Patrick Donnelly
b24608daa2
qa: choose victim pg from rbd pool
Right now scrub_test picks any pg in ceph. Unfortunately, it picked the
.mgr pool's only pg in [1]:

	2021-05-16T11:36:35.035 DEBUG:teuthology.orchestra.run.smithi049:> adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados --cluster ceph --pool rbd setomapval main.db-journal.0000000000000000 key val

Instead, only pick a pg in the rbd pool.

[1] /ceph/teuthology-archive/kchai-2021-05-16_11:19:39-rados-wip-kefu-testing-2021-05-16-1043-distro-basic-smithi/6117396/teuthology.log

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-11 20:07:22 -07:00
Patrick Donnelly
d6c66f3fa6
qa,pybind/mgr: allow disabling .mgr pool
This is mostly for testing: a lot of tests assume that there are no
existing pools. These tests relied on a config to turn off creating the
"device_health_metrics" pool which generally exists for any new Ceph
cluster. It would be better to make these tests tolerant of the new .mgr
pool but clearly there's a lot of these. So just convert the config to
make it work.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-11 19:35:17 -07:00
Patrick Donnelly
71d2c81d41
qa: add upgrade test for devicehealth
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-11 19:35:17 -07:00
Patrick Donnelly
0d9032771c
qa: fix api test failures
"device_health_metrics" pool is gone -- .mgr pool is in.

I don't think the pool removal code in some test cases is necessary any
longer with recent changes to remove those warnings; so that code is
gone too.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-11 19:35:17 -07:00
Kefu Chai
7513b24aa5
Merge pull request #40480 from kamoltat/wip-ksirivad-fix-bug-49988
pybind/mgr/progress: Disregard unreported pgs

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-06-12 08:37:35 +08:00
Patrick Donnelly
95f0e9c959
Merge PR #39505 into master
* refs/pull/39505/head:
	qa: test nowsync option in kernel client workflows
	qa: deep merge top level overrides for fuse/kclient

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2021-06-11 17:10:41 -07:00
Patrick Donnelly
07bafc1b23
Merge PR #41683 into master
* refs/pull/41683/head:
	qa: update RHEL to 8.4

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-11 17:07:06 -07:00
Patrick Donnelly
8eeb1455ee
qa: specify distro for fs:bugs
Fixes: https://tracker.ceph.com/issues/51184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-11 16:25:50 -07:00
Kefu Chai
7afd38f846 tasks/ceph_manager: ignore EACCES when waiting for quorum
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like

handle_auth_bad_method server allowed_methods [2] but i only support [2]

in the case of ceph cli, the error would look like:

[errno 13] RADOS permission denied (error connecting to the cluster)

so, to address this issue, the EACCES error is ignored when waiting
for a quorum.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-10 20:29:50 +08:00
Kefu Chai
3908c1f4cd tasks/ceph_manager: use safe_while() to refactor the wait for quorum
for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-10 20:29:50 +08:00
Patrick Donnelly
26605723cf
qa: update cephfs-shell distro to ubuntu 20.04
18.04 is no longer built.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-09 16:51:49 -07:00
Sage Weil
64281bb394 Merge PR #41229 into master
* refs/pull/41229/head:
	qa/suites/upgrade/pacific-x/stress-split: add

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
2021-06-09 15:55:25 -04:00
Neha Ojha
c88bfc8bdd
Merge pull request #41782 from sseshasa/wip-fix-standalone-test
qa/standalone: Use osd op queue = wpq in activate_osd() within ceph-helpers.sh.

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-06-09 08:28:05 -07:00
Kamoltat
4b00f1c2bd pybind/mg/progress: Disregard unreported pgs
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.

Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.

Fixes: https://tracker.ceph.com/issues/49988

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-06-09 15:11:32 +00:00
Patrick Donnelly
0f505dc299
qa: update scrub start code to use comma sep scrubopts
The documentation specifies this in [1] and yet we were using (I
believe) an older syntax:

    ceph tell mds.foo:0 scrub start / recursive force

instead of

    ceph tell mds.foo:0 scrub start / recursive,force

Oddly the former works at least as recently as in [2]:

    2021-06-03T07:11:42.071 DEBUG:teuthology.orchestra.run.smithi025:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub start / recursive force
    ...
    2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:{
    2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:    "return_code": 0,
    2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:    "scrub_tag": "cf7a74b2-3eb2-4657-9274-ea504b1ebf8f",
    2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:    "mode": "asynchronous"
    2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:}

[1] https://docs.ceph.com/en/latest/cephfs/scrub/
[2] /ceph/teuthology-archive/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/6148097/teuthology.log

Fixes: https://tracker.ceph.com/issues/51146
See-also: https://tracker.ceph.com/issues/51145
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-09 07:23:05 -07:00
Sebastian Wagner
1f6b4744b5 qa: Upgrade to mypy 0.901
mypy 0.9 now requires stub packages

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-06-09 12:53:21 +02:00
Sebastian Wagner
3fab28a55f src,qa: Upgrade to mypy 0.812
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2021-06-09 12:39:58 +02:00
Sridhar Seshasayee
94826eaadc qa/standalone: Use osd op queue = wpq in activate_osd()
This change is a follow-up to commit
b6e9c0903d that set the scheduler to wpq in
run_osd() and run_osd_filestore(). In addition, activate_osd() too has to
set the scheduler type to 'wpq' in order to be consistent and avoid test
failures.

The above is a temporary measure until all the standalone tests are
modified to run well with the mclock_scheduler.

Fixes: https://tracker.ceph.com/issues/51074
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-06-09 15:02:58 +05:30
Ernesto Puerta
6465b9a254
Merge pull request #41123 from rhcs-dashboard/host-addr-and-labels
mgr/dashboard: Include Network address and labels on Host Creation form

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2021-06-09 10:23:34 +02:00
Sage Weil
b18427da4b Merge PR #41509 into master
* refs/pull/41509/head:
	common/cmdparse: fix CephBool validation for tell commands
	mgr/nfs: fix 'nfs export create' argument order
	common/cmdparse: emit proper json
	mon/MonCommands: add -- seperator to example
	qa/tasks/cephfs/test_nfs: fix export create test
	mgr: make mgr commands compat with pre-quincy mon
	doc/_ext/ceph_commands: handle non-positional args in docs
	mgr: fix reweight-by-utilization cephbool flag
	mon/MonCommands: convert some CephChoices to CephBool
	mgr/k8sevents: fix help strings
	pybind/mgr/mgr_module: fix help desc formatting
	mgr/orchestrator: clean up 'orch {daemon add,apply} rgw' args
	mgr/orchestrator: add end_positional to a few methods
	mgr/orchestrator: reformat a few methods
	pybind/ceph_argparse: stop parsing when we run out of positional args
	pybind/ceph_argparse: remove dead code
	pybind/mgr/mgr_module: infer non-positional args
	pybind/mgr/mgr_module: add separator for non-positional args
	command/cmdparse: use -- to separate positional from non-positional args
	pybind/ceph_argparse: adjust help text for non-positional args
	pybind/ceph_argparse: track a 'positional' property on cli args

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-06-07 10:02:52 -04:00
Nizamudeen A
7c1df692f2 mgr/dashboard: Include Network address and labels on Host Creation form
The ability to create host by specifying network address and also create
labels.

https://tracker.ceph.com/issues/50318
Signed-off-by: Nizamudeen A <nia@redhat.com>
2021-06-07 14:47:09 +05:30
Patrick Donnelly
84ae38594d
qa: test nowsync option in kernel client workflows
Fixes: https://tracker.ceph.com/issues/49341
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-04 19:15:12 -07:00
Patrick Donnelly
88f74dbfa6
qa: deep merge top level overrides for fuse/kclient
This allows for array/dict configs like mntopts to accumulate changes
from multiple yaml fragments.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-04 19:15:12 -07:00
Sage Weil
8683cccd06 qa/tasks/cephfs/test_nfs: fix export create test
Everything after --readonly is non-positional.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-04 16:56:17 -04:00
Kefu Chai
104f054cee qa/suites/orch/rook/smoke: stop testing on ubuntu 18.04
even rook does not really install ceph packages in the host directly, it
uses the ceph container image. but teuthology insists on checking the
existence of debian packages by querying shaman server when it sees a
teuthology facet file which includes:

os_type: ubuntu
os_version: "18.04"

but since we've stopped building ubuntu/bionic packages, teuthology
just complains when we are scheduling test suites which are composed
from facets in qa/suites/orch/rook/smoke.

in this change, the ubuntu_18.04.yaml is dropped because ubuntu/bionic
does not really increase the test coverage of ceph. it helps to test
the rook and container runtime though.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-05 01:11:14 +08:00
Sage Weil
c8c5071dcd qa/tasks/cephfs/test_sessionmap: reap connections immediately
We have to reap connections promptly for this test to work.

This test was broken indirectly by d51d80b323,
which moved the counter decrement to reap time instead of mark_down/stop
time.

The reaping is asynchronous, so allow for a delay in the count change.

Fixes: https://tracker.ceph.com/issues/50622
Signed-off-by: Sage Weil <sage@newdream.net>
2021-06-04 11:02:29 -04:00
Kefu Chai
dba26fc7a8
Merge pull request #41652 from tchaikov/wip-qa-asock-or
qa/tasks/admin_socket: support "foo || bar" as command

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-06-04 13:50:38 +08:00
Patrick Donnelly
a12db7941b
Merge PR #41499 into master
* refs/pull/41499/head:
	qa/tasks/mds_thrash: fix thrash iteration never skip

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-03 13:33:27 -07:00
Patrick Donnelly
a52712f955
Merge PR #41443 into master
* refs/pull/41443/head:
	test: update log-ignorelist for fs:mirror test

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-03 13:23:17 -07:00
Patrick Donnelly
4e1f812461
Merge PR #39910 into master
* refs/pull/39910/head:
	test: Add test for mgr hang when osd is full
	mgr: Set client_check_pool_perm to false
	mds: Add full caps to avoid osd full check

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-03 13:22:23 -07:00
Patrick Donnelly
b9bf490974
qa: update RHEL to 8.4
Fixes: https://tracker.ceph.com/issues/51082
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-03 12:44:35 -07:00
Neha Ojha
11252f6117
Merge pull request #41308 from sseshasa/wip-osd-benchmark-for-mclock
osd: Run osd bench test to override default max osd capacity for mclock

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-06-03 08:39:22 -07:00
Radoslaw Zarzynski
cec7c15f19 qa: use dump_metrics as alternative of get_heap_property
"get_heap_property *" asock commands are exposed to operators
to check the tcmalloc internals for understanding the performance
of the memory subsystem. but crimson uses the builtin seastar allocator
which is not backed by tcmalloc. but we can dump the metrics using
the "dump_metrics" asock command which is only available from
crimson-osd.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-03 14:24:23 +08:00
Kefu Chai
83e4edcd80 qa/tasks/admin_socket: support "foo || bar" as command
so we can cater the needs of different implementation of osd, i.e.,
classic osd and crimson osd. they offer different set of asock commands.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-06-03 14:23:46 +08:00
Patrick Donnelly
5871240363
Merge PR #41635 into master
* refs/pull/41635/head:
	qa: increase fragmentation to improve uniform distribution

Reviewed-by: Ramana Raja <rraja@redhat.com>
2021-06-02 08:18:22 -07:00
Sridhar Seshasayee
328271d587 qa/tasks: Enhance wait_until_true() to check & retry recovery progress
With mclock scheduler enabled, the recovery throughput is throttled based
on factors like the type of mclock profile enabled, the OSD capacity among
others. Due to this the recovery times may vary and therefore the existing
timeout of 120 secs may not be sufficient.

To address the above, a new method called _is_inprogress_or_complete() is
introduced in the TestProgress Class that checks if the event with the
specified 'id' is in progress by checking the 'progress' key of the
progress command response. This method also handles the corner case where
the event completes just before it's called.

The existing wait_until_true() method in the CephTestCase Class is
modified to accept another function argument called "check_fn". This is
set to the _is_inprogress_or_complete() function described earlier in the
"test_turn_off_module" test that has been observed to fail due to the
reasons already described above. A retry mechanism of a maximum of 5
attempts is introduced after the first timeout is hit. This means that
the wait can extend up to a maximum of 600 secs (120 secs * 5) as long as
there is recovery progress reported by the 'ceph progress' command result.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-06-02 14:19:48 +05:30
Yuval Lifshitz
679ddf5d11
Merge pull request #41026 from TRYTOBE8TME/wip-rgw-rabbitmq
qa/tasks: Adding RabbitMQ task for bucket notification tests
2021-06-02 07:47:39 +03:00