Commit Graph

8602 Commits

Author SHA1 Message Date
Patrick Donnelly
2cd3494771 qa: update mds_pre_upgrade to no longer stop standbys
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
8e0b9bcad6 qa: update mds_pre_upgrade to disable standby-replay
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
295971b9c6 qa: add tests for compat manipulation and upgrade
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 16:28:54 -07:00
Patrick Donnelly
5ae7b9202b Merge PR #42513 into master
* refs/pull/42513/head:
	qa: multifs already enabled as default

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-30 14:03:36 -07:00
Patrick Donnelly
c99a5e56a6 Merge PR #42201 into master
* refs/pull/42201/head:
	qa: fold frag confs into conf/mds.yaml

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
2021-07-30 14:00:19 -07:00
Sridhar Seshasayee
464e9ea6c0 qa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout
Modified test cases:

1. ver-health.sh:
  a. TEST_check_version_health_1():
    To avoid intermittent timeouts observed in wait_for_health_string(),
    increase the wait time to 20 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
33d2a2c93b qa/standalone/scrub: Force a subset of scrub tests to use "wpq" scheduler
The following tests in the test files mentioned below use the
"osd_scrub_sleep" option to introduce delays during scrubbing to help
determine scrubbing states, validate reservations during scrubbing etc..
This works when using the "wpq" scheduler.

But when the "mclock_scheduler" is enabled, the "osd_scrub_sleep" is
disabled and overridden to 0. This is done to delegate the scheduling of
the background scrubs to the "mclock_scheduler" based on the set QoS
parameters. Due to this, the checks to verify the scrub states,
reservations etc. fail since the window to check them is very short
due to scrubs completing very quickly. This affects a small subset of
scrub tests mentioned below,

1. osd-scrub-dump.sh -> TEST_recover_unexpected()
2. osd-scrub-repair.sh -> TEST_auto_repair_bluestore_tag()
3. osd-scrub-test.sh -> TEST_scrub_abort(), TEST_deep_scrub_abort()

Only for the above tests, until there's a reliable way to query scrub
states with "--osd-scrub-sleep" set to 0, the "osd_op_queue" config
option is set to "wpq".

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
f658ff3511 qa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler
Modified test cases:

1. test-erasure-eio.sh:
  a. Test_ec_backfill_unfound():
    - Set osd_mclock_profile to high_recovery_ops profile.
    - Increase the wait for backfill_unfound timeout to 240 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
bdf36cf045 qa/standalone/osd-backfill: Modify backfill tests for mclock scheduler
Modified test cases:

1. osd-backfill-prio.sh:
  Set osd_op_queue = wpq for all tests since the mclock doesn't
  consider recovery priority as part of its scheduling algorithm.

2. osd-backfill-space.sh:
  Set osd_mclock_profile to high_recovery_ops and increase the wait
  for backfills timeout to 1200 secs for the following tests:
  - TEST_backfill_test_simple()
  - TEST_backfill_test_multi()
  - TEST_backfill_test_sametarget()
  - TEST_backfill_multi_partial()
  - TEST_ec_backfill_simple()
  - TEST_ec_backfill_multi()
  - SKIP_TEST_ec_backfill_multi_partial()
  - SKIP_TEST_ec_backfill_multi_partial()

3. osd-backfill-stats:
  - TEST_backfill_ec_down_all_out():
   Set osd_mclock_profile to high_recovery_ops and increase the wait
   for recovery timeout to 240 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
2c577040cb qa/standalone/osd: Modify osd tests for mclock scheduler
Modified test cases:
1. osd-recovery-prio.sh:
   Set osd_op_queue = wpq for all tests since mclock
   doesn't consider recovery priority as part of its
   scheduling algorithm.

2. osd-recovery-stats.sh:
   a. TEST_recovery_undersized():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for recovery timeout to 300 secs.

3. osd-rep-recov-eio.sh:
   a. TEST_rep_backfill_unfound():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for backfill_unfound to 360 secs.

4. repeer-on-acting-back.sh:
   a. TEST_repeer_on_down_act():
     - Set osd_mclock_profile to high_recovery_ops profile.
       (To improve the test duration)

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
5a85a6a035 qa/standalone: Modify ceph-helpers.sh tests for mclock scheduler.
List of changes:

1. Remove the enforcement to use osd_op_queue=wpq when an osd is brought
   up in the following functions:
   - run_osd()
   - run_osd_filestore() and
   - activate_osd()

2. New functions:
   - get_op_scheduler() - Get the current osd_op_queue for an osd.

3. Modified test cases:
   - test_run_osd() - Add check for osd_max_backfill count.
     The mclock scheduler overrides the count to 1000.

4. New test cases:
   - test_activate_osd_after_mark_down()
   - test_get_op_scheduler()

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Brad Hubbard
434b325c40
Merge pull request #42442 from badone/wip-insights-reports-non-persistent-storage
Don't persist report data

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-07-29 09:19:32 +10:00
Patrick Donnelly
665b36de4e Merge PR #42349 into master
* refs/pull/42349/head:
	mon/MDSMonitor: propose if FSMap struct_v is too old
	mon/MDSMonitor: give a proper error message if FSMap struct_v is too old
	mds/FSMap: use DECODE_OLDEST to gate FSMap version
	qa: add tests for fs dump of epoch and trimming
	qa: add file system support for dumping epoch
	mon/MDSMonitor: return mon_mds_force_trim_to even if equal to current epoch
	mon: add debugging for trimming methods
	mon: fix debug spacing
	qa: add nofs upgrade suite

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
2021-07-28 10:45:08 -07:00
Patrick Donnelly
4f0f51e4cb Merge PR #41025 into master
* refs/pull/41025/head:
	qa: wait pgs to be clean before using the pools
	qa: ignore PG_RECOVERY_FULL and PG_DEGRADED for mds-full
	qa: wait more time since there have many more pgs than before
	qa: do not multiple the full ratio twice
	qa: do not raise for kclient for _fsync test
	qa: use the pg autoscale mode to calcuate the pg_num
	qa: set the object_size to 1M
	qa: move the is_full() to parent class

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-28 10:34:12 -07:00
Patrick Donnelly
5ddaa36d17 qa: add tests for fs dump of epoch and trimming
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-28 07:07:05 -07:00
Patrick Donnelly
ee899d9a44 qa: add file system support for dumping epoch
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-28 07:07:05 -07:00
Patrick Donnelly
9941188116 qa: add nofs upgrade suite
This adds an upgrade suite to ensure that a Ceph cluster without a
CephFS file system does not blow up on upgrade (in particular, that the
MDSMonitor does not trip). This was developed to potentially reproduce
tracker 51673 but the actual cause for that issue was an old encoding
for the MDSMap which was obsoleted in Pacific. You must create a cluster
older than the FSMap (~Hammer or Infernalis) to reproduce. In any case,
this upgrade suite may be useful in the future so let's keep it!

Related-to: https://tracker.ceph.com/issues/51673
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-28 07:07:05 -07:00
Xiubo Li
361ee535dd qa: multifs already enabled as default
Since pacific already mark multifs enabled as defaut.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-28 13:56:10 +08:00
Sage Weil
ecc0b4597b Merge PR #42406 into master
* refs/pull/42406/head:
	mgr/nfs: remove unused 'realm' arg for 'nfs export create rgw'
	doc/mgr/rook: update title
	doc/mgr/nfs: reference customizing ingress
	doc/mgr/nfs: add section for manual ganesha config; reframe
	doc/mgr/nfs: document ingress in more detail
	doc/mgr/nfs: typo
	doc/mgr/nfs: add note about incomplete ingress
	qa/suites/orch/cephadm: add rgw nfs export test
	mgr/cephadm: ingress: tolerate no daemons
	mgr/nfs: add --squash option to 'nfs export create rgw ...'
	mgr/nfs: use bucket owner creds for rgw bucket export
	mgr/cephadm: use new CEPH_IMAGE_TYPES for all daemons using ceph container image
	qa/tasks/python: simple task to run python code
	doc/mgr/nfs: revisions
	mgr/nfs/export: nicer exceptions on cap update

Reviewed-by: Varsha Rao <varao@redhat.com>
2021-07-27 14:11:56 -04:00
Xiubo Li
a448d1c3ee qa: wait pgs to be clean before using the pools
Or in some use cases, like the mds-full tests, we will hit the
"PG_AVAILABILITY" warning.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-27 09:54:03 +08:00
Xiubo Li
3456ff2628 qa: ignore PG_RECOVERY_FULL and PG_DEGRADED for mds-full
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-27 09:48:56 +08:00
Xiubo Li
999c787ac6 qa: wait more time since there have many more pgs than before
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-27 09:48:56 +08:00
Xiubo Li
ba3833a622 qa: do not multiple the full ratio twice
The cluster has already multiple the full ratio before returning
the "max_avail".

Fixes: https://tracker.ceph.com/issues/50984
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-27 09:48:56 +08:00
Xiubo Li
a96ee41908 qa: do not raise for kclient for _fsync test
For kclient, the write() will return -ENOSPC instead of the fsync().

Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-27 09:48:56 +08:00
Xiubo Li
c1cea71299 qa: use the pg autoscale mode to calcuate the pg_num
Setting the pg_num to 8 is too small that some osds maybe not covered by the
pools, some osds maybe overloaded. Remove the hardcodeing pg_num here and let
the pg autoscale mode to calculate it as needed, and at the same time set the
pg_num_min to 64 to avoid the pg_num to small.

If ec pool is used, for the test cases most datas will go to the ec pool and
the primary replicated pool will store a small amount of metadata for all the
files only, so set the target size ratio to 0.05 should be enough.

Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-27 09:48:56 +08:00
Xiubo Li
c7837484d9 qa: set the object_size to 1M
Set the object_size to 1MB to make the objects destributed more even
among the OSDs.

Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-27 09:48:56 +08:00
Xiubo Li
f4288f2a9b qa: move the is_full() to parent class
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-07-27 09:48:56 +08:00
Sage Weil
f8f7b86571 Merge PR #42292 into master
* refs/pull/42292/head:
	qa/suites/upgrade: log_to_journald=false

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-07-26 19:23:26 -04:00
Sage Weil
ac63ab6125 Merge PR #42489 into master
* refs/pull/42489/head:
	qa/suites/upgrade/pacific-x/stress-split: do not avoid_pacific_features

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-07-26 19:23:03 -04:00
Sage Weil
cd089ee74e qa/suites/orch/cephadm: add rgw nfs export test
Signed-off-by: Sage Weil <sage@newdream.net>
2021-07-26 16:23:17 -04:00
Sage Weil
45737fe95a qa/tasks/python: simple task to run python code
Signed-off-by: Sage Weil <sage@newdream.net>
2021-07-26 16:23:06 -04:00
Patrick Donnelly
83d252cc30 qa: fold frag confs into conf/mds.yaml
These overrides are standard for all configurations. The config to
enable fragmentation is also long removed.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-26 07:14:38 -07:00
Neha Ojha
c9ad86e9c5
Merge pull request #42438 from tchaikov/wip-qa-test_module_selftest
qa/tasks/mgr: clean crash reports before waiting for clean

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-07-23 15:39:14 -07:00
Patrick Donnelly
0e71ea4a13
Merge PR #42106 into master
* refs/pull/42106/head:
	mds: create file system with specific ID

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-23 11:15:33 -07:00
Patrick Donnelly
4ee63174e6
Merge PR #42431 into master
* refs/pull/42431/head:
	cmake: add "mypy" back to tox envlist of "qa""
	qa/tasks/vstart_runner: add optional "sudo" param to _run_python()

Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-07-22 07:56:40 -07:00
Milind Changire
6edebf8a1f
Merge pull request #42329 from vshankar/wip-cephfs-mirror-dir-remove-registery
cephfs-mirror: record directory path cancel in DirRegistry

Reviewed-by: Milind Changire <mchangir@redhat.com>
2021-07-22 18:20:59 +05:30
Brad Hubbard
32d1cca2d9 qa/tasks/mgr/test_insights: Remove test for persistent checks
This test makes no sense if we are no longer persisting the store.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2021-07-22 15:02:01 +10:00
Kefu Chai
ffbc3164d4 cmake: add "mypy" back to tox envlist of "qa""
This reverts commit 286e46578d.

since 0017df2006 has been merged, let's
add mypy back.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-22 10:09:21 +08:00
Kefu Chai
0017df2006 qa/tasks/vstart_runner: add optional "sudo" param to _run_python()
to silence mypy warnings like:

tasks/vstart_runner.py:691: error: Definition of "_run_python" in base class "LocalCephFSMount" is incompatible with definition in base class "CephFSMount"
tasks/vstart_runner.py:705: error: Definition of "_run_python" in base class "LocalCephFSMount" is incompatible with definition in base class "CephFSMount"

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-22 10:08:27 +08:00
Neha Ojha
c9f8846b7f
Merge pull request #41907 from kamoltat/wip-ksirivad-progress-time-interval
pybind/mgr/progress: introduce 5 second sleep interval

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2021-07-21 16:53:38 -07:00
Casey Bodley
e3a6377099
Merge pull request #42196 from cbodley/wip-qa-rgw-rm-cephadm
qa/rgw: remove rgw_cephadm.yaml from rgw/singleton suite

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2021-07-21 13:14:35 -04:00
Casey Bodley
255293bd80
Merge pull request #42317 from cbodley/wip-39657
rgw multisite: metadata sync treats all errors as 'transient' for retry

Reviewed-by: Shilpa Jagannath <smanjara@redhat.com>
2021-07-21 13:13:54 -04:00
Casey Bodley
1acee2ab76
Merge pull request #42361 from cbodley/wip-49747
qa/rgw: add failing tempest test to blocklist

Reviewed-by: Ali Maredia <amaredia@redhat.com>
2021-07-21 13:10:30 -04:00
Kefu Chai
ec8a40b08f qa/tasks/mgr: clean crash reports before waiting for clean
otherwise we have following warning in health report

{"status":"HEALTH_WARN","checks":{"RECENT_MGR_MODULE_CRASH":{"severity":"HEALTH_WARN","summary":{"message":"1 mgr modules have recently crashed","count":1},"muted":false}},"mutes":[]}

and it does not disappear after the test waits for 30 seconds.
and the tasks.mgr.test_module_selftest.TestModuleSelftest test
fails like:

2021-07-21T09:59:52.560 INFO:tasks.cephfs_test_runner:======================================================================
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:ERROR: test_module_commands (tasks.mgr.test_module_selftest.TestModuleSelftest)
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/mgr/test_module_selftest.py", line 201, in
test_mo
dule_commands
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner:    self.wait_for_health_clear(timeout=30)
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 172, in
wait_for_health_c
lear
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner:    self.wait_until_true(is_clear, timeout)
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 209, in
wait_until_true
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner:    raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2021-07-21T09:59:52.564 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries

in this change, the crash reports are nuked right after
we see the warning, so that we can have a clean health
report.

Fixes: https://tracker.ceph.com/issues/51743
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-21 22:46:18 +08:00
Neha Ojha
2c528248df
Merge pull request #42410 from ronen-fr/wip-ronenf-standalone-repair
qa/standalone: fixing the timings when waiting for deep-scrub to start

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-21 06:57:41 -07:00
Sebastian Wagner
8e513c6c9b
Merge pull request #42430 from tchaikov/wip-cmake-qa-drop-mypy
cmake: drop "mypy" from tox envlist of "qa"

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-07-21 08:58:18 +02:00
Kefu Chai
056065c7e6
Merge pull request #42429 from neha-ojha/wip-51638-cleanup
qa/*/test_envlibrados_for_rocksdb.sh: remove OS specific configuration

Reviewed-by: David Galloway <dgallowa@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-07-21 11:24:40 +08:00
Kefu Chai
286e46578d cmake: drop "mypy" from tox envlist of "qa"
this change partially reverts 81305b0da9,
otherwise we have following errors:

tasks/vstart_runner.py:691: error: Definition of "_run_python" in base class "LocalCephFSMount" is incompatible with definition in base class "CephFSMount"
tasks/vstart_runner.py:705: error: Definition of "_run_python" in base class "LocalCephFSMount" is incompatible with definition in base class "CephFSMount"

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-07-21 11:09:20 +08:00
Kefu Chai
dc1a8a8b0e
Merge pull request #41929 from sebastian-philipp/fix-qa-tox
qa: Various make check fixes

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-07-21 00:36:59 +08:00
Sebastian Wagner
cb553909d3
Merge pull request #41280 from sebastian-philipp/test_cephadm-stdin
qa/workunits/test_cephadm: Also test stdin

Reviewed-by: Michael Fritch <mfritch@suse.com>
2021-07-20 17:33:20 +02:00