RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-08 20:21:33 +00:00

Author	SHA1	Message	Date
Ronen Friedman	10909c3cba	osd/scrub: update the stand-alone tests to check 'scrub scheduling' entries Analyzing and verifying the relevant entries in 'pg query' and 'pg dump' output. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-11-05 17:07:57 +02:00
Ronen Friedman	52e9fa16ef	tests: modify osd-scrub-repair to match PR #43239 changes PR #43239 has modified ECBackend::get_hash_info() behavior. Modified the standalone scrub test to match. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-10-20 06:42:51 +00:00
Zack Cerza	b57539dc94	Revert "qa: support isal ec test for aarch64" This commit has been causing scheduled jobs to request e.g. aarch64 smithi machines, which don't exist. The dispatcher then tries to find them forever, requiring the dispatcher to be killed and restarted. The queue will sit idle until someone notices the problem. Signed-off-by: Zack Cerza <zack@redhat.com>	2021-10-12 12:53:58 -06:00
Dai Zhiwei	eaa385f3da	qa: support isal ec test for aarch64 modified: qa/standalone/erasure-code/test-erasure-code-plugins.sh new file: qa/suites/rados/thrash-erasure-code-isa/arch/aarch64.yaml Signed-off-by: Dai Zhiwei <daizhiwei3@huawei.com>	2021-10-08 14:37:25 +08:00
Aishwarya Mathuria	1b4e416f81	osd/scrub: Add scrub duration to pg dump stats Addition of a new column, SCRUB_DURATION, to the pg stats that stores the time taken for a PG scrub. Fixes: https://tracker.ceph.com/issues/52605 Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>	2021-10-01 13:27:27 +05:30
Neha Ojha	e273418bbb	Merge pull request #42604 from sseshasa/wip-skip-osd-benchmark osd: Add config option to skip running the osd benchmark during init and update documentation. Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>	2021-09-08 11:03:09 -07:00
Sridhar Seshasayee	f539bedc96	qa/standalone: Add standalone test to validate osd-mclock-skip-benchmark option Add a standalone test - test_activate_osd_skip_benchmark() in ceph-helpers.sh that exercises the osd-mclock-skip-benchmark option. Fixes: https://tracker.ceph.com/issues/52025 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-09-01 14:19:03 +05:30
Igor Fedotov	0b0f8ef12f	qa/osd-bluefs-volume-ops: reproduce bluefs migrate bug Reproduces: https://tracker.ceph.com/issues/40434 Signed-off-by: Igor Fedotov <ifedotov@suse.com>	2021-08-31 16:23:22 +03:00
Sridhar Seshasayee	464e9ea6c0	qa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout Modified test cases: 1. ver-health.sh: a. TEST_check_version_health_1(): To avoid intermittent timeouts observed in wait_for_health_string(), increase the wait time to 20 secs. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	33d2a2c93b	qa/standalone/scrub: Force a subset of scrub tests to use "wpq" scheduler The following tests in the test files mentioned below use the "osd_scrub_sleep" option to introduce delays during scrubbing to help determine scrubbing states, validate reservations during scrubbing etc.. This works when using the "wpq" scheduler. But when the "mclock_scheduler" is enabled, the "osd_scrub_sleep" is disabled and overridden to 0. This is done to delegate the scheduling of the background scrubs to the "mclock_scheduler" based on the set QoS parameters. Due to this, the checks to verify the scrub states, reservations etc. fail since the window to check them is very short due to scrubs completing very quickly. This affects a small subset of scrub tests mentioned below, 1. osd-scrub-dump.sh -> TEST_recover_unexpected() 2. osd-scrub-repair.sh -> TEST_auto_repair_bluestore_tag() 3. osd-scrub-test.sh -> TEST_scrub_abort(), TEST_deep_scrub_abort() Only for the above tests, until there's a reliable way to query scrub states with "--osd-scrub-sleep" set to 0, the "osd_op_queue" config option is set to "wpq". Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	f658ff3511	qa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler Modified test cases: 1. test-erasure-eio.sh: a. Test_ec_backfill_unfound(): - Set osd_mclock_profile to high_recovery_ops profile. - Increase the wait for backfill_unfound timeout to 240 secs. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	bdf36cf045	qa/standalone/osd-backfill: Modify backfill tests for mclock scheduler Modified test cases: 1. osd-backfill-prio.sh: Set osd_op_queue = wpq for all tests since the mclock doesn't consider recovery priority as part of its scheduling algorithm. 2. osd-backfill-space.sh: Set osd_mclock_profile to high_recovery_ops and increase the wait for backfills timeout to 1200 secs for the following tests: - TEST_backfill_test_simple() - TEST_backfill_test_multi() - TEST_backfill_test_sametarget() - TEST_backfill_multi_partial() - TEST_ec_backfill_simple() - TEST_ec_backfill_multi() - SKIP_TEST_ec_backfill_multi_partial() - SKIP_TEST_ec_backfill_multi_partial() 3. osd-backfill-stats: - TEST_backfill_ec_down_all_out(): Set osd_mclock_profile to high_recovery_ops and increase the wait for recovery timeout to 240 secs. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	2c577040cb	qa/standalone/osd: Modify osd tests for mclock scheduler Modified test cases: 1. osd-recovery-prio.sh: Set osd_op_queue = wpq for all tests since mclock doesn't consider recovery priority as part of its scheduling algorithm. 2. osd-recovery-stats.sh: a. TEST_recovery_undersized(): - Set osd_mclock_profile to high_recovery_ops profile. - Increase wait for recovery timeout to 300 secs. 3. osd-rep-recov-eio.sh: a. TEST_rep_backfill_unfound(): - Set osd_mclock_profile to high_recovery_ops profile. - Increase wait for backfill_unfound to 360 secs. 4. repeer-on-acting-back.sh: a. TEST_repeer_on_down_act(): - Set osd_mclock_profile to high_recovery_ops profile. (To improve the test duration) Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	5a85a6a035	qa/standalone: Modify ceph-helpers.sh tests for mclock scheduler. List of changes: 1. Remove the enforcement to use osd_op_queue=wpq when an osd is brought up in the following functions: - run_osd() - run_osd_filestore() and - activate_osd() 2. New functions: - get_op_scheduler() - Get the current osd_op_queue for an osd. 3. Modified test cases: - test_run_osd() - Add check for osd_max_backfill count. The mclock scheduler overrides the count to 1000. 4. New test cases: - test_activate_osd_after_mark_down() - test_get_op_scheduler() Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Neha Ojha	2c528248df	Merge pull request #42410 from ronen-fr/wip-ronenf-standalone-repair qa/standalone: fixing the timings when waiting for deep-scrub to start Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-21 06:57:41 -07:00
Ronen Friedman	ed45acee34	qa/standalone: fixing the timings when waiting for deep-scrub to start initiate_and_fetch_state() initiates a scrub, then polls the published PG state looking for 'scrubbing'. Calling flush_pg_stats() as part of the polling process might cause the scrub and the following recovery to be missed altogether. Note: this polling mechanism is definitely not robust. Will be redesigned in the future. Fixes: https://tracker.ceph.com/issues/51581 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-07-20 08:57:37 +03:00
Sage Weil	01c006c2de	Merge PR #42041 into master * refs/pull/42041/head: mgr/restful: ignore min/max_size test/crush: drop min/max_size refs qa/workunits/mon/pool_ops: remove test for min/max_size check qa: scrub a few remaining mentions of ruleset qa/standalone/mon/osd-*: fix tests PendingReleaseNotes: note min/max_size removal mgr/dashboard: remove max/min_size and ruleset mon/OSDMonitor: fix calls to CrushTester crush: eliminate min_size and max_size test/cli/crushtool: reunumber rulesets in test maps crushtool: require min/max or num-rep for --test crush: remove last traces of 'ruleset' test/cli/crushtool: use 'id' instead of 'ruleset' in crush inputs crushtool: take --min-rep and --max-rep explicitly crush/CrushTester: drop --ruleset doc: scrub 'ruleset' from docs src/erasure-code: rule, not ruleset mon/OSDMonitor: remove check_crush_rule() callers mon/OSDMonitor: rule, not ruleset crushtool: remove check for overlapped ruels crush/CrushWrapper: get_osd_pool_default_crush_replicated_ruleset -> rule crush: remove find_rule() mon/OSDMonitor: use pool's crush rule directly osd/OSDMap: drop checks for ruleset == ruleid osd/OSDMap: use pool's crush rule_id directly mon/PGMap: use pool's crush_rule directly mon/OSDMonitor: remove crush ruleset->rule rewrite Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com>	2021-07-14 14:38:59 -04:00
Sridhar Seshasayee	84cab65e3a	qa/standalone: Add missing teardowns to a subset of osd-scrub-repair tests Tests identified with missing teardown within osd-scrub-repair.sh: 1. TEST_periodic_scrub_replicated() 2. TEST_scrub_warning() 3. TEST_request_scrub_priority() Centralize setup and teardown within the run() function for all the tests. Fixes: https://tracker.ceph.com/issues/51580 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-08 13:31:42 +05:30
Sridhar Seshasayee	a96c34f0ee	qa/standalone: Add missing teardowns to a subset of osd tests The following files and tests in them did not teardown the cluster after a test completed. 1. osd/osd-force-create.sh 2. osd/osd-reuse-id.sh 3. osd/pg-split-merge.sh This wouldn't cause issues if the tests are run individually. But when running all the tests in the files mentioned above, it could introduce unexpected test failures down the line. For e.g., multiple tests may create pools with same name and if they are not cleaned up properly, this could result in unexpected failures in a subsequent test. Fixes: https://tracker.ceph.com/issues/51580 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-08 13:28:31 +05:30
Sage Weil	4fc7c3093c	qa/standalone/mon/osd-*: fix tests Signed-off-by: Sage Weil <sage@newdream.net>	2021-07-07 10:31:57 -04:00
Patrick Donnelly	d6c66f3fa6	qa,pybind/mgr: allow disabling .mgr pool This is mostly for testing: a lot of tests assume that there are no existing pools. These tests relied on a config to turn off creating the "device_health_metrics" pool which generally exists for any new Ceph cluster. It would be better to make these tests tolerant of the new .mgr pool but clearly there's a lot of these. So just convert the config to make it work. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2021-06-11 19:35:17 -07:00
Sridhar Seshasayee	94826eaadc	qa/standalone: Use osd op queue = wpq in activate_osd() This change is a follow-up to commit `b6e9c0903d` that set the scheduler to wpq in run_osd() and run_osd_filestore(). In addition, activate_osd() too has to set the scheduler type to 'wpq' in order to be consistent and avoid test failures. The above is a temporary measure until all the standalone tests are modified to run well with the mclock_scheduler. Fixes: https://tracker.ceph.com/issues/51074 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-06-09 15:02:58 +05:30
Ronen Friedman	d6eb3e3a3c	test: recovery_scrub: do not display 'repair' status on auto-repair deep-scrub A new test: auto_repair_bluestore_tag. Based on auto_repair_bluestore_basic. Sets auto-repair, starts a periodic deep-scrub, then verifies that the PG state, while scrubbing, is 'scrubbing+deep' and not 'scrubbing+deep+repair'. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-05-18 17:43:28 +03:00
Neha Ojha	b6e9c0903d	qa/standalone: use osd op queue = wpq mclock_scheduler is now the default and some of these tests need to be modified to run well with it. Continue using wpq until https://tracker.ceph.com/issues/50574 is addressed. Signed-off-by: Neha Ojha <nojha@redhat.com>	2021-05-06 17:54:38 +00:00
Loïc Dachary	7fe0ac7c11	qa: verify the benefits of mempool cacheline optimization There already is a test to verify the mempool sharding works, in the sense that it uses at least half of the variables available to count the number of allocated objects and their total size. This new test verifies that, with sharding, object counting is at least twice faster than without sharding. It also collects cacheline contention data with the perf c2c tool. The manual analysis of this data shows the optimization gain is indeed related to cacheline contention. Fixes: https://tracker.ceph.com/issues/49896 Signed-off-by: Loïc Dachary <loic@dachary.org>	2021-04-30 12:11:13 +08:00
Ilya Dryomov	7eb9c5ddb2	Merge branch 'master' into wip-unauthorized-gids Sync up with master up to commit `3d8e73b266` ("Merge pull request #40731 from tchaikov/wip-yamlize-options"). Specifically, bring in src/common/options.cc yamlization and move new auth-related options into src/common/options/global.yaml.in. Conflicts: src/common/options.cc src/common/options/global.yaml.in Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2021-04-13 15:42:06 +02:00
Sage Weil	dcd90a1c8d	Merge PR #40626 into master * refs/pull/40626/head: qa/suites/rados/objectstore: separate store_test tests qa/standalone: split osd/ into 2 directories Reviewed-by: Josh Durgin <jdurgin@redhat.com>	2021-04-12 22:38:49 -04:00
Sage Weil	0f65e5cffa	qa/standalone: split osd/ into 2 directories The whole osd/ directory takes 3 hours to run. Of that, about half is osd-backfill*: 2021-04-05T20:38:55.932 INFO:tasks.workunit:Running workunit osd/osd-backfill-prio.sh... 2021-04-05T20:47:27.184 INFO:tasks.workunit:Running workunit osd/osd-backfill-recovery-log.sh... 2021-04-05T20:55:59.497 INFO:tasks.workunit:Running workunit osd/osd-backfill-space.sh... 2021-04-05T21:48:47.549 INFO:tasks.workunit:Running workunit osd/osd-backfill-stats.sh... 2021-04-05T22:17:09.197 INFO:tasks.workunit:Running workunit osd/osd-bench.sh... Signed-off-by: Sage Weil <sage@newdream.net>	2021-04-12 09:59:17 -05:00
Ronen Friedman	b8045f7b18	Revert "test: Add test for scrub parallelism" This reverts commit `dd63577ab3`. As `08c3ede084` (the tested functionality) is reverted. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-04-07 08:37:03 +03:00
Sage Weil	72c4fc75ad	qa/standalone: default to disable insecure global id reclaim Signed-off-by: Sage Weil <sage@newdream.net>	2021-04-06 17:29:23 -04:00
Prashant D	92eb39ee6f	crush/CrushCompiler: print weight with uniform precision Fixes: https://tracker.ceph.com/issues/48508 Signed-off-by: Prashant D <pdhange@redhat.com>	2021-03-29 14:44:49 +11:00
David Zafman	eec821b6e5	test: osd-recovery-scrub.sh: Test fails if no scrubs happened for a recovering pg Change TEST_recovery_scrub_2 to create more objects and use osd_recovery_sleep to prevent recovery from finihing before we start to scrub. Verify that at least 1 scrub was started while the pg was reovering. Fixes: https://tracker.ceph.com/issues/49779 Signed-off-by: David Zafman <dzafman@redhat.com>	2021-03-14 16:19:46 -07:00
David Zafman	a4fd1d650e	Revert "qa/standalone/scrub/osd-recovery-scrub: fix unnoticed recovery state" This reverts commit `1323bdb839`. The tests needs to scrub while recovery is in progress, so catching recovery from the logs after the fact isn't the proper setup. We can use osd_recovery_sleep config. Signed-off-by: David Zafman <dzafman@redhat.com>	2021-03-13 11:40:55 -08:00
David Zafman	dd63577ab3	test: Add test for scrub parallelism Signed-off-by: David Zafman <dzafman@redhat.com>	2021-03-05 11:41:26 -08:00
Sage Weil	5e197a21e6	Merge PR #39455 into master * refs/pull/39455/head: doc/man/8/ceph: document --max option src/test/osd/safe-to-destroy: adjust test ceph: print command output to stdout even on error mgr/DaemonServer: include details in 'osd ok-to-stop' output mgr: add --max <n> to 'osd ok-to-stop' command mgr: relax osd ok-to-stop condition on degraded pgs Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Kefu Chai <kchai@redhat.com>	2021-02-27 10:15:27 -05:00
Sage Weil	33dee7d7bf	crush/CrushWrapper: update shadow trees on update_item() insert_item() already does this, but update_item did not. Fixes: https://tracker.ceph.com/issues/48065 Signed-off-by: Sage Weil <sage@newdream.net>	2021-02-22 14:21:04 -06:00
Sage Weil	722f57dee1	mgr: add --max <n> to 'osd ok-to-stop' command Given and initial (set of) osd(s), if provide up to N OSDs that can be stopped together without making PGs become unavailable. This can be used to quickly identify large(r) batches of OSDs that can be stopped together to (for example) upgrade. Signed-off-by: Sage Weil <sage@newdream.net>	2021-02-20 09:53:51 -05:00
Kefu Chai	8dc097ff46	qa/standalone/mon/misc: verify that len(monmap.features.persistent) == 9 in `beb62c029a`, FEATURE_QUINCY was added to ceph::features::mon::get_persistent(), so update the test accordingly. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-01-30 22:45:20 +08:00
Sage Weil	7bbc92eda3	mon: updates for quincy Signed-off-by: Sage Weil <sage@newdream.net>	2021-01-28 13:29:28 -06:00
Neha Ojha	5c11f40c12	Merge pull request #38856 from dzafman/wip-48789 test: Fix osd-scrub-scaps.sh to handle DB format change Reviewed-by: Ronen Friedman <rfriedma@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>	2021-01-15 16:27:59 -08:00
Neha Ojha	6fc9166af4	Merge pull request #38726 from ronen-fr/wip-ronenf-48720 qa/standalone/scrub/osd-recovery-scrub: handle primary change when waiting for scrub Reviewed-by: David Zafman <dzafman@redhat.com>	2021-01-15 13:46:30 -08:00
David Zafman	af9befb0f4	test: Fix osd-scrub-scaps.sh to handle DB format change Caused by: `f9c95fa7fc` Fixes: https://tracker.ceph.com/issues/48789 Signed-off-by: David Zafman <dzafman@redhat.com>	2021-01-15 10:35:30 -08:00
David Zafman	4814648155	test: osd-recovery-prio.sh replace sleep with wait for both PGs recovering fixes: https://tracker.ceph.com/issues/48842 Signed-off-by: David Zafman <dzafman@redhat.com>	2021-01-11 17:30:00 -08:00
Ronen Friedman	1323bdb839	qa/standalone/scrub/osd-recovery-scrub: fix unnoticed recovery state The 'recovering' state is transitory. Existing code looks for it by polling 'pg stat', missing from time to time. New version searches the tails of the relevant OSDs' logs. Fixes: https://tracker.ceph.com/issues/48719 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-01-04 13:29:41 +02:00
Ronen Friedman	bb848cfd90	qa/standalone/ceph-helpers.sh: log meaningful PIDs for run_in_background() While the relevant comment says: '# Execute the command and prepend the output with its pid' the actual PID logged is the same for all background processes, which isn't very helpful. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2020-12-28 10:47:02 +02:00
Ronen Friedman	445db7f171	qa/standalone/scrub/osd-recovery-scrub: handle a Primary change Stop waiting for a scrub to happen if the Primary for the target PG changes. Fixes: https://tracker.ceph.com/issues/48720 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2020-12-28 10:42:41 +02:00
Ronen Friedman	dff7faaf3c	qa/standalone/scrub/osd-scrub-snaps.sh: fix Python print syntax Fixes: https://tracker.ceph.com/issues/48690 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2020-12-21 16:52:27 +02:00
Kefu Chai	694ed23e9d	qa/standalone/misc/ver-health.sh: include the bootup-time in my test bed, it takes 11 seconds to boot the 3 OSDs and to restart one of them, this fails the test. so we need to take the time into consideration. in this change, the delay is added to the total "warn_older_version_delay", so the monitor does not start sending warning earlier than expected. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-12-11 16:14:03 +08:00
Kefu Chai	4bcfa139ab	mon/HealthMonitor: use timespan for mon_warn_older_version_delay for better user experience Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-12-11 16:12:47 +08:00
Kefu Chai	1f5406a752	src/*: do not pass cct to ceph_version_to_str() in `e5b1ae5554`, a new option named "debug_version_for_testing" is introduced to override the version so we can test version check. in crimson, we have two families of shared functions. - one of them is used by alien store. they are compiled with -DWITH_SEASTAR and -DWITH_ALIEN, to enable the shim code between seastar and POSIX thread. - another is used by crimson in general. where no lock is allowed. currently, we use the "crimson" and "ceph" namespace to differentiate these two families of functions, so they can colocate in the same executable without violating the ODR. see src/include/common_fwd.h for more details. the functions defined in src/common/version.cc are also shared by alien store and crimson code. and because we have different implementations of `CephContext` in crimson and in classic OSD (i.e. alienstore), we have to have different implementations of this function as well, if we follow the same approach. but since these functions are very simple and are non-blocking, there is not much value in differentiating them, it is better to inject the test settings using environment variable instead of using ceph option subsystem. in this change, "ceph_debug_version_for_testing" environment variable is checked instead, so that crimson and alienstore can share the same compilation unit of version.cc. and "debug_version_for_testing" option is removed. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-12-10 18:26:39 +08:00

1 2 3 4 5 ...

590 Commits