RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-30 15:33:31 +00:00

Author	SHA1	Message	Date
Yuri Weinstein	408c10613e	Merge pull request #54954 from diffs/main osd: add clear_shards_repaired command Reviewed-by: Ronen Friedman <rfriedma@redhat.com>	2024-10-30 11:10:42 -07:00
Yuri Weinstein	1c8bea0cbb	Merge pull request #57193 from NitzanMordhai/wip-nitzan-osd-recovery-standalone-test-wait-for-too-full Test: osd-recovery-space.sh extends the wait time for "recovery toofull" Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2024-07-30 07:50:16 -07:00
Nitzan Mordechai	a76bf0de58	test: ceph daemon command with asok path pg-split-merge using ceph daemon command to check merge. but it doesn't use asok path, which causes the check not to return the correct output. change the command to use asok path. Fixes: https://tracker.ceph.com/issues/65737 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2024-05-01 13:11:02 +00:00
Nitzan Mordechai	a7bd91dafb	Test: osd-recovery-space.sh extends the wait time for "recovery toofull". The osd-recovery-space test involves writing objects and expecting to receive the "toofull" flag. If we don't wait long enough, we might check the "toofull" flag before all objects have completed writing, and the "toofull" status hasn't been activated yet. The change will extend the waiting time and will also incorporate additional checks for the return code from the status wait. Fixes: https://tracker.ceph.com/issues/44510 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2024-05-01 06:21:35 +00:00
DanWritesCode	78d6bfe54c	osd: add clear_shards_repaired command This command will allow us to clear the OSD_TOO_MANY_REPAIRS alert by setting the shard repair count to 0. This will help in cases where the alert was a false positive, or a condition that has since cleared at the disk level. Often, zeroing out the repair count is better than muting the alert or restarting the OSD. Fixes: https://tracker.ceph.com/issues/54182 Co-authored-by: David Zafman <dzafman@redhat.com> Signed-off-by: Daniel Radjenovic <dradjenovic@digitalocean.com>	2024-03-04 16:08:48 -05:00
Nitzan Mordechai	13c640b5a8	test: Divergent test 3 with pg_autoscale_mode on pick divergent osd When creating new pool, the current code pick the divergent osd by the first pg out of pg dump pgs, that can be in "unknown" status which means the up_primary = -1 and that will fail the test. We need to wait unitl the first pg is active+clean Fixes: https://tracker.ceph.com/issues/56034 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2023-06-06 05:54:39 +00:00
Sridhar Seshasayee	4c22fcfbe8	qa/: Override mClock profile to 'high_recovery_ops' for qa tests The qa tests are not client I/O centric and mostly focus on triggering recovery/backfills and monitor them for completion within a finite amount of time. The same holds true for scrub operations. Therefore, an mClock profile that optimizes background operations is a better fit for qa related tests. The osd_mclock_profile is therefore globally overriden to 'high_recovery_ops' profile for the Rados suite as it fits the requirement. Also, many standalone tests expect recovery and scrub operations to complete within a finite time. To ensure this, the osd_mclock_profile options is set to 'high_recovery_ops' as part of the run_osd() function in ceph-helpers.sh. A subset of standalone tests explicitly used 'high_recovery_ops' profile. Since the profile is now set as part of run_osd(), the earlier overrides are redundant and therefore removed from the tests. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:22:00 +05:30
Matan Breizman	14a1819187	osd/PeeringState: Handle legacy testing logging Signed-off-by: Matan Breizman <mbreizma@redhat.com>	2023-03-08 14:17:15 +00:00
Nitzan Mordechai	e682b60617	test: Remove all filestore tests and use - test_trans convert FileStore to BlueStore test - xattr_bench convert FileStore to BlueStore usage - remove test_idempotent_sequence tests - remove test_idempotent - remove test_filejournal - removing filestore tests from store_test - remove rep_read_unfound test for filestore only - remove osd-dup convert filestore to bluestore - osd-scrub-repair start only bluestore osd - osd-pool-create remove filestore expected_num_object test - Remove chain_xattr and LFNIndex uneeded test Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>	2023-02-12 06:11:29 +00:00
Sridhar Seshasayee	5b2fee21e8	qa: Allow tests to override recovery configs with mClock scheduler enabled Set osd_mclock_override_recovery_settings option to true for tests that modify recovery/backfill configuration options. This prevents logging of the cluster warning when modifying recovery/backfill limits. Fixes: https://tracker.ceph.com/issues/57529 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-12-12 18:12:46 +05:30
NitzanMordhai	c916f568aa	standalone/osd: Test adjust with new trimming function Change the number of dups trimmied according to the new loop. Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2022-08-24 08:19:18 +00:00
NitzanMordhai	2aecf0d7b6	standalone/osd: Test log_dups_size output from pg query Add to the current test of log_size the log_dups_size output test Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2022-06-30 08:10:29 +00:00
Gabriel BenHanokh	a39b1f3cf7	tools/ceph-bluestore-tool: Fix bluefs-bdev-expand command Update allocation file when we expand-device Add the expended space to the allocator and then force an update to the allocation file There is also a new standalone test case for expand Fixes: https://tracker.ceph.com/issues/53699 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>	2022-01-12 18:07:59 +02:00
Igor Fedotov	efb67445c2	qa/osd-bluefs-volume-ops: retry data writing if spillover hasn't happened. Fixes: https://tracker.ceph.com/issues/52676 Signed-off-by: Igor Fedotov <ifedotov@suse.com>	2021-11-02 17:26:39 +03:00
Igor Fedotov	0b0f8ef12f	qa/osd-bluefs-volume-ops: reproduce bluefs migrate bug Reproduces: https://tracker.ceph.com/issues/40434 Signed-off-by: Igor Fedotov <ifedotov@suse.com>	2021-08-31 16:23:22 +03:00
Sridhar Seshasayee	2c577040cb	qa/standalone/osd: Modify osd tests for mclock scheduler Modified test cases: 1. osd-recovery-prio.sh: Set osd_op_queue = wpq for all tests since mclock doesn't consider recovery priority as part of its scheduling algorithm. 2. osd-recovery-stats.sh: a. TEST_recovery_undersized(): - Set osd_mclock_profile to high_recovery_ops profile. - Increase wait for recovery timeout to 300 secs. 3. osd-rep-recov-eio.sh: a. TEST_rep_backfill_unfound(): - Set osd_mclock_profile to high_recovery_ops profile. - Increase wait for backfill_unfound to 360 secs. 4. repeer-on-acting-back.sh: a. TEST_repeer_on_down_act(): - Set osd_mclock_profile to high_recovery_ops profile. (To improve the test duration) Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	a96c34f0ee	qa/standalone: Add missing teardowns to a subset of osd tests The following files and tests in them did not teardown the cluster after a test completed. 1. osd/osd-force-create.sh 2. osd/osd-reuse-id.sh 3. osd/pg-split-merge.sh This wouldn't cause issues if the tests are run individually. But when running all the tests in the files mentioned above, it could introduce unexpected test failures down the line. For e.g., multiple tests may create pools with same name and if they are not cleaned up properly, this could result in unexpected failures in a subsequent test. Fixes: https://tracker.ceph.com/issues/51580 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-08 13:28:31 +05:30
Sage Weil	0f65e5cffa	qa/standalone: split osd/ into 2 directories The whole osd/ directory takes 3 hours to run. Of that, about half is osd-backfill*: 2021-04-05T20:38:55.932 INFO:tasks.workunit:Running workunit osd/osd-backfill-prio.sh... 2021-04-05T20:47:27.184 INFO:tasks.workunit:Running workunit osd/osd-backfill-recovery-log.sh... 2021-04-05T20:55:59.497 INFO:tasks.workunit:Running workunit osd/osd-backfill-space.sh... 2021-04-05T21:48:47.549 INFO:tasks.workunit:Running workunit osd/osd-backfill-stats.sh... 2021-04-05T22:17:09.197 INFO:tasks.workunit:Running workunit osd/osd-bench.sh... Signed-off-by: Sage Weil <sage@newdream.net>	2021-04-12 09:59:17 -05:00
David Zafman	4814648155	test: osd-recovery-prio.sh replace sleep with wait for both PGs recovering fixes: https://tracker.ceph.com/issues/48842 Signed-off-by: David Zafman <dzafman@redhat.com>	2021-01-11 17:30:00 -08:00
David Zafman	ef47a3e708	test: set mon_allow_pool_size_one for consistency with original test intention Signed-off-by: David Zafman <dzafman@redhat.com>	2020-11-03 21:49:00 +00:00
David Zafman	3ba7ebd3e2	test: Avoid races by waiting for PGs go clean before query Fixes: https://tracker.ceph.com/issues/46405 Signed-off-by: David Zafman <dzafman@redhat.com>	2020-10-01 19:43:57 +00:00
David Zafman	b20a277f05	test: Inconsequential change to get object names as desired Signed-off-by: David Zafman <dzafman@redhat.com>	2020-09-29 18:01:24 +00:00
Kefu Chai	4f6443737e	Merge pull request #30838 from ifed01/wip-ifed-single-alloc os/bluestore: use single allocator for shared bluestore/bluefs device Reviewed-by: Sage Weil <sage@redhat.com>	2020-08-03 18:00:16 +08:00
Igor Fedotov	9a8f1ae492	os/bluestore: fix bluefs migrate/expand to match single allocator. Signed-off-by: Igor Fedotov <ifedotov@suse.com>	2020-07-31 15:36:47 +03:00
Dan van der Ster	b550112dba	qa/standalone/osd: add bad-inc-map.sh Test that the osd doesn't crash when it gets a bad incremental osdmap. Related-to: https://tracker.ceph.com/issues/46443 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>	2020-07-28 23:15:42 +02:00
David Zafman	661996d434	mgr: Warn when too many reads are repaired on an OSD Include test case Configurable by setting mon_osd_warn_num_repaired (default 10) Ignore new health warning with random eio injection test Fixes: https://tracker.ceph.com/issues/41564 Signed-off-by: David Zafman <dzafman@redhat.com>	2020-06-16 17:45:27 -07:00
David Zafman	92f970cbed	test: osd-backfill-stats.sh use nobackfill to avoid races in remaining test Fixes: https://tracker.ceph.com/issues/44314 Signed-off-by: David Zafman <dzafman@redhat.com>	2020-06-05 17:48:10 -07:00
Sage Weil	04e0b9c2f8	Merge PR #34126 into master * refs/pull/34126/head: qa/*/osd-backfill-recovery-log.sh: flush_pg_stats before checking log length Reviewed-by: Sage Weil <sage@redhat.com>	2020-03-23 13:55:16 -05:00
Neha	cfebec1b12	qa/*/osd-backfill-recovery-log.sh: flush_pg_stats before checking log length It is possible for the pg dump to not be the latest when we check for newprimary in _common_test(). This is because mgr_stats_period is 5 seconds, and we may not have fetched the latest stats just yet. This causes the test to look at the same stats before and after wait_for_clean. Fixes: https://tracker.ceph.com/issues/43807 (2) Signed-off-by: Neha Ojha <nojha@redhat.com>	2020-03-23 15:37:12 +00:00
Kefu Chai	b0dca75a59	Merge pull request #34056 from xiexingguo/wip-44662 qa/*/osd-markdown.sh: propagate map to osd before testing its reaction Reviewed-by: Neha Ojha <nojha@redhat.com>	2020-03-21 14:27:51 +08:00
xie xingguo	afdff0cd3f	qa/*/osd-markdown.sh: propagate map to osd before testing its reaction Mon might fail to share the newest map with any of up osds, e.g., due to an injected broken pipe. Since we don't have any client activities during the osd-markdown tests, osds might be unaware of the map changes made through CLI. Make sure osds have pulled the newest map down before we can test its reaction correctly. Fixes: https://tracker.ceph.com/issues/44662 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>	2020-03-19 18:17:28 +08:00
Neha	6edd1cb686	qa/standalone/osd/osd-backfill-stats.sh: get_latest_osdmap to propagate map change Fixes: https://tracker.ceph.com/issues/44518 Signed-off-by: Neha Ojha <nojha@redhat.com>	2020-03-18 22:57:41 +00:00
Deepika Upadhyay	21508bd9dd	mon/OSDMonitor: add flag `--yes-i-really-mean-it` for setting pool size 1 Adds option `mon_allow_pool_size_one` which will be disabled by default to ensure pools are not configured without replicas. If the user still wants to use pool size 1, they will have to change the value of `mon_allow_pool_size_one` to true and then have to pass flag `--yes-i-really-mean-it` to cli command: Example: `ceph osd pool test set size 1 --yes-i-really-mean-it` Fixes: https://tracker.ceph.com/issues/44025 Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>	2020-03-09 23:27:36 +05:30
xie xingguo	023524a26d	osd/PeeringState: restart peering on any previous down acting member coming back One of our customers wants to verify the data safety of Ceph during scaling the cluster up, and the test case looks like: - keep checking the status of a speficied pg, who's up is [1, 2, 3] - add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5], pg is remapped - stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized - restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set, hence leaving the corresponding pg pinning undersized for a long time until all backfill targets completes It does not pose any critical problem -- we'll end up getting that pg back into active + clean, except that the long live DEGRADED warnings keep bothering our customer who cares about data safety more than any thing else. The right way to achieve the above goal is for: boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt) to check whether the newly booted node could be validly chosen for the acting set and request a new temp mapping. The new temp mapping would then trigger a real interval change that will get rid of the DEGRADED warning. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>	2020-02-21 17:52:52 +08:00
Sage Weil	f10cc22c60	Merge PR #32961 into master * refs/pull/32961/head: qa/standalone/osd/osd-bench: debug bluestore Reviewed-by: Neha Ojha <nojha@redhat.com>	2020-01-30 10:42:17 -06:00
Sage Weil	b99e506a3f	qa/standalone/osd/osd-bench: debug bluestore Looking for https://tracker.ceph.com/issues/43888 Signed-off-by: Sage Weil <sage@redhat.com>	2020-01-29 07:43:41 -06:00
David Zafman	e18519ad09	test: Update pg log test for new trimming behavior Fixes: https://tracker.ceph.com/issues/43864 Signed-off-by: David Zafman <dzafman@redhat.com>	2020-01-28 15:23:45 -08:00
Neha	b20817795a	qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_2 Fixes: https://tracker.ceph.com/issues/43807 Signed-off-by: Neha Ojha <nojha@redhat.com>	2020-01-24 22:42:04 +00:00
Neha	994698277b	qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_1 Fixes: https://tracker.ceph.com/issues/43807 Signed-off-by: Neha Ojha <nojha@redhat.com>	2020-01-24 22:20:21 +00:00
David Zafman	9f7aabbe9f	test: Fix wait_for_state() to wait for a PG to get into a state To avoid confusion fix function names in osd-backfill-space.sh for how they actually work. Fixes: https://tracker.ceph.com/issues/43592 Signed-off-by: David Zafman <dzafman@redhat.com>	2020-01-13 18:39:38 -08:00
David Zafman	676d882649	test: Improve races by using kill_daemons which waits for OSDs terminate osd-backfill-space.sh: More sleep time to make sure the backfill gets started Signed-off-by: David Zafman <dzafman@redhat.com>	2019-12-06 19:44:06 -08:00
David Zafman	43f6218993	test: Use activate_osd() when restarting OSDs Signed-off-by: David Zafman <dzafman@redhat.com>	2019-12-05 15:13:31 -08:00
Sage Weil	8994a65242	qa/standalone/osd/divergent-priors: add reproducer for bug 41816 Reproducer for https://tracker.ceph.com/issues/41816 Signed-off-by: Sage Weil <sage@redhat.com>	2019-09-21 10:09:15 -05:00
David Zafman	b98950e707	osd: Rename dump_reservations to dump_recovery_reservations Signed-off-by: David Zafman <dzafman@redhat.com>	2019-09-10 13:32:29 -07:00
David Zafman	fa698e18e1	mon: Improve health status for backfill_toofull and recovery_toofull Treat backfull_toofull as a warning condition because it can resolve itself. Includes test case for PG_BACKFILL_FULL Includes test case for recovery_toofull / PG_RECOVERY_FULL Fixes: https://tracker.ceph.com/issues/39555 Signed-off-by: David Zafman <dzafman@redhat.com>	2019-06-20 02:22:01 +00:00
David Zafman	7959159e83	test: Adding standalone test of log copy handling Signed-off-by: David Zafman <dzafman@redhat.com>	2019-05-10 15:31:51 -07:00
sjust@redhat.com	252d5c20cf	osd/: move stat updates and publishing to PeeringState Signed-off-by: Samuel Just <sjust@redhat.com>	2019-05-01 11:22:24 -07:00
David Zafman	66b041fa4a	Merge pull request #27769 from dzafman/wip-39333 osd-backfill-space.sh test failed in TEST_backfill_multi_partial() Reviewed-by: Neha Ojha <nojha@redhat.com>	2019-04-26 11:55:04 -07:00
David Zafman	9931023457	test: osd-backfill-spsace.sh doesn't matter which PG wins the race Fixes: http://tracker.ceph.com/issues/39333 Signed-off-by: David Zafman <dzafman@redhat.com>	2019-04-26 10:11:00 -07:00
David Zafman	39cc14bdc1	Merge pull request #27503 from dzafman/wip-39099 osd: Give recovery for inactive PGs a higher priority Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>	2019-04-25 15:06:56 -07:00

1 2 3

145 Commits