RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-19 01:21:49 +00:00

Author	SHA1	Message	Date
Sridhar Seshasayee	4c22fcfbe8	qa/: Override mClock profile to 'high_recovery_ops' for qa tests The qa tests are not client I/O centric and mostly focus on triggering recovery/backfills and monitor them for completion within a finite amount of time. The same holds true for scrub operations. Therefore, an mClock profile that optimizes background operations is a better fit for qa related tests. The osd_mclock_profile is therefore globally overriden to 'high_recovery_ops' profile for the Rados suite as it fits the requirement. Also, many standalone tests expect recovery and scrub operations to complete within a finite time. To ensure this, the osd_mclock_profile options is set to 'high_recovery_ops' as part of the run_osd() function in ceph-helpers.sh. A subset of standalone tests explicitly used 'high_recovery_ops' profile. Since the profile is now set as part of run_osd(), the earlier overrides are redundant and therefore removed from the tests. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:22:00 +05:30
Sridhar Seshasayee	5b2fee21e8	qa: Allow tests to override recovery configs with mClock scheduler enabled Set osd_mclock_override_recovery_settings option to true for tests that modify recovery/backfill configuration options. This prevents logging of the cluster warning when modifying recovery/backfill limits. Fixes: https://tracker.ceph.com/issues/57529 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-12-12 18:12:46 +05:30
Laura Flores	c1e6f7c470	qa/standalone/erasure-code: give osdmap 5 seconds to refresh Fixes: https://tracker.ceph.com/issues/57883 Signed-off-by: Laura Flores <lflores@redhat.com>	2022-10-25 17:03:24 +00:00
Zack Cerza	b57539dc94	Revert "qa: support isal ec test for aarch64" This commit has been causing scheduled jobs to request e.g. aarch64 smithi machines, which don't exist. The dispatcher then tries to find them forever, requiring the dispatcher to be killed and restarted. The queue will sit idle until someone notices the problem. Signed-off-by: Zack Cerza <zack@redhat.com>	2021-10-12 12:53:58 -06:00
Dai Zhiwei	eaa385f3da	qa: support isal ec test for aarch64 modified: qa/standalone/erasure-code/test-erasure-code-plugins.sh new file: qa/suites/rados/thrash-erasure-code-isa/arch/aarch64.yaml Signed-off-by: Dai Zhiwei <daizhiwei3@huawei.com>	2021-10-08 14:37:25 +08:00
Sridhar Seshasayee	f658ff3511	qa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler Modified test cases: 1. test-erasure-eio.sh: a. Test_ec_backfill_unfound(): - Set osd_mclock_profile to high_recovery_ops profile. - Increase the wait for backfill_unfound timeout to 240 secs. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
David Zafman	5b0ba0e5a8	test: Modify test to check new feature might_have_unfound added to list_unfound Signed-off-by: David Zafman <dzafman@redhat.com>	2020-09-14 13:06:29 -07:00
David Zafman	43f6218993	test: Use activate_osd() when restarting OSDs Signed-off-by: David Zafman <dzafman@redhat.com>	2019-12-05 15:13:31 -08:00
Samuel Just	5ea5c47152	test-erasure-eio: first eio may be fixed during recovery The changes to the way EC/ReplicatedBackend communicate read t showerrors had a side effect of making first eio on the object in TEST_rados_get_subread_eio_shard_[01] repair itself depending on the timing of the killed osd recovering. The test should be improved to actually test that behavior at some point. Signed-off-by: Samuel Just <sjust@redhat.com>	2019-05-01 11:22:28 -07:00
David Zafman	69fa515c95	test: Make most tests use default objectstore bluestore Change run_osd() to default objectstore bluestore Use run_osd_filestore() to use the non-default objectstore Fix inject_eio to handle any objectstore if config prefixed with type Remaining tests using filestore: osd-pool-create.sh TEST_pool_create_rep_expected_num_objects Test filestore directory creation qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore Obvious qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound Requires data digest in object info qa/standalone/scrub/osd-scrub-repair.sh multiple tests Erasure code pools append mode for filestore is tested qa/standalone/special/ceph_objectstore_tool.py Test code verifies COT by directly examining filestore contents Fixes: https://tracker.ceph.com/issues/39162 Signed-off-by: David Zafman <dzafman@redhat.com>	2019-04-10 08:55:04 -07:00
Sage Weil	4bb4f7a891	Merge PR #26894 into nautilus * refs/pull/26894/head: qa/standalone/erasure-code/test-erasure-code: adjust test to avoid m=0 erasure-code: ensure m >= 1 mon/OSDMonitor: set ec min_size to k + min(1, m - 1) Reviewed-by: Kefu Chai <kchai@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>	2019-03-13 22:07:45 -05:00
Sage Weil	52d5797c3d	qa/standalone/erasure-code/test-erasure-code: adjust test to avoid m=0 _DD is k=2 m=0, which we don't allow. Switch it to cDD. I confess I don't fully understand why this was _DD to begin with, but I'm pretty sure mapping is there to control the order of results so that it can be mapped to the CRUSH rule output sanely, and the coding portion is not relevant to the test. Signed-off-by: Sage Weil <sage@redhat.com>	2019-03-13 12:46:50 -05:00
David Zafman	51a45e796e	qa/test-erasure-code.sh: Don't grep entire bluestore directory Bluestore caused grep crash with "grep: memory exhausted" due to size of "block" storage. Fixes: http://tracker.ceph.com/issues/38678 Signed-off-by: David Zafman <dzafman@redhat.com>	2019-03-11 18:47:29 -07:00
David Zafman	8114a2619b	qa: Can't wait for clean when there aren't any pools/PGs. Fixes: http://tracker.ceph.com/issues/38678 Signed-off-by: David Zafman <dzafman@redhat.com>	2019-03-11 16:02:48 -07:00
Sage Weil	2ad02fbfe3	qa/standalone/erasure-code/test-erasure-eio.sh: still need to create rbd pool Signed-off-by: Sage Weil <sage@redhat.com>	2019-03-09 09:34:49 -06:00
Sage Weil	cba0483b09	qa/standalone: make sure an osd is running before create_rbd_pool 'rbd pool init' now does IO. Drop the pool, or change the pool size to 1. Fixes: http://tracker.ceph.com/issues/38585 Signed-off-by: Sage Weil <sage@redhat.com>	2019-03-06 16:27:56 -06:00
Vikhyat Umrao	8a694fc2f9	qa: specify filestore for misc tests Signed-off-by: Vikhyat Umrao <vumrao@redhat.com> Signed-off-by: Sage Weil <sage@redhat.com>	2019-01-16 13:09:19 -06:00
huanwen ren	f1219d716d	qa/osd: fixup osd-rep-recov-eio.sh fails to parse pg dump Fixes: http://tracker.ceph.com/issues/36418 Signed-off-by: huanwen ren <ren.huanwen@zte.com.cn>	2018-10-16 02:18:22 +08:00
xie xingguo	85ba2f0a82	osd/PrimaryLogPG: s/list_missing/list_unfound/ Also: - Do not print offset until specified - Count missing objects correctly (used to be primary's local missing) Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>	2018-09-06 09:52:20 +08:00
cuixf	3eb1679b1f	osd: retry to read object attrs at EC recovery In EC recovery read, if the object's attrs read failed or with errors, we erase the attrs we have read and try to read it again from left shards. This will make the primary osd get the object's attrs correct and avoid assert. Signed-off-by: xiaofei cui <cuixiaofei@sangfor.com>	2018-06-01 06:26:56 -04:00
Josh Durgin	d4808256d2	osd/ECBackend: preserve requests for other objects when sending extra reads When multiple objects are in flight for the same ReadOp, swap() on the map<hobject_t, read_request_t> would remove requests for all objects. We just want to replace the requests for the single object we're dealing with in send_all_remaining_reads(). This prevents crashing trying to look up rop.to_read[hoid] when another object in the same ReadOp gets an EIO and tries to send more requests. Test this by using osd-recovery-max-single-start to bundle multiple reads into one ReadOp. Save and restore CEPH_ARGS so custom settings are reset for each test. Fixes: http://tracker.ceph.com/issues/23195 (the 2nd crash there) Signed-off-by: Josh Durgin <jdurgin@redhat.com>	2018-04-20 19:42:15 -04:00
Josh Durgin	b162a5478d	osd/ECBackend: recover from EIO based on the minimum data necessary Discount shards that already returned EIO, and use minimum_to_decode() to request just what is necessary to recover or read the originally requested extents of the object. Signed-off-by: Josh Durgin <jdurgin@redhat.com>	2018-04-20 19:42:14 -04:00
Josh Durgin	468ad4b410	osd/ECBackend: only check required shards when finishing recovery reads `1235810c2a` allowed recovery to use multiple passes of reads to handle EIO, but the end condition for checking whether we finished reading requires the full data to be decodable (this is what get_want_to_read_shards returns). This is just a loss of efficiency normally, since when there is only one object the subsequent read works, and grabs all the data necessary. The crash comes from having multiple objects in the same ReadOp - in this case the sequence of events is: - start recovery of two objects (osd_recovery_max_single_start > 1) - read object a shard 3 - read object b shard 3 - fail minimum_to_decode because shard 3 can't reconstruct all of object a - re-read all of object a, marking more reads in progress - fail minimum_to_decode because shard 3 can't reconstruct all of object b - skip re-reading object because there are now reads in progress - finish reading k shards of object a - still fail minimum_to_decode for object b, so no extra data was read - send_all_remaining_reads tries to lookup object b in ReadOp object - crash dereferencing to_read[object b], since this was cleared after handling the original object b read reply This patch fixes the immediate inefficiency and crash by only checking for the missing shards that were requested, rather than the entire object, for recovery reads. Fixes: http://tracker.ceph.com/issues/23195 (first crash) Signed-off-by: Josh Durgin <jdurgin@redhat.com>	2018-04-20 19:42:14 -04:00
Kefu Chai	fc43ae1724	qa/standalone: s/delete_erasure_pool/delete_erasure_coded_pool/ it's a regression introduced by `ac56a202` Signed-off-by: Kefu Chai <kchai@redhat.com>	2018-03-01 19:09:31 +08:00
Kefu Chai	ac56a202fd	qa/standalone: extract delete_pool() some tests, like osd-backfill-stats.sh are using delete_pool(), but they don't have this function defined. and this function is defined in standalone tests separately, so would be simpler if we can consolidate them in ceph-helper.sh. Signed-off-by: Kefu Chai <kchai@redhat.com>	2018-02-28 15:40:28 +08:00
David Zafman	69b5fc54fe	test: Cleanup test-erasure-eio.sh code Signed-off-by: David Zafman <dzafman@redhat.com>	2017-10-18 11:12:14 -07:00
David Zafman	bb2bcb95f5	osd: Add new UnfoundBackfill and UnfoundRecovery pg transitions Signed-off-by: David Zafman <dzafman@redhat.com>	2017-10-18 11:01:39 -07:00
David Zafman	b9de5eec26	test: Test case that reproduces tracker 18162 recover_replicas: object added to missing set for backfill, but is not in recovering, error! Signed-off-by: David Zafman <dzafman@redhat.com>	2017-10-18 10:58:23 -07:00
David Zafman	1235810c2a	osd: Allow recovery to send additional reads For now it doesn't include non-acting OSDs Added test for this case Signed-off-by: David Zafman <dzafman@redhat.com>	2017-09-28 23:31:18 -07:00
David Zafman	f92aa6c824	test: Allow modified options to existing setup functions Signed-off-by: David Zafman <dzafman@redhat.com>	2017-09-28 23:31:18 -07:00
David Zafman	43e3206de2	test: Use feature to get last array element Signed-off-by: David Zafman <dzafman@redhat.com>	2017-09-28 23:31:18 -07:00
David Zafman	50e08b0a5d	test: Add a removal test for erasure code read Test feature: http://tracker.ceph.com/issues/14513 Signed-off-by: David Zafman <dzafman@redhat.com>	2017-09-13 13:15:52 -07:00
Kefu Chai	30b5b4627c	Merge pull request #16494 from asomers/bin_bash misc: Fix bash path in shebangs Reviewed-by: Willem Jan Withagen <wjw@digiware.nl> Reviewed-by: Kefu Chai <kchai@redhat.com>	2017-08-27 10:14:14 +08:00
David Zafman	574b3cd3d4	qa: Add common generalized inject_eio() to ceph-helpers.sh Retry for a while to allow pool to appear Signed-off-by: David Zafman <dzafman@redhat.com>	2017-08-10 08:30:47 -07:00
David Zafman	99ad4bbd91	qa: Add create_pool() which sleeps 1 second like python variant wait_for_clean() can miss the new pool if it races with pool create. Fixes: http://tracker.ceph.com/issues/20465 Signed-off-by: David Zafman <dzafman@redhat.com>	2017-08-04 06:38:09 -07:00
Alan Somers	3aae5ca6fd	scripts: fix bash path in shebangs /bin/bash is a Linuxism. Other operating systems install bash to different paths. Use /usr/bin/env in shebangs to find bash. Signed-off-by: Alan Somers <asomers@gmail.com>	2017-07-27 13:24:26 -06:00
Sage Weil	cabad62242	qa/standalone/ceph-helpers: factor rbd pool create out of run_mon Signed-off-by: Sage Weil <sage@redhat.com>	2017-07-24 22:11:50 -04:00
Sage Weil	71ea171604	qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone - stop running via make check - add teuthology yamls to run them - disable ceph_objecstore_tool.py for now (too slow for make check, and we can't use vstart in teuthology via a package install) - drop cephtool tests since those are already covered by other teuthology tests - leave a handful of (fast!) ceph-helpers tests for make check for minimal integration tests. Signed-off-by: Sage Weil <sage@redhat.com>	2017-07-24 22:11:49 -04:00

38 Commits