RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-29 15:03:33 +00:00

Author	SHA1	Message	Date
Ronen Friedman	dac0fce773	Merge pull request #50236 from ronen-fr/wip-rf-total-dump test/osd-scrub-dump.sh: fix scrub chunk size Reviewed-by: Samuel Just <sjust@redhat.com>-	2023-02-26 21:30:30 +02:00
Ronen Friedman	ce7e132e7d	test/osd-scrub-dump.sh: fix scrub chunk size The test performs shallow scrubs, intentionally using small chunk sizes to allow dump commands time to check specific details. Following commit `ffda64119f` (PR#44749), shallow scrubs chunks are controlled by a separate configuration parameter. This PR fixes the test to use the correct parameter. An additional minor change is an adjustment to the test loop sleep time: it is now reduced to guarantee that a dump followed by a counter increase will be performed in more-or-less the scrubs frequency. Fixes: https://tracker.ceph.com/issues/58797 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-02-26 15:06:19 +02:00
Radoslaw Zarzynski	6668814a17	Merge pull request #49528 from NitzanMordhai/wip-nitzan-filestore-removal Reef: filestore removal Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2023-02-23 12:20:49 +01:00
Ronen Friedman	ffda64119f	osd/scrub: create a separate chunk size conf for shallow scrubs Using the existing common default chunk size for scrubs that are not deep scrubs is wasteful: a high ratio of inter-OSD messages per chunk, while the actual OSD work per chunk is minimal. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-02-14 07:58:01 +02:00
Nitzan Mordechai	e682b60617	test: Remove all filestore tests and use - test_trans convert FileStore to BlueStore test - xattr_bench convert FileStore to BlueStore usage - remove test_idempotent_sequence tests - remove test_idempotent - remove test_filejournal - removing filestore tests from store_test - remove rep_read_unfound test for filestore only - remove osd-dup convert filestore to bluestore - osd-scrub-repair start only bluestore osd - osd-pool-create remove filestore expected_num_object test - Remove chain_xattr and LFNIndex uneeded test Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>	2023-02-12 06:11:29 +00:00
Sridhar Seshasayee	b12780667f	osd: Restore defaults of mClock built-in profiles upon modification The QoS parameters (res, wgt, lim) of mClock profiles are not allowed to be modified by users using commands like "config set" or via admin socket. handle_conf_change() does not allow changes to any built-in mClock profile at the mClock scheduler. But the config subsystem showed the change as expected for the built-in mClock profile QoS parameters. This misled the user into thinking that the change was made at the mClock server when it was not the case. The above issue is the result of the config "levels" used by the config subsystem. The inital built-in QoS params are set at the CONF_DEFAULT level. This allows the user to modify the built-in QoS params using "config set" command which sets values at CONF_MON level which has higher priority than CONF_DEFAULT level. The new value is persisted on the mon store and therefore the config subsystem shows the change when "config show" command is issued. To prevent the above, this commit adds changes to restore the defaults set for the built-in profiles by removing the new config changes from the MON store. This results in the original defaults to come back into effect and maintain a consistent view of the built-in profile across all levels. To accomplish this, the mClock scheduler is provided with additional information like the OSD id, shard id and a pointer to the MonClient using which the Mon store command to remove the option is executed. A standalone test is added to verify that built-in params cannot be modified and the original profile params are retained. Fixes: https://tracker.ceph.com/issues/57533 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-01-09 19:54:44 +05:30
Kamoltat Sirivadhna	4aa8af29ae	Merge pull request #48991 from kamoltat/wip-ksirivad-fix-bz-2121452 mon/Elector: Change how we handle removed_ranks and notify_rank_removed() Reviewed-by: Gregory Farnum <gfarnum@redhat.com>	2022-12-15 17:07:38 -05:00
Sridhar Seshasayee	5b2fee21e8	qa: Allow tests to override recovery configs with mClock scheduler enabled Set osd_mclock_override_recovery_settings option to true for tests that modify recovery/backfill configuration options. This prevents logging of the cluster warning when modifying recovery/backfill limits. Fixes: https://tracker.ceph.com/issues/57529 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-12-12 18:12:46 +05:30
Sridhar Seshasayee	9c72116b1c	qa/standalone: Add/Modify tests to verify mclock recovery/backfill limits - Consolidate all mclock standalone tests under qa/standalone/misc/mclock-config.sh. - Revert existing tests in ceph-helpers.sh that verified the earlier hard override of recovery/backfill limits. - Add new tests to verify the procedure to change the recovery/backfill limits with mclock scheduler. Fixes: https://tracker.ceph.com/issues/57529 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-12-12 12:43:50 +05:30
Kamoltat	4d8b0c29bf	qa/standalone/mon: remove --mon-inital-members setting Problem: --mon-initial-members does nothing but causes monmap to populate ``removed_ranks`` because the way we start monitors in standalone tests uses ``run_mon $dir $id ..`` on each mon. Regardless of --mon-initial-members=a,b,c, if we set --mon-host=$MONA,$MONB,$MONC (which we do every single tests), everytime we run a monitor (e.g.,run mon.b) it will pre-build our monmap with ``` noname-a=mon.noname-a addrs v2:127.0.0.1:7127/0, b=mon.b addrs v2:127.0.0.1:7128/0, noname-c=mon.noname-c addrs v2:127.0.0.1:7129/0, ``` Now, with --mon-initial-members=a,b,c we are letting monmap know that we should have initial members name: a,b,c, which we only have `b` as a match. So what ``MonMap::set_initial_members`` do is that it will remove noname-a and noname-c which will populate `removed_ranks`. Solution: remove all instances of --mon-initial-members in the standalone test as it has no impact on the nature of the tests themselves. Fixes: https://tracker.ceph.com/issues/58132 Signed-off-by: Kamoltat <ksirivad@redhat.com>	2022-12-09 15:43:45 +00:00
Laura Flores	c1e6f7c470	qa/standalone/erasure-code: give osdmap 5 seconds to refresh Fixes: https://tracker.ceph.com/issues/57883 Signed-off-by: Laura Flores <lflores@redhat.com>	2022-10-25 17:03:24 +00:00
Radoslaw Zarzynski	bf46d3736d	Merge pull request #47458 from rzarzynski/wip-all-kickoff-r kickoff v18 reef Reviewed-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Casey Bodley <cbodley@redhat.com> Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com> Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Laura Flores <lflores@redhat.com>	2022-10-04 22:39:19 +02:00
Radoslaw Zarzynski	130704e815	doc, qa/standalone/mon/misc: verify that len(monmap.features.persistent) == 10 Also updates the release checklist. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2022-10-04 00:27:28 +02:00
Yuri Weinstein	0d5e2e5dc9	Merge pull request #47340 from kamoltat/wip-ksirivad-recreate-zilla-2104207 mon/OSDMonitor: Added extra check before mon.go_recovery_stretch_mode() Reviewed-by: Greg Farnum <gfarnum@redhat.com>	2022-10-03 13:18:18 -07:00
Radoslaw Zarzynski	905540db14	doc, common, mon, qa: Mon-related updates for reef This bases on two commits: * `7bbc92eda3` and * `6b22d47863` which seems to be a fixup to former one. In contrast to them, in `OSDMonitor::create_initial()` I updated also `newmap.require_osd_release` to pacific when `mon_debug_no_require_reef` and `mon_debug_no_require_quincy`. Please take have an extra look on that during the review. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2022-09-20 14:26:59 +00:00
Ronen Friedman	a85ef8e798	osd/scrub: modify SnapMapper.cc to use ceph::buffer::list ... systematically, over ceph::bufferlist. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2022-09-02 10:40:54 +03:00
Ronen Friedman	84d9c4d177	tests/osd: creating a Teuthology test re missing SnapMapper entries The test (in the standalone/scrub suite) verifies that the scrubber detects (and issues a cluster-log error) whenever a mapping entry ("SNA_") is missing in the SnapMapper DB. Specifically, here the entry is corrupted - shortened as per https://tracker.ceph.com/issues/56147. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2022-09-02 10:40:54 +03:00
Laura Flores	8ccd4e2533	Merge pull request #47046 from rzarzynski/wip-dup-trimming-test2 osd, tools, kv: non-aggressive, on-line trimming of accumulated dups	2022-08-26 14:07:44 -05:00
NitzanMordhai	c916f568aa	standalone/osd: Test adjust with new trimming function Change the number of dups trimmied according to the new loop. Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2022-08-24 08:19:18 +00:00
Kamoltat	62fe3cb8b9	qa/standalone/mon: init mon-stretched-cluster.sh Added bug reproducer for https://bugzilla.redhat.com/show_bug.cgi?id=2104207 Added more logs in MON. Signed-off-by: Kamoltat <ksirivad@redhat.com>	2022-08-09 18:27:17 +00:00
Ronen Friedman	ba77676163	osd/scrub: modify scrub behaviour under no-scrub Fix no-scrub & nodeep-scrub related code to match requirements: - deep scrubs should be allowed to execute when no-scrub is set; - some initiated scrubs (i.e. not periodic ones) might be changed from the requested 'deep' to 'shallow'. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2022-08-08 09:55:10 +00:00
Ronen Friedman	b04ef6ebfc	tests/osd: fix a test to follow an output formatting change PR#47255 modified the formatting of std::set-s. That broke the osd-scrub-snaps.sh standalone test. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2022-08-04 15:46:11 +00:00
Yuri Weinstein	48d78184f0	Merge pull request #46561 from NitzanMordhai/wip-nitzan-add-pglog-dups-length osd, mon: add pglog dups length Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2022-07-21 13:29:07 -07:00
Sridhar Seshasayee	e0b5316171	osd: Set initial mClock QoS params at CONF_DEFAULT level Create the initial mClock QoS params at CONF_DEFAULT level using set_val_default(). This allows switching to a custom profile on a running OSD and to make necessary changes to the desired QoS params. Note that Switching to ‘custom’ profile and then subsequently changing the QoS params using “config set osd.n …” will be at a higher level i.e. at CONF_MON. But When switching back to a built-in profile, the new values won’t take effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the config keys created as part of the ‘custom’ profile must be removed from the ConfigMonitor store after switching back to a built-in profile. - Added a couple of standalone tests to exercise the scenario. - Updated the mClock configuration document and the mClock internal documentation with a couple of typos relating to the best effort weights. - Added new sections to the mClock configuration document outlining the steps to switch between the built-in and custom profile and vice-versa. Fixes: https://tracker.ceph.com/issues/55153 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-07-06 16:15:58 +05:30
NitzanMordhai	2aecf0d7b6	standalone/osd: Test log_dups_size output from pg query Add to the current test of log_size the log_dups_size output test Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2022-06-30 08:10:29 +00:00
Sridhar Seshasayee	3aa2df2e0f	qa/standalone: Fix test_activate_osd() test in ceph-helpers.sh Modify test_activate_osd() to get the type of scheduler in use and then verify the value of osd_max_backfills. This is because mclock scheduler overrides this option to 1000 upon OSD initialization. The test earlier used to pass because the OSD daemon was killed but not marked down and upon being brought up, the wait for OSD up check was passing quickly. But the OSD still didn't have the latest config values. But now upon killing the OSD, the osd_fast_shutdown sequence notifies the mon (see PR: https://github.com/ceph/ceph/pull/44807) and is marked down and dead. Upon bringing it up, the wait for OSD up check takes a longer time and this is sufficient for the config values to be updated. This results in the correct values being read from the config 'Values' map. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-03-25 22:10:31 +05:30
Sridhar Seshasayee	a86ead953d	osd: Add snaptrim duration to pg dump stats. Add the snaptrim duration to the json formatted output of the pg dump stats. Define methods for a PG to set the snaptrim begin time and then to calculate the total time spent to trim all the objects for the snaps in the snap_trimq for the PG. Tests: - Librados C and C++ API tests to verify the time spent for a snaptrim operation on a PG. These tests use the self-managed snaps APIs. - Standalone tests to verify snaptrim duration using rados pool snaps. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-03-16 00:33:24 +05:30
Sridhar Seshasayee	00249dc0cc	mon, osd: Add objects trimmed to pg dump stats. Add a new column, OBJECTS_TRIMMED, to the pg dump stats that shows the number of objects trimmed when a snap is removed. When a pg splits, the stats from the parent pg is copied to the child pg. In such a case, reset objects_trimmed to 0 for the child pg (see PeeringState::split_into()). Otherwise, this will result in incorrect stats to be shown for a child pg after the split operation. Tests: - Librados C and C++ API tests to verify the number of objects trimmed during snaptrim operation. These tests use the self-managed snaps APIs. - Standalone tests to verify objects trimmed using rados pool snaps. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-03-16 00:30:56 +05:30
Ronen Friedman	d654839222	test: osd-scrub-snaps.sh: fix expected 'missing snaps' log string Fix the expected log message to match the scrub code, by removing the redundant part. Fixes: https://tracker.ceph.com/issues/54458 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2022-03-03 08:03:00 +00:00
Yuri Weinstein	e419a29be5	Merge pull request #42735 from amathuria/wip-amathuria-scrub-stats osd/scrub: Add stats to PG dump for number of objects scrubbed Reviewed-by: Ronen Friedman <rfriedma@redhat.com>	2022-01-14 10:46:28 -08:00
Gabriel BenHanokh	a39b1f3cf7	tools/ceph-bluestore-tool: Fix bluefs-bdev-expand command Update allocation file when we expand-device Add the expended space to the allocator and then force an update to the allocation file There is also a new standalone test case for expand Fixes: https://tracker.ceph.com/issues/53699 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>	2022-01-12 18:07:59 +02:00
Aishwarya Mathuria	91885f1a87	qa/standalone: add test to check if objects_scrubbed is equal to number of objects in a PG once a scrub finishes Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>	2022-01-12 14:57:40 +05:30
Yuri Weinstein	58faf5712e	Merge pull request #43919 from ronen-fr/wip-rf-test-nodeep osd/scrub (& qa/standalone): test for scrub behavior when no-scrub is set but no-deep-scrub is not Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>	2021-12-08 13:04:57 -08:00
Neha Ojha	89d5b2a79e	Merge pull request #43336 from ifed01/wip-fix-bluefs-volumes-ops qa/osd-bluefs-volume-ops: fix bluefs volumes ops test case Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>	2021-12-02 08:39:41 -08:00
Ronen Friedman	20dd022715	qa/standalone: osd-scrub-repair.sh: fix expected "not scrubbed since" warnings count Following PR#43244, the 'ceph tell pg deep_scrub' now sets both deep-scrub and "regular" scrub time-stamps. This necessitated a modification to TEST_scrub_warning, as more PGs in this test are late for their regular scrubbing. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-11-23 16:43:21 +00:00
Ronen Friedman	7008b85fc5	qa/standalone: test for scrub behavior when noscrub is set but nodeepscrub is not A bug (https://tracker.ceph.com/issues/52901 - now fixed) resulted in this combination of conditions leaving the PG in "scrubbing" state forever. That bug was fixed by PR#43521. The patch here adds a test to detect the (now fixed) wrong behavior. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-11-23 15:08:42 +00:00
Ronen Friedman	9dda986bd5	qa/standalone: fix scrub/osd-scrub-dump following changes to 'pg dump pgs' output Make osd-scrub-dump test ignore the 'scrubbing' that might be late to disappear from the modified (PR #43403) 'pg dump' output. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-11-12 18:43:41 +00:00
Ronen Friedman	10909c3cba	osd/scrub: update the stand-alone tests to check 'scrub scheduling' entries Analyzing and verifying the relevant entries in 'pg query' and 'pg dump' output. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-11-05 17:07:57 +02:00
Igor Fedotov	efb67445c2	qa/osd-bluefs-volume-ops: retry data writing if spillover hasn't happened. Fixes: https://tracker.ceph.com/issues/52676 Signed-off-by: Igor Fedotov <ifedotov@suse.com>	2021-11-02 17:26:39 +03:00
Ronen Friedman	52e9fa16ef	tests: modify osd-scrub-repair to match PR #43239 changes PR #43239 has modified ECBackend::get_hash_info() behavior. Modified the standalone scrub test to match. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2021-10-20 06:42:51 +00:00
Zack Cerza	b57539dc94	Revert "qa: support isal ec test for aarch64" This commit has been causing scheduled jobs to request e.g. aarch64 smithi machines, which don't exist. The dispatcher then tries to find them forever, requiring the dispatcher to be killed and restarted. The queue will sit idle until someone notices the problem. Signed-off-by: Zack Cerza <zack@redhat.com>	2021-10-12 12:53:58 -06:00
Dai Zhiwei	eaa385f3da	qa: support isal ec test for aarch64 modified: qa/standalone/erasure-code/test-erasure-code-plugins.sh new file: qa/suites/rados/thrash-erasure-code-isa/arch/aarch64.yaml Signed-off-by: Dai Zhiwei <daizhiwei3@huawei.com>	2021-10-08 14:37:25 +08:00
Aishwarya Mathuria	1b4e416f81	osd/scrub: Add scrub duration to pg dump stats Addition of a new column, SCRUB_DURATION, to the pg stats that stores the time taken for a PG scrub. Fixes: https://tracker.ceph.com/issues/52605 Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>	2021-10-01 13:27:27 +05:30
Neha Ojha	e273418bbb	Merge pull request #42604 from sseshasa/wip-skip-osd-benchmark osd: Add config option to skip running the osd benchmark during init and update documentation. Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>	2021-09-08 11:03:09 -07:00
Sridhar Seshasayee	f539bedc96	qa/standalone: Add standalone test to validate osd-mclock-skip-benchmark option Add a standalone test - test_activate_osd_skip_benchmark() in ceph-helpers.sh that exercises the osd-mclock-skip-benchmark option. Fixes: https://tracker.ceph.com/issues/52025 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-09-01 14:19:03 +05:30
Igor Fedotov	0b0f8ef12f	qa/osd-bluefs-volume-ops: reproduce bluefs migrate bug Reproduces: https://tracker.ceph.com/issues/40434 Signed-off-by: Igor Fedotov <ifedotov@suse.com>	2021-08-31 16:23:22 +03:00
Sridhar Seshasayee	464e9ea6c0	qa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout Modified test cases: 1. ver-health.sh: a. TEST_check_version_health_1(): To avoid intermittent timeouts observed in wait_for_health_string(), increase the wait time to 20 secs. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	33d2a2c93b	qa/standalone/scrub: Force a subset of scrub tests to use "wpq" scheduler The following tests in the test files mentioned below use the "osd_scrub_sleep" option to introduce delays during scrubbing to help determine scrubbing states, validate reservations during scrubbing etc.. This works when using the "wpq" scheduler. But when the "mclock_scheduler" is enabled, the "osd_scrub_sleep" is disabled and overridden to 0. This is done to delegate the scheduling of the background scrubs to the "mclock_scheduler" based on the set QoS parameters. Due to this, the checks to verify the scrub states, reservations etc. fail since the window to check them is very short due to scrubs completing very quickly. This affects a small subset of scrub tests mentioned below, 1. osd-scrub-dump.sh -> TEST_recover_unexpected() 2. osd-scrub-repair.sh -> TEST_auto_repair_bluestore_tag() 3. osd-scrub-test.sh -> TEST_scrub_abort(), TEST_deep_scrub_abort() Only for the above tests, until there's a reliable way to query scrub states with "--osd-scrub-sleep" set to 0, the "osd_op_queue" config option is set to "wpq". Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	f658ff3511	qa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler Modified test cases: 1. test-erasure-eio.sh: a. Test_ec_backfill_unfound(): - Set osd_mclock_profile to high_recovery_ops profile. - Increase the wait for backfill_unfound timeout to 240 secs. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sridhar Seshasayee	bdf36cf045	qa/standalone/osd-backfill: Modify backfill tests for mclock scheduler Modified test cases: 1. osd-backfill-prio.sh: Set osd_op_queue = wpq for all tests since the mclock doesn't consider recovery priority as part of its scheduling algorithm. 2. osd-backfill-space.sh: Set osd_mclock_profile to high_recovery_ops and increase the wait for backfills timeout to 1200 secs for the following tests: - TEST_backfill_test_simple() - TEST_backfill_test_multi() - TEST_backfill_test_sametarget() - TEST_backfill_multi_partial() - TEST_ec_backfill_simple() - TEST_ec_backfill_multi() - SKIP_TEST_ec_backfill_multi_partial() - SKIP_TEST_ec_backfill_multi_partial() 3. osd-backfill-stats: - TEST_backfill_ec_down_all_out(): Set osd_mclock_profile to high_recovery_ops and increase the wait for recovery timeout to 240 secs. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30

1 2 3 4 5 ...

628 Commits