RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-03 01:22:53 +00:00

Author	SHA1	Message	Date
Yuri Weinstein	e538587e18	Merge pull request #54491 from jianwei1216/fix_osd_pg_stat_report_interval_max_cmain fix: resolve inconsistent judgment of osd_pg_stat_report_interval_max Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>	2024-01-24 13:30:50 -08:00
Laura Flores	0a2572c78c	Merge pull request #53191 from rzarzynski/wip-all-kickoff-s-2 kickoff v19 squid	2023-12-22 14:16:49 -06:00
Ronen Friedman	5c0ae99be5	test/osd: log scrub performance counters in osd-scrub-test.sh Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-12-20 18:56:06 +02:00
Ronen Friedman	eb5a5e990a	test/scrub: fix osd-recovery-scrub.sh to look for correct log message ... matching the code changes in the previous commit. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-12-11 01:30:06 -06:00
Radoslaw Zarzynski	dea8aa67c2	common, mon, qa: Mon-related updates for squid Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2023-12-04 16:27:51 +01:00
Radoslaw Zarzynski	da25e58dc1	doc, qa/standalone/mon/misc: verify that len(monmap.features.persistent) == 11 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2023-12-04 16:27:50 +01:00
Ronen Friedman	6f1e0e6825	tests/scrub: deactivate osd-scrub-dump stand-alone test as the scrub reservation changes had made it obsolete. Note - it is not an issue of fixing the test, but rather that the tested functionality is no longer there. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-12-01 08:48:49 -06:00
zhangjianwei2	629cb3232d	osd: distinguish between osd_pg_stat_report_max_(epoch\|seconds) osd_pg_stat_report_max was previously used as either a max time in seconds or a max number of epochs. Instead, separate into two configs and adjust PeeringState::prepare_stats_for_publish to check both. Additionally, this commit removes a superfluous check in PeeringState::Active::react(const AdvMap&) and calls publish_stats_to_osd unconditionally as with other callers in PeeringState. Fixes: https://tracker.ceph.com/issues/63520 Signed-off-by: zhangjianwei2 <zhangjianwei2@cmss.chinamobile.com>	2023-11-23 09:29:09 +08:00
Ronen Friedman	116c72bf1b	osd/scrub: maintain a set of remote reservations ... instead of a simple counter. This - as a preparation for the next commit, which will decouple the "being reserved" state from the handling of scrub requests. The planned changes to the scrub state machine will make it harder to know when to clear the "being reserved" state. The changes here would allow us to err on the side of caution, i.e. trying to "un-count" a remote reservation even if it was not actually reserved or was already deleted. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-11-18 11:44:32 -06:00
Ronen Friedman	fbb7d73e6f	tests/standalone: fix scrub-related tests following command changes Using ceph tell $pgid [deep]-scrub to initiate an 'operator initiated' scrub, and ceph tell $pgid schedule[-deep]-scrub for causing a 'periodic scrub' to be scheduled. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-11-15 03:14:15 -06:00
Ronen Friedman	0af4fc4261	osd/scrub: extract restrictions_on_scrubbing() from ScrubQueue::select_pg_and_scrub(). Clearing the path to moving some ScrubQueue methods into OscScrub. Starting here with the CPU load tracker. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-09-20 01:39:10 -05:00
Prashant D	990806e635	mon, qa: issue pool application warning even if pool is empty Ceph status fail to report pool application warning if the pool is empty. Report pool application warning even if pool has 0 objects stored in it. Add POOL_APP_NOT_ENABLED cluster warnings to log-ignorelist to fix rados suite. Fixes: https://tracker.ceph.com/issues/57097 Signed-off-by: Prashant D <pdhange@redhat.com>	2023-07-31 19:09:29 -04:00
Yuri Weinstein	95627c9673	Merge pull request #51909 from NitzanMordhai/wip-nitzan-divergent-priors-test-3-pick-osd qa/standalone/osd/divergent-prior.sh: Divergent test 3 with pg_autoscale_mode on pick divergent osd Reviewed-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>	2023-07-12 14:36:45 -04:00
Yuri Weinstein	5ae95880bf	Merge pull request #48209 from kamoltat/wip-ksirivad-fix-tracker-57570 osd/OSDMap: Check for uneven weights & != 2 buckets post stretch mode Reviewed-by: Greg Farnum <gfarnum@redhat.com>	2023-06-19 13:29:21 -04:00
Nitzan Mordechai	13c640b5a8	test: Divergent test 3 with pg_autoscale_mode on pick divergent osd When creating new pool, the current code pick the divergent osd by the first pg out of pg dump pgs, that can be in "unknown" status which means the up_primary = -1 and that will fail the test. We need to wait unitl the first pg is active+clean Fixes: https://tracker.ceph.com/issues/56034 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2023-06-06 05:54:39 +00:00
Ronen Friedman	83607c0610	qa/standalone: osd-recovery-scrub: fix slow updates and recovery concurrency 1. Setting frequent scrub status updates, to compensate for the removal of some 'send updates' in PR#50283. 2. Switching back to using the wpq scheduler, as otherwise the number of concurrent recovery operations is below what the test expects. Fixes: https://tracker.ceph.com/issues/61386 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-06-04 07:34:02 -05:00
Kamoltat	2c25b29347	qa/standalone/mon-stretch/mon-stretch-uneven-crush-weights.sh: init Initialize standalone test for stretched clusters, testing uneven weight warnings and != 2 buckets warnings. Added `wait_for_health_gone()` function in ceph-helpers.sh this function allows us to wait for health condition to disappear when doing standalone tests. Signed-off-by: Kamoltat <ksirivad@redhat.com>	2023-05-24 18:35:27 +00:00
Sridhar Seshasayee	414ac7dd2c	osd/scheduler: Reset ephemeral changes to mClock built-in profile This is a follow-up to PR: https://github.com/ceph/ceph/pull/48703. This commit also considers changes made ephemerally using either the 'daemon' or the 'tell' interfaces to override the built-in mClock QoS parameters. In such a scenario, the ephemeral changes are removed using the rm_val() method exposed by the config subsytem and logging this information. Other changes: 1. Add a standalone test to exercise the fix. 2. Add documentation note on the outcome of the attempt to modify built-in profile defaults. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-18 14:03:45 +05:30
Kamoltat Sirivadhna	78a43309b2	Merge pull request #50857 from kamoltat/wip-ksirivad-iswriteable mon/Monitor.cc: exit function if !osdmon()->is_writeable() Reviewd-by: Gregory Farnum <gfarnum@redhat.com>	2023-05-08 21:04:59 -04:00
Sridhar Seshasayee	4c22fcfbe8	qa/: Override mClock profile to 'high_recovery_ops' for qa tests The qa tests are not client I/O centric and mostly focus on triggering recovery/backfills and monitor them for completion within a finite amount of time. The same holds true for scrub operations. Therefore, an mClock profile that optimizes background operations is a better fit for qa related tests. The osd_mclock_profile is therefore globally overriden to 'high_recovery_ops' profile for the Rados suite as it fits the requirement. Also, many standalone tests expect recovery and scrub operations to complete within a finite time. To ensure this, the osd_mclock_profile options is set to 'high_recovery_ops' as part of the run_osd() function in ceph-helpers.sh. A subset of standalone tests explicitly used 'high_recovery_ops' profile. Since the profile is now set as part of run_osd(), the earlier overrides are redundant and therefore removed from the tests. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:22:00 +05:30
Samuel Just	5a649f3c94	common/options/osd.yaml.in: change mclock profile default to balanced Let's use the middle profile as the default. Modify the standalone tests accordingly. Signed-off-by: Samuel Just <sjust@redhat.com> Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:22:00 +05:30
Sridhar Seshasayee	b6a442c7cc	osd: Retain overridden mClock recovery settings across osd restarts Fix an issue where an overridden mClock recovery setting (set prior to an osd restart) could be lost after an osd restart. For e.g., consider that prior to an osd restart, the option 'osd_max_backfill' was successfully set to a value different from the mClock default. If the osd was restarted for some reason, the boot-up sequence was incorrectly resetting the backfill value to the mclock default within the async local/remote reservers. This fix ensures that no change is made if the current overriden value is different from the mClock default. Modify an existing standalone test to verify that the local and remote async reservers are updated to the desired number of backfills under normal conditions and also across osd restarts. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:22:00 +05:30
Sridhar Seshasayee	514cb598fb	osd: Modify mClock scheduler's cost model to represent cost in bytes The mClock scheduler's cost model for HDDs/SSDs is modified and now represents the cost of an IO in terms of bytes. The cost parameters, namely, osd_mclock_cost_per_io_usec_[hdd\|ssd] and osd_mclock_cost_per_byte_usec_[hdd\|ssd] which represent the cost of an IO in secs are inaccurate and therefore removed. The new model considers the following aspects of an osd to calculate the cost of an IO: - osd_mclock_max_capacity_iops_[hdd\|ssd] (existing option) The measured random write IOPS at 4 KiB block size. This is measured during OSD boot-up using OSD bench tool. - osd_mclock_max_sequential_bandwidth_[hdd\|ssd] (new config option) The maximum sequential bandwidth of of the underlying device. For HDDs, 150 MiB/s is considered, and for SSDs 750 MiB/s is considered in the cost calculation. The following important changes are made to arrive at the overall cost of an IO, 1. Represent QoS reservation and limit config parameter as proportion: The reservation and limit parameters are now set in terms of a proportion of the OSD's max IOPS capacity. The earlier representation was in terms of IOPS per OSD shard which required the user to perform calculations before setting the parameter. Representing the reservation and limit in terms of proportions is much more intuitive and simpler for a user. 2. Cost per IO Calculation: Using the above config options, osd_bandwidth_cost_per_io for the osd is calculated and set. It is the ratio of the max sequential bandwidth and the max random write iops of the osd. It is a constant and represents the base cost of an IO in terms of bytes. This is added to the actual size of the IO(in bytes) to represent the overall cost of the IO operation.See mClockScheduler::calc_scaled_cost(). 3. Cost calculation in Bytes: The settings for reservation and limit in terms a fraction of the OSD's maximum IOPS capacity is converted to Bytes/sec before updating the mClock server's ClientInfo structure. This is done for each OSD op shard using osd_bandwidth_capacity_per_shard shown below: (res\|lim) = (IOPS proportion) * osd_bandwidth_capacity_per_shard (Bytes/sec) (unitless) (bytes/sec) The above result is updated within the mClock server's ClientInfo structure for different op_scheduler_class operations. See mClockScheduler::ClientRegistry::update_from_config(). The overall cost of an IO operation (in secs) is finally determined during the tag calculations performed in the mClock server. See crimson::dmclock::RequestTag::tag_calc() for more details. 4. Profile Allocations: Optimize mClock profile allocations due to the change in the cost model and lower recovery cost. 5. Modify standalone tests to reflect the change in the QoS config parameter representation of reservation and limit options. Fixes: https://tracker.ceph.com/issues/58529 Fixes: https://tracker.ceph.com/issues/59080 Signed-off-by: Samuel Just <sjust@redhat.com> Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:21:59 +05:30
Kamoltat	431c4559c4	qa/standalone: create mon-stretch standalone test Separate `mon-stretch` from `mon`. Renamed `mon-stretched-cluster.sh` to `mon-stretch-fail-recovery.sh`. This isolation of stretch cluster test will enable developers to get results faster for stretch-cluster related stuff. Signed-off-by: Kamoltat <ksirivad@redhat.com>	2023-04-17 16:06:22 +00:00
Samuel Just	6a56d85f19	qa/standalone/scrub/osd-scrub-dump.sh: drop unnecessary primary lookup `1e44d86b2` swapped this to a pg tell command which doesn't actually need the primary specified. Drop the now unnecessary lookup. Signed-off-by: Samuel Just <sjust@redhat.com>	2023-04-11 20:39:19 -07:00
Yuri Weinstein	2042e95458	Merge pull request #49380 from Matan-B/wip-matanb-lrod-logs osd/PeeringState: Add logs around can_serve_replica_read() / last_complete_ondisk() Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Ronen Friedman <rfriedma@redhat.com>	2023-03-22 16:23:10 -07:00
Laura Flores	af986901a9	Merge pull request #50199 from athanatos/sjust/wip-scrub-event-helpers-50088 scrub: use a generic interface for scheduling timer based events	2023-03-09 15:30:23 -06:00
Matan Breizman	14a1819187	osd/PeeringState: Handle legacy testing logging Signed-off-by: Matan Breizman <mbreizma@redhat.com>	2023-03-08 14:17:15 +00:00
Ronen Friedman	dac0fce773	Merge pull request #50236 from ronen-fr/wip-rf-total-dump test/osd-scrub-dump.sh: fix scrub chunk size Reviewed-by: Samuel Just <sjust@redhat.com>-	2023-02-26 21:30:30 +02:00
Ronen Friedman	ce7e132e7d	test/osd-scrub-dump.sh: fix scrub chunk size The test performs shallow scrubs, intentionally using small chunk sizes to allow dump commands time to check specific details. Following commit `ffda64119f` (PR#44749), shallow scrubs chunks are controlled by a separate configuration parameter. This PR fixes the test to use the correct parameter. An additional minor change is an adjustment to the test loop sleep time: it is now reduced to guarantee that a dump followed by a counter increase will be performed in more-or-less the scrubs frequency. Fixes: https://tracker.ceph.com/issues/58797 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-02-26 15:06:19 +02:00
Radoslaw Zarzynski	6668814a17	Merge pull request #49528 from NitzanMordhai/wip-nitzan-filestore-removal Reef: filestore removal Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2023-02-23 12:20:49 +01:00
Samuel Just	d8be43e39a	qa/.../osd-scrub-test.sh: don't use /etc/fstab to corrupt the data Generally, it's more portable not to rely on specific system files to be readable. Specifically, container environments may not have an fstab. Instead, just generate another random file. Signed-off-by: Samuel Just <sjust@redhat.com>	2023-02-22 16:27:47 -08:00
Samuel Just	841903dafc	qa/.../osd-scrub-test.sh: add per-test log lines This should make it a bit easier to find the test boundaries in the output. Signed-off-by: Samuel Just <sjust@redhat.com>	2023-02-22 16:25:50 -08:00
Ronen Friedman	ffda64119f	osd/scrub: create a separate chunk size conf for shallow scrubs Using the existing common default chunk size for scrubs that are not deep scrubs is wasteful: a high ratio of inter-OSD messages per chunk, while the actual OSD work per chunk is minimal. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2023-02-14 07:58:01 +02:00
Nitzan Mordechai	e682b60617	test: Remove all filestore tests and use - test_trans convert FileStore to BlueStore test - xattr_bench convert FileStore to BlueStore usage - remove test_idempotent_sequence tests - remove test_idempotent - remove test_filejournal - removing filestore tests from store_test - remove rep_read_unfound test for filestore only - remove osd-dup convert filestore to bluestore - osd-scrub-repair start only bluestore osd - osd-pool-create remove filestore expected_num_object test - Remove chain_xattr and LFNIndex uneeded test Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>	2023-02-12 06:11:29 +00:00
Sridhar Seshasayee	b12780667f	osd: Restore defaults of mClock built-in profiles upon modification The QoS parameters (res, wgt, lim) of mClock profiles are not allowed to be modified by users using commands like "config set" or via admin socket. handle_conf_change() does not allow changes to any built-in mClock profile at the mClock scheduler. But the config subsystem showed the change as expected for the built-in mClock profile QoS parameters. This misled the user into thinking that the change was made at the mClock server when it was not the case. The above issue is the result of the config "levels" used by the config subsystem. The inital built-in QoS params are set at the CONF_DEFAULT level. This allows the user to modify the built-in QoS params using "config set" command which sets values at CONF_MON level which has higher priority than CONF_DEFAULT level. The new value is persisted on the mon store and therefore the config subsystem shows the change when "config show" command is issued. To prevent the above, this commit adds changes to restore the defaults set for the built-in profiles by removing the new config changes from the MON store. This results in the original defaults to come back into effect and maintain a consistent view of the built-in profile across all levels. To accomplish this, the mClock scheduler is provided with additional information like the OSD id, shard id and a pointer to the MonClient using which the Mon store command to remove the option is executed. A standalone test is added to verify that built-in params cannot be modified and the original profile params are retained. Fixes: https://tracker.ceph.com/issues/57533 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-01-09 19:54:44 +05:30
Kamoltat Sirivadhna	4aa8af29ae	Merge pull request #48991 from kamoltat/wip-ksirivad-fix-bz-2121452 mon/Elector: Change how we handle removed_ranks and notify_rank_removed() Reviewed-by: Gregory Farnum <gfarnum@redhat.com>	2022-12-15 17:07:38 -05:00
Sridhar Seshasayee	5b2fee21e8	qa: Allow tests to override recovery configs with mClock scheduler enabled Set osd_mclock_override_recovery_settings option to true for tests that modify recovery/backfill configuration options. This prevents logging of the cluster warning when modifying recovery/backfill limits. Fixes: https://tracker.ceph.com/issues/57529 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-12-12 18:12:46 +05:30
Sridhar Seshasayee	9c72116b1c	qa/standalone: Add/Modify tests to verify mclock recovery/backfill limits - Consolidate all mclock standalone tests under qa/standalone/misc/mclock-config.sh. - Revert existing tests in ceph-helpers.sh that verified the earlier hard override of recovery/backfill limits. - Add new tests to verify the procedure to change the recovery/backfill limits with mclock scheduler. Fixes: https://tracker.ceph.com/issues/57529 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-12-12 12:43:50 +05:30
Kamoltat	4d8b0c29bf	qa/standalone/mon: remove --mon-inital-members setting Problem: --mon-initial-members does nothing but causes monmap to populate ``removed_ranks`` because the way we start monitors in standalone tests uses ``run_mon $dir $id ..`` on each mon. Regardless of --mon-initial-members=a,b,c, if we set --mon-host=$MONA,$MONB,$MONC (which we do every single tests), everytime we run a monitor (e.g.,run mon.b) it will pre-build our monmap with ``` noname-a=mon.noname-a addrs v2:127.0.0.1:7127/0, b=mon.b addrs v2:127.0.0.1:7128/0, noname-c=mon.noname-c addrs v2:127.0.0.1:7129/0, ``` Now, with --mon-initial-members=a,b,c we are letting monmap know that we should have initial members name: a,b,c, which we only have `b` as a match. So what ``MonMap::set_initial_members`` do is that it will remove noname-a and noname-c which will populate `removed_ranks`. Solution: remove all instances of --mon-initial-members in the standalone test as it has no impact on the nature of the tests themselves. Fixes: https://tracker.ceph.com/issues/58132 Signed-off-by: Kamoltat <ksirivad@redhat.com>	2022-12-09 15:43:45 +00:00
Laura Flores	c1e6f7c470	qa/standalone/erasure-code: give osdmap 5 seconds to refresh Fixes: https://tracker.ceph.com/issues/57883 Signed-off-by: Laura Flores <lflores@redhat.com>	2022-10-25 17:03:24 +00:00
Radoslaw Zarzynski	bf46d3736d	Merge pull request #47458 from rzarzynski/wip-all-kickoff-r kickoff v18 reef Reviewed-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Casey Bodley <cbodley@redhat.com> Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com> Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Laura Flores <lflores@redhat.com>	2022-10-04 22:39:19 +02:00
Radoslaw Zarzynski	130704e815	doc, qa/standalone/mon/misc: verify that len(monmap.features.persistent) == 10 Also updates the release checklist. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2022-10-04 00:27:28 +02:00
Yuri Weinstein	0d5e2e5dc9	Merge pull request #47340 from kamoltat/wip-ksirivad-recreate-zilla-2104207 mon/OSDMonitor: Added extra check before mon.go_recovery_stretch_mode() Reviewed-by: Greg Farnum <gfarnum@redhat.com>	2022-10-03 13:18:18 -07:00
Radoslaw Zarzynski	905540db14	doc, common, mon, qa: Mon-related updates for reef This bases on two commits: * `7bbc92eda3` and * `6b22d47863` which seems to be a fixup to former one. In contrast to them, in `OSDMonitor::create_initial()` I updated also `newmap.require_osd_release` to pacific when `mon_debug_no_require_reef` and `mon_debug_no_require_quincy`. Please take have an extra look on that during the review. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>	2022-09-20 14:26:59 +00:00
Ronen Friedman	a85ef8e798	osd/scrub: modify SnapMapper.cc to use ceph::buffer::list ... systematically, over ceph::bufferlist. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2022-09-02 10:40:54 +03:00
Ronen Friedman	84d9c4d177	tests/osd: creating a Teuthology test re missing SnapMapper entries The test (in the standalone/scrub suite) verifies that the scrubber detects (and issues a cluster-log error) whenever a mapping entry ("SNA_") is missing in the SnapMapper DB. Specifically, here the entry is corrupted - shortened as per https://tracker.ceph.com/issues/56147. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>	2022-09-02 10:40:54 +03:00
Laura Flores	8ccd4e2533	Merge pull request #47046 from rzarzynski/wip-dup-trimming-test2 osd, tools, kv: non-aggressive, on-line trimming of accumulated dups	2022-08-26 14:07:44 -05:00
NitzanMordhai	c916f568aa	standalone/osd: Test adjust with new trimming function Change the number of dups trimmied according to the new loop. Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>	2022-08-24 08:19:18 +00:00
Kamoltat	62fe3cb8b9	qa/standalone/mon: init mon-stretched-cluster.sh Added bug reproducer for https://bugzilla.redhat.com/show_bug.cgi?id=2104207 Added more logs in MON. Signed-off-by: Kamoltat <ksirivad@redhat.com>	2022-08-09 18:27:17 +00:00

1 2 3 4 5 ...

658 Commits