RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-28 14:03:21 +00:00

Author	SHA1	Message	Date
Sridhar Seshasayee	414ac7dd2c	osd/scheduler: Reset ephemeral changes to mClock built-in profile This is a follow-up to PR: https://github.com/ceph/ceph/pull/48703. This commit also considers changes made ephemerally using either the 'daemon' or the 'tell' interfaces to override the built-in mClock QoS parameters. In such a scenario, the ephemeral changes are removed using the rm_val() method exposed by the config subsytem and logging this information. Other changes: 1. Add a standalone test to exercise the fix. 2. Add documentation note on the outcome of the attempt to modify built-in profile defaults. Fixes: https://tracker.ceph.com/issues/61155 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-18 14:03:45 +05:30
Samuel Just	5a649f3c94	common/options/osd.yaml.in: change mclock profile default to balanced Let's use the middle profile as the default. Modify the standalone tests accordingly. Signed-off-by: Samuel Just <sjust@redhat.com> Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:22:00 +05:30
Sridhar Seshasayee	b6a442c7cc	osd: Retain overridden mClock recovery settings across osd restarts Fix an issue where an overridden mClock recovery setting (set prior to an osd restart) could be lost after an osd restart. For e.g., consider that prior to an osd restart, the option 'osd_max_backfill' was successfully set to a value different from the mClock default. If the osd was restarted for some reason, the boot-up sequence was incorrectly resetting the backfill value to the mclock default within the async local/remote reservers. This fix ensures that no change is made if the current overriden value is different from the mClock default. Modify an existing standalone test to verify that the local and remote async reservers are updated to the desired number of backfills under normal conditions and also across osd restarts. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:22:00 +05:30
Sridhar Seshasayee	514cb598fb	osd: Modify mClock scheduler's cost model to represent cost in bytes The mClock scheduler's cost model for HDDs/SSDs is modified and now represents the cost of an IO in terms of bytes. The cost parameters, namely, osd_mclock_cost_per_io_usec_[hdd\|ssd] and osd_mclock_cost_per_byte_usec_[hdd\|ssd] which represent the cost of an IO in secs are inaccurate and therefore removed. The new model considers the following aspects of an osd to calculate the cost of an IO: - osd_mclock_max_capacity_iops_[hdd\|ssd] (existing option) The measured random write IOPS at 4 KiB block size. This is measured during OSD boot-up using OSD bench tool. - osd_mclock_max_sequential_bandwidth_[hdd\|ssd] (new config option) The maximum sequential bandwidth of of the underlying device. For HDDs, 150 MiB/s is considered, and for SSDs 750 MiB/s is considered in the cost calculation. The following important changes are made to arrive at the overall cost of an IO, 1. Represent QoS reservation and limit config parameter as proportion: The reservation and limit parameters are now set in terms of a proportion of the OSD's max IOPS capacity. The earlier representation was in terms of IOPS per OSD shard which required the user to perform calculations before setting the parameter. Representing the reservation and limit in terms of proportions is much more intuitive and simpler for a user. 2. Cost per IO Calculation: Using the above config options, osd_bandwidth_cost_per_io for the osd is calculated and set. It is the ratio of the max sequential bandwidth and the max random write iops of the osd. It is a constant and represents the base cost of an IO in terms of bytes. This is added to the actual size of the IO(in bytes) to represent the overall cost of the IO operation.See mClockScheduler::calc_scaled_cost(). 3. Cost calculation in Bytes: The settings for reservation and limit in terms a fraction of the OSD's maximum IOPS capacity is converted to Bytes/sec before updating the mClock server's ClientInfo structure. This is done for each OSD op shard using osd_bandwidth_capacity_per_shard shown below: (res\|lim) = (IOPS proportion) * osd_bandwidth_capacity_per_shard (Bytes/sec) (unitless) (bytes/sec) The above result is updated within the mClock server's ClientInfo structure for different op_scheduler_class operations. See mClockScheduler::ClientRegistry::update_from_config(). The overall cost of an IO operation (in secs) is finally determined during the tag calculations performed in the mClock server. See crimson::dmclock::RequestTag::tag_calc() for more details. 4. Profile Allocations: Optimize mClock profile allocations due to the change in the cost model and lower recovery cost. 5. Modify standalone tests to reflect the change in the QoS config parameter representation of reservation and limit options. Fixes: https://tracker.ceph.com/issues/58529 Fixes: https://tracker.ceph.com/issues/59080 Signed-off-by: Samuel Just <sjust@redhat.com> Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-05-08 16:21:59 +05:30
Sridhar Seshasayee	b12780667f	osd: Restore defaults of mClock built-in profiles upon modification The QoS parameters (res, wgt, lim) of mClock profiles are not allowed to be modified by users using commands like "config set" or via admin socket. handle_conf_change() does not allow changes to any built-in mClock profile at the mClock scheduler. But the config subsystem showed the change as expected for the built-in mClock profile QoS parameters. This misled the user into thinking that the change was made at the mClock server when it was not the case. The above issue is the result of the config "levels" used by the config subsystem. The inital built-in QoS params are set at the CONF_DEFAULT level. This allows the user to modify the built-in QoS params using "config set" command which sets values at CONF_MON level which has higher priority than CONF_DEFAULT level. The new value is persisted on the mon store and therefore the config subsystem shows the change when "config show" command is issued. To prevent the above, this commit adds changes to restore the defaults set for the built-in profiles by removing the new config changes from the MON store. This results in the original defaults to come back into effect and maintain a consistent view of the built-in profile across all levels. To accomplish this, the mClock scheduler is provided with additional information like the OSD id, shard id and a pointer to the MonClient using which the Mon store command to remove the option is executed. A standalone test is added to verify that built-in params cannot be modified and the original profile params are retained. Fixes: https://tracker.ceph.com/issues/57533 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2023-01-09 19:54:44 +05:30
Sridhar Seshasayee	9c72116b1c	qa/standalone: Add/Modify tests to verify mclock recovery/backfill limits - Consolidate all mclock standalone tests under qa/standalone/misc/mclock-config.sh. - Revert existing tests in ceph-helpers.sh that verified the earlier hard override of recovery/backfill limits. - Add new tests to verify the procedure to change the recovery/backfill limits with mclock scheduler. Fixes: https://tracker.ceph.com/issues/57529 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-12-12 12:43:50 +05:30
Sridhar Seshasayee	e0b5316171	osd: Set initial mClock QoS params at CONF_DEFAULT level Create the initial mClock QoS params at CONF_DEFAULT level using set_val_default(). This allows switching to a custom profile on a running OSD and to make necessary changes to the desired QoS params. Note that Switching to ‘custom’ profile and then subsequently changing the QoS params using “config set osd.n …” will be at a higher level i.e. at CONF_MON. But When switching back to a built-in profile, the new values won’t take effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the config keys created as part of the ‘custom’ profile must be removed from the ConfigMonitor store after switching back to a built-in profile. - Added a couple of standalone tests to exercise the scenario. - Updated the mClock configuration document and the mClock internal documentation with a couple of typos relating to the best effort weights. - Added new sections to the mClock configuration document outlining the steps to switch between the built-in and custom profile and vice-versa. Fixes: https://tracker.ceph.com/issues/55153 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-07-06 16:15:58 +05:30
Sridhar Seshasayee	a86ead953d	osd: Add snaptrim duration to pg dump stats. Add the snaptrim duration to the json formatted output of the pg dump stats. Define methods for a PG to set the snaptrim begin time and then to calculate the total time spent to trim all the objects for the snaps in the snap_trimq for the PG. Tests: - Librados C and C++ API tests to verify the time spent for a snaptrim operation on a PG. These tests use the self-managed snaps APIs. - Standalone tests to verify snaptrim duration using rados pool snaps. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-03-16 00:33:24 +05:30
Sridhar Seshasayee	00249dc0cc	mon, osd: Add objects trimmed to pg dump stats. Add a new column, OBJECTS_TRIMMED, to the pg dump stats that shows the number of objects trimmed when a snap is removed. When a pg splits, the stats from the parent pg is copied to the child pg. In such a case, reset objects_trimmed to 0 for the child pg (see PeeringState::split_into()). Otherwise, this will result in incorrect stats to be shown for a child pg after the split operation. Tests: - Librados C and C++ API tests to verify the number of objects trimmed during snaptrim operation. These tests use the self-managed snaps APIs. - Standalone tests to verify objects trimmed using rados pool snaps. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2022-03-16 00:30:56 +05:30
Sridhar Seshasayee	464e9ea6c0	qa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout Modified test cases: 1. ver-health.sh: a. TEST_check_version_health_1(): To avoid intermittent timeouts observed in wait_for_health_string(), increase the wait time to 20 secs. Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>	2021-07-30 18:16:00 +05:30
Sage Weil	722f57dee1	mgr: add --max <n> to 'osd ok-to-stop' command Given and initial (set of) osd(s), if provide up to N OSDs that can be stopped together without making PGs become unavailable. This can be used to quickly identify large(r) batches of OSDs that can be stopped together to (for example) upgrade. Signed-off-by: Sage Weil <sage@newdream.net>	2021-02-20 09:53:51 -05:00
Kefu Chai	694ed23e9d	qa/standalone/misc/ver-health.sh: include the bootup-time in my test bed, it takes 11 seconds to boot the 3 OSDs and to restart one of them, this fails the test. so we need to take the time into consideration. in this change, the delay is added to the total "warn_older_version_delay", so the monitor does not start sending warning earlier than expected. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-12-11 16:14:03 +08:00
Kefu Chai	4bcfa139ab	mon/HealthMonitor: use timespan for mon_warn_older_version_delay for better user experience Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-12-11 16:12:47 +08:00
Kefu Chai	1f5406a752	src/*: do not pass cct to ceph_version_to_str() in `e5b1ae5554`, a new option named "debug_version_for_testing" is introduced to override the version so we can test version check. in crimson, we have two families of shared functions. - one of them is used by alien store. they are compiled with -DWITH_SEASTAR and -DWITH_ALIEN, to enable the shim code between seastar and POSIX thread. - another is used by crimson in general. where no lock is allowed. currently, we use the "crimson" and "ceph" namespace to differentiate these two families of functions, so they can colocate in the same executable without violating the ODR. see src/include/common_fwd.h for more details. the functions defined in src/common/version.cc are also shared by alien store and crimson code. and because we have different implementations of `CephContext` in crimson and in classic OSD (i.e. alienstore), we have to have different implementations of this function as well, if we follow the same approach. but since these functions are very simple and are non-blocking, there is not much value in differentiating them, it is better to inject the test settings using environment variable instead of using ceph option subsystem. in this change, "ceph_debug_version_for_testing" environment variable is checked instead, so that crimson and alienstore can share the same compilation unit of version.cc. and "debug_version_for_testing" option is removed. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-12-10 18:26:39 +08:00
David Zafman	0a0ed890c2	test: Improve version checking test, to improve reliability Signed-off-by: David Zafman <dzafman@redhat.com>	2020-11-16 18:30:14 -08:00
David Zafman	870bde04a5	test: Changes based on code review comments Signed-off-by: David Zafman <dzafman@redhat.com>	2020-11-11 15:31:26 -08:00
David Zafman	93373746f5	osd test: Delay reporting until mon_warn_older_version_delay has passed Move release notes description to 16.0.0 and update Update documentation Signed-off-by: David Zafman <dzafman@redhat.com>	2020-11-11 15:10:11 -08:00
David Zafman	9d988c3dbc	test: Simple test case for version health warning Signed-off-by: David Zafman <dzafman@redhat.com>	2020-11-11 15:10:11 -08:00
David Zafman	587cd64207	Merge pull request #32342 from dzafman/wip-43126 mon: Improvements to slow heartbeat health messages Reviewed-by: Sage Weil <sage@redhat.com>	2020-02-25 17:42:00 -08:00
Sage Weil	76ea774c10	qa/standalone/misc/ok-to-stop: improve test Make sure PGs peer (simply flushing state to mon isn't enough). Fixes: https://tracker.ceph.com/issues/43721 Signed-off-by: Sage Weil <sage@redhat.com>	2020-01-20 13:24:30 -06:00
David Zafman	886475b5fe	mon: Improvements to slow heartbeat health messages Include crush parentage for each osd Fixes: https://tracker.ceph.com/issues/43126 Signed-off-by: David Zafman <dzafman@redhat.com>	2020-01-14 18:06:44 +00:00
Sage Weil	66690ea314	mgr/DaemonServer: fix 'osd ok-to-stop' for EC pools We need to pay attention to account for CRUSH_ITEM_NONE entries in the EC PG acting set. Fixes: https://tracker.ceph.com/issues/43151 Signed-off-by: Sage Weil <sage@redhat.com>	2019-12-05 14:31:24 -06:00
David Zafman	6d2e4cb109	test: Allow fractional milliseconds to make test possible Fixes: https://tracker.ceph.com/issues/41689 Signed-off-by: David Zafman <dzafman@redhat.com>	2019-09-06 11:23:52 -07:00
David Zafman	5f83a6158b	osd doc mon mgr: To milliseconds for config value, user input and threshold out Signed-off-by: David Zafman <dzafman@redhat.com>	2019-09-04 17:13:32 +00:00
David Zafman	4fb42ea27e	test: Add basic test for network ping tracking Signed-off-by: David Zafman <dzafman@redhat.com>	2019-08-26 15:25:34 +00:00
Sage Weil	aa33a26e32	mon/MDSMonitor: add 'mds ok-to-stop' command Signed-off-by: Sage Weil <sage@redhat.com>	2019-04-01 14:58:50 -05:00
Sage Weil	cfba0acc01	mon: add 'mon ok-to-{stop,add-offline,rm}' commands Helpers to decide when it is safe to stop a mon, add a mon that is not started, or remove a mon. (Adding and start a mon would always be safe, but it takes time to sync, so it's not really possible to do quickly.) Signed-off-by: Sage Weil <sage@redhat.com>	2019-04-01 11:05:52 -05:00
Kefu Chai	30b5b4627c	Merge pull request #16494 from asomers/bin_bash misc: Fix bash path in shebangs Reviewed-by: Willem Jan Withagen <wjw@digiware.nl> Reviewed-by: Kefu Chai <kchai@redhat.com>	2017-08-27 10:14:14 +08:00
David Zafman	2a679a36de	qa: Add support for specifying sub-tests with run-standalone.sh Fix test-ceph-helpers.sh to pass additional arguments on Signed-off-by: David Zafman <dzafman@redhat.com>	2017-08-10 08:30:47 -07:00
Alan Somers	3aae5ca6fd	scripts: fix bash path in shebangs /bin/bash is a Linuxism. Other operating systems install bash to different paths. Use /usr/bin/env in shebangs to find bash. Signed-off-by: Alan Somers <asomers@gmail.com>	2017-07-27 13:24:26 -06:00
Sage Weil	cabad62242	qa/standalone/ceph-helpers: factor rbd pool create out of run_mon Signed-off-by: Sage Weil <sage@redhat.com>	2017-07-24 22:11:50 -04:00
Sage Weil	71ea171604	qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone - stop running via make check - add teuthology yamls to run them - disable ceph_objecstore_tool.py for now (too slow for make check, and we can't use vstart in teuthology via a package install) - drop cephtool tests since those are already covered by other teuthology tests - leave a handful of (fast!) ceph-helpers tests for make check for minimal integration tests. Signed-off-by: Sage Weil <sage@redhat.com>	2017-07-24 22:11:49 -04:00

32 Commits