Commit Graph

25 Commits

Author SHA1 Message Date
Sridhar Seshasayee
a86ead953d osd: Add snaptrim duration to pg dump stats.
Add the snaptrim duration to the json formatted output of the pg dump
stats. Define methods for a PG to set the snaptrim begin time and then to
calculate the total time spent to trim all the objects for the snaps in
the snap_trimq for the PG.

Tests:
  - Librados C and C++ API tests to verify the time spent for a snaptrim
    operation on a PG. These tests use the self-managed snaps APIs.
  - Standalone tests to verify snaptrim duration using rados pool snaps.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-03-16 00:33:24 +05:30
Sridhar Seshasayee
00249dc0cc mon, osd: Add objects trimmed to pg dump stats.
Add a new column, OBJECTS_TRIMMED, to the pg dump stats that shows the
number of objects trimmed when a snap is removed.

When a pg splits, the stats from the parent pg is copied to the child
pg. In such a case, reset objects_trimmed to 0 for the child pg
(see PeeringState::split_into()). Otherwise, this will result in incorrect
stats to be shown for a child pg after the split operation.

Tests:
 - Librados C and C++ API tests to verify the number of objects trimmed
   during snaptrim operation. These tests use the self-managed snaps APIs.
 - Standalone tests to verify objects trimmed using rados pool snaps.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-03-16 00:30:56 +05:30
Sridhar Seshasayee
464e9ea6c0 qa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout
Modified test cases:

1. ver-health.sh:
  a. TEST_check_version_health_1():
    To avoid intermittent timeouts observed in wait_for_health_string(),
    increase the wait time to 20 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sage Weil
722f57dee1 mgr: add --max <n> to 'osd ok-to-stop' command
Given and initial (set of) osd(s), if provide up to N OSDs that can be
stopped together without making PGs become unavailable.

This can be used to quickly identify large(r) batches of OSDs that can be
stopped together to (for example) upgrade.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-02-20 09:53:51 -05:00
Kefu Chai
694ed23e9d qa/standalone/misc/ver-health.sh: include the bootup-time
in my test bed, it takes 11 seconds to boot the 3 OSDs and to restart
one of them, this fails the test.

so we need to take the time into consideration. in this change, the
delay is added to the total "warn_older_version_delay", so the monitor
does not start sending warning earlier than expected.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-11 16:14:03 +08:00
Kefu Chai
4bcfa139ab mon/HealthMonitor: use timespan for mon_warn_older_version_delay
for better user experience

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-11 16:12:47 +08:00
Kefu Chai
1f5406a752 src/*: do not pass cct to ceph_version_to_str()
in e5b1ae5554, a new option named
"debug_version_for_testing" is introduced to override the version so
we can test version check.

in crimson, we have two families of shared functions.

- one of them is used by alien store. they are compiled with
  -DWITH_SEASTAR and -DWITH_ALIEN, to enable the shim code between
  seastar and POSIX thread.
- another is used by crimson in general. where no lock is allowed.

currently, we use the "crimson" and "ceph" namespace to differentiate
these two families of functions, so they can colocate in the same
executable without violating the ODR. see src/include/common_fwd.h for
more details.

the functions defined in src/common/version.cc are also shared by
alien store and crimson code. and because we have different
implementations of `CephContext` in crimson and in classic OSD (i.e.
alienstore), we have to have different implementations of this function
as well, if we follow the same approach. but since these functions are
very simple and are non-blocking, there is not much value in
differentiating them, it is better to inject the test settings using
environment variable instead of using ceph option subsystem.

in this change, "ceph_debug_version_for_testing" environment variable is
checked instead, so that crimson and alienstore can share the same
compilation unit of version.cc. and "debug_version_for_testing" option
is removed.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-10 18:26:39 +08:00
David Zafman
0a0ed890c2 test: Improve version checking test, to improve reliability
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-16 18:30:14 -08:00
David Zafman
870bde04a5 test: Changes based on code review comments
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:31:26 -08:00
David Zafman
93373746f5 osd test: Delay reporting until mon_warn_older_version_delay has passed
Move release notes description to 16.0.0 and update
Update documentation

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:10:11 -08:00
David Zafman
9d988c3dbc test: Simple test case for version health warning
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:10:11 -08:00
David Zafman
587cd64207
Merge pull request #32342 from dzafman/wip-43126
mon: Improvements to slow heartbeat health messages

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-25 17:42:00 -08:00
Sage Weil
76ea774c10 qa/standalone/misc/ok-to-stop: improve test
Make sure PGs peer (simply flushing state to mon isn't enough).

Fixes: https://tracker.ceph.com/issues/43721
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 13:24:30 -06:00
David Zafman
886475b5fe mon: Improvements to slow heartbeat health messages
Include crush parentage for each osd

Fixes: https://tracker.ceph.com/issues/43126

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-14 18:06:44 +00:00
Sage Weil
66690ea314 mgr/DaemonServer: fix 'osd ok-to-stop' for EC pools
We need to pay attention to account for CRUSH_ITEM_NONE entries in the
EC PG acting set.

Fixes: https://tracker.ceph.com/issues/43151
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-05 14:31:24 -06:00
David Zafman
6d2e4cb109 test: Allow fractional milliseconds to make test possible
Fixes: https://tracker.ceph.com/issues/41689

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-06 11:23:52 -07:00
David Zafman
5f83a6158b osd doc mon mgr: To milliseconds for config value, user input and threshold out
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-04 17:13:32 +00:00
David Zafman
4fb42ea27e test: Add basic test for network ping tracking
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-08-26 15:25:34 +00:00
Sage Weil
aa33a26e32 mon/MDSMonitor: add 'mds ok-to-stop' command
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 14:58:50 -05:00
Sage Weil
cfba0acc01 mon: add 'mon ok-to-{stop,add-offline,rm}' commands
Helpers to decide when it is safe to stop a mon, add a mon that is
not started, or remove a mon.  (Adding and start a mon would always
be safe, but it takes time to sync, so it's not really possible to do
quickly.)

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 11:05:52 -05:00
Kefu Chai
30b5b4627c Merge pull request #16494 from asomers/bin_bash
misc: Fix bash path in shebangs

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-27 10:14:14 +08:00
David Zafman
2a679a36de qa: Add support for specifying sub-tests with run-standalone.sh
Fix test-ceph-helpers.sh to pass additional arguments on

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
Alan Somers
3aae5ca6fd scripts: fix bash path in shebangs
/bin/bash is a Linuxism.  Other operating systems install bash to
different paths.  Use /usr/bin/env in shebangs to find bash.

Signed-off-by: Alan Somers <asomers@gmail.com>
2017-07-27 13:24:26 -06:00
Sage Weil
cabad62242 qa/standalone/ceph-helpers: factor rbd pool create out of run_mon
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
71ea171604 qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone
- stop running via make check
- add teuthology yamls to run them
- disable ceph_objecstore_tool.py for now (too slow for make check, and
we can't use vstart in teuthology via a package install)
- drop cephtool tests since those are already covered by other teuthology
tests
- leave a handful of (fast!) ceph-helpers tests for make check for minimal
integration tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00