Commit Graph

29 Commits

Author SHA1 Message Date
Daniel R
d8bf71a8fc
Split cluster health state by plus sign
PR #226
2023-01-24 17:42:34 -05:00
Daniel R
50874e99af revert health_status_interp to gauge 2022-10-12 18:03:20 -04:00
Daniel R
ae64dae6f8 add a comment indicating gauge deprecation 2022-10-11 11:58:42 -04:00
Daniel R
b9af3ab29f bugfixes; stop defaulting map flags to 0 in the constmetric 2022-10-07 14:53:46 -04:00
Daniel R
1a741d7606 introduce constmetrics for osdmap flags 2022-10-07 14:17:29 -04:00
Daniel R
c3a3d581aa migrate health checks from gauges to constmetrics 2022-10-06 16:20:24 -04:00
Daniel R
362cb4b8dd fix health unit tests 2022-10-06 16:20:19 -04:00
Daniel R
5dd16fe875 migrate pool_usage.go to constmetrics 2022-10-06 16:20:10 -04:00
Tyler Brekke
957b06df91 Add user to exporter for use with rbd/rgw commands 2022-08-25 15:20:57 -07:00
Tyler Brekke
19a3cd5c7e Add rbd-mirror health status 2022-08-24 14:23:23 -07:00
Xavier Villaneau
ae09ffe3fe Add hostname label to ceph_crash_reports 2022-06-16 13:20:29 -04:00
Xavier Villaneau
2faa6cb82d Fix comments and docstring in getCrashLs 2022-06-15 17:04:04 -04:00
Xavier Villaneau
3141fef319 Use JSON output from ceph crash ls instead of plain output 2022-06-15 17:04:04 -04:00
Xavier Villaneau
adf792c3e8 Use ConstMetrics for ceph_crash_reports
Makes the code simpler since we're not tracking state anymore.
Also rewrote the tests to be more in-line with the rest.
2022-06-15 17:04:04 -04:00
Xavier Villaneau
74c89af225 Implement new gauge counting crash reports
New metric: `ceph_crash_reports` which counts the entries returned by
`ceph crash ls` by daemon name and archival status.

This is not the same as `ceph_new_crash_reports` which is the value of
the `RECENT_CRASH` health check, and that only counts the non-archived
errors of the past two weeks. The new metric counts errors as long as
they are not purged (which is done after 1 year by defaults).
2022-06-15 17:04:04 -04:00
AKYD
763e5ecd21 Normalize ceph-ansible version format 2022-05-25 11:49:04 +03:00
Joshua Baergen
ebd166be2d ceph: Support the Octopus+ mgrmap format. 2022-04-12 08:52:04 -06:00
Joshua Baergen
4e0f8910a4 Add missing tests for Octopus+ osdmap format.
In TestClusterHealthCollector, test all supported versions by default,
and split the osdmap tests for Nautilus vs. Octopus+. There were a
number of tests that included an osdmap that didn't need it, and the
osdmap was removed from them so that version-specific testing would not
be required.
2022-04-12 08:52:01 -06:00
haoyixing
407248ce1d feat: add misplaced ratio metric
Misplaced ratio equals to misplaced_objects deviding misplaced_total, not misplaced_objects / num_objects.
So add a separate metric to show misplaced ratio.

Signed-off-by: haoyixing <haoyixing@kuaishou.com>
2022-03-29 18:38:15 -07:00
Kyle
917a468065 update deps and reduce a warn to debug 2022-03-29 17:44:50 -07:00
Kyle
1d7bac531d update license headers 2022-03-23 14:02:21 -07:00
Kyle
4d817f487d fix staticcheck errors 2022-03-23 12:24:28 -07:00
Kyle
d6b67a77c3 removed down osd duplicate filtering 2022-03-22 12:59:51 -07:00
Kyle
3a0b289eda filter duplicate OSD nodes for down health check and fix health tests 2022-03-21 15:28:20 -07:00
Kyle
b806cf51bb remove pre-nautilus health check code 2022-03-21 14:52:34 -07:00
Kyle
df7435b259 add DAEMON_OLD_VERSION health check, update readme, remove makefile 2022-03-21 13:56:19 -07:00
Kyle
2122a3331f support flattened osdmap format added in octopus 2022-03-16 14:13:57 -07:00
Xavier Villaneau
6f83fdd300 Restructure so that tests do not depend on go-ceph
- `ceph.Conn` interface no longer depends on go-ceph/rados,
  now defines its own `PoolStat` structure for our use.
- New separate `rados` package that implements the interface
- Merged `mocks` package into `ceph` to avoid circular import
2022-02-24 15:57:00 -05:00
Kyle
566f1fa5d3 a ton of refactoring 2022-02-23 15:43:46 -08:00