Commit Graph

311 Commits

Author SHA1 Message Date
Daniel R
362cb4b8dd fix health unit tests 2022-10-06 16:20:19 -04:00
Daniel R
5dd16fe875 migrate pool_usage.go to constmetrics 2022-10-06 16:20:10 -04:00
Tyler Brekke
16a6427a0f
Merge pull request #223 from digitalocean/tbrekke/ceph-user
Add user to exporter for use with rbd/rgw commands
2022-08-26 07:37:35 -07:00
Tyler Brekke
957b06df91 Add user to exporter for use with rbd/rgw commands 2022-08-25 15:20:57 -07:00
Tyler Brekke
63c7e97a32
Merge pull request #222 from digitalocean/tbrekke/rbd-mirror
Add rbd-mirror health status
2022-08-24 14:24:28 -07:00
Tyler Brekke
19a3cd5c7e Add rbd-mirror health status 2022-08-24 14:23:23 -07:00
Xavier Villaneau
ae09ffe3fe Add hostname label to ceph_crash_reports 2022-06-16 13:20:29 -04:00
Xavier Villaneau
2faa6cb82d Fix comments and docstring in getCrashLs 2022-06-15 17:04:04 -04:00
Xavier Villaneau
3141fef319 Use JSON output from ceph crash ls instead of plain output 2022-06-15 17:04:04 -04:00
Xavier Villaneau
adf792c3e8 Use ConstMetrics for ceph_crash_reports
Makes the code simpler since we're not tracking state anymore.
Also rewrote the tests to be more in-line with the rest.
2022-06-15 17:04:04 -04:00
Xavier Villaneau
74c89af225 Implement new gauge counting crash reports
New metric: `ceph_crash_reports` which counts the entries returned by
`ceph crash ls` by daemon name and archival status.

This is not the same as `ceph_new_crash_reports` which is the value of
the `RECENT_CRASH` health check, and that only counts the non-archived
errors of the past two weeks. The new metric counts errors as long as
they are not purged (which is done after 1 year by defaults).
2022-06-15 17:04:04 -04:00
Joshua Baergen
56bd79f4be
Merge pull request #216 from AKYD/update_version_validation
Normalize ceph-ansible version format
2022-05-25 11:21:46 -06:00
AKYD
763e5ecd21 Normalize ceph-ansible version format 2022-05-25 11:49:04 +03:00
Joshua Baergen
43df70b181
Merge pull request #215 from digitalocean/fix-pacific-mgrs
ceph: Support the Octopus+ mgrmap format; improve multi-version testing.
2022-04-12 11:20:41 -06:00
Joshua Baergen
ebd166be2d ceph: Support the Octopus+ mgrmap format. 2022-04-12 08:52:04 -06:00
Joshua Baergen
4e0f8910a4 Add missing tests for Octopus+ osdmap format.
In TestClusterHealthCollector, test all supported versions by default,
and split the osdmap tests for Nautilus vs. Octopus+. There were a
number of tests that included an osdmap that didn't need it, and the
osdmap was removed from them so that version-specific testing would not
be required.
2022-04-12 08:52:01 -06:00
Kyle
e54e159791 add docker build and push action 2022-03-30 11:41:21 -07:00
Kyle
602a178af1
Merge pull request #213 from Rethan/feat-misplaced-ratio
feat: add misplaced ratio metric
2022-03-29 18:44:01 -07:00
haoyixing
407248ce1d feat: add misplaced ratio metric
Misplaced ratio equals to misplaced_objects deviding misplaced_total, not misplaced_objects / num_objects.
So add a separate metric to show misplaced ratio.

Signed-off-by: haoyixing <haoyixing@kuaishou.com>
2022-03-29 18:38:15 -07:00
Kyle
52ecf44451
Merge pull request #211 from digitalocean/4.0-dev
v4.0.0
2022-03-29 18:21:14 -07:00
Kyle
917a468065 update deps and reduce a warn to debug 2022-03-29 17:44:50 -07:00
Kyle
ce4e3993c4 update build workflow 2022-03-29 12:41:44 -07:00
Kyle
1d7bac531d update license headers 2022-03-23 14:02:21 -07:00
Kyle
00c0dacc02
Merge pull request #210 from digitalocean/more-new-stuff
4.0-rc1
2022-03-23 13:31:49 -07:00
Kyle
cf432402f5 update README 2022-03-23 13:26:33 -07:00
Kyle
4d817f487d fix staticcheck errors 2022-03-23 12:24:28 -07:00
Kyle
ef01452f1c reload tls cert whenever it is requested 2022-03-23 11:43:13 -07:00
Kyle
d6b67a77c3 removed down osd duplicate filtering 2022-03-22 12:59:51 -07:00
Kyle
5e7fae5d5a add TLS support 2022-03-22 10:40:40 -07:00
Kyle
3a0b289eda filter duplicate OSD nodes for down health check and fix health tests 2022-03-21 15:28:20 -07:00
Kyle
b806cf51bb remove pre-nautilus health check code 2022-03-21 14:52:34 -07:00
Kyle
df7435b259 add DAEMON_OLD_VERSION health check, update readme, remove makefile 2022-03-21 13:56:19 -07:00
Kyle
e0d8ba4d6f
Merge pull request #209 from digitalocean/update-dockerfile
update Go to 1.18 and Docker image to focal
2022-03-18 10:19:18 -07:00
Kyle
64e5410753 target nautilus and update workflows to use Go 1.18 2022-03-18 10:16:08 -07:00
Kyle
11cca676b8 update Go to 1.18 and Docker image to focal 2022-03-18 10:04:51 -07:00
Kyle
b5897900cb
Merge pull request #207 from digitalocean/multiple-version-support
* add version parser and IsAtLeast constraint
* refactor package structure
* restructure tests to remove go-ceph requirement
* split CI into build and test
* support flattened osdmap format added in octopus
2022-03-17 12:15:50 -07:00
Kyle
2122a3331f support flattened osdmap format added in octopus 2022-03-16 14:13:57 -07:00
Xavier Villaneau
94efb30be1 CI: Split build and tests into separate workflows 2022-02-24 15:57:00 -05:00
Xavier Villaneau
6f83fdd300 Restructure so that tests do not depend on go-ceph
- `ceph.Conn` interface no longer depends on go-ceph/rados,
  now defines its own `PoolStat` structure for our use.
- New separate `rados` package that implements the interface
- Merged `mocks` package into `ceph` to avoid circular import
2022-02-24 15:57:00 -05:00
Kyle
566f1fa5d3 a ton of refactoring 2022-02-23 15:43:46 -08:00
Kyle
13e97cd25d introduce version parser and IsAtLeast constraint 2022-02-22 16:00:42 -08:00
Kyle
4e84633fc0 allow different collectors by ceph version 2022-02-16 10:00:05 -08:00
Xavier Villaneau
5ea59b00fd ci: Use GitHub Actions to run tests 2022-02-14 15:17:42 -05:00
Kyle
8fe2bcc648 fix pg states test case 2022-02-14 11:01:25 -08:00
Kyle
cfe7dc2df3 use t.Run for table driven health test 2022-02-14 10:46:31 -08:00
Kyle
625f1fe8cf update Go and go-ceph 2022-02-11 13:44:55 -08:00
Matt1360
e8ea7d7e66
Merge pull request #204 from digitalocean/repair-counter
collectors/health: add repair state checking
2021-12-21 15:07:04 -04:00
Matt Vandersomething
47b7ae2ed6
collectors/health: add repair state checking 2021-12-21 14:24:47 -04:00
Alexandre Marangone
8aa2b4127d
Merge pull request #203 from digitalocean/amarangone/STORSYS-347
health: add osds_too_many_repair gauge
2021-12-02 09:45:22 -08:00
Alex Marangone
ef8b362842 health: add osds_too_many_repair gauge 2021-12-02 09:34:23 -08:00