Commit Graph

350 Commits

Author SHA1 Message Date
Joshua Baergen 0808753eb9 rados: Handle commands that return 'inf' as a float value
For now, this appears to be limited to read balancer fields, which
show up in OSD dumps and pool details.
2024-09-23 11:40:39 -06:00
Joshua Baergen f87851df15
Merge pull request #248 from digitalocean/dependabot/go_modules/google.golang.org/protobuf-1.33.0
Bump google.golang.org/protobuf from 1.26.0 to 1.33.0
2024-03-14 10:12:44 -06:00
dependabot[bot] 9c553daf71
Bump google.golang.org/protobuf from 1.26.0 to 1.33.0
Bumps google.golang.org/protobuf from 1.26.0 to 1.33.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-13 21:18:14 +00:00
Joshua Baergen 52874e51e3
Merge pull request #246 from digitalocean/rm-old-health-stats
monitors: Remove stats that have been dead since Luminous
2023-12-18 06:32:51 -07:00
Joshua Baergen c01b5fb37e monitors: Remove stats that have been dead since Luminous
Ceph commit reference e170405fd873723bec6ce691afad82641bab2ef1
2023-12-15 15:50:21 -07:00
Joshua Baergen 545836d5da
Merge pull request #245 from digitalocean/ceph_exporter-fail-if-no-connect
Fail to start if any rados connection fails on startup
2023-11-23 09:50:40 -07:00
Joshua Baergen a257136ac9 Fail to start if any rados connection fails on startup
Not instantiating an exporter and logging instead of failing to start
when we can't connect to one of the configured clusters is not the
behaviour we want; a user may think that ceph_exporter is monitoring all
of the requested clusters (since it's up and responds to metrics
requests) when in fact data is missing.
2023-11-23 09:46:50 -07:00
Matt Vandermeulen e858f2efc1
Merge pull request #244 from digitalocean/dependabot/go_modules/gopkg.in/yaml.v3-3.0.0
Bump gopkg.in/yaml.v3 from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0
2023-08-30 11:59:29 -03:00
dependabot[bot] 0767c99e88
Bump gopkg.in/yaml.v3 from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0
Bumps gopkg.in/yaml.v3 from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0.

---
updated-dependencies:
- dependency-name: gopkg.in/yaml.v3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-30 14:55:07 +00:00
Tyler Brekke b6e2dc79ab
Merge pull request #243 from digitalocean/ceph-osd-metadata
osd: Add new collector for osd metadata
2023-08-25 08:05:42 -07:00
Tyler Brekke 3c403081b5 osd: Add new collector for osd metadata
scrape created_at, ceph_version_when_created, and osd_objectstore
from ceph osd metadata
2023-08-24 17:37:43 -07:00
Alexandre Marangone 2df38cb776
Merge pull request #242 from digitalocean/label-swap
ceph/osd: fix root/rack labels
2023-05-23 08:36:02 -07:00
Alex Marangone 2fb4e78d4a ceph/osd: fix root/rack labels 2023-05-23 08:06:31 -07:00
Joshua Baergen 76ec6f81de
Merge pull request #241 from digitalocean/oldest-inactive-internal-poll
osd: Internally poll PG dump for oldest active PG tracking
2023-05-09 11:00:02 -06:00
Joshua Baergen 5b487fad9c osd: Internally poll PG dump for oldest active PG tracking
Without this, the granularity of the oldest active PG is based on
external scrape frequency, and an unlucky sequence of scrapes could see
the same PG inactive two scrapes in a row even though it was active in
between.

Preferably, we would update this even more often than 10 seconds, but PG
dumps can take a while.
2023-05-08 16:33:13 -06:00
Matt1360 e6a0a46acf
Merge pull request #238 from digitalocean/go-1.20
go: update envs to use go 1.20
2023-04-05 14:04:30 -03:00
Matt Vandersomething 6965835059
go: update envs to use go 1.20 2023-04-05 13:57:18 -03:00
Matt1360 4cbf5d8732
Merge pull request #237 from digitalocean/gha-on-tag
gha: add build/push on tag update
2023-04-05 13:48:13 -03:00
Matt Vandersomething 45bf35c039
gha: add build/push on tag update 2023-04-05 13:43:44 -03:00
Daniel R b1548c0d96
establish only 1 rados connection (#235) 2023-03-23 16:35:12 -04:00
Daniel R 46b06f317f
Fix timeouts and use goroutines for collectors/commands (#234)
* rados: timeouts on Mon/Mgr command & connections

* rados: remove unneeded timeouts

* make all collectors async

* fix osd collector

* only add 1 in waitgroups

* ceph: don't pass waitgroups to collectors

* monitors.go: use errgroup instead of waitgroup

* rados: add comment, pass arg & close channel
2023-03-14 14:00:33 -04:00
Alexandre Marangone 30bb895860
Merge pull request #232 from digitalocean/collectorInitFix
Collector init fix and version refactor
2023-02-22 06:45:05 -08:00
Alexandre Marangone 06e78e98ed
Merge pull request #233 from digitalocean/metricsDoc
doc: add metrics list and desc
2023-02-21 07:11:21 -08:00
Alex Marangone 6f6df03e8c doc: add metrics list and desc 2023-02-21 07:10:56 -08:00
Alex Marangone 52ad633440 move to collector interface to avoid ugly switch 2023-02-14 12:24:34 -08:00
Alex Marangone 2235817fc4 move common mocking to a func 2023-02-14 11:36:09 -08:00
Alex Marangone abbe4444ef update test to support version being passed in Collect() 2023-02-14 11:17:51 -08:00
Alex Marangone ba15bf50a3 pass version to collectors when calling Collect() 2023-02-14 11:10:54 -08:00
Alex Marangone 69edc55596 exporter: do not reinitialize collectors on every collect
We store all the collectors in a map of string in order to
dynamically load/unload the rbd mirror collector
2023-02-14 09:27:21 -08:00
Matt1360 49e6345cb6
Merge pull request #230 from digitalocean/fix-license-date-futureproof
github-actions/license: make license year future-proof
2023-02-09 11:27:45 -04:00
Vaibhav Bhembre 3cb7ca9409
github-actions/license: make license year future-proof 2023-02-09 10:23:36 -05:00
Daniel R d8bf71a8fc
Split cluster health state by plus sign
PR #226
2023-01-24 17:42:34 -05:00
Daniel R a52902054e
Update Github Actions
Update Github Actions to use ubuntu 20.04

* Update run_tests.yml

* Update run_build.yml
2023-01-24 17:39:04 -05:00
Daniel R fec95971a1
Merge pull request #224 from digitalocean/digitalocean/STORSYS-524/replaces-gauges-with-constmetrics
Replaces gauges with constmetrics
2022-10-13 14:15:37 -04:00
Daniel R 50874e99af revert health_status_interp to gauge 2022-10-12 18:03:20 -04:00
Daniel R ae64dae6f8 add a comment indicating gauge deprecation 2022-10-11 11:58:42 -04:00
Daniel R b9af3ab29f bugfixes; stop defaulting map flags to 0 in the constmetric 2022-10-07 14:53:46 -04:00
Daniel R 1a741d7606 introduce constmetrics for osdmap flags 2022-10-07 14:17:29 -04:00
Daniel R c3a3d581aa migrate health checks from gauges to constmetrics 2022-10-06 16:20:24 -04:00
Daniel R 362cb4b8dd fix health unit tests 2022-10-06 16:20:19 -04:00
Daniel R 5dd16fe875 migrate pool_usage.go to constmetrics 2022-10-06 16:20:10 -04:00
Tyler Brekke 16a6427a0f
Merge pull request #223 from digitalocean/tbrekke/ceph-user
Add user to exporter for use with rbd/rgw commands
2022-08-26 07:37:35 -07:00
Tyler Brekke 957b06df91 Add user to exporter for use with rbd/rgw commands 2022-08-25 15:20:57 -07:00
Tyler Brekke 63c7e97a32
Merge pull request #222 from digitalocean/tbrekke/rbd-mirror
Add rbd-mirror health status
2022-08-24 14:24:28 -07:00
Tyler Brekke 19a3cd5c7e Add rbd-mirror health status 2022-08-24 14:23:23 -07:00
Xavier Villaneau ae09ffe3fe Add `hostname` label to `ceph_crash_reports` 2022-06-16 13:20:29 -04:00
Xavier Villaneau 2faa6cb82d Fix comments and docstring in getCrashLs 2022-06-15 17:04:04 -04:00
Xavier Villaneau 3141fef319 Use JSON output from `ceph crash ls` instead of plain output 2022-06-15 17:04:04 -04:00
Xavier Villaneau adf792c3e8 Use ConstMetrics for ceph_crash_reports
Makes the code simpler since we're not tracking state anymore.
Also rewrote the tests to be more in-line with the rest.
2022-06-15 17:04:04 -04:00
Xavier Villaneau 74c89af225 Implement new gauge counting crash reports
New metric: `ceph_crash_reports` which counts the entries returned by
`ceph crash ls` by daemon name and archival status.

This is not the same as `ceph_new_crash_reports` which is the value of
the `RECENT_CRASH` health check, and that only counts the non-archived
errors of the past two weeks. The new metric counts errors as long as
they are not purged (which is done after 1 year by defaults).
2022-06-15 17:04:04 -04:00