Joshua Baergen
0808753eb9
rados: Handle commands that return 'inf' as a float value
...
For now, this appears to be limited to read balancer fields, which
show up in OSD dumps and pool details.
2024-09-23 11:40:39 -06:00
Joshua Baergen
f87851df15
Merge pull request #248 from digitalocean/dependabot/go_modules/google.golang.org/protobuf-1.33.0
...
Bump google.golang.org/protobuf from 1.26.0 to 1.33.0
2024-03-14 10:12:44 -06:00
dependabot[bot]
9c553daf71
Bump google.golang.org/protobuf from 1.26.0 to 1.33.0
...
Bumps google.golang.org/protobuf from 1.26.0 to 1.33.0.
---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
2024-03-13 21:18:14 +00:00
Joshua Baergen
52874e51e3
Merge pull request #246 from digitalocean/rm-old-health-stats
...
monitors: Remove stats that have been dead since Luminous
2023-12-18 06:32:51 -07:00
Joshua Baergen
c01b5fb37e
monitors: Remove stats that have been dead since Luminous
...
Ceph commit reference e170405fd873723bec6ce691afad82641bab2ef1
2023-12-15 15:50:21 -07:00
Joshua Baergen
545836d5da
Merge pull request #245 from digitalocean/ceph_exporter-fail-if-no-connect
...
Fail to start if any rados connection fails on startup
2023-11-23 09:50:40 -07:00
Joshua Baergen
a257136ac9
Fail to start if any rados connection fails on startup
...
Not instantiating an exporter and logging instead of failing to start
when we can't connect to one of the configured clusters is not the
behaviour we want; a user may think that ceph_exporter is monitoring all
of the requested clusters (since it's up and responds to metrics
requests) when in fact data is missing.
2023-11-23 09:46:50 -07:00
Matt Vandermeulen
e858f2efc1
Merge pull request #244 from digitalocean/dependabot/go_modules/gopkg.in/yaml.v3-3.0.0
...
Bump gopkg.in/yaml.v3 from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0
2023-08-30 11:59:29 -03:00
dependabot[bot]
0767c99e88
Bump gopkg.in/yaml.v3 from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0
...
Bumps gopkg.in/yaml.v3 from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0.
---
updated-dependencies:
- dependency-name: gopkg.in/yaml.v3
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-08-30 14:55:07 +00:00
Tyler Brekke
b6e2dc79ab
Merge pull request #243 from digitalocean/ceph-osd-metadata
...
osd: Add new collector for osd metadata
2023-08-25 08:05:42 -07:00
Tyler Brekke
3c403081b5
osd: Add new collector for osd metadata
...
scrape created_at, ceph_version_when_created, and osd_objectstore
from ceph osd metadata
2023-08-24 17:37:43 -07:00
Alexandre Marangone
2df38cb776
Merge pull request #242 from digitalocean/label-swap
...
ceph/osd: fix root/rack labels
2023-05-23 08:36:02 -07:00
Alex Marangone
2fb4e78d4a
ceph/osd: fix root/rack labels
2023-05-23 08:06:31 -07:00
Joshua Baergen
76ec6f81de
Merge pull request #241 from digitalocean/oldest-inactive-internal-poll
...
osd: Internally poll PG dump for oldest active PG tracking
2023-05-09 11:00:02 -06:00
Joshua Baergen
5b487fad9c
osd: Internally poll PG dump for oldest active PG tracking
...
Without this, the granularity of the oldest active PG is based on
external scrape frequency, and an unlucky sequence of scrapes could see
the same PG inactive two scrapes in a row even though it was active in
between.
Preferably, we would update this even more often than 10 seconds, but PG
dumps can take a while.
2023-05-08 16:33:13 -06:00
Matt1360
e6a0a46acf
Merge pull request #238 from digitalocean/go-1.20
...
go: update envs to use go 1.20
2023-04-05 14:04:30 -03:00
Matt Vandersomething
6965835059
go: update envs to use go 1.20
2023-04-05 13:57:18 -03:00
Matt1360
4cbf5d8732
Merge pull request #237 from digitalocean/gha-on-tag
...
gha: add build/push on tag update
2023-04-05 13:48:13 -03:00
Matt Vandersomething
45bf35c039
gha: add build/push on tag update
2023-04-05 13:43:44 -03:00
Daniel R
b1548c0d96
establish only 1 rados connection ( #235 )
2023-03-23 16:35:12 -04:00
Daniel R
46b06f317f
Fix timeouts and use goroutines for collectors/commands ( #234 )
...
* rados: timeouts on Mon/Mgr command & connections
* rados: remove unneeded timeouts
* make all collectors async
* fix osd collector
* only add 1 in waitgroups
* ceph: don't pass waitgroups to collectors
* monitors.go: use errgroup instead of waitgroup
* rados: add comment, pass arg & close channel
2023-03-14 14:00:33 -04:00
Alexandre Marangone
30bb895860
Merge pull request #232 from digitalocean/collectorInitFix
...
Collector init fix and version refactor
2023-02-22 06:45:05 -08:00
Alexandre Marangone
06e78e98ed
Merge pull request #233 from digitalocean/metricsDoc
...
doc: add metrics list and desc
2023-02-21 07:11:21 -08:00
Alex Marangone
6f6df03e8c
doc: add metrics list and desc
2023-02-21 07:10:56 -08:00
Alex Marangone
52ad633440
move to collector interface to avoid ugly switch
2023-02-14 12:24:34 -08:00
Alex Marangone
2235817fc4
move common mocking to a func
2023-02-14 11:36:09 -08:00
Alex Marangone
abbe4444ef
update test to support version being passed in Collect()
2023-02-14 11:17:51 -08:00
Alex Marangone
ba15bf50a3
pass version to collectors when calling Collect()
2023-02-14 11:10:54 -08:00
Alex Marangone
69edc55596
exporter: do not reinitialize collectors on every collect
...
We store all the collectors in a map of string in order to
dynamically load/unload the rbd mirror collector
2023-02-14 09:27:21 -08:00
Matt1360
49e6345cb6
Merge pull request #230 from digitalocean/fix-license-date-futureproof
...
github-actions/license: make license year future-proof
2023-02-09 11:27:45 -04:00
Vaibhav Bhembre
3cb7ca9409
github-actions/license: make license year future-proof
2023-02-09 10:23:36 -05:00
Daniel R
d8bf71a8fc
Split cluster health state by plus sign
...
PR #226
2023-01-24 17:42:34 -05:00
Daniel R
a52902054e
Update Github Actions
...
Update Github Actions to use ubuntu 20.04
* Update run_tests.yml
* Update run_build.yml
2023-01-24 17:39:04 -05:00
Daniel R
fec95971a1
Merge pull request #224 from digitalocean/digitalocean/STORSYS-524/replaces-gauges-with-constmetrics
...
Replaces gauges with constmetrics
2022-10-13 14:15:37 -04:00
Daniel R
50874e99af
revert health_status_interp to gauge
2022-10-12 18:03:20 -04:00
Daniel R
ae64dae6f8
add a comment indicating gauge deprecation
2022-10-11 11:58:42 -04:00
Daniel R
b9af3ab29f
bugfixes; stop defaulting map flags to 0 in the constmetric
2022-10-07 14:53:46 -04:00
Daniel R
1a741d7606
introduce constmetrics for osdmap flags
2022-10-07 14:17:29 -04:00
Daniel R
c3a3d581aa
migrate health checks from gauges to constmetrics
2022-10-06 16:20:24 -04:00
Daniel R
362cb4b8dd
fix health unit tests
2022-10-06 16:20:19 -04:00
Daniel R
5dd16fe875
migrate pool_usage.go to constmetrics
2022-10-06 16:20:10 -04:00
Tyler Brekke
16a6427a0f
Merge pull request #223 from digitalocean/tbrekke/ceph-user
...
Add user to exporter for use with rbd/rgw commands
2022-08-26 07:37:35 -07:00
Tyler Brekke
957b06df91
Add user to exporter for use with rbd/rgw commands
2022-08-25 15:20:57 -07:00
Tyler Brekke
63c7e97a32
Merge pull request #222 from digitalocean/tbrekke/rbd-mirror
...
Add rbd-mirror health status
2022-08-24 14:24:28 -07:00
Tyler Brekke
19a3cd5c7e
Add rbd-mirror health status
2022-08-24 14:23:23 -07:00
Xavier Villaneau
ae09ffe3fe
Add `hostname` label to `ceph_crash_reports`
2022-06-16 13:20:29 -04:00
Xavier Villaneau
2faa6cb82d
Fix comments and docstring in getCrashLs
2022-06-15 17:04:04 -04:00
Xavier Villaneau
3141fef319
Use JSON output from `ceph crash ls` instead of plain output
2022-06-15 17:04:04 -04:00
Xavier Villaneau
adf792c3e8
Use ConstMetrics for ceph_crash_reports
...
Makes the code simpler since we're not tracking state anymore.
Also rewrote the tests to be more in-line with the rest.
2022-06-15 17:04:04 -04:00
Xavier Villaneau
74c89af225
Implement new gauge counting crash reports
...
New metric: `ceph_crash_reports` which counts the entries returned by
`ceph crash ls` by daemon name and archival status.
This is not the same as `ceph_new_crash_reports` which is the value of
the `RECENT_CRASH` health check, and that only counts the non-archived
errors of the past two weeks. The new metric counts errors as long as
they are not purged (which is done after 1 year by defaults).
2022-06-15 17:04:04 -04:00