Commit Graph

351 Commits

Author SHA1 Message Date
alin.hrapciuc@gmail.com c9e65abd6f Fix units for latency 2021-01-14 16:41:28 +02:00
AKYD 9be0cd69d3 Use IEC instead of SI units 2021-01-14 09:46:59 +02:00
Joshua Baergen 0ee6da60a0
Merge pull request #189 from digitalocean/nautilus-pg-upmap-items
collectors/osd: Export the total number of items in the pg-upmap table.
2020-12-18 06:14:19 -07:00
Joshua Baergen cb429a2b91 collectors/osd: Export the total number of items in the pg-upmap table. 2020-12-17 16:31:34 -07:00
Joshua Baergen f7011dbe78
Merge pull request #188 from digitalocean/nautilus-misplaced_objects
collectors/health: Fix ceph_misplaced_objects on Nautilus.
2020-12-11 08:33:27 -07:00
Joshua Baergen b4b94a0844 collectors/health: Fix ceph_misplaced_objects on Nautilus.
Nautilus no longer reports misplaced objects as a health status, but it
is available in the pgmap data. For consistency, let's get the degraded
object count from there as well.
2020-12-10 14:54:05 -07:00
Joshua Baergen 8a1f51881f
Merge pull request #187 from shminjs/feat-add-mon-down-metric
Add new gauge to show the count of mon in down state.
2020-12-04 06:58:31 -07:00
shimin b01931d0c4 Add new gauge to show the count of mon in down state.
When a monitor is down, it should be urgent to notice administrator.

Signed-off-by: shminjs <shminjs@outlook.com>
2020-12-04 19:58:41 +08:00
Joshua Baergen 38c5cc7360
Merge pull request #185 from Rethan/feat_add_osd_full_ratio
feat: add osd full/nearfull/backfillfull ratio
2020-11-06 07:14:59 -07:00
haoyixing 5ef76dbd19 update osd_test
Signed-off-by: haoyixing <haoyixing@kuaishou.com>
2020-11-06 18:03:16 +08:00
haoyixing c60444dcef feat: add osd full/nearfull/backfillfull ratio
Add new gauge to show osd full/nearfull/backfillfull ratio.
Not only do we need to know whether a osd is full or not, we
also want to know the exact full ratio was for a cluster.
For many clusters which have different full ratio set, this
should be meaningfull.

Signed-off-by: haoyixing <haoyixing@kuaishou.com>
2020-11-06 16:21:30 +08:00
Yue Zhu 4906d5b866
Merge pull request #184 from digitalocean/yzhu/fix-osd-collector
Use MgrCommand for "osd df", "osd perf" and "pg dump pgs_brief"
2020-10-29 22:44:33 -04:00
Yue Zhu 1778243b17 Remove go get because we use go mod 2020-10-29 21:26:41 -04:00
Yue Zhu 477ce579f7 Update Travis 2020-10-29 15:14:31 -04:00
Yue Zhu 252eb6604a Use MgrCommand for "osd df", "osd perf" and "pg dump pgs_brief" 2020-10-29 13:44:43 -04:00
Yue Zhu 238f39a71b
Refactoring for nautilus branch (#183)
* Run -race for go test

* Upgrade go 1.15.3

* Extract a function to create rados connection

* Convert to go module; update dependencies

* Use environment variables to pass in parameters

* Make rados connection short lived

* Use float64 for JSON number in cluster_usage.go

* Use mocks to replace the NoopConn

* Add "-tags nautilus" for "go test" and "go build"

* Update readme

* Update go mod
2020-10-28 14:42:52 -04:00
Max Kuznetsov c9552f0f9f
Merge pull request #180 from syhpoon/BLOCK-2615
fix rbd-mirror daemon format in Nautilus
2020-10-07 16:09:02 -04:00
Max Kuznetsov 77d7977809 fix rbd-mirror daemon format in Nautilus 2020-10-07 15:55:58 -04:00
Max Kuznetsov de55efabeb
Merge pull request #179 from syhpoon/BLOCK-2559
add metric to capture the number of inconsistent pgs
2020-09-16 11:32:38 -04:00
Max Kuznetsov b75904e25f add metric to capture the number of inconsistent pgs 2020-09-16 11:27:27 -04:00
Max Kuznetsov 21fb2c4d24
Merge pull request #177 from syhpoon/BLOCK-2553
rbd_mirror_up: switch to const metric
2020-09-15 11:46:03 -04:00
Max Kuznetsov dc6728deb0 rbd_mirror_up: switch to const metric 2020-09-15 11:26:30 -04:00
Max Kuznetsov a5c61bc5ba
Merge pull request #175 from syhpoon/BLOCK-2553
add new metric: rbd_mirror_up
2020-09-14 13:37:22 -04:00
Max Kuznetsov 6056ab259b add new metric: rbd_mirror_up 2020-09-14 13:06:18 -04:00
Yue Zhu 0cf8d3e787
Merge pull request #173 from digitalocean/pool-unfound-objects-nautilus
nautilus: add pool metric unfound_objects_total
2020-09-10 16:39:57 -04:00
Yue Zhu 0242f191c2 Add pool metric unfound_objects_total
(cherry picked from commit 170a989eed)
2020-09-10 16:29:05 -04:00
Joshua Baergen 2c01ccf251
Merge pull request #170 from digitalocean/go-ceph-0.5.0
vendor: Update go-ceph to 0.5.0 for proper Nautilus support.
2020-08-28 14:25:52 -06:00
Joshua Baergen 627e904280 vendor: Update go-ceph to 0.5.0 for proper Nautilus support.
According to the upstream docs, the version of go-ceph being used by
this branch did not officially support Nautilus, though we haven't seen
any problems thus far.
2020-08-28 14:20:26 -06:00
Joshua Baergen 090bb6a09d collectors: Remove the unused performPGQuery() and supporting code. 2020-08-28 08:56:45 -06:00
Yue Zhu eef3214528
Merge pull request #167 from digitalocean/yzhu/remove-pg-recovery-metrics-nautilus
nautilus: remove PG recovery metrics
2020-08-06 17:58:58 -04:00
Yue Zhu a400d2a53d Remove PG recovery metrics
(cherry picked from commit 974b187636)
2020-08-05 17:51:37 -04:00
Joshua Baergen e4eeee00cd
Merge pull request #165 from digitalocean/nautilus-op-timeout
osd: set default timeout 30 seconds for Mon/OSD ops
2020-07-15 10:34:36 -06:00
Yue Zhu dcb73397dc osd: set default timeout 30 seconds for Mon/OSD ops 2020-07-15 10:29:03 -06:00
Joshua Baergen 25869fdd0e
Merge pull request #164 from digitalocean/nautilus-pool-crush-root
[nautilus] pool: Eliminate bad assumption about crush rule ordering.
2020-07-13 15:22:26 -06:00
Joshua Baergen ee46165576 pool: Eliminate bad assumption about crush rule ordering.
We had assumed that the "take" step came first, but that's not
necessarily the case.
2020-07-13 15:19:00 -06:00
Joshua Baergen 695b597efe
Merge pull request #162 from digitalocean/nautilus-pool-crush-root
[nautilus] pool: Include the crush root as a label on pool metrics.
2020-07-13 12:15:34 -06:00
Joshua Baergen c0a214d0d8 pool: Include the crush root as a label on pool metrics.
This makes it fairly simply to match up pools with OSDs, for example, in
queries.
2020-07-13 10:32:38 -06:00
Joshua Baergen a44fe3557b
Merge pull request #159 from digitalocean/used_bytes_nautilus
pool_usage: Fix the used-bytes metric for Nautilus.
2020-07-13 09:47:14 -06:00
Joshua Baergen 4c90529931 pool_usage: Fix the used-bytes metric for Nautilus.
bytes_used went from object-bytes-used (thus raw/expansion) in
Luminous to raw-bytes-used in Nautilus. The new stored value is the
closest thing to the previous bytes_used.
2020-07-11 09:06:35 -06:00
Joshua Baergen 5ea99da9a9
Merge pull request #157 from baergj/used_bytes_nautilus
Tweak the raw used metric on Nautilus; fix expansion factor computation.
2020-07-10 15:07:54 -06:00
Joshua Baergen 32de4574a5 pool: Fix logic of falling back to replicated expansion in computation of ExpansionFactor.
The code was assuming that the command would succeed with empty output
if a non-EC profile was given, when in fact it fails. Re-work the code
to properly handle this case.
2020-07-10 14:58:12 -06:00
Joshua Baergen dc4541a398 pool: Fix "osd erasure-code-profile get" syntax. 2020-07-10 14:58:12 -06:00
Joshua Baergen 1ae9a8fdd1 pool_usage: Use the maximum of bytes_used and stored_raw for raw used bytes.
These are two different computations of the raw usage value. Be
pessimistic and report the higher of the two.
2020-07-10 14:58:12 -06:00
Joshua Baergen 5d1cf3f385
Merge pull request #154 from anthonyeleven/anthonyeleven/nautilus
Capture new percent_used pool metric and copy over mgr metrics from the Luminous branch
2020-07-10 12:21:46 -06:00
Yue Zhu e7c6973903
Merge pull request #156 from yuezhu/nautilus
Add gauge for incomplete PGs
2020-07-07 22:23:53 -04:00
Yue Zhu 0aa50cf2d9 Add gauge for incomplete PGs 2020-07-07 18:42:47 -04:00
Anthony D'Atri 1c1799c1cd copy over mgr stats from the luminous branch 2020-06-29 15:31:48 -07:00
Anthony D'Atri cb57bbe3fd typo 2020-06-26 12:48:33 -07:00
Anthony D'Atri 07b0c39a02 Capture new percent_used pool metric 2020-06-26 12:19:59 -07:00
Cody Breedlove 5d050f19bb
Merge pull request #152 from digitalocean/cbreedlove/CEPH-114
Add expansion factor to Nautilus
2020-06-18 09:35:51 -04:00