Commit Graph

356 Commits

Author SHA1 Message Date
Vaibhav Bhembre
e7660edf8a collector/health: add stuck request metric 2019-05-23 12:54:47 -04:00
Vaibhav Bhembre
8cc817b251
Merge pull request #117 from digitalocean/add_pool_info_collector
exporter: register pool info collector
2019-05-17 14:08:59 -04:00
Vaibhav Bhembre
4cd78e49f0 exporter: register pool info collector 2019-05-17 14:03:29 -04:00
Vaibhav Bhembre
2a9db14d39
Merge pull request #116 from digitalocean/add_pool_info
collectors: add pool info collector
2019-05-17 13:44:11 -04:00
Vaibhav Bhembre
ce59e8446f collectors/pool: add new pool test 2019-05-17 13:28:20 -04:00
Vaibhav Bhembre
0c551664a5 vendor github.com/ceph/go-ceph/rados 2019-05-16 18:38:39 -04:00
Vaibhav Bhembre
0241ef7863 collectors/pool: add pool info collector 2019-05-16 18:28:26 -04:00
Vaibhav Bhembre
53db874aa7
Merge pull request #112 from tbregolin/export-osdmap-flags
Export OSD cluster map flags
2019-02-25 16:20:09 -05:00
Thomás S. Bregolin
eb4e8483b8 health: export OSD cluster map flags 2019-02-17 03:06:34 +00:00
Vaibhav Bhembre
f91b1241dc
Merge pull request #110 from digitalocean/luminous_pg_down
health: add stats for down PGs
2018-11-22 11:25:46 -05:00
Vaibhav Bhembre
31bb74f8ea health: add stats for down PGs 2018-11-22 11:22:11 -05:00
Vaibhav Bhembre
3efbebdb4f
Merge pull request #108 from digitalocean/add-recovery-backfill-stats
luminous: add recovery/backfill stats
2018-11-01 12:50:28 -04:00
Vaibhav Bhembre
6e7bdd9c3e luminous: add recovery/backfill stats 2018-11-01 12:46:18 -04:00
Vaibhav Bhembre
d0ed675640
Merge pull request #107 from digitalocean/fix-degraded-misplaced-count
health: fix degraded/misplaced object count
2018-10-23 17:38:12 -04:00
Vaibhav Bhembre
595bd8c641 health: fix degraded/misplaced object count 2018-10-23 16:56:39 -04:00
Ryan Roemmich
b311e48b4e
Merge pull request #100 from ralfonso/patch-1
Update README.md to add additional flags
2018-10-18 13:19:28 -07:00
Vaibhav Bhembre
6aba5522bd
Merge pull request #103 from jan--f/reset-some-metric-labels
collectors: Reset metric vectors; pools and daemons can vanish
2018-09-19 11:16:27 -04:00
Jan Fajerski
4fc19fbc03 collectors: Reset metric vectors; pools and daemons can vanish
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2018-09-13 14:19:35 +02:00
Vaibhav Bhembre
7814098640
Merge pull request #101 from benyanke/patch-1
Bump Copyright
2018-09-10 12:31:42 -04:00
Ben Yanke
f3a9fa3195
Update README.md 2018-09-10 11:24:25 -05:00
Ben Yanke
213d8453b5
Bump Copyright 2018-09-05 20:42:09 -05:00
Ryan Roemmich
0e8962eb82
Update README.md
Add additional flags.
2018-09-04 12:56:25 -07:00
ssobolewski
415d296c31
Ssobolewski/run rgw stats in background (#97)
* RGW GC stat collection can take a long time if there is a very large backlog

* Use a const for background interval

* Minor change per code review
2018-08-10 13:43:02 -06:00
ssobolewski
dc6ab9c636
Optionally collect RGW GC task stats (#94)
* Optionally collect RGW GC task stats

* Minor changes per code-review, add some additional tests to squeeze out extra coverage
2018-08-01 07:37:07 -06:00
Vaibhav Bhembre
ae0f874abb
Merge pull request #90 from jan--f/add-active-pgs-luminous
health: add active_pg metric
2018-07-09 09:48:59 -04:00
Vaibhav Bhembre
5df7451281
Merge pull request #89 from jan--f/backport-80-terminate-process
Terminate exporter process if maximum open files exceeded
2018-07-09 09:36:42 -04:00
Jan Fajerski
a42a258a28 Add constant for tcp keepalive periode
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2018-07-09 11:08:14 +02:00
Jan Fajerski
2436d00967 health: add active_pg metric
This metric allows to graph active PGs vs not active, i.e. the number of
PGs ceph is serving vs. currently unavailable PGs.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2018-07-09 10:18:47 +02:00
Vaibhav Bhembre
cd84a7b9db
Merge pull request #93 from digitalocean/ceph_osd_down_destroyed
osd: add metrics for down and destroyed OSD
2018-06-25 10:47:19 -04:00
Vaibhav Bhembre
6aed4c4f74
Merge pull request #92 from digitalocean/luminous_latency_skew_support
monitors: add back clock skew and latency metric support
2018-06-25 10:46:34 -04:00
Vaibhav Bhembre
aa5abdc470 osd: add metrics for down and destroyed OSD 2018-06-24 13:20:58 -04:00
Vaibhav Bhembre
6f290751c9 monitors: add back clock skew and latency metric support 2018-06-23 19:09:07 -04:00
ssobolewski
9a39cc64ed
Add tracking of scrub/deep scrub on a per osd basis (#91)
* Add tracking of scrub/deep scrub on a per osd basis

* Changes per code review comments/discussion
2018-06-14 12:40:46 -06:00
Tim Serong
cd9aa031a8 Terminate exporter process if maximum open files exceeded
This is somewhat of a workaround for the exporter becoming
perpetually blocked when it runs out of file descriptors if
the cluster is down for too long, as mentioned in:

  https://github.com/digitalocean/ceph_exporter/issues/60#issuecomment-319396108

The problem is that if the MONs are down for long enough,
each time prometheus scrapes the metrics, another socket is
opened, but these block forever.  If the cluster comes back
up before we run out of FDs, the blocked requests recover.
If the clusetr *doesn't* come back up before we run out of
FDs, the blocked requests never recover.

This commit causes ceph exporter to terminate if it runs
out of file descriptors, which IMO is better than blocking
forever -- it'll be a noisier failure, and also if you're
running ceph_exporter via systemd, systemd will then
automatically trigger a service restart.

Signed-off-by: Tim Serong <tserong@suse.com>
(cherry picked from commit bb1ad364b5)
2018-06-12 10:49:04 +02:00
Vaibhav Bhembre
ccd6b7135b
Merge pull request #88 from digitalocean/fix_stuck_requests
health: add visibility for stuck requests
2018-05-10 13:52:14 -04:00
Vaibhav Bhembre
c887505a42 health: add visibility for stuck requests 2018-05-10 13:04:52 -04:00
Vaibhav Bhembre
1b91b5bf2d
Merge pull request #87 from digitalocean/add_slow_request_osd
health: capture slow request per osd
2018-05-10 12:11:50 -04:00
Vaibhav Bhembre
219fb69bde health: capture slow request per osd 2018-05-10 11:55:39 -04:00
Vaibhav Bhembre
afd5a2c4bf update travis go to 1.9.2 2018-01-23 21:04:24 +05:30
Vaibhav Bhembre
5f702bc55f
Merge pull request #76 from digitalocean/luminous-switch-json
luminous: move health stats to be extracted from status json
2017-11-14 16:09:58 -05:00
Vaibhav Bhembre
9c40ffc620 luminous: pick correct value from health status after compat warning is removed 2017-11-13 15:27:20 -05:00
Vaibhav Bhembre
96785e82b1 luminous: move luminous stats to be extracted from json 2017-11-13 15:26:20 -05:00
Vaibhav Bhembre
c41f6ae85c
Merge pull request #59 from AcalephStorage/multi-stage-build
Dockerfile with multi-stage build
2017-11-10 16:47:08 -05:00
Vaibhav Bhembre
3828da2b24
Merge pull request #71 from tserong/wip-add-promhttp-to-vendor
Add promhttp to vendor directory
2017-11-10 16:34:04 -05:00
Tim Serong
66a684fbce Add promhttp to vendor directory
This adds promhttp (required by 4c70969) to the vendor directory.

Signed-off-by: Tim Serong <tserong@suse.com>
2017-09-23 17:16:34 +02:00
Vaibhav Bhembre
ac79cf743b Merge pull request #69 from utkarshmani1997/feature_issue#68_exporter
update /metrics handler to promhttp.Handler()
2017-09-14 08:12:40 -04:00
utkarshmani1997
4c70969940 update /metrics handler to promhttp.Handler() 2017-09-14 17:09:52 +05:30
Vaibhav Bhembre
f870fc3254 Merge pull request #67 from fitschn/fix-metrics-recovery-client-io-luminous
health: expose client io and recovery io for luminous
2017-09-02 11:43:51 -04:00
Vaibhav Bhembre
bb62aa1b76 Merge pull request #66 from fitschn/fix-metrics-slow-request-luminous
health: expose slow requests for luminous
2017-08-23 21:55:48 -04:00
Mathias Nohr
b2f520b9aa health: expose client io and recovery io for luminous
An update to ceph luminous breaks the metrics for recovery and client io, because of the new output format for ceph status.

The updates are still compatible to older versions of ceph.
2017-08-23 15:22:26 +02:00