Commit Graph

351 Commits

Author SHA1 Message Date
Vaibhav Bhembre 0c551664a5 vendor github.com/ceph/go-ceph/rados 2019-05-16 18:38:39 -04:00
Vaibhav Bhembre 0241ef7863 collectors/pool: add pool info collector 2019-05-16 18:28:26 -04:00
Vaibhav Bhembre 53db874aa7
Merge pull request #112 from tbregolin/export-osdmap-flags
Export OSD cluster map flags
2019-02-25 16:20:09 -05:00
Thomás S. Bregolin eb4e8483b8 health: export OSD cluster map flags 2019-02-17 03:06:34 +00:00
Vaibhav Bhembre f91b1241dc
Merge pull request #110 from digitalocean/luminous_pg_down
health: add stats for down PGs
2018-11-22 11:25:46 -05:00
Vaibhav Bhembre 31bb74f8ea health: add stats for down PGs 2018-11-22 11:22:11 -05:00
Vaibhav Bhembre 3efbebdb4f
Merge pull request #108 from digitalocean/add-recovery-backfill-stats
luminous: add recovery/backfill stats
2018-11-01 12:50:28 -04:00
Vaibhav Bhembre 6e7bdd9c3e luminous: add recovery/backfill stats 2018-11-01 12:46:18 -04:00
Vaibhav Bhembre d0ed675640
Merge pull request #107 from digitalocean/fix-degraded-misplaced-count
health: fix degraded/misplaced object count
2018-10-23 17:38:12 -04:00
Vaibhav Bhembre 595bd8c641 health: fix degraded/misplaced object count 2018-10-23 16:56:39 -04:00
Ryan Roemmich b311e48b4e
Merge pull request #100 from ralfonso/patch-1
Update README.md to add additional flags
2018-10-18 13:19:28 -07:00
Vaibhav Bhembre 6aba5522bd
Merge pull request #103 from jan--f/reset-some-metric-labels
collectors: Reset metric vectors; pools and daemons can vanish
2018-09-19 11:16:27 -04:00
Jan Fajerski 4fc19fbc03 collectors: Reset metric vectors; pools and daemons can vanish
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2018-09-13 14:19:35 +02:00
Vaibhav Bhembre 7814098640
Merge pull request #101 from benyanke/patch-1
Bump Copyright
2018-09-10 12:31:42 -04:00
Ben Yanke f3a9fa3195
Update README.md 2018-09-10 11:24:25 -05:00
Ben Yanke 213d8453b5
Bump Copyright 2018-09-05 20:42:09 -05:00
Ryan Roemmich 0e8962eb82
Update README.md
Add additional flags.
2018-09-04 12:56:25 -07:00
ssobolewski 415d296c31
Ssobolewski/run rgw stats in background (#97)
* RGW GC stat collection can take a long time if there is a very large backlog

* Use a const for background interval

* Minor change per code review
2018-08-10 13:43:02 -06:00
ssobolewski dc6ab9c636
Optionally collect RGW GC task stats (#94)
* Optionally collect RGW GC task stats

* Minor changes per code-review, add some additional tests to squeeze out extra coverage
2018-08-01 07:37:07 -06:00
Vaibhav Bhembre ae0f874abb
Merge pull request #90 from jan--f/add-active-pgs-luminous
health: add active_pg metric
2018-07-09 09:48:59 -04:00
Vaibhav Bhembre 5df7451281
Merge pull request #89 from jan--f/backport-80-terminate-process
Terminate exporter process if maximum open files exceeded
2018-07-09 09:36:42 -04:00
Jan Fajerski a42a258a28 Add constant for tcp keepalive periode
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2018-07-09 11:08:14 +02:00
Jan Fajerski 2436d00967 health: add active_pg metric
This metric allows to graph active PGs vs not active, i.e. the number of
PGs ceph is serving vs. currently unavailable PGs.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2018-07-09 10:18:47 +02:00
Vaibhav Bhembre cd84a7b9db
Merge pull request #93 from digitalocean/ceph_osd_down_destroyed
osd: add metrics for down and destroyed OSD
2018-06-25 10:47:19 -04:00
Vaibhav Bhembre 6aed4c4f74
Merge pull request #92 from digitalocean/luminous_latency_skew_support
monitors: add back clock skew and latency metric support
2018-06-25 10:46:34 -04:00
Vaibhav Bhembre aa5abdc470 osd: add metrics for down and destroyed OSD 2018-06-24 13:20:58 -04:00
Vaibhav Bhembre 6f290751c9 monitors: add back clock skew and latency metric support 2018-06-23 19:09:07 -04:00
ssobolewski 9a39cc64ed
Add tracking of scrub/deep scrub on a per osd basis (#91)
* Add tracking of scrub/deep scrub on a per osd basis

* Changes per code review comments/discussion
2018-06-14 12:40:46 -06:00
Tim Serong cd9aa031a8 Terminate exporter process if maximum open files exceeded
This is somewhat of a workaround for the exporter becoming
perpetually blocked when it runs out of file descriptors if
the cluster is down for too long, as mentioned in:

  https://github.com/digitalocean/ceph_exporter/issues/60#issuecomment-319396108

The problem is that if the MONs are down for long enough,
each time prometheus scrapes the metrics, another socket is
opened, but these block forever.  If the cluster comes back
up before we run out of FDs, the blocked requests recover.
If the clusetr *doesn't* come back up before we run out of
FDs, the blocked requests never recover.

This commit causes ceph exporter to terminate if it runs
out of file descriptors, which IMO is better than blocking
forever -- it'll be a noisier failure, and also if you're
running ceph_exporter via systemd, systemd will then
automatically trigger a service restart.

Signed-off-by: Tim Serong <tserong@suse.com>
(cherry picked from commit bb1ad364b5)
2018-06-12 10:49:04 +02:00
Vaibhav Bhembre ccd6b7135b
Merge pull request #88 from digitalocean/fix_stuck_requests
health: add visibility for stuck requests
2018-05-10 13:52:14 -04:00
Vaibhav Bhembre c887505a42 health: add visibility for stuck requests 2018-05-10 13:04:52 -04:00
Vaibhav Bhembre 1b91b5bf2d
Merge pull request #87 from digitalocean/add_slow_request_osd
health: capture slow request per osd
2018-05-10 12:11:50 -04:00
Vaibhav Bhembre 219fb69bde health: capture slow request per osd 2018-05-10 11:55:39 -04:00
Vaibhav Bhembre afd5a2c4bf update travis go to 1.9.2 2018-01-23 21:04:24 +05:30
Vaibhav Bhembre 5f702bc55f
Merge pull request #76 from digitalocean/luminous-switch-json
luminous: move health stats to be extracted from status json
2017-11-14 16:09:58 -05:00
Vaibhav Bhembre 9c40ffc620 luminous: pick correct value from health status after compat warning is removed 2017-11-13 15:27:20 -05:00
Vaibhav Bhembre 96785e82b1 luminous: move luminous stats to be extracted from json 2017-11-13 15:26:20 -05:00
Vaibhav Bhembre c41f6ae85c
Merge pull request #59 from AcalephStorage/multi-stage-build
Dockerfile with multi-stage build
2017-11-10 16:47:08 -05:00
Vaibhav Bhembre 3828da2b24
Merge pull request #71 from tserong/wip-add-promhttp-to-vendor
Add promhttp to vendor directory
2017-11-10 16:34:04 -05:00
Tim Serong 66a684fbce Add promhttp to vendor directory
This adds promhttp (required by 4c70969) to the vendor directory.

Signed-off-by: Tim Serong <tserong@suse.com>
2017-09-23 17:16:34 +02:00
Vaibhav Bhembre ac79cf743b Merge pull request #69 from utkarshmani1997/feature_issue#68_exporter
update /metrics handler to promhttp.Handler()
2017-09-14 08:12:40 -04:00
utkarshmani1997 4c70969940 update /metrics handler to promhttp.Handler() 2017-09-14 17:09:52 +05:30
Vaibhav Bhembre f870fc3254 Merge pull request #67 from fitschn/fix-metrics-recovery-client-io-luminous
health: expose client io and recovery io for luminous
2017-09-02 11:43:51 -04:00
Vaibhav Bhembre bb62aa1b76 Merge pull request #66 from fitschn/fix-metrics-slow-request-luminous
health: expose slow requests for luminous
2017-08-23 21:55:48 -04:00
Mathias Nohr b2f520b9aa health: expose client io and recovery io for luminous
An update to ceph luminous breaks the metrics for recovery and client io, because of the new output format for ceph status.

The updates are still compatible to older versions of ceph.
2017-08-23 15:22:26 +02:00
Mathias Nohr 8ba20e8cb3 osd: expose slow requests for luminous
Extract slow request information from health checks in ceph luminous. It's still compatible to older versions of ceph.

It's possible to add additional luminous metrics, i.e. stucked pgs, in the same way.
2017-08-23 11:39:48 +02:00
Vaibhav Bhembre f2a4404e28 Merge pull request #64 from tserong/remove-no-pools-error
pool: remove error log when no pools present
2017-08-18 09:48:20 -04:00
Tim Serong ac1d8a93e2 pool: remove error log when no pools present
If there's no pools created yet (which can be true after a ceph
cluster is initially created, but before any pools are present),
there'll be a log every minute "[ERROR] failed collecting pool
usage metrics: no pools found in the cluster to report stats on".

IMO that's not actually an error, it's just the way things are,
so doesn't really need to be reported.

Signed-off-by: Tim Serong <tserong@suse.com>
2017-08-18 15:22:28 +10:00
Hunter Nield 5121d03b6f Update Dockerfile with multistage build
- Gets the image size down to around 60MB
- Bumps up the base image to 16.04
2017-06-13 12:58:18 +08:00
Vaibhav Bhembre 80aa3ff0d0 Merge pull request #53 from skloeckner/master
Added docker-compose for quick test environment and brief description…
2017-05-02 12:03:24 -04:00