ceph_exporter

Commit Graph

Author	SHA1	Message	Date
Joshua Baergen	c01b5fb37e	monitors: Remove stats that have been dead since Luminous Ceph commit reference e170405fd873723bec6ce691afad82641bab2ef1	2023-12-15 15:50:21 -07:00
Tyler Brekke	3c403081b5	osd: Add new collector for osd metadata scrape created_at, ceph_version_when_created, and osd_objectstore from ceph osd metadata	2023-08-24 17:37:43 -07:00
Alex Marangone	2fb4e78d4a	ceph/osd: fix root/rack labels	2023-05-23 08:06:31 -07:00
Joshua Baergen	5b487fad9c	osd: Internally poll PG dump for oldest active PG tracking Without this, the granularity of the oldest active PG is based on external scrape frequency, and an unlucky sequence of scrapes could see the same PG inactive two scrapes in a row even though it was active in between. Preferably, we would update this even more often than 10 seconds, but PG dumps can take a while.	2023-05-08 16:33:13 -06:00
Daniel R	46b06f317f	Fix timeouts and use goroutines for collectors/commands (#234 ) * rados: timeouts on Mon/Mgr command & connections * rados: remove unneeded timeouts * make all collectors async * fix osd collector * only add 1 in waitgroups * ceph: don't pass waitgroups to collectors * monitors.go: use errgroup instead of waitgroup * rados: add comment, pass arg & close channel	2023-03-14 14:00:33 -04:00
Alex Marangone	52ad633440	move to collector interface to avoid ugly switch	2023-02-14 12:24:34 -08:00
Alex Marangone	2235817fc4	move common mocking to a func	2023-02-14 11:36:09 -08:00
Alex Marangone	abbe4444ef	update test to support version being passed in Collect()	2023-02-14 11:17:51 -08:00
Alex Marangone	ba15bf50a3	pass version to collectors when calling Collect()	2023-02-14 11:10:54 -08:00
Alex Marangone	69edc55596	exporter: do not reinitialize collectors on every collect We store all the collectors in a map of string in order to dynamically load/unload the rbd mirror collector	2023-02-14 09:27:21 -08:00
Daniel R	d8bf71a8fc	Split cluster health state by plus sign PR #226	2023-01-24 17:42:34 -05:00
Daniel R	50874e99af	revert health_status_interp to gauge	2022-10-12 18:03:20 -04:00
Daniel R	ae64dae6f8	add a comment indicating gauge deprecation	2022-10-11 11:58:42 -04:00
Daniel R	b9af3ab29f	bugfixes; stop defaulting map flags to 0 in the constmetric	2022-10-07 14:53:46 -04:00
Daniel R	1a741d7606	introduce constmetrics for osdmap flags	2022-10-07 14:17:29 -04:00
Daniel R	c3a3d581aa	migrate health checks from gauges to constmetrics	2022-10-06 16:20:24 -04:00
Daniel R	362cb4b8dd	fix health unit tests	2022-10-06 16:20:19 -04:00
Daniel R	5dd16fe875	migrate pool_usage.go to constmetrics	2022-10-06 16:20:10 -04:00
Tyler Brekke	957b06df91	Add user to exporter for use with rbd/rgw commands	2022-08-25 15:20:57 -07:00
Tyler Brekke	19a3cd5c7e	Add rbd-mirror health status	2022-08-24 14:23:23 -07:00
Xavier Villaneau	ae09ffe3fe	Add `hostname` label to `ceph_crash_reports`	2022-06-16 13:20:29 -04:00
Xavier Villaneau	2faa6cb82d	Fix comments and docstring in getCrashLs	2022-06-15 17:04:04 -04:00
Xavier Villaneau	3141fef319	Use JSON output from `ceph crash ls` instead of plain output	2022-06-15 17:04:04 -04:00
Xavier Villaneau	adf792c3e8	Use ConstMetrics for ceph_crash_reports Makes the code simpler since we're not tracking state anymore. Also rewrote the tests to be more in-line with the rest.	2022-06-15 17:04:04 -04:00
Xavier Villaneau	74c89af225	Implement new gauge counting crash reports New metric: `ceph_crash_reports` which counts the entries returned by `ceph crash ls` by daemon name and archival status. This is not the same as `ceph_new_crash_reports` which is the value of the `RECENT_CRASH` health check, and that only counts the non-archived errors of the past two weeks. The new metric counts errors as long as they are not purged (which is done after 1 year by defaults).	2022-06-15 17:04:04 -04:00
AKYD	763e5ecd21	Normalize ceph-ansible version format	2022-05-25 11:49:04 +03:00
Joshua Baergen	ebd166be2d	ceph: Support the Octopus+ mgrmap format.	2022-04-12 08:52:04 -06:00
Joshua Baergen	4e0f8910a4	Add missing tests for Octopus+ osdmap format. In TestClusterHealthCollector, test all supported versions by default, and split the osdmap tests for Nautilus vs. Octopus+. There were a number of tests that included an osdmap that didn't need it, and the osdmap was removed from them so that version-specific testing would not be required.	2022-04-12 08:52:01 -06:00
haoyixing	407248ce1d	feat: add misplaced ratio metric Misplaced ratio equals to misplaced_objects deviding misplaced_total, not misplaced_objects / num_objects. So add a separate metric to show misplaced ratio. Signed-off-by: haoyixing <haoyixing@kuaishou.com>	2022-03-29 18:38:15 -07:00
Kyle	917a468065	update deps and reduce a warn to debug	2022-03-29 17:44:50 -07:00
Kyle	1d7bac531d	update license headers	2022-03-23 14:02:21 -07:00
Kyle	4d817f487d	fix staticcheck errors	2022-03-23 12:24:28 -07:00
Kyle	d6b67a77c3	removed down osd duplicate filtering	2022-03-22 12:59:51 -07:00
Kyle	3a0b289eda	filter duplicate OSD nodes for down health check and fix health tests	2022-03-21 15:28:20 -07:00
Kyle	b806cf51bb	remove pre-nautilus health check code	2022-03-21 14:52:34 -07:00
Kyle	df7435b259	add DAEMON_OLD_VERSION health check, update readme, remove makefile	2022-03-21 13:56:19 -07:00
Kyle	2122a3331f	support flattened osdmap format added in octopus	2022-03-16 14:13:57 -07:00
Xavier Villaneau	6f83fdd300	Restructure so that tests do not depend on go-ceph - `ceph.Conn` interface no longer depends on go-ceph/rados, now defines its own `PoolStat` structure for our use. - New separate `rados` package that implements the interface - Merged `mocks` package into `ceph` to avoid circular import	2022-02-24 15:57:00 -05:00
Kyle	566f1fa5d3	a ton of refactoring	2022-02-23 15:43:46 -08:00

39 Commits