RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-03-25 11:48:05 +00:00

Author	SHA1	Message	Date
Benoît Knecht	653c3f6682	monitoring: Fix "10% OSDs down" alert description The alert was triggered when less than 90% of OSDs were _up_, but then the description took that value and described it as the percentage of OSDs being _down_. So with 12% of OSDs down, the alert description would read: ``` 88% or 88 of 100 OSDs are down (>=10%). ``` which can be panic-inducing. This commit changes the alert expression to actually compute the ratio of OSDs being down, which makes the correct value appear in the description. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-05-06 18:49:26 +02:00
Volker Theile	e197e4d7f4	monitoring: alert for pool fill up broken Fixes: https://tracker.ceph.com/issues/44991 Signed-off-by: Volker Theile <vtheile@suse.com>	2020-04-08 15:02:45 +02:00
Volker Theile	a5ade11a31	Merge pull request #34239 from p-se/wip-pse-fix-false-root-vol-full-alert monitoring: root volume full alert fires false positives Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Jan Fajerski <jfajerski@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>	2020-04-06 14:17:17 +02:00
Patrick Seidensal	6935dc5592	monitoring: alert for prediction of disk and pool fill up broken Fixes: https://tracker.ceph.com/issues/44776 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-03-27 13:44:28 +01:00
Patrick Seidensal	f8e347f771	monitoring: root volume full alert fires false positives Fixes: https://tracker.ceph.com/issues/44780 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-03-27 11:06:08 +01:00
James Cheng	1b980ef88c	monitoring: Fix pool capacity incorrect Signed-off-by: James Cheng <james59988@gmail.com>	2020-02-18 19:19:13 +08:00
Aleksei Zakharov	4eb58f7ccc	monitoring/grafana,prometheus: add per-pool pg states support Signed-off-by: Aleksei Zakharov <zaharov@selectel.ru>	2020-01-29 17:28:36 +03:00
Patrick Seidensal	fb51c589b5	monitoring: add details to Prometheus' alerts Fixes: https://tracker.ceph.com/issues/43764 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-01-24 14:21:31 +01:00
Jan Fajerski	e098536acc	Merge pull request #32325 from Kriechi/fix-42982 monitoring: fix prometheus alert for full pools	2020-01-20 10:42:36 +01:00
Thomas Kriechbaumer	9abddc0dd3	monitoring: fix prometheus alert for full pools The existing alert (introduced via https://tracker.ceph.com/issues/24977) already triggers when still 50% of storage space are available. Fixes: https://tracker.ceph.com/issues/42982 Signed-off-by: Thomas Kriechbaumer <thomas@kriechbaumer.name>	2019-12-18 15:04:51 +01:00
Patrick Seidensal	d262adeb21	monitoring: fix indentation of ceph default alerts Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2019-11-18 12:40:55 +01:00
Patrick Seidensal	e923af3430	monitoring: wait before firing osd full alert Fixes: https://tracker.ceph.com/issues/42862 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2019-11-18 12:39:27 +01:00
Volker Theile	8e6838c740	monitoring: SNMP OID per every Prometheus alert rule Use the Ceph enterprise OID 50495 (https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers) and create OIDs for every Prometheus alert rule according to the schema at https://github.com/SUSE/prometheus-webhook-snmp/blob/master/README.md. Example OID: 1.3.6.1.4.1.50495.15.1.2.2.1 All alert rule OIDs are located below the object identifier 15 (15 for p which is the first character of prometheus). Check out the MIB at https://github.com/SUSE/prometheus-webhook-snmp/blob/master/PROMETHEUS-ALERT-CEPH-MIB.txt for more details. Signed-off-by: Volker Theile <vtheile@suse.com>	2019-05-28 09:59:50 +02:00
Jan Fajerski	c0e58bd8ae	monitoring: add a few prometheus alerts Alerts are from https://github.com/SUSE/DeepSea/blob/SES5/srv/salt/ceph/monitoring/prometheus/files/ses_default_alerts.yml but updated for the mgr module and node_exporter >= 0.15. Signed-off-by: Jan Fajerski <jfajerski@suse.com>	2019-04-26 11:21:39 +02:00

14 Commits