Benoît Knecht
653c3f6682
monitoring: Fix "10% OSDs down" alert description
...
The alert was triggered when less than 90% of OSDs were _up_, but then the
description took that value and described it as the percentage of OSDs being
_down_. So with 12% of OSDs down, the alert description would read:
```
88% or 88 of 100 OSDs are down (>=10%).
```
which can be panic-inducing.
This commit changes the alert expression to actually compute the ratio of OSDs
being down, which makes the correct value appear in the description.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-05-06 18:49:26 +02:00
Volker Theile
e197e4d7f4
monitoring: alert for pool fill up broken
...
Fixes: https://tracker.ceph.com/issues/44991
Signed-off-by: Volker Theile <vtheile@suse.com>
2020-04-08 15:02:45 +02:00
Volker Theile
a5ade11a31
Merge pull request #34239 from p-se/wip-pse-fix-false-root-vol-full-alert
...
monitoring: root volume full alert fires false positives
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Jan Fajerski <jfajerski@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
2020-04-06 14:17:17 +02:00
Patrick Seidensal
6935dc5592
monitoring: alert for prediction of disk and pool fill up broken
...
Fixes: https://tracker.ceph.com/issues/44776
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2020-03-27 13:44:28 +01:00
Patrick Seidensal
f8e347f771
monitoring: root volume full alert fires false positives
...
Fixes: https://tracker.ceph.com/issues/44780
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2020-03-27 11:06:08 +01:00
James Cheng
1b980ef88c
monitoring: Fix pool capacity incorrect
...
Signed-off-by: James Cheng <james59988@gmail.com>
2020-02-18 19:19:13 +08:00
Aleksei Zakharov
4eb58f7ccc
monitoring/grafana,prometheus: add per-pool pg states support
...
Signed-off-by: Aleksei Zakharov <zaharov@selectel.ru>
2020-01-29 17:28:36 +03:00
Patrick Seidensal
fb51c589b5
monitoring: add details to Prometheus' alerts
...
Fixes: https://tracker.ceph.com/issues/43764
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2020-01-24 14:21:31 +01:00
Jan Fajerski
e098536acc
Merge pull request #32325 from Kriechi/fix-42982
...
monitoring: fix prometheus alert for full pools
2020-01-20 10:42:36 +01:00
Thomas Kriechbaumer
9abddc0dd3
monitoring: fix prometheus alert for full pools
...
The existing alert (introduced via
https://tracker.ceph.com/issues/24977 ) already triggers when still 50%
of storage space are available.
Fixes: https://tracker.ceph.com/issues/42982
Signed-off-by: Thomas Kriechbaumer <thomas@kriechbaumer.name>
2019-12-18 15:04:51 +01:00
Patrick Seidensal
d262adeb21
monitoring: fix indentation of ceph default alerts
...
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2019-11-18 12:40:55 +01:00
Patrick Seidensal
e923af3430
monitoring: wait before firing osd full alert
...
Fixes: https://tracker.ceph.com/issues/42862
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2019-11-18 12:39:27 +01:00
Volker Theile
8e6838c740
monitoring: SNMP OID per every Prometheus alert rule
...
Use the Ceph enterprise OID 50495 (https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers ) and create OIDs for every Prometheus alert rule according to the schema at https://github.com/SUSE/prometheus-webhook-snmp/blob/master/README.md .
Example OID:
1.3.6.1.4.1.50495.15.1.2.2.1
All alert rule OIDs are located below the object identifier 15 (15 for p which is the first character of prometheus). Check out the MIB at https://github.com/SUSE/prometheus-webhook-snmp/blob/master/PROMETHEUS-ALERT-CEPH-MIB.txt for more details.
Signed-off-by: Volker Theile <vtheile@suse.com>
2019-05-28 09:59:50 +02:00
Jan Fajerski
c0e58bd8ae
monitoring: add a few prometheus alerts
...
Alerts are from
https://github.com/SUSE/DeepSea/blob/SES5/srv/salt/ceph/monitoring/prometheus/files/ses_default_alerts.yml
but updated for the mgr module and node_exporter >= 0.15.
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
2019-04-26 11:21:39 +02:00