ceph/prometheus at ec05d8743226c4126e56b3d801208b51a889c895 - ceph

History

Benoît Knecht 653c3f6682 monitoring: Fix "10% OSDs down" alert description The alert was triggered when less than 90% of OSDs were _up_, but then the description took that value and described it as the percentage of OSDs being _down_. So with 12% of OSDs down, the alert description would read: ``` 88% or 88 of 100 OSDs are down (>=10%). ``` which can be panic-inducing. This commit changes the alert expression to actually compute the ratio of OSDs being down, which makes the correct value appear in the description. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-05-06 18:49:26 +02:00
..
alerts	monitoring: Fix "10% OSDs down" alert description	2020-05-06 18:49:26 +02:00
README.md	…

Benoît Knecht 653c3f6682 monitoring: Fix "10% OSDs down" alert description

The alert was triggered when less than 90% of OSDs were _up_, but then the
description took that value and described it as the percentage of OSDs being
_down_. So with 12% of OSDs down, the alert description would read:

```
88% or 88 of 100 OSDs are down (>=10%).
```

which can be panic-inducing.

This commit changes the alert expression to actually compute the ratio of OSDs
being down, which makes the correct value appear in the description.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>

2020-05-06 18:49:26 +02:00

alerts

monitoring: Fix "10% OSDs down" alert description

2020-05-06 18:49:26 +02:00

README.md

…

README.md

Alerts

In monitoring/prometheus/alerts you'll find a set of Prometheus alert rules that should provide a decent set of default alerts for a Ceph cluster. Just put this file in a place according to your Prometheus configuration (wherever the rules configuration stanza points).

README.md

Prometheus related bits

Alerts