The MTU mismatch warning was being fired for those NIC's as well that are in down state. This PR intends to fix this issue
Fixes:https://tracker.ceph.com/issues/52028
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
run-promtool-unittests is failing with difference in floating point values in some complex calculations. This PR intends to simplify those calculations and fix this issue.
Fixes: https://tracker.ceph.com/issues/49952
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
This PR intends to add unit testing for prometheus rules using promtool. To run the tests run 'run-promtool-unittests.sh' file.
Fixes: https://tracker.ceph.com/issues/45415
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
mgr/dashboard: prometheus alerting: add some leeway for package drops and errors
Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
This PR intends to alert a user if a specific network is configured with a custom MTU
Fixes: https://tracker.ceph.com/issues/48748
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
SLOW_OPS is triggered by op tracker, and generates a health
alert but healthchecks do not create metrics for prometheus to
use as alert triggers. This change adds SLOW_OPS metric, and
provides a simple means to extend to other relevant health
checks in the future
If the extract of the value from the health check message fails
we log an error and remove the metric from the metric set. In
addition the metric description has changed to better reflect
the scenarios where SLOW_OPS can be triggered.
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
The alert was triggered when less than 90% of OSDs were _up_, but then the
description took that value and described it as the percentage of OSDs being
_down_. So with 12% of OSDs down, the alert description would read:
```
88% or 88 of 100 OSDs are down (>=10%).
```
which can be panic-inducing.
This commit changes the alert expression to actually compute the ratio of OSDs
being down, which makes the correct value appear in the description.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>