Commit Graph

15 Commits

Author SHA1 Message Date
Vallari Agrawal 7994fea436
monitoring: add 2 new nvmeof alerts
Add NVMeoFMissingListener and NVMeoFZeroListenerSubsystem
alerts to prometheus_alerts.libsonnet.

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
2024-11-11 17:23:04 +05:30
Christian Rohmann 810c706868 Add multi-cluster support (showMultiCluster=True) to alerts
Following PR https://github.com/ceph/ceph/pull/55495 fixing the
dashboard in regards to multiple clusters storing their metrics
in a single Prometheus instance, this PR addresses the issues
for alerts.

Fixes: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
2024-10-21 11:53:10 +05:30
Paul Cuzner f1573b76f3 ceph-mixins: Add nvmeof alerts
Signed-off-by: Paul Cuzner <pcuzner@ibm.com>
2024-02-27 09:51:04 +13:00
Guillaume Abrioux 76d8e0bbbf monitoring: add new alerts
This adds new hardware monitoring alerts.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2024-01-25 14:43:30 +00:00
Nizamudeen A a5027e37ec mgr/dashboard: fix broken alert generator
Currently the alert generator is broken if you try to run `tox
-ealerts-fix`. I fixed it and ran the command and it built a new json
file as well.

Signed-off-by: Nizamudeen A <nia@redhat.com>
2023-10-13 12:42:50 +05:30
Juan Miguel Olmo b7b7ef90f4
Merge pull request #50132 from aruniiird/add-rbd-mirror-mon-alerts
ceph-mixin: Add RBD Mirror monitoring alerts
2023-10-10 13:37:01 +02:00
Josh Soref 73479a1e05 dashboard: fix spelling errors
* access
* availability
* dashboard
* depth
* dimless
* evaluation
* executing
* existing
* facts
* gigabytes
* idempotent
* independent
* initial
* inventory
* managed
* must not
* notification
* notifications
* orchestrator
* previously
* promises
* purging
* queried
* repetitive
* split
* subdirectories
* tenant
* the
* timestamp
* transformed
* unavailable
* visibility
* yourself

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2023-08-09 11:14:20 -04:00
Arun Kumar Mohan 5c21134064 ceph-mixin: add RBD Mirror monitoring alerts
Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
2023-08-09 12:19:04 +05:30
Arun Kumar Mohan e9d803d608 ceph-mixin: fix manually edited 'prometheus_alerts.yml' file
File 'prometheus_alerts.yml' file should not be edited directly.
The changes should be added to 'prometheus_alerts.libsonnet' file
(and/or any other appropriate lib/j sonnet files) and generated
using 'make generate' command.

Adding all the changes to 'prometheus_alerts.libsonnet' file and
building/generating the prometheus_alerts YAML file.

PS: all the changes seen in 'prometheus_alerts.yml' file is due
to the re-arrangement of lines. The file remains same.

Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
2023-08-09 12:19:04 +05:30
Pere Diaz Bou 5a2b7c25b6 mgr/prometheus: expose daemon health metrics
Until now daemon health metrics were stored without being used. One of
the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs
which this commit tries to expose to bring fine grained metrics to find
troublesome OSDs instead of having a lone healthcheck of slow ops in the
whole cluster.

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
2022-12-20 09:44:49 +01:00
Christian Kugler 4aecdad350
ceph-mixin: Add Prometheus Alert for Degraded Bond
Currently there is no alert for a network interface card to be misconfigured or
failed which is part of a network bond.

This could lead to redundancies and performance being degraded unnoticed.

To solve this, I use node exporter metrics to look at the number of total peers
of the bond and the ones that are active. If the numbers differ, something is up
and should be looked at.

Fixes: https://tracker.ceph.com/issues/57962
Signed-off-by: Christian Kugler <syphdias+git@gmail.com>
2022-11-02 14:48:57 +01:00
Arthur Outhenin-Chalandre f744a93ef1
Merge pull request #47707 from bosc0/fix_alert
Ceph-mixin: Fix CephNodeNetworkPacket alerts
2022-08-30 12:49:23 +02:00
Aswin Toni 351e1ac639 ceph-mixin: fix CephNodeNetworkPacket alerts
Signed-off-by: Aswin Toni <aswin.toni@cern.ch>
2022-08-23 15:26:52 +02:00
Aswin Toni 35183140f6 ceph-mixin: fix config inheritance
Signed-off-by: Aswin Toni <aswin.toni@cern.ch>
2022-08-18 16:21:36 +02:00
Aswin Toni 5cdc1c62c5 prometheus: add multicluster support to alerts
Signed-off-by: Aswin Toni <aswin.toni@cern.ch>
2022-08-17 12:08:56 +02:00