RepoMirrors/ceph - ceph

Commit Graph

Author	SHA1	Message	Date
Vallari Agrawal	7994fea436	monitoring: add 2 new nvmeof alerts Add NVMeoFMissingListener and NVMeoFZeroListenerSubsystem alerts to prometheus_alerts.libsonnet. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>	2024-11-11 17:23:04 +05:30
Christian Rohmann	810c706868	Add multi-cluster support (showMultiCluster=True) to alerts Following PR https://github.com/ceph/ceph/pull/55495 fixing the dashboard in regards to multiple clusters storing their metrics in a single Prometheus instance, this PR addresses the issues for alerts. Fixes: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>	2024-10-21 11:53:10 +05:30
Paul Cuzner	f1573b76f3	ceph-mixins: Add nvmeof alerts Signed-off-by: Paul Cuzner <pcuzner@ibm.com>	2024-02-27 09:51:04 +13:00
Guillaume Abrioux	76d8e0bbbf	monitoring: add new alerts This adds new hardware monitoring alerts. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-01-25 14:43:30 +00:00
Nizamudeen A	a5027e37ec	mgr/dashboard: fix broken alert generator Currently the alert generator is broken if you try to run `tox -ealerts-fix`. I fixed it and ran the command and it built a new json file as well. Signed-off-by: Nizamudeen A <nia@redhat.com>	2023-10-13 12:42:50 +05:30
Juan Miguel Olmo	b7b7ef90f4	Merge pull request #50132 from aruniiird/add-rbd-mirror-mon-alerts ceph-mixin: Add RBD Mirror monitoring alerts	2023-10-10 13:37:01 +02:00
Josh Soref	73479a1e05	dashboard: fix spelling errors * access * availability * dashboard * depth * dimless * evaluation * executing * existing * facts * gigabytes * idempotent * independent * initial * inventory * managed * must not * notification * notifications * orchestrator * previously * promises * purging * queried * repetitive * split * subdirectories * tenant * the * timestamp * transformed * unavailable * visibility * yourself Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>	2023-08-09 11:14:20 -04:00
Arun Kumar Mohan	5c21134064	ceph-mixin: add RBD Mirror monitoring alerts Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>	2023-08-09 12:19:04 +05:30
Arun Kumar Mohan	e9d803d608	ceph-mixin: fix manually edited 'prometheus_alerts.yml' file File 'prometheus_alerts.yml' file should not be edited directly. The changes should be added to 'prometheus_alerts.libsonnet' file (and/or any other appropriate lib/j sonnet files) and generated using 'make generate' command. Adding all the changes to 'prometheus_alerts.libsonnet' file and building/generating the prometheus_alerts YAML file. PS: all the changes seen in 'prometheus_alerts.yml' file is due to the re-arrangement of lines. The file remains same. Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>	2023-08-09 12:19:04 +05:30
Pere Diaz Bou	5a2b7c25b6	mgr/prometheus: expose daemon health metrics Until now daemon health metrics were stored without being used. One of the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs which this commit tries to expose to bring fine grained metrics to find troublesome OSDs instead of having a lone healthcheck of slow ops in the whole cluster. Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-12-20 09:44:49 +01:00
Christian Kugler	4aecdad350	ceph-mixin: Add Prometheus Alert for Degraded Bond Currently there is no alert for a network interface card to be misconfigured or failed which is part of a network bond. This could lead to redundancies and performance being degraded unnoticed. To solve this, I use node exporter metrics to look at the number of total peers of the bond and the ones that are active. If the numbers differ, something is up and should be looked at. Fixes: https://tracker.ceph.com/issues/57962 Signed-off-by: Christian Kugler <syphdias+git@gmail.com>	2022-11-02 14:48:57 +01:00
Arthur Outhenin-Chalandre	f744a93ef1	Merge pull request #47707 from bosc0/fix_alert Ceph-mixin: Fix CephNodeNetworkPacket alerts	2022-08-30 12:49:23 +02:00
Aswin Toni	351e1ac639	ceph-mixin: fix CephNodeNetworkPacket alerts Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-23 15:26:52 +02:00
Aswin Toni	35183140f6	ceph-mixin: fix config inheritance Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-18 16:21:36 +02:00
Aswin Toni	5cdc1c62c5	prometheus: add multicluster support to alerts Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-17 12:08:56 +02:00

15 Commits