RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-02 17:12:31 +00:00

Author	SHA1	Message	Date
Juan Miguel Olmo	b7b7ef90f4	Merge pull request #50132 from aruniiird/add-rbd-mirror-mon-alerts ceph-mixin: Add RBD Mirror monitoring alerts	2023-10-10 13:37:01 +02:00
Josh Soref	73479a1e05	dashboard: fix spelling errors * access * availability * dashboard * depth * dimless * evaluation * executing * existing * facts * gigabytes * idempotent * independent * initial * inventory * managed * must not * notification * notifications * orchestrator * previously * promises * purging * queried * repetitive * split * subdirectories * tenant * the * timestamp * transformed * unavailable * visibility * yourself Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>	2023-08-09 11:14:20 -04:00
Arun Kumar Mohan	5c21134064	ceph-mixin: add RBD Mirror monitoring alerts Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>	2023-08-09 12:19:04 +05:30
Pere Diaz Bou	8e07fbd2ea	Merge pull request #48843 from rhcs-dashboard/expose_slow_ops mgr/prometheus: expose daemon health metrics Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-12-20 12:25:32 +01:00
Pere Diaz Bou	5a2b7c25b6	mgr/prometheus: expose daemon health metrics Until now daemon health metrics were stored without being used. One of the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs which this commit tries to expose to bring fine grained metrics to find troublesome OSDs instead of having a lone healthcheck of slow ops in the whole cluster. Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-12-20 09:44:49 +01:00
Nizamudeen A	3f1c1b6376	Merge pull request #48526 from rhcs-dashboard/fix-cephPoolGrowth-alert mgr/dashboard: Fix CephPoolGrowthWarning alert Reviewed-by: Pegonzal <NOT@FOUND> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-11-29 18:29:01 +05:30
Aashish Sharma	97189b66af	mgr/dashboard: Fix CephPoolGrowthWarning alert Prometheus reports an error - many-to-many matching not allowed: matching labels must be unique on one side for CephPoolGrowthWarning if we have same pool ids on two different instances. Fixes: https://tracker.ceph.com/issues/58017 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-11-22 11:55:41 +05:30
Christian Kugler	4aecdad350	ceph-mixin: Add Prometheus Alert for Degraded Bond Currently there is no alert for a network interface card to be misconfigured or failed which is part of a network bond. This could lead to redundancies and performance being degraded unnoticed. To solve this, I use node exporter metrics to look at the number of total peers of the bond and the ones that are active. If the numbers differ, something is up and should be looked at. Fixes: https://tracker.ceph.com/issues/57962 Signed-off-by: Christian Kugler <syphdias+git@gmail.com>	2022-11-02 14:48:57 +01:00
Aswin Toni	351e1ac639	ceph-mixin: fix CephNodeNetworkPacket alerts Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-23 15:26:52 +02:00
Aswin Toni	5cdc1c62c5	prometheus: add multicluster support to alerts Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-17 12:08:56 +02:00
Anthony D'Atri	9b65974468	monitoring/ceph-mixin: clean up prometheus_alerts.yml Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>	2022-07-28 19:17:51 -07:00
Ernesto Puerta	a98c2475c6	Merge pull request #45254 from travisn/prometheus-rules-typos prometheus: Spell check the alert descriptions Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Laura Flores <lflores@redhat.com> Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: sunilangadi2 <NOT@FOUND> Reviewed-by: Travis Nielsen <tnielsen@redhat.com>	2022-04-04 13:46:00 +02:00
Travis Nielsen	9cca95b16a	prometheus: spell check the alert descriptions Signed-off-by: Travis Nielsen <tnielsen@redhat.com>	2022-03-30 17:38:43 -06:00
Aashish Sharma	49d6068463	mgr/dashboard: fix promtool test for mtu alert Fixes: https://tracker.ceph.com/issues/55004 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-03-28 13:39:38 +02:00
Nizamudeen A	27592b7561	cephadm: change shared_folder directory for prometheus and grafana After https://github.com/ceph/ceph/pull/44059 the monitoring/prometheus and monitoring/grafana/dashboards directories are changed to monitoring/ceph-mixins. That broke the shared_folders in the cephadm bootstrap script. Changed all the instances of monitoring/prometheus and monitoring/grafana/dashboards to monitoring/ceph-mixins Also, renaming all the instances of prometheus_alerts.yaml to prometheus_alerts.yml. Fixes: https://tracker.ceph.com/issues/54176 Signed-off-by: Nizamudeen A <nia@redhat.com>	2022-02-07 16:34:37 +05:30
Arthur Outhenin-Chalandre	98236e3a1d	mgr/dashboard: monitoring: refactor into ceph-mixin Mixin is a way to bundle dashboards, prometheus rules and alerts into jsonnet package. Shifting to mixin will allow easier integration with monitoring automation that some users may use. This commit moves `/monitoring/grafana/dashboards` and `/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts was also converted to Jsonnet using an automated way (from yaml to json to jsonnet). This commit minimises any change made to the generated files and should not change neithers the dashboards nor the Prometheus alerts. In the future some configuration will also be added to jsonnet to add more functionalities to the dashboards or alerts (i.e.: multi cluster). Fixes: https://tracker.ceph.com/issues/53374 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-02-03 13:08:20 +01:00

16 Commits