RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-02 09:02:34 +00:00

Author	SHA1	Message	Date
Aashish Sharma	2e54c9a01e	mgr/dashboard: Add a new chart for replication delta per shard in rgw sync overview grafana dashboard Fixes: https://tracker.ceph.com/issues/66994 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2024-07-17 15:49:15 +05:30
Nizamudeen A	d11b25ed0f	Merge pull request #56014 from badone/wip-tracker-63591-pyyaml-cython_sources install-deps: Update Pyyaml version Reviewed-by: Ankush Behl <cloudbehl@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2024-05-21 12:00:53 -04:00
Aashish Sharma	1622ad8f76	mgr/dashboard: fix cluster filter typo in multi-cluster-overview grafana dashboard Fixes: https://tracker.ceph.com/issues/65760 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2024-05-02 17:12:26 +05:30
Aashish Sharma	fce7c520f3	Merge pull request #56575 from cloudbehl/ceph-cluster-json-update monitoring/ceph-mixin: Add cluster variable to ceph-cluster.json Reviewed-by: Aashish Sharma <aasharma@redhat.com>	2024-05-02 15:34:50 +05:30
Nizamudeen A	a8d01fff00	Merge pull request #55495 from frittentheke/issue_64321 monitoring/ceph-mixin: Cleanup of variables, queries and tests (to fix showMultiCluster=True) Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ankush Behl <cloudbehl@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2024-05-02 13:55:37 +05:30
Adam King	c6871bbaf5	monitoring/ceph-mixin: set NVMeoFMaxGatewaysPerGroup to 4 Recommendation from the nvmeof team Signed-off-by: Adam King <adking@redhat.com>	2024-04-22 08:48:15 -04:00
Christian Rohmann	090b8e17f1	Cleanup of variables, queries and tests to enable showMultiCluster=True Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via https://github.com/ceph/ceph/pull/54964 Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>	2024-04-22 08:29:37 +02:00
Aashish Sharma	c2f4aa7887	mgr/dashboard: replace deprecated table panel in grafana with a newer table panel Fixes: https://tracker.ceph.com/issues/65174 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2024-04-02 12:22:53 +05:30
cloudbehl	2df3ce1902	monitoring/ceph-mixin: Add cluster variable to ceph-cluster.json Fixes: https://tracker.ceph.com/issues/65218 Signed-off-by: cloudbehl <cloudbehl@gmail.com>	2024-03-29 13:29:33 +05:30
Brad Hubbard	7863d297ea	install-deps: Update Pyyaml version Move to 6.0.1 to overcome https://github.com/yaml/pyyaml/issues/601 Fixes: https://tracker.ceph.com/issues/63591 Signed-off-by: Brad Hubbard <bhubbard@redhat.com>	2024-03-07 14:13:11 +10:00
Nizamudeen A	278daa4ba2	Merge pull request #55574 from ceph/feature-multi-cluster-management-monitoring mgr/dashboard: introduce multi cluster management and monitoring in ceph dashboard Reviewed-by: Nizamudeen A <nia@redhat.com>	2024-03-06 10:26:29 +05:30
Nizamudeen A	b8811c844f	mgr/dashboard: introduce multi-cluster overview page https://tracker.ceph.com/issues/64530 Signed-off-by: Nizamudeen A <nia@redhat.com> Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2024-03-05 19:05:37 +05:30
Aashish Sharma	d9c92e562d	Merge pull request #55510 from pcuzner/add-nvmeof-alerts ceph-mixin: Update mixin to include alerts for the nvmeof gateway(s) Reviewed-by: Aashish Sharma <aasharma@redhat.com>	2024-02-29 10:36:58 +05:30
Aashish Sharma	6e5efb626f	mgr/dashboard: replace piechart plugin charts with native pie chart panel Fixes: https://tracker.ceph.com/issues/64579 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2024-02-27 14:20:31 +05:30
Paul Cuzner	c2534a6dba	ceph-mixins: Add test cases for nvmeof alerts Signed-off-by: Paul Cuzner <pcuzner@ibm.com>	2024-02-27 09:51:11 +13:00
Paul Cuzner	e7d25482d1	ceph-mixins: nvmeof alerts added Signed-off-by: Paul Cuzner <pcuzner@ibm.com>	2024-02-27 09:51:11 +13:00
Paul Cuzner	f1573b76f3	ceph-mixins: Add nvmeof alerts Signed-off-by: Paul Cuzner <pcuzner@ibm.com>	2024-02-27 09:51:04 +13:00
Paul Cuzner	feb1e69034	ceph-mixins: Add vars to support nvmeof alerts Signed-off-by: Paul Cuzner <pcuzner@ibm.com>	2024-02-26 11:15:15 +13:00
Aashish Sharma	495f669faf	mgr/dashboard: Add a manage clusters page to the multi-cluster nav to list/connect/disconnect/edit clusters in multi-cluster setup Fixes: https://tracker.ceph.com/issues/64530 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2024-02-22 10:42:01 +05:30
Aashish Sharma	a85baa89da	Merge pull request #55314 from cloudbehl/rgw-dashboard-json mgr/dashboard: Fixing RGW graph panels Reviewed-by: Aashish Sharma <aasharma@redhat.com>	2024-02-13 12:00:33 +05:30
Aashish Sharma	a572a0c167	mgr/dashboard: Add RGW per user/bucket panels in grafana Fixes: https://tracker.ceph.com/issues/64359 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2024-02-09 21:14:27 +05:30
Aashish Sharma	65e6714720	mgr/dashboards: add generated json files Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2024-02-07 14:27:31 +05:30
Guillaume Abrioux	76d8e0bbbf	monitoring: add new alerts This adds new hardware monitoring alerts. Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>	2024-01-25 14:43:30 +00:00
cloudbehl	191fda84b3	mgr/dashboard: Fixing RGW graph panels - Fixing grafana panels for rgw dashboards - Fixing RGW overview dashboard queries fixes https://tracker.ceph.com/issues/64177 Signed-off-by: cloudbehl <cloudbehl@gmail.com>	2024-01-25 18:41:03 +05:30
Aashish Sharma	2573426f54	mgr/dashboard: upgrade from old 'graph' type panels to the new 'timeseries' panel The graph panel type is deprecated, and disappears after Grafana v9.1 (current version is 10.0) to prevent more old type panels being created. These should be migrated to the timeseries panel type, to avoid potential problems with future Grafana versions. Fixes: https://tracker.ceph.com/issues/61720 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2023-12-22 11:19:40 +05:30
Nizamudeen A	a42e286fc0	Merge pull request #54355 from nobuto-m/info-rbd-stats-pools mgr/dashboard: info on why RBD graphs are empty Reviewed-by: Ankush Behl <cloudbehl@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2023-11-30 13:38:54 +05:30
Aashish Sharma	39fea8f71c	Merge pull request #51340 from Javlopez/feature/12087-upgrade-and-generate-grafana-dashboards monitoring: add new dashboards Fixes: https://tracker.ceph.com/issues/63592 Reviewed-by: Aashish Sharma <aasharma@redhat.com>	2023-11-20 11:33:07 +05:30
Aashish Sharma	70d8c5b565	Merge pull request #53650 from rhcs-dashboard/fix-62969-main mgr/dashboard: Show the OSDs Out and Down panels as red whenever an OSD is in Out or Down state in Ceph Cluster grafana dashboard Reviewed-by: Nizamudeen A <nia@redhat.com>	2023-11-17 11:24:45 +05:30
Nobuto Murata	9c026fa18c	mgr/dashboard: info on why RBD graphs are empty Those RBD IO statistics graphs are empty out of the box and it's on purpose. Instead of giving an impression that those graphs are broken, point users to a documentation explaining about optional steps to enable those statistics. https://docs.ceph.com/en/latest/mgr/prometheus/#rbd-io-statistics Signed-off-by: Nobuto Murata <nobuto.murata@canonical.com>	2023-11-06 15:50:50 +09:00
Aashish Sharma	88d0a9f45d	Merge pull request #53807 from rhcs-dashboard/fix-63088-main mgr/dashboard: Consider null values as zero in grafana panels Reviewed-by: Nizamudeen A <nia@redhat.com>	2023-10-25 13:01:03 +05:30
Javier	f0e8565b49	monitoring: update libsonnet files for generate ceph-cluster.json add ceph-cluster.libsonnet file to generate ceph-cluster.json Fixes: https://tracker.ceph.com/issues/61443 Signed-off-by: Javier <sjavierlopez@gmail.com>	2023-10-20 18:07:33 -06:00
Nizamudeen A	a5027e37ec	mgr/dashboard: fix broken alert generator Currently the alert generator is broken if you try to run `tox -ealerts-fix`. I fixed it and ran the command and it built a new json file as well. Signed-off-by: Nizamudeen A <nia@redhat.com>	2023-10-13 12:42:50 +05:30
Aashish Sharma	a29e6a8673	mgr/dashboard: Show the OSD's Out and Down panels as red whenever an OSD is in Out or Down state in Ceph Cluster grafana dashboard Fixes: https://tracker.ceph.com/issues/62969 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2023-10-11 11:46:03 +05:30
Juan Miguel Olmo	b7b7ef90f4	Merge pull request #50132 from aruniiird/add-rbd-mirror-mon-alerts ceph-mixin: Add RBD Mirror monitoring alerts	2023-10-10 13:37:01 +02:00
Aashish Sharma	6f3f58cb8e	mgr/dashboard: Consider null values as zero in grafana panels After upgrading from RHCS4 to RHCS5..some of the grafana charts broke. This is because in RHCS5 we do not generate the metrics if its value is zero as a result the null value from that metric breaks the grafana charts or graphs. This PR is to fix the above mentioned issue. Fixes: https://tracker.ceph.com/issues/63088 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2023-10-04 12:31:42 +05:30
Nizamudeen A	b5bf9d70cb	Merge pull request #52150 from paulreece42/wip-grafana-quorum-fix monitoring: grafana mons out of quorum should be count - sum Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2023-09-21 12:36:21 +05:30
Josh Soref	73479a1e05	dashboard: fix spelling errors * access * availability * dashboard * depth * dimless * evaluation * executing * existing * facts * gigabytes * idempotent * independent * initial * inventory * managed * must not * notification * notifications * orchestrator * previously * promises * purging * queried * repetitive * split * subdirectories * tenant * the * timestamp * transformed * unavailable * visibility * yourself Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>	2023-08-09 11:14:20 -04:00
Arun Kumar Mohan	5c21134064	ceph-mixin: add RBD Mirror monitoring alerts Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>	2023-08-09 12:19:04 +05:30
Arun Kumar Mohan	e9d803d608	ceph-mixin: fix manually edited 'prometheus_alerts.yml' file File 'prometheus_alerts.yml' file should not be edited directly. The changes should be added to 'prometheus_alerts.libsonnet' file (and/or any other appropriate lib/j sonnet files) and generated using 'make generate' command. Adding all the changes to 'prometheus_alerts.libsonnet' file and building/generating the prometheus_alerts YAML file. PS: all the changes seen in 'prometheus_alerts.yml' file is due to the re-arrangement of lines. The file remains same. Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>	2023-08-09 12:19:04 +05:30
Arun Kumar Mohan	86d040e2fc	ceph-mixin: fix ceph-mixin setup Made following changes to files, Makefile: Add needed 'tox' target to generate alert files Now we can do 'make generate' OR 'make test' to generate all the yaml files (and run tests) alerts.jsonnet: Added an 'import' line to include 'config.libsonnet' file. This fix the errors in generating 'prometheus_alerts.yml' file tox.ini: Added all the existing 'alerts-' targets to 'envlist' Added the missing 'alerts-test' target to 'testenv' Added 'jsonnet' to 'allowlist_externals', which prevents a deprecation waring A minor spell correction lint-jsonnet.sh: Made errors more verbose. Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>	2023-08-09 12:19:04 +05:30
Paul Reece	6ff02381a3	monitoring: grafana mons out of quorum should be count - sum not count / sum For example, with 3 mons total, all in quorum, original will do 3/3 = 1, showing 1 out of quorum (likely typo fix) Fixes: https://tracker.ceph.com/issues/61923 Signed-off-By: Paul Reece <paulreece42@gmail.com> fixing case sensitive Signed-off-by: Paul Reece <paulreece42@gmail.com>	2023-08-07 16:16:18 +00:00
Aashish Sharma	3063c8a4fb	Merge pull request #48783 from rhcs-dashboard/fix-grafana-stat-panel mgr/dashboard: Replace vonage-status-panel with native grafana stat panel Reviewed-by: Nizamudeen A <nia@redhat.com>	2023-02-08 18:34:30 +05:30
Pere Diaz Bou	8e07fbd2ea	Merge pull request #48843 from rhcs-dashboard/expose_slow_ops mgr/prometheus: expose daemon health metrics Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-12-20 12:25:32 +01:00
Pere Diaz Bou	5a2b7c25b6	mgr/prometheus: expose daemon health metrics Until now daemon health metrics were stored without being used. One of the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs which this commit tries to expose to bring fine grained metrics to find troublesome OSDs instead of having a lone healthcheck of slow ops in the whole cluster. Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-12-20 09:44:49 +01:00
Aashish Sharma	3e08b81b40	mgr/dashboard: Replace vonage-status-panel with native grafana stat panel Fixes: https://tracker.ceph.com/issues/58295 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-12-16 10:51:47 +05:30
Kefu Chai	34e2e33870	*: s/whitelist_externals/allowlist_externals/ as allowlist_externals was introduced in tox v4.0. see `5e33fda1a4` , but this option was backported to 3.18 as an alias of whitelist_externals, so we don't need to specify the minversion to 4.0 in this change. as we started using tox 4.0 and up (v4.0.2 in specific). tox complains and fails like: alerts-lint: failed with promtool is not allowed, use allowlist_externals to allow it alerts-lint: FAIL code 1 (9.25 seconds) see https://tox.wiki/en/latest/faq.html#tox-4-removed-tox-ini-keys and https://tox.wiki/en/latest/config.html#allowlist_externals it'd be nice to use a more inclusive language also. so, in this change, s/whitelist_externals/allowlist_externals/ in all tox.ini in this project. Signed-off-by: Kefu Chai <tchaikov@gmail.com>	2022-12-08 15:07:00 +08:00
Nizamudeen A	3f1c1b6376	Merge pull request #48526 from rhcs-dashboard/fix-cephPoolGrowth-alert mgr/dashboard: Fix CephPoolGrowthWarning alert Reviewed-by: Pegonzal <NOT@FOUND> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-11-29 18:29:01 +05:30
Aashish Sharma	97189b66af	mgr/dashboard: Fix CephPoolGrowthWarning alert Prometheus reports an error - many-to-many matching not allowed: matching labels must be unique on one side for CephPoolGrowthWarning if we have same pool ids on two different instances. Fixes: https://tracker.ceph.com/issues/58017 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-11-22 11:55:41 +05:30
Tatjana Dehler	08352b6540	ceph-mixing: fix ceph_hosts variable Do only use `instance` to query for hostnames in single-cluster-mode. Consider the cluster matcher only in multi-cluster-mode. In this case the query will look like: `"label_values({cluster=~\"$cluster\"}, instance)"`. Fixes: https://tracker.ceph.com/issues/57987 Signed-off-by: Tatjana Dehler <tdehler@suse.com>	2022-11-11 16:35:05 +01:00
Christian Kugler	4aecdad350	ceph-mixin: Add Prometheus Alert for Degraded Bond Currently there is no alert for a network interface card to be misconfigured or failed which is part of a network bond. This could lead to redundancies and performance being degraded unnoticed. To solve this, I use node exporter metrics to look at the number of total peers of the bond and the ones that are active. If the numbers differ, something is up and should be looked at. Fixes: https://tracker.ceph.com/issues/57962 Signed-off-by: Christian Kugler <syphdias+git@gmail.com>	2022-11-02 14:48:57 +01:00

1 2

86 Commits