RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-03 09:32:43 +00:00

Author	SHA1	Message	Date
Ernesto Puerta	458ad48024	Merge pull request #40715 from pcuzner/pool-overview-enhancement mgr/dashboard:include compression stats on pool dashboard Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2021-05-05 18:08:58 +02:00
Paul Cuzner	81788b1f21	mgr/dashboard:include compression stats on pool dashboard This is a replacement dashboard configuration for the pool overview page. It provides a cluster wide view of capacity consumed and compression effectiveness, and breaks this down by each pool within the configuration. Fixes: https://tracker.ceph.com/issues/50226 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2021-05-03 12:26:06 +12:00
Ernesto Puerta	381685f17f	Merge pull request #40072 from wornet-mwo/dashboard--grafana-hostname-corrections mgr/dashboard: Fixed name clash when hostname similar to another Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: p-se <NOT@FOUND>	2021-04-29 19:40:57 +02:00
Michael Wodniok	e97e27ebdb	dashboard: Fixed name clash when hostname similar to anaother Fixes: #49769 Signed-off-by: Michael Wodniok <wodniok@wor.net>	2021-04-27 08:42:59 +02:00
Malcolm Holmes	382e293656	monitoring/grafana: Remove erroneous elements in hosts-overview Grafana dashboard The hosts-overview Grafana dashboard json file contains a repeated element, making it invalid JSON. Some JSON parsers handle this. However, this prevents Jsonnet from parsing the dashboard, which prevents the deployment of this dashboard via Jsonnet. Fixes: https://tracker.ceph.com/issues/50410 Signed-off-by: Malcolm Holmes <mdh@odoko.co.uk>	2021-04-17 23:11:48 +01:00
Aashish Sharma	8d2f39e6c5	mgr/dashboard:Simplify some complex calculations in test_alerts.yml run-promtool-unittests is failing with difference in floating point values in some complex calculations. This PR intends to simplify those calculations and fix this issue. Fixes: https://tracker.ceph.com/issues/49952 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2021-03-25 12:05:07 +05:30
Aashish Sharma	53a5816ded	mgr/dashboard:test prometheus rules through promtool This PR intends to add unit testing for prometheus rules using promtool. To run the tests run 'run-promtool-unittests.sh' file. Fixes: https://tracker.ceph.com/issues/45415 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2021-03-08 10:16:22 +05:30
Ernesto Puerta	dff5b78d3b	Merge pull request #39462 from rhcs-dashboard/fix-alerts-mtuMismatch mgr/dashboard: fix MTU Mismatch alert Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2021-02-17 14:14:17 +01:00
Ernesto Puerta	e2d73297cf	Merge pull request #38030 from p-se/prom-alert-package-drops-leeway mgr/dashboard: prometheus alerting: add some leeway for package drops and errors Reviewed-by: Stephan Müller <smueller@suse.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2021-02-16 20:45:44 +01:00
Patrick Seidensal	9ac248b0c3	mgr/dashboard: prometheus alerting: add some leeway for package drops and errors (1%) Fixes: https://tracker.ceph.com/issues/48201 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2021-02-16 14:43:00 +01:00
Aashish Sharma	8527489b91	mgr/dashboard:fix MTU Mismatch alert This PR intends to fix the expression used for MTU Mismatch alert in prometheus Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2021-02-15 10:13:39 +05:30
Aashish Sharma	06cc0d8743	mgr/dashboard: trigger alert if some nodes have a MTU different than the median value This PR intends to alert a user if a specific network is configured with a custom MTU Fixes: https://tracker.ceph.com/issues/48748 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2021-01-22 11:20:13 +05:30
Alfonso Martínez	9441fda4dc	mgr/dashboard/monitoring: upgrade Grafana version due to CVE-2020-13379 Fixes: https://tracker.ceph.com/issues/48685 Signed-off-by: Alfonso Martínez <almartin@redhat.com>	2021-01-07 16:53:26 +01:00
Kefu Chai	30487c755c	Merge pull request #38282 from vosdev/ceph-pool-alert mgr/prometheus: Fix 'pool filling up' with >50% usage Reviewed-by: Patrick Seidensal <pseidensal@suse.com>	2020-12-12 12:10:44 +08:00
Daniël Vos	79568d51c6	mgr/prometheus: Fix 'pool filling up' with >50% usage Fixes: https://tracker.ceph.com/issues/48354 Signed-off-by: Daniël Vos <danielvos@outlook.com>	2020-12-01 16:31:09 +01:00
haoyixing	0e7e036aa7	doc/dev: use http://docs.ceph.com/en/latest/ instead of /docs/master/ for docs Several links under http://docs.ceph.com/docs/master/ were unable to access. Change them to http://docs.ceph.com/en/lastest so we can access them directly. Signed-off-by: haoyixing <haoyixing@kuaishou.com>	2020-11-24 12:49:47 +08:00
Paul Cuzner	2010432b50	mgr/prometheus: Add healthcheck metric for SLOW_OPS SLOW_OPS is triggered by op tracker, and generates a health alert but healthchecks do not create metrics for prometheus to use as alert triggers. This change adds SLOW_OPS metric, and provides a simple means to extend to other relevant health checks in the future If the extract of the value from the health check message fails we log an error and remove the metric from the metric set. In addition the metric description has changed to better reflect the scenarios where SLOW_OPS can be triggered. Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2020-11-02 15:30:49 +13:00
Seena Fallah	0fd28f646c	monitoring: Use null yaxes min for OSD read latency According to seriesOverrides that negative-Y for read param there shouldn't be a minimum for yaxes Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2020-10-12 19:56:18 +03:30
Patrick Seidensal	fe64b9d176	mgr/dashboard: Fix many-to-many issue in host-details dashboard The labels on one side do not match the labels of the other side, where a label_replace is used. The fix uses the same label_replace on the missing side. Fixes: https://tracker.ceph.com/issues/47334 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-09-07 12:37:40 +02:00
Avan Thakkar	f039e5585d	mgr/dashboard: cpu stats incorrectly displayed Fixes: https://tracker.ceph.com/issues/46683 Signed-off-by: Avan Thakkar <athakkar@redhat.com>	2020-07-23 11:57:32 +05:30
pcuzner	0021dd278b	Merge pull request #35610 from pcuzner/wip-grafana-container monitoring: add grafana container build file	2020-07-06 13:06:55 +12:00
Lenz Grimmer	399521d66b	Merge pull request #34532 from rhcs-dashboard/wip-45068-fix-parse-error mgr/dashboard: Prometheus query error in the metrics of Pools, OSDs and RBD images Reviewed-by: Alfonso Martínez <almartin@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Volker Theile <vtheile@suse.com>	2020-06-30 10:50:59 +02:00
Paul Cuzner	3c813729dc	monitoring:add grafama container build file This commit provides the Makefile to create the ceph-grafana containers for nautilus, octopus and master releases. Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2020-06-17 17:20:45 +12:00
Kiefer Chang	b963b7fbe9	monitoring: fixing some issues in RBD detail dashboard - Exchange read/write legends in The `I/O Bytes per second` panel. - Rename `I/O Bytes per second` to `Throughput`. - Rename `IOPS Count` to just `IOPS`. - Remove instance name from legends. - Fixes typos: `Averange` -> `Average`. Fixes: https://tracker.ceph.com/issues/45735 Signed-off-by: Kiefer Chang <kiefer.chang@suse.com>	2020-05-28 14:49:31 +08:00
Alfonso Martínez	cf4ff7d2f0	mgr/dashboard: grafana panels for rgw multisite sync performance * RGW sync perf. counters are now exposed through grafana panels. * Sync Performance tab is only shown if rgw realm is detected. * Prometheus module: added metrics suitable for prometheus consumption (from existing ones, not replacing for backward compatibility). Fixes: https://tracker.ceph.com/issues/45310 Signed-off-by: Alfonso Martínez <almartin@redhat.com>	2020-05-22 13:36:10 +02:00
Benoît Knecht	653c3f6682	monitoring: Fix "10% OSDs down" alert description The alert was triggered when less than 90% of OSDs were _up_, but then the description took that value and described it as the percentage of OSDs being _down_. So with 12% of OSDs down, the alert description would read: ``` 88% or 88 of 100 OSDs are down (>=10%). ``` which can be panic-inducing. This commit changes the alert expression to actually compute the ratio of OSDs being down, which makes the correct value appear in the description. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-05-06 18:49:26 +02:00
Lenz Grimmer	9334471340	Merge pull request #33991 from SchoolGuy/monitoring/rbd-image-details mgr/dashboard/grafana: Add rbd-image details dashboard Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Jan Fajerski <jfajerski@suse.com> Reviewed-by: Laura Paduano <lpaduano@suse.com> Reviewed-by: Patrick Seidensal <pnawracay@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>	2020-05-04 09:59:53 +02:00
Enno Gotthold	dfb1e0020e	mgr/dashboard: Remove additional unneeded steps for the metrics calculation Signed-off-by: Enno Gotthold <egotthold@suse.de>	2020-04-28 13:34:16 +02:00
Ernesto Puerta	3fd804f10b	monitoring: fix decimal precision in Grafana % Set decimal precision to 2 positions for charts using percentunits. Fixes: https://tracker.ceph.com/issues/45183 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>	2020-04-22 13:39:16 +02:00
Avan Thakkar	47b515c094	mgr/dashboard: Prometheus query error in the metrics of Pools, OSDs and RBD images Fixes: https://tracker.ceph.com/issues/45068 Signed-off-by: Avan Thakkar <athakkar@redhat.com>	2020-04-21 23:03:09 +05:30
Volker Theile	e197e4d7f4	monitoring: alert for pool fill up broken Fixes: https://tracker.ceph.com/issues/44991 Signed-off-by: Volker Theile <vtheile@suse.com>	2020-04-08 15:02:45 +02:00
Volker Theile	a5ade11a31	Merge pull request #34239 from p-se/wip-pse-fix-false-root-vol-full-alert monitoring: root volume full alert fires false positives Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Jan Fajerski <jfajerski@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>	2020-04-06 14:17:17 +02:00
Lenz Grimmer	b6ad9a804b	Merge pull request #34240 from krig/grafana-dashboards-fixes mgr/dashboard: Repair broken grafana panels Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Stephan Müller <smueller@suse.com>	2020-04-06 10:55:20 +02:00
Patrick Seidensal	6935dc5592	monitoring: alert for prediction of disk and pool fill up broken Fixes: https://tracker.ceph.com/issues/44776 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-03-27 13:44:28 +01:00
Kristoffer Grönlund	b7abaab5bd	dashboard: Convert FQDN to hostname in grafana panels The $ceph_hosts variable contained the FQDN for hosts while the instance label created by ceph only has the hostname. Fixes: https://tracker.ceph.com/issues/44784 Signed-off-by: Kristoffer Grönlund <kgronlund@suse.com>	2020-03-27 12:33:15 +01:00
Kristoffer Grönlund	136d21e21d	dashboard: Resolve FQDN / hostname mismatch in hosts overview panel In the AVG Disk Utilization panel, the result is calculated by combining the output of node_disk_io_time_seconds_total with the output of ceph_disk_occupation. However, the first vector encodes the instance label with the full FQDN while the ceph label only contains the hostname:port. In order for these to match correctly, the domain name and port has to be stripped from the labels. Fixes: https://tracker.ceph.com/issues/44784 Signed-off-by: Kristoffer Grönlund <kgronlund@suse.com>	2020-03-27 12:33:09 +01:00
Kristoffer Grönlund	8b61b8d3d7	dashboard: Use exported_instance to identify OSDs When moving to LVM-based ceph-volume setups, several grafana dashboards stopped working. The problem is that (device, instance) no longer results in unique labels which causes errors like: "many-to-many matching not allowed: matching labels must be unique on one side" Fixes: https://tracker.ceph.com/issues/44784 Signed-off-by: Kristoffer Grönlund <kgronlund@suse.com>	2020-03-27 12:33:01 +01:00
Kristoffer Grönlund	4444333243	dashboard: AVG RAM Utilization panel always showed "N/A" The references to `$osd_hosts` etc. were encoded as `[[osd_hosts]]` in the PromQL expression divisor, and the panel always displayed N/A as the result of the query. Replacing the `[[...]]` with `$...` makes the expression work again. Fixes: https://tracker.ceph.com/issues/44784 Signed-off-by: Kristoffer Grönlund <kgronlund@suse.com>	2020-03-27 12:32:52 +01:00
Patrick Seidensal	f8e347f771	monitoring: root volume full alert fires false positives Fixes: https://tracker.ceph.com/issues/44780 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-03-27 11:06:08 +01:00
Kefu Chai	a12f9f19e0	Merge pull request #32749 from james58899/fix-capacity monitoring: Fix pool capacity incorrect Reviewed-by: Jan Fajerski <jfajerski@suse.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com>	2020-03-27 16:13:29 +08:00
Enno Gotthold	9707cb30cb	mgr/dashboard: Add grafana chart for rbd image details Fixes: https://tracker.ceph.com/issues/44623 Signed-off-by: Enno Gotthold <egotthold@suse.de> This dashboard will per default be empty as the already existing dashboard with the summary for all rbd images.	2020-03-26 08:21:30 +01:00
Alfonso Martínez	1f0cddfafc	monitoring: fix RGW grafana chart 'Average GET/PUT Latencies' Fixes: https://tracker.ceph.com/issues/44538 Signed-off-by: Alfonso Martínez <almartin@redhat.com>	2020-03-10 12:05:26 +01:00
Patrick Seidensal	1794b55e64	monitoring: restore lost `pool full` alert Fixes: https://tracker.ceph.com/issues/44366 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-03-02 11:43:03 +01:00
James Cheng	1b980ef88c	monitoring: Fix pool capacity incorrect Signed-off-by: James Cheng <james59988@gmail.com>	2020-02-18 19:19:13 +08:00
Avan Thakkar	dd8cb9d2d6	mgr/dashboard: UI fixes Fixes: https://tracker.ceph.com/issues/42914 Signed-off-by: Avan Thakkar <athakkar@redhat.com>	2020-02-10 22:57:57 +05:30
Aleksei Zakharov	a37cf380ad	mgr/grafana: sum pg states for cluster Also, revert table formatting. Signed-off-by: Aleksei Zakharov <zaharov@selectel.ru>	2020-01-29 17:28:36 +03:00
Aleksei Zakharov	4eb58f7ccc	monitoring/grafana,prometheus: add per-pool pg states support Signed-off-by: Aleksei Zakharov <zaharov@selectel.ru>	2020-01-29 17:28:36 +03:00
Patrick Seidensal	fb51c589b5	monitoring: add details to Prometheus' alerts Fixes: https://tracker.ceph.com/issues/43764 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-01-24 14:21:31 +01:00
Jan Fajerski	e098536acc	Merge pull request #32325 from Kriechi/fix-42982 monitoring: fix prometheus alert for full pools	2020-01-20 10:42:36 +01:00
Bryan Stillwell	8eafb09acb	Switch spelling of utilization Prefer the non-British spelling of utilization since that's what the majority of the code base seems to use. Signed-off-by: Bryan Stillwell <bstillwell@godaddy.com>	2020-01-07 16:57:36 -07:00

1 2

84 Commits