RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-28 14:34:13 +00:00

Author	SHA1	Message	Date
Volker Theile	a5ade11a31	Merge pull request #34239 from p-se/wip-pse-fix-false-root-vol-full-alert monitoring: root volume full alert fires false positives Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Jan Fajerski <jfajerski@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>	2020-04-06 14:17:17 +02:00
Lenz Grimmer	b6ad9a804b	Merge pull request #34240 from krig/grafana-dashboards-fixes mgr/dashboard: Repair broken grafana panels Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Stephan Müller <smueller@suse.com>	2020-04-06 10:55:20 +02:00
Patrick Seidensal	6935dc5592	monitoring: alert for prediction of disk and pool fill up broken Fixes: https://tracker.ceph.com/issues/44776 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-03-27 13:44:28 +01:00
Kristoffer Grönlund	b7abaab5bd	dashboard: Convert FQDN to hostname in grafana panels The $ceph_hosts variable contained the FQDN for hosts while the instance label created by ceph only has the hostname. Fixes: https://tracker.ceph.com/issues/44784 Signed-off-by: Kristoffer Grönlund <kgronlund@suse.com>	2020-03-27 12:33:15 +01:00
Kristoffer Grönlund	136d21e21d	dashboard: Resolve FQDN / hostname mismatch in hosts overview panel In the AVG Disk Utilization panel, the result is calculated by combining the output of node_disk_io_time_seconds_total with the output of ceph_disk_occupation. However, the first vector encodes the instance label with the full FQDN while the ceph label only contains the hostname:port. In order for these to match correctly, the domain name and port has to be stripped from the labels. Fixes: https://tracker.ceph.com/issues/44784 Signed-off-by: Kristoffer Grönlund <kgronlund@suse.com>	2020-03-27 12:33:09 +01:00
Kristoffer Grönlund	8b61b8d3d7	dashboard: Use exported_instance to identify OSDs When moving to LVM-based ceph-volume setups, several grafana dashboards stopped working. The problem is that (device, instance) no longer results in unique labels which causes errors like: "many-to-many matching not allowed: matching labels must be unique on one side" Fixes: https://tracker.ceph.com/issues/44784 Signed-off-by: Kristoffer Grönlund <kgronlund@suse.com>	2020-03-27 12:33:01 +01:00
Kristoffer Grönlund	4444333243	dashboard: AVG RAM Utilization panel always showed "N/A" The references to `$osd_hosts` etc. were encoded as `[[osd_hosts]]` in the PromQL expression divisor, and the panel always displayed N/A as the result of the query. Replacing the `[[...]]` with `$...` makes the expression work again. Fixes: https://tracker.ceph.com/issues/44784 Signed-off-by: Kristoffer Grönlund <kgronlund@suse.com>	2020-03-27 12:32:52 +01:00
Patrick Seidensal	f8e347f771	monitoring: root volume full alert fires false positives Fixes: https://tracker.ceph.com/issues/44780 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-03-27 11:06:08 +01:00
Kefu Chai	a12f9f19e0	Merge pull request #32749 from james58899/fix-capacity monitoring: Fix pool capacity incorrect Reviewed-by: Jan Fajerski <jfajerski@suse.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com>	2020-03-27 16:13:29 +08:00
Alfonso Martínez	1f0cddfafc	monitoring: fix RGW grafana chart 'Average GET/PUT Latencies' Fixes: https://tracker.ceph.com/issues/44538 Signed-off-by: Alfonso Martínez <almartin@redhat.com>	2020-03-10 12:05:26 +01:00
Patrick Seidensal	1794b55e64	monitoring: restore lost `pool full` alert Fixes: https://tracker.ceph.com/issues/44366 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-03-02 11:43:03 +01:00
James Cheng	1b980ef88c	monitoring: Fix pool capacity incorrect Signed-off-by: James Cheng <james59988@gmail.com>	2020-02-18 19:19:13 +08:00
Avan Thakkar	dd8cb9d2d6	mgr/dashboard: UI fixes Fixes: https://tracker.ceph.com/issues/42914 Signed-off-by: Avan Thakkar <athakkar@redhat.com>	2020-02-10 22:57:57 +05:30
Aleksei Zakharov	a37cf380ad	mgr/grafana: sum pg states for cluster Also, revert table formatting. Signed-off-by: Aleksei Zakharov <zaharov@selectel.ru>	2020-01-29 17:28:36 +03:00
Aleksei Zakharov	4eb58f7ccc	monitoring/grafana,prometheus: add per-pool pg states support Signed-off-by: Aleksei Zakharov <zaharov@selectel.ru>	2020-01-29 17:28:36 +03:00
Patrick Seidensal	fb51c589b5	monitoring: add details to Prometheus' alerts Fixes: https://tracker.ceph.com/issues/43764 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2020-01-24 14:21:31 +01:00
Jan Fajerski	e098536acc	Merge pull request #32325 from Kriechi/fix-42982 monitoring: fix prometheus alert for full pools	2020-01-20 10:42:36 +01:00
Bryan Stillwell	8eafb09acb	Switch spelling of utilization Prefer the non-British spelling of utilization since that's what the majority of the code base seems to use. Signed-off-by: Bryan Stillwell <bstillwell@godaddy.com>	2020-01-07 16:57:36 -07:00
Thomas Kriechbaumer	9abddc0dd3	monitoring: fix prometheus alert for full pools The existing alert (introduced via https://tracker.ceph.com/issues/24977) already triggers when still 50% of storage space are available. Fixes: https://tracker.ceph.com/issues/42982 Signed-off-by: Thomas Kriechbaumer <thomas@kriechbaumer.name>	2019-12-18 15:04:51 +01:00
Lenz Grimmer	11a1708e19	mgr/dashboard: grafana charts match time picker selection. (#31964 ) mgr/dashboard: grafana charts match time picker selection. Reviewed-by: Jan Fajerski <jfajerski@suse.com> Reviewed-by: Laura Paduano <lpaduano@suse.com> Reviewed-by: Patrick Seidensal <pnawracay@suse.com>	2019-12-03 17:09:00 +00:00
Alfonso Martínez	5ba114330e	mgr/dashboard: grafana charts match time picker selection. Fixes: https://tracker.ceph.com/issues/43097 Signed-off-by: Alfonso Martínez <almartin@redhat.com>	2019-12-03 14:15:10 +01:00
Ernesto Puerta	1182073f0c	mgr/dashboard,grafana: remove shortcut menu Remove shortcut menu (links) and add check in grafana CI script. Fixes: https://tracker.ceph.com/issues/43091 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>	2019-12-03 10:21:35 +01:00
Patrick Seidensal	d262adeb21	monitoring: fix indentation of ceph default alerts Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2019-11-18 12:40:55 +01:00
Patrick Seidensal	e923af3430	monitoring: wait before firing osd full alert Fixes: https://tracker.ceph.com/issues/42862 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2019-11-18 12:39:27 +01:00
Radu Toader	3beaf63761	mgr/dashboard: fix grafana dashboards Fixes: https://tracker.ceph.com/issues/42542 Sort order was wrong for some dashboards, fixed empty / buggy Top 3 clients IOPS by pool / Throughput - in Pools Overall performance fixed Avg utilization Multiple series found - in Host Overall performance Fixed invalid dimensions for plot - in OSD Overall performance Signed-off-by: Radu Toader <radu.m.toader@gmail.com>	2019-10-30 11:03:03 +02:00
Volker Theile	8e6838c740	monitoring: SNMP OID per every Prometheus alert rule Use the Ceph enterprise OID 50495 (https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers) and create OIDs for every Prometheus alert rule according to the schema at https://github.com/SUSE/prometheus-webhook-snmp/blob/master/README.md. Example OID: 1.3.6.1.4.1.50495.15.1.2.2.1 All alert rule OIDs are located below the object identifier 15 (15 for p which is the first character of prometheus). Check out the MIB at https://github.com/SUSE/prometheus-webhook-snmp/blob/master/PROMETHEUS-ALERT-CEPH-MIB.txt for more details. Signed-off-by: Volker Theile <vtheile@suse.com>	2019-05-28 09:59:50 +02:00
Jan Fajerski	e7a4437fdc	monitoring: update Grafana dashboards Fix various panels that used outdated metric names, cluncky or unnecessary label_replace calls. Also unify the style of many panels. Fixes: http://tracker.ceph.com/issues/39652 Signed-off-by: Jan Fajerski <jfajerski@suse.com>	2019-05-14 13:47:55 +02:00
Jan Fajerski	c0e58bd8ae	monitoring: add a few prometheus alerts Alerts are from https://github.com/SUSE/DeepSea/blob/SES5/srv/salt/ceph/monitoring/prometheus/files/ses_default_alerts.yml but updated for the mgr module and node_exporter >= 0.15. Signed-off-by: Jan Fajerski <jfajerski@suse.com>	2019-04-26 11:21:39 +02:00
Jan Fajerski	287e209351	monitoring/grafana: fix typo in README Signed-off-by: Jan Fajerski <jfajerski@suse.com>	2019-04-16 14:19:51 +02:00
Neha Gupta	739fdbad37	mgr/dashboard: Fixed performance details context for host list row selection Fixes: http://tracker.ceph.com/issues/37854 Signed-off-by: Neha Gupta <gnehapk@gmail.com>	2019-01-18 13:36:49 +09:00
Jason Dillaman	f4ac899950	monitoring/grafana: new RBD overview dashboard page This page pulls RBD stats from the Natuatilus prometheus exporter. Signed-off-by: Jason Dillaman <dillaman@redhat.com>	2019-01-11 16:41:46 -05:00
Boris Ranto	1ade714910	cmake: Support grafana dashboard installation We are currently hosting the grafana dashboards in our repo but we do not install them. This patch adds the cmake support. Signed-off-by: Boris Ranto <branto@redhat.com>	2018-10-25 17:09:02 +02:00
Lenz Grimmer	94aefee3b0	Merge pull request #24314 from rhcs-dashboard/dashboards mgr/dashboard: Grafana dashboard updates and additions Reviewed-by: Boris Ranto <branto@redhat.com>	2018-10-19 12:42:23 +02:00
Paul Cuzner	a848411bd8	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	a99618ce41	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	b64289ca3d	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	5432470914	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	bc5eea09c8	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	ba1a3b3a09	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	f97fee3a83	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	02b5414d19	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	7c04098e68	MGR/dashboard: make grafana datasource selectable Grafana dashboard updated to use a templating variable for the datasource Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	2c346efd12	Fix linewidth issue in pools overview dashboard Linewidth was set to two, but the idea is that a linewidth of >1 is reserved for eye-catcher plot lines like maximums Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	b84f0ce45f	Refresh of the dashboards Fixes some minor anomalies and tested against node_exporter 0.15 and 0.16 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	7d97bb28a8	Updated requirements information Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	0e655f8400	Added new Overview dashboards These new dashboard definitions provide the high level views for the hosts in the cluster and the OSDs. Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	4292a7a357	Screenshots added for all dashboards Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	3c7c32f2ed	Add Host level details dashboard The host-details.json file provides a view of host level metrics. The panels are arranged in two rows; Overview : Cpu/RAM/Network related stats OSD Performance: OSD physical drive stats The overview row is shown by default. Click on the OSD Performance row to show the remaining graphs Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00
Paul Cuzner	a0d9325c4d	Document the current state of the dashboards Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:26:08 +13:00
Paul Cuzner	8ebf2ede7f	Initial grafana dashboard definitions Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-10-09 08:23:39 +13:00

1 2

52 Commits