RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-19 01:46:00 +00:00

Author	SHA1	Message	Date
Josh Soref	73479a1e05	dashboard: fix spelling errors * access * availability * dashboard * depth * dimless * evaluation * executing * existing * facts * gigabytes * idempotent * independent * initial * inventory * managed * must not * notification * notifications * orchestrator * previously * promises * purging * queried * repetitive * split * subdirectories * tenant * the * timestamp * transformed * unavailable * visibility * yourself Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>	2023-08-09 11:14:20 -04:00
Aashish Sharma	3063c8a4fb	Merge pull request #48783 from rhcs-dashboard/fix-grafana-stat-panel mgr/dashboard: Replace vonage-status-panel with native grafana stat panel Reviewed-by: Nizamudeen A <nia@redhat.com>	2023-02-08 18:34:30 +05:30
Pere Diaz Bou	8e07fbd2ea	Merge pull request #48843 from rhcs-dashboard/expose_slow_ops mgr/prometheus: expose daemon health metrics Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-12-20 12:25:32 +01:00
Pere Diaz Bou	5a2b7c25b6	mgr/prometheus: expose daemon health metrics Until now daemon health metrics were stored without being used. One of the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs which this commit tries to expose to bring fine grained metrics to find troublesome OSDs instead of having a lone healthcheck of slow ops in the whole cluster. Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-12-20 09:44:49 +01:00
Aashish Sharma	3e08b81b40	mgr/dashboard: Replace vonage-status-panel with native grafana stat panel Fixes: https://tracker.ceph.com/issues/58295 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-12-16 10:51:47 +05:30
Kefu Chai	34e2e33870	*: s/whitelist_externals/allowlist_externals/ as allowlist_externals was introduced in tox v4.0. see `5e33fda1a4` , but this option was backported to 3.18 as an alias of whitelist_externals, so we don't need to specify the minversion to 4.0 in this change. as we started using tox 4.0 and up (v4.0.2 in specific). tox complains and fails like: alerts-lint: failed with promtool is not allowed, use allowlist_externals to allow it alerts-lint: FAIL code 1 (9.25 seconds) see https://tox.wiki/en/latest/faq.html#tox-4-removed-tox-ini-keys and https://tox.wiki/en/latest/config.html#allowlist_externals it'd be nice to use a more inclusive language also. so, in this change, s/whitelist_externals/allowlist_externals/ in all tox.ini in this project. Signed-off-by: Kefu Chai <tchaikov@gmail.com>	2022-12-08 15:07:00 +08:00
Nizamudeen A	3f1c1b6376	Merge pull request #48526 from rhcs-dashboard/fix-cephPoolGrowth-alert mgr/dashboard: Fix CephPoolGrowthWarning alert Reviewed-by: Pegonzal <NOT@FOUND> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-11-29 18:29:01 +05:30
Aashish Sharma	97189b66af	mgr/dashboard: Fix CephPoolGrowthWarning alert Prometheus reports an error - many-to-many matching not allowed: matching labels must be unique on one side for CephPoolGrowthWarning if we have same pool ids on two different instances. Fixes: https://tracker.ceph.com/issues/58017 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-11-22 11:55:41 +05:30
Tatjana Dehler	08352b6540	ceph-mixing: fix ceph_hosts variable Do only use `instance` to query for hostnames in single-cluster-mode. Consider the cluster matcher only in multi-cluster-mode. In this case the query will look like: `"label_values({cluster=~\"$cluster\"}, instance)"`. Fixes: https://tracker.ceph.com/issues/57987 Signed-off-by: Tatjana Dehler <tdehler@suse.com>	2022-11-11 16:35:05 +01:00
Christian Kugler	4aecdad350	ceph-mixin: Add Prometheus Alert for Degraded Bond Currently there is no alert for a network interface card to be misconfigured or failed which is part of a network bond. This could lead to redundancies and performance being degraded unnoticed. To solve this, I use node exporter metrics to look at the number of total peers of the bond and the ones that are active. If the numbers differ, something is up and should be looked at. Fixes: https://tracker.ceph.com/issues/57962 Signed-off-by: Christian Kugler <syphdias+git@gmail.com>	2022-11-02 14:48:57 +01:00
zdover23	23aa2be306	Merge pull request #47305 from zdover23/wip-doc-2022-07-25-pr4600-cleanup doc/monitoring: add min vers of apps in mon stack Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com>	2022-09-13 13:44:43 +10:00
Nizamudeen A	d84a03e989	Merge pull request #47700 from s0nea/wip-rgw-overview-labels monitoring/ceph-mixin: add RGW host to label info Reviewed-by: MrFreezeex <NOT@FOUND> Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-09-09 17:36:40 +05:30
Tatjana Dehler	15fa97d49d	monitoring/ceph-mixin: add RGW host to label info Add the missing information about the RGW instance to the labels of the "Average GET/PUT Latencies" panel on the "RGW Overview" dashboard. Fixes: https://tracker.ceph.com/issues/57166 Signed-off-by: Tatjana Dehler <tdehler@suse.com>	2022-09-06 16:19:19 +02:00
Zac Dover	367695f5b0	doc/monitoring: add min vers of apps in mon stack https://tracker.ceph.com/issues/45447 This PR adds recommended versions of grafana and prometheus and alert manager. This PR is a second attempt at getting the information in the following PR into the docs: https://github.com/ceph/ceph/pull/46000/files Himadri Maheshwari deserves the credit for the work in this commit. Signed-off-by: Zac Dover <zac.dover@gmail.com> Signed-off-by: Himadri Maheshwari <himadri.maheshwari7915@gmail.com>	2022-09-05 07:36:52 +10:00
Arthur Outhenin-Chalandre	f744a93ef1	Merge pull request #47707 from bosc0/fix_alert Ceph-mixin: Fix CephNodeNetworkPacket alerts	2022-08-30 12:49:23 +02:00
Arthur Outhenin-Chalandre	4909e795c9	Merge pull request #47669 from MrFreezeex/jb-path ceph-mixin: fix PATH issues with jsonnet-bundler	2022-08-30 08:35:04 +02:00
Aswin Toni	351e1ac639	ceph-mixin: fix CephNodeNetworkPacket alerts Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-23 15:26:52 +02:00
Tatjana Dehler	42ff9370a0	monitoring/ceph-mixin: add entries to envlist Add the missing entries `jsonnet-bundler-install` and `jsonnet-bundler-update` to envlist. Signed-off-by: Tatjana Dehler <tdehler@suse.com>	2022-08-19 12:08:56 +02:00
Aswin Toni	35183140f6	ceph-mixin: fix config inheritance Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-18 16:21:36 +02:00
Arthur Outhenin-Chalandre	d46e14c71b	ceph-mixin: fix PATH issues with jsonnet-bundler In `4a3afcf`, the $PATH is set for the test, but we cannot set multiple properties with a single `set_property()` cmake command. We fix that by adding the installation path of jsonnet-bundler (CMAKE_CURRENT_BINARY_DIR) to the $PATH used for every tox test. Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch> Co-Authored-By: Kefu Chai <tchaikov@gmail.com>	2022-08-18 13:43:34 +02:00
Aswin Toni	2e0e684fc2	ceph-mixin: Remove jsonnet building Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-17 12:08:56 +02:00
Aswin Toni	5cdc1c62c5	prometheus: add multicluster support to alerts Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-17 12:08:56 +02:00
Kefu Chai	4a3afcf277	cmake: set $PATH for tests using jsonnet tools otherwise they would not able to find executables installed into ${CMAKE_CURRENT_BINARY_DIR}. Signed-off-by: Kefu Chai <tchaikov@gmail.com>	2022-08-16 10:53:29 +08:00
Nizamudeen A	e9d361f621	Merge pull request #47334 from s0nea/wip-osd-objectstore-types-fix monitoring/ceph-mixin: OSD overview typo fix Reviewed-by: MrFreezeex <NOT@FOUND> Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-08-01 13:47:03 +05:30
Anthony D'Atri	9b65974468	monitoring/ceph-mixin: clean up prometheus_alerts.yml Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>	2022-07-28 19:17:51 -07:00
Tatjana Dehler	8faaca2082	monitoring/ceph-mixin: OSD overview typo fix Correct a wrongly set bracket on ceph-dashboard -> OSD Overview -> OSD Objectstore Types resulting in a parser error. Fixes: https://tracker.ceph.com/issues/56948 Signed-off-by: Tatjana Dehler <tdehler@suse.com>	2022-07-28 15:15:32 +02:00
Arthur Outhenin-Chalandre	37add644d1	ceph-mixin: remove timepicker override in every dashboards Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-24 11:54:26 +02:00
Arthur Outhenin-Chalandre	5db37300fd	ceph-mixin: rationalize local helper functions to utils Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-24 11:50:49 +02:00
Arthur Outhenin-Chalandre	0b7cc6bc99	ceph-mixin: fix typos Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-18 10:02:54 +02:00
Arthur Outhenin-Chalandre	c8f086c182	ceph-mixin: fix test with rate and label changes Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-17 09:42:29 +02:00
Arthur Outhenin-Chalandre	3b6356c872	ceph-mixin: don't add cluster matcher if showcluster is disabled Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-17 09:41:21 +02:00
Arthur Outhenin-Chalandre	fd4f484d22	ceph-mixin: refactor the structure of _config and utils Before this refactor we couln't override the config externally. Now the _config is correctly propagated and not only taken from the config.libsonnet file. Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-16 15:26:56 +02:00
Arthur Outhenin-Chalandre	4595e9af23	ceph-mixin: fix makefile dashboards dependency Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-16 15:26:55 +02:00
Arthur Outhenin-Chalandre	faeea8d165	ceph-mixin: fix linting issue and add cluster template support Fix most of the issues reported by dashboards-linter: - Add matcher/template for job (and also cluster) - use $__rate_interval everywhere Also this change all the irate functions to rate as most of irate where not actually used correctly. While using irate on graph for instance you can easily miss some of the metrics values as irate only take the two last values and the query steps can be quite large if you want a graph for a few hours/a day or more. Fixes: https://tracker.ceph.com/issues/55003 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch> ceph-mixin: add config with matchers and tags Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-16 15:26:53 +02:00
Arthur Outhenin-Chalandre	1452311a9b	ceph-mixin: rewrite promql queries to multiline Fixes: https://tracker.ceph.com/issues/55005 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-04-27 17:55:52 +02:00
Ernesto Puerta	a98c2475c6	Merge pull request #45254 from travisn/prometheus-rules-typos prometheus: Spell check the alert descriptions Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Laura Flores <lflores@redhat.com> Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: sunilangadi2 <NOT@FOUND> Reviewed-by: Travis Nielsen <tnielsen@redhat.com>	2022-04-04 13:46:00 +02:00
Ernesto Puerta	2d1c480f5a	Merge pull request #45583 from p-se/monitoring-alert-mtu-group-by-devices mgr/dashboard: Compare values of MTU alert by device Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: p-se <NOT@FOUND>	2022-04-01 11:11:30 +02:00
Travis Nielsen	9cca95b16a	prometheus: spell check the alert descriptions Signed-off-by: Travis Nielsen <tnielsen@redhat.com>	2022-03-30 17:38:43 -06:00
Aashish Sharma	9719cc795e	mgr/dashboard: Pool overall performance shows multiple entries of same pool in pool overview This PR intends to fix this issue Fixes:https://tracker.ceph.com/issues/54513 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-03-28 18:25:25 +05:30
Aashish Sharma	49d6068463	mgr/dashboard: fix promtool test for mtu alert Fixes: https://tracker.ceph.com/issues/55004 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-03-28 13:39:38 +02:00
Patrick Seidensal	3821548a37	mgr/dashboard: Compare values of MTU alert by device Fixes: https://tracker.ceph.com/issues/55004 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2022-03-28 13:38:15 +02:00
Rishabh Dave	a6f5efb620	monitoring: mention PyYAML only once in requirements Following error occurs while running "sudo install-deps.sh" - ERROR: Double requirement given: PyYAML==6.0 (from -r requirements-lint.txt (line 5)) (already in pyyaml (from -r requirements-alerts.txt (line 1)), name='PyYAML') PyYAML is mentioned twice as a requirement. It is mentioned once in both the following files - monitoring/ceph-mixin/requirements-lint.txt monitoring/ceph-mixin/requirements-alerts.txt These requirements were added in commits `44d3e4c264` and `4750ac0d77`. Fixes: https://tracker.ceph.com/issues/54185 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2022-02-08 11:19:15 +05:30
Nizamudeen A	27592b7561	cephadm: change shared_folder directory for prometheus and grafana After https://github.com/ceph/ceph/pull/44059 the monitoring/prometheus and monitoring/grafana/dashboards directories are changed to monitoring/ceph-mixins. That broke the shared_folders in the cephadm bootstrap script. Changed all the instances of monitoring/prometheus and monitoring/grafana/dashboards to monitoring/ceph-mixins Also, renaming all the instances of prometheus_alerts.yaml to prometheus_alerts.yml. Fixes: https://tracker.ceph.com/issues/54176 Signed-off-by: Nizamudeen A <nia@redhat.com>	2022-02-07 16:34:37 +05:30
Arthur Outhenin-Chalandre	8ff1e6b399	monitoring: build jsonnet/jb only for testing Build jsonnet and jb in the testso that we can build ceph without internet access and still be able to run the test needed for monitoring using jsonnet tools. Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-02-03 13:08:37 +01:00
Arthur Outhenin-Chalandre	ecaf9070ae	spec: debian: monitoring: build jsonnet from source to use 0.18.0 As this new version is recently released it's still not in every distro we use. We now build jsonnet from source so that we can use this new version of jsonnet. This commit could be reverted later on when the new version would be available everywhere. Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-02-03 13:08:36 +01:00
Arthur Outhenin-Chalandre	98236e3a1d	mgr/dashboard: monitoring: refactor into ceph-mixin Mixin is a way to bundle dashboards, prometheus rules and alerts into jsonnet package. Shifting to mixin will allow easier integration with monitoring automation that some users may use. This commit moves `/monitoring/grafana/dashboards` and `/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts was also converted to Jsonnet using an automated way (from yaml to json to jsonnet). This commit minimises any change made to the generated files and should not change neithers the dashboards nor the Prometheus alerts. In the future some configuration will also be added to jsonnet to add more functionalities to the dashboards or alerts (i.e.: multi cluster). Fixes: https://tracker.ceph.com/issues/53374 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-02-03 13:08:20 +01:00

46 Commits