RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-18 17:37:38 +00:00

Author	SHA1	Message	Date
Tatjana Dehler	42ff9370a0	monitoring/ceph-mixin: add entries to envlist Add the missing entries `jsonnet-bundler-install` and `jsonnet-bundler-update` to envlist. Signed-off-by: Tatjana Dehler <tdehler@suse.com>	2022-08-19 12:08:56 +02:00
Aswin Toni	2e0e684fc2	ceph-mixin: Remove jsonnet building Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-17 12:08:56 +02:00
Aswin Toni	5cdc1c62c5	prometheus: add multicluster support to alerts Signed-off-by: Aswin Toni <aswin.toni@cern.ch>	2022-08-17 12:08:56 +02:00
Kefu Chai	4a3afcf277	cmake: set $PATH for tests using jsonnet tools otherwise they would not able to find executables installed into ${CMAKE_CURRENT_BINARY_DIR}. Signed-off-by: Kefu Chai <tchaikov@gmail.com>	2022-08-16 10:53:29 +08:00
Nizamudeen A	e9d361f621	Merge pull request #47334 from s0nea/wip-osd-objectstore-types-fix monitoring/ceph-mixin: OSD overview typo fix Reviewed-by: MrFreezeex <NOT@FOUND> Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-08-01 13:47:03 +05:30
Anthony D'Atri	9b65974468	monitoring/ceph-mixin: clean up prometheus_alerts.yml Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>	2022-07-28 19:17:51 -07:00
Tatjana Dehler	8faaca2082	monitoring/ceph-mixin: OSD overview typo fix Correct a wrongly set bracket on ceph-dashboard -> OSD Overview -> OSD Objectstore Types resulting in a parser error. Fixes: https://tracker.ceph.com/issues/56948 Signed-off-by: Tatjana Dehler <tdehler@suse.com>	2022-07-28 15:15:32 +02:00
Arthur Outhenin-Chalandre	37add644d1	ceph-mixin: remove timepicker override in every dashboards Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-24 11:54:26 +02:00
Arthur Outhenin-Chalandre	5db37300fd	ceph-mixin: rationalize local helper functions to utils Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-24 11:50:49 +02:00
Arthur Outhenin-Chalandre	0b7cc6bc99	ceph-mixin: fix typos Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-18 10:02:54 +02:00
Arthur Outhenin-Chalandre	c8f086c182	ceph-mixin: fix test with rate and label changes Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-17 09:42:29 +02:00
Arthur Outhenin-Chalandre	3b6356c872	ceph-mixin: don't add cluster matcher if showcluster is disabled Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-17 09:41:21 +02:00
Arthur Outhenin-Chalandre	fd4f484d22	ceph-mixin: refactor the structure of _config and utils Before this refactor we couln't override the config externally. Now the _config is correctly propagated and not only taken from the config.libsonnet file. Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-16 15:26:56 +02:00
Arthur Outhenin-Chalandre	4595e9af23	ceph-mixin: fix makefile dashboards dependency Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-16 15:26:55 +02:00
Arthur Outhenin-Chalandre	faeea8d165	ceph-mixin: fix linting issue and add cluster template support Fix most of the issues reported by dashboards-linter: - Add matcher/template for job (and also cluster) - use $__rate_interval everywhere Also this change all the irate functions to rate as most of irate where not actually used correctly. While using irate on graph for instance you can easily miss some of the metrics values as irate only take the two last values and the query steps can be quite large if you want a graph for a few hours/a day or more. Fixes: https://tracker.ceph.com/issues/55003 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch> ceph-mixin: add config with matchers and tags Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-05-16 15:26:53 +02:00
Arthur Outhenin-Chalandre	1452311a9b	ceph-mixin: rewrite promql queries to multiline Fixes: https://tracker.ceph.com/issues/55005 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-04-27 17:55:52 +02:00
Aashish Sharma	2877920f58	mgr/dashboard: upgrade grafana pie-chart and vonage-status-panel versions Fixes:https://tracker.ceph.com/issues/55195 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-04-06 15:24:41 +05:30
Ernesto Puerta	8721bd6c5d	monitoring/grafana: fix version Fixes: https://tracker.ceph.com/issues/55172 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>	2022-04-04 13:52:43 +02:00
Ernesto Puerta	a98c2475c6	Merge pull request #45254 from travisn/prometheus-rules-typos prometheus: Spell check the alert descriptions Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Laura Flores <lflores@redhat.com> Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: sunilangadi2 <NOT@FOUND> Reviewed-by: Travis Nielsen <tnielsen@redhat.com>	2022-04-04 13:46:00 +02:00
David Galloway	b4910a6627	Merge pull request #45739 from rhcs-dashboard/fix-55155-master grafana/Makefile: don't push to docker	2022-04-01 13:30:05 -04:00
Ernesto Puerta	7e6309fac3	grafana/Makefile: don't push to docker Fixes: https://tracker.ceph.com/issues/55155 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>	2022-04-01 11:44:43 +02:00
Ernesto Puerta	2d1c480f5a	Merge pull request #45583 from p-se/monitoring-alert-mtu-group-by-devices mgr/dashboard: Compare values of MTU alert by device Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: p-se <NOT@FOUND>	2022-04-01 11:11:30 +02:00
Ernesto Puerta	87f494eda0	Merge pull request #45578 from rhcs-dashboard/fix-grafana-build mgr/dashboard: remove transition-through-oci image workaround in grafana build Reviewed-by: Dan Mick <dmick@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com>	2022-03-31 19:58:29 +02:00
Travis Nielsen	9cca95b16a	prometheus: spell check the alert descriptions Signed-off-by: Travis Nielsen <tnielsen@redhat.com>	2022-03-30 17:38:43 -06:00
Ernesto Puerta	043f7953d8	Merge pull request #45335 from rhcs-dashboard/fix-54513-master mgr/dashboard: Pool overall performance shows multiple entries of same pool in pool overview Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com> Reviewed-by: sunilangadi2 <NOT@FOUND>	2022-03-30 14:05:38 +02:00
Aashish Sharma	9719cc795e	mgr/dashboard: Pool overall performance shows multiple entries of same pool in pool overview This PR intends to fix this issue Fixes:https://tracker.ceph.com/issues/54513 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-03-28 18:25:25 +05:30
Aashish Sharma	49d6068463	mgr/dashboard: fix promtool test for mtu alert Fixes: https://tracker.ceph.com/issues/55004 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-03-28 13:39:38 +02:00
Patrick Seidensal	3821548a37	mgr/dashboard: Compare values of MTU alert by device Fixes: https://tracker.ceph.com/issues/55004 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2022-03-28 13:38:15 +02:00
Aashish Sharma	64b0e5ce8a	mgr/dashboard: fix transition-through-oci image workaround in grafana build Fixes: https://tracker.ceph.com/issues/54311 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-03-23 13:59:28 +05:30
Aashish Sharma	c306778889	mgr/dashboard/monitoring: update grafana version Fixes: https://tracker.ceph.com/issues/54311 Signed-off-by: Aashish Sharma <aasharma@redhat.com>	2022-03-21 17:40:03 +05:30
Rishabh Dave	a6f5efb620	monitoring: mention PyYAML only once in requirements Following error occurs while running "sudo install-deps.sh" - ERROR: Double requirement given: PyYAML==6.0 (from -r requirements-lint.txt (line 5)) (already in pyyaml (from -r requirements-alerts.txt (line 1)), name='PyYAML') PyYAML is mentioned twice as a requirement. It is mentioned once in both the following files - monitoring/ceph-mixin/requirements-lint.txt monitoring/ceph-mixin/requirements-alerts.txt These requirements were added in commits `44d3e4c264` and `4750ac0d77`. Fixes: https://tracker.ceph.com/issues/54185 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2022-02-08 11:19:15 +05:30
Nizamudeen A	27592b7561	cephadm: change shared_folder directory for prometheus and grafana After https://github.com/ceph/ceph/pull/44059 the monitoring/prometheus and monitoring/grafana/dashboards directories are changed to monitoring/ceph-mixins. That broke the shared_folders in the cephadm bootstrap script. Changed all the instances of monitoring/prometheus and monitoring/grafana/dashboards to monitoring/ceph-mixins Also, renaming all the instances of prometheus_alerts.yaml to prometheus_alerts.yml. Fixes: https://tracker.ceph.com/issues/54176 Signed-off-by: Nizamudeen A <nia@redhat.com>	2022-02-07 16:34:37 +05:30
Ernesto Puerta	6a4b1e148d	Merge pull request #44796 from pcuzner/remove-old-mib monitoring: remove old MIB Reviewed-by: Alfonso Martínez <almartin@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-02-04 17:42:08 +01:00
Arthur Outhenin-Chalandre	8ff1e6b399	monitoring: build jsonnet/jb only for testing Build jsonnet and jb in the testso that we can build ceph without internet access and still be able to run the test needed for monitoring using jsonnet tools. Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-02-03 13:08:37 +01:00
Arthur Outhenin-Chalandre	ecaf9070ae	spec: debian: monitoring: build jsonnet from source to use 0.18.0 As this new version is recently released it's still not in every distro we use. We now build jsonnet from source so that we can use this new version of jsonnet. This commit could be reverted later on when the new version would be available everywhere. Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-02-03 13:08:36 +01:00
Arthur Outhenin-Chalandre	98236e3a1d	mgr/dashboard: monitoring: refactor into ceph-mixin Mixin is a way to bundle dashboards, prometheus rules and alerts into jsonnet package. Shifting to mixin will allow easier integration with monitoring automation that some users may use. This commit moves `/monitoring/grafana/dashboards` and `/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts was also converted to Jsonnet using an automated way (from yaml to json to jsonnet). This commit minimises any change made to the generated files and should not change neithers the dashboards nor the Prometheus alerts. In the future some configuration will also be added to jsonnet to add more functionalities to the dashboards or alerts (i.e.: multi cluster). Fixes: https://tracker.ceph.com/issues/53374 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>	2022-02-03 13:08:20 +01:00
Ernesto Puerta	c47ace9215	Merge pull request #43707 from BenoitKnecht/ceph-mgr-service-id mgr: Fix ceph_daemon label in ceph_rgw_* metrics Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-02-02 18:39:57 +01:00
Paul Cuzner	cbeab5c566	monitoring: remove old MIB The MIB file that matches the OID definitions in the alerts is CEPH-MIB.txt. The old MIB from the original SuSE snmp gateway work, therefore needs to be removed to avoid confusion. Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2022-01-27 11:24:34 +13:00
Pere Diaz Bou	57c26311de	monitoring/grafana: replace filestore osd count Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-01-18 14:14:41 +01:00
Pere Diaz Bou	a3cf5c5e9f	monitoring/grafana: use Path class instead of split Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-01-18 13:24:12 +01:00
Pere Diaz Bou	1e4d85d04f	monitoring/grafana: remove explicit str casting Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-01-18 13:24:12 +01:00
Pere Diaz Bou	2b4f3561d2	monitoring/grafana: add generated json files Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-01-18 13:24:12 +01:00
Pere Diaz Bou	b381a83e9b	monitoring/grafana: ValueError instead of RuntimeError Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-01-18 13:24:12 +01:00
Pere Diaz Bou	4c302234ff	monitoring/grafana: Replace missing legendFormat warning with error Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-01-18 13:24:10 +01:00
Patrick Seidensal	7d7488018e	monitoring: Add unit tests for OSD panels in ceph-cluster dashboard Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2022-01-13 13:27:55 +01:00
Patrick Seidensal	4a6b2c1dfb	monitoring: fix display ceph_osd_in in Grafana panel Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2022-01-13 13:27:55 +01:00
Patrick Seidensal	18d3a71618	mgr/prometheus: Fix regression with OSD/host details/overview dashboards Fix issues with PromQL expressions and vector matching with the `ceph_disk_occupation` metric. As it turns out, `ceph_disk_occupation` cannot simply be used as expected, as there seem to be some edge cases for users that have several OSDs on a single disk. This leads to issues which cannot be approached by PromQL alone (many-to-many PromQL erros). The data we have expected is simply different in some rare cases. I have not found a sole PromQL solution to this issue. What we basically need is the following. 1. Match on labels `host` and `instance` to get one or more OSD names from a metadata metric (`ceph_disk_occupation`) to let a user know about which OSDs belong to which disk. 2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric, in which case the value of `ceph_daemon` must not refer to more than a single OSD. The exact opposite to requirement 1. As both operations are currently performed on a single metric, and there is no way to satisfy both requirements on a single metric, the intention of this commit is to extend the metric by providing a similar metric that satisfies one of the requirements. This enables the queries to differentiate between a vector matching operation to show a string to the user (where `ceph_daemon` could possibly be `osd.1` or `osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in the condition for the matching. Although the `ceph_daemon` label is used on a variety of daemons, only OSDs seem to be affected by this issue (only if more than one OSD is run on a single disk). This means that only the `ceph_disk_occupation` metadata metric seems to need to be extended and provided as two metrics. `ceph_disk_occupation` is supposed to be used for matching the `ceph_daemon` label value. foo * on(ceph_daemon) group_left ceph_disk_occupation `ceph_disk_occupation_human` is supposed to be used for anything where the resulting data is displayed to be consumed by humans (graphs, alert messages, etc). foo * on(device,instance) group_left(ceph_daemon) ceph_disk_occupation_human Fixes: https://tracker.ceph.com/issues/52974 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>	2022-01-13 13:27:55 +01:00
Benoît Knecht	2daaa052ea	monitoring/grafana: Add tests for radosgw panels Some of the expressions modified in c40290390d7 were not covered by any tests, especially those in the `radosgw-detail.json` dashboard. This commit fills in those gaps. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2022-01-11 13:17:48 +01:00
Benoît Knecht	adc36dea7f	monitoring/grafana: Update radosgw dashboards With the `ceph_daemon` label now replaced by `instance_id` on all `ceph_rgw_` metrics, we need to update Grafana dashboards get that label back from `ceph_rgw_metadata` using this type of construct: ``` ceph_rgw_req on (instance_id) group_left(ceph_daemon) ceph_rgw_metadata ``` Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2022-01-11 13:17:20 +01:00
Ernesto Puerta	978d5829f2	Merge pull request #44294 from rhcs-dashboard/feature-bluestore-onode mgr/dashboard: monitoring:Implement BlueStore onode hit/miss counters into the dashboard Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Alfonso Martínez <almartin@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Laura Flores <lflores@redhat.com> Reviewed-by: neha-ojha <NOT@FOUND> Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>	2022-01-11 11:24:21 +01:00

1 2 3 4

191 Commits