Commit Graph

24 Commits

Author SHA1 Message Date
Aashish Sharma
2573426f54 mgr/dashboard: upgrade from old 'graph' type panels to the new
'timeseries' panel

The graph panel type is deprecated, and disappears after Grafana v9.1 (current version is 10.0) to prevent more old type panels being created. These should be migrated to the timeseries panel type, to avoid potential problems with future Grafana versions.

Fixes: https://tracker.ceph.com/issues/61720

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2023-12-22 11:19:40 +05:30
Nizamudeen A
a42e286fc0
Merge pull request #54355 from nobuto-m/info-rbd-stats-pools
mgr/dashboard: info on why RBD graphs are empty

Reviewed-by: Ankush Behl <cloudbehl@gmail.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2023-11-30 13:38:54 +05:30
Aashish Sharma
39fea8f71c
Merge pull request #51340 from Javlopez/feature/12087-upgrade-and-generate-grafana-dashboards
monitoring: add new dashboards

Fixes: https://tracker.ceph.com/issues/63592

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
2023-11-20 11:33:07 +05:30
Aashish Sharma
70d8c5b565
Merge pull request #53650 from rhcs-dashboard/fix-62969-main
mgr/dashboard: Show the OSDs Out and Down panels as red whenever an OSD is in Out or Down state in Ceph Cluster grafana dashboard

Reviewed-by: Nizamudeen A <nia@redhat.com>
2023-11-17 11:24:45 +05:30
Nobuto Murata
9c026fa18c mgr/dashboard: info on why RBD graphs are empty
Those RBD IO statistics graphs are empty out of the box and it's on
purpose. Instead of giving an impression that those graphs are broken,
point users to a documentation explaining about optional steps to enable
those statistics.
https://docs.ceph.com/en/latest/mgr/prometheus/#rbd-io-statistics

Signed-off-by: Nobuto Murata <nobuto.murata@canonical.com>
2023-11-06 15:50:50 +09:00
Javier
f0e8565b49 monitoring: update libsonnet files for generate ceph-cluster.json
add ceph-cluster.libsonnet file to generate ceph-cluster.json

Fixes: https://tracker.ceph.com/issues/61443
Signed-off-by: Javier <sjavierlopez@gmail.com>
2023-10-20 18:07:33 -06:00
Aashish Sharma
a29e6a8673 mgr/dashboard: Show the OSD's Out and Down panels as red whenever an OSD is in Out or Down state in Ceph Cluster grafana dashboard
Fixes: https://tracker.ceph.com/issues/62969

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2023-10-11 11:46:03 +05:30
Aashish Sharma
6f3f58cb8e mgr/dashboard: Consider null values as zero in grafana panels
After upgrading from RHCS4 to RHCS5..some of the grafana charts broke.
This is because in RHCS5 we do not generate the metrics if its value is
zero as a result the null value from that metric breaks the grafana
charts or graphs. This PR is to fix the above mentioned issue.

Fixes: https://tracker.ceph.com/issues/63088

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2023-10-04 12:31:42 +05:30
Paul Reece
6ff02381a3 monitoring: grafana mons out of quorum should be count - sum
not count / sum

For example, with 3 mons total, all in quorum, original
will do 3/3 = 1, showing 1 out of quorum (likely typo fix)

Fixes: https://tracker.ceph.com/issues/61923

Signed-off-By: Paul Reece <paulreece42@gmail.com>

fixing case sensitive
Signed-off-by: Paul Reece <paulreece42@gmail.com>
2023-08-07 16:16:18 +00:00
Aashish Sharma
3063c8a4fb
Merge pull request #48783 from rhcs-dashboard/fix-grafana-stat-panel
mgr/dashboard: Replace vonage-status-panel with native grafana stat panel


Reviewed-by: Nizamudeen A <nia@redhat.com>
2023-02-08 18:34:30 +05:30
Pere Diaz Bou
8e07fbd2ea
Merge pull request #48843 from rhcs-dashboard/expose_slow_ops
mgr/prometheus: expose daemon health metrics

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2022-12-20 12:25:32 +01:00
Pere Diaz Bou
5a2b7c25b6 mgr/prometheus: expose daemon health metrics
Until now daemon health metrics were stored without being used. One of
the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs
which this commit tries to expose to bring fine grained metrics to find
troublesome OSDs instead of having a lone healthcheck of slow ops in the
whole cluster.

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
2022-12-20 09:44:49 +01:00
Aashish Sharma
3e08b81b40 mgr/dashboard: Replace vonage-status-panel with native grafana stat panel
Fixes: https://tracker.ceph.com/issues/58295
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2022-12-16 10:51:47 +05:30
Tatjana Dehler
08352b6540
ceph-mixing: fix ceph_hosts variable
Do only use `instance` to query for hostnames in single-cluster-mode.
Consider the cluster matcher only in multi-cluster-mode. In this case
the query will look like:
`"label_values({cluster=~\"$cluster\"}, instance)"`.

Fixes: https://tracker.ceph.com/issues/57987
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2022-11-11 16:35:05 +01:00
Tatjana Dehler
15fa97d49d
monitoring/ceph-mixin: add RGW host to label info
Add the missing information about the RGW instance to the labels of the
"Average GET/PUT Latencies" panel on the "RGW Overview" dashboard.

Fixes: https://tracker.ceph.com/issues/57166
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2022-09-06 16:19:19 +02:00
Tatjana Dehler
8faaca2082
monitoring/ceph-mixin: OSD overview typo fix
Correct a wrongly set bracket on ceph-dashboard -> OSD Overview ->
OSD Objectstore Types resulting in a parser error.

Fixes: https://tracker.ceph.com/issues/56948
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2022-07-28 15:15:32 +02:00
Arthur Outhenin-Chalandre
37add644d1
ceph-mixin: remove timepicker override in every dashboards
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-24 11:54:26 +02:00
Arthur Outhenin-Chalandre
5db37300fd
ceph-mixin: rationalize local helper functions to utils
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-24 11:50:49 +02:00
Arthur Outhenin-Chalandre
0b7cc6bc99
ceph-mixin: fix typos
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-18 10:02:54 +02:00
Arthur Outhenin-Chalandre
3b6356c872
ceph-mixin: don't add cluster matcher if showcluster is disabled
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-17 09:41:21 +02:00
Arthur Outhenin-Chalandre
faeea8d165
ceph-mixin: fix linting issue and add cluster template support
Fix most of the issues reported by dashboards-linter:
- Add matcher/template for job (and also cluster)
- use $__rate_interval everywhere

Also this change all the irate functions to rate as most of irate where
not actually used correctly. While using irate on graph for instance you
can easily miss some of the metrics values as irate only take the two
last values and the query steps can be quite large if you want a graph
for a few hours/a day or more.

Fixes: https://tracker.ceph.com/issues/55003
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>

ceph-mixin: add config with matchers and tags

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-16 15:26:53 +02:00
Arthur Outhenin-Chalandre
1452311a9b
ceph-mixin: rewrite promql queries to multiline
Fixes: https://tracker.ceph.com/issues/55005
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-04-27 17:55:52 +02:00
Aashish Sharma
9719cc795e mgr/dashboard: Pool overall performance shows multiple entries of same pool in pool overview
This PR intends to fix this issue

Fixes:https://tracker.ceph.com/issues/54513
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2022-03-28 18:25:25 +05:30
Arthur Outhenin-Chalandre
98236e3a1d
mgr/dashboard: monitoring: refactor into ceph-mixin
Mixin is a way to bundle dashboards, prometheus rules and alerts into
jsonnet package. Shifting to mixin will allow easier integration with
monitoring automation that some users may use.

This commit moves `/monitoring/grafana/dashboards` and
`/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts
was also converted to Jsonnet using an automated way (from yaml to json
to jsonnet). This commit minimises any change made to the generated files
and should not change neithers the dashboards nor the Prometheus alerts.

In the future some configuration will also be added to jsonnet to add
more functionalities to the dashboards or alerts (i.e.: multi cluster).

Fixes: https://tracker.ceph.com/issues/53374
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-02-03 13:08:20 +01:00