Commit Graph

33 Commits

Author SHA1 Message Date
Piyush Agarwal 10d4f309f3 mgr/dashboard: Add 'Browse Dashboards' button in multi-cluster and ceph-cluster Grafana dashboards
Fixes: https://tracker.ceph.com/issues/68316

Signed-off-by: piyushagarwal1411 <piyushagarwal14.pa@gmail.com>
Signed-off-by: Piyush Agarwal <piyushagarwal14.pa@gmail.com>
2024-10-16 17:45:09 +05:30
Aashish Sharma b5536d8b8d mgr/dashboard: Add Performance Details grafana charts for individual clusters in Manage-clusters page
Fixes: https://tracker.ceph.com/issues/67192

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2024-08-22 14:15:08 +05:30
Aashish Sharma 2e54c9a01e mgr/dashboard: Add a new chart for replication delta per shard in rgw sync overview grafana dashboard
Fixes: https://tracker.ceph.com/issues/66994

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2024-07-17 15:49:15 +05:30
Aashish Sharma 1622ad8f76 mgr/dashboard: fix cluster filter typo in multi-cluster-overview
grafana dashboard

Fixes: https://tracker.ceph.com/issues/65760

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2024-05-02 17:12:26 +05:30
Christian Rohmann 090b8e17f1 Cleanup of variables, queries and tests to enable showMultiCluster=True
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.

Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.

There also are some consistency cleanups here and there:
 * `datasource` was not used consistently
 * `cluster` label_values are determined from `ceph_health_status`
 * `job` template and filters on this label were removed to align multi cluster
    support solely via the `cluster` label
 * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
    to now show all instance values, but those of hosts with some Ceph
    component / daemon.
 *  Enable showMultiCluster=True since `cluster` label is now always present,
    via https://github.com/ceph/ceph/pull/54964

Improves: https://tracker.ceph.com/issues/64321
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
2024-04-22 08:29:37 +02:00
Aashish Sharma c2f4aa7887 mgr/dashboard: replace deprecated table panel in grafana with a newer
table panel

Fixes: https://tracker.ceph.com/issues/65174

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2024-04-02 12:22:53 +05:30
Nizamudeen A 278daa4ba2
Merge pull request #55574 from ceph/feature-multi-cluster-management-monitoring
mgr/dashboard: introduce multi cluster management and monitoring in ceph dashboard 

Reviewed-by: Nizamudeen A <nia@redhat.com>
2024-03-06 10:26:29 +05:30
Nizamudeen A b8811c844f mgr/dashboard: introduce multi-cluster overview page
https://tracker.ceph.com/issues/64530
Signed-off-by: Nizamudeen A <nia@redhat.com>
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2024-03-05 19:05:37 +05:30
Aashish Sharma 6e5efb626f mgr/dashboard: replace piechart plugin charts with native pie chart
panel

Fixes: https://tracker.ceph.com/issues/64579

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2024-02-27 14:20:31 +05:30
Aashish Sharma 495f669faf mgr/dashboard: Add a manage clusters page to the multi-cluster nav to
list/connect/disconnect/edit clusters in multi-cluster setup

Fixes: https://tracker.ceph.com/issues/64530
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2024-02-22 10:42:01 +05:30
Aashish Sharma a85baa89da
Merge pull request #55314 from cloudbehl/rgw-dashboard-json
mgr/dashboard: Fixing RGW graph panels


Reviewed-by: Aashish Sharma <aasharma@redhat.com>
2024-02-13 12:00:33 +05:30
Aashish Sharma a572a0c167 mgr/dashboard: Add RGW per user/bucket panels in grafana
Fixes: https://tracker.ceph.com/issues/64359

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2024-02-09 21:14:27 +05:30
cloudbehl 191fda84b3 mgr/dashboard: Fixing RGW graph panels
- Fixing grafana panels for rgw dashboards
- Fixing RGW overview dashboard queries

fixes https://tracker.ceph.com/issues/64177

Signed-off-by: cloudbehl <cloudbehl@gmail.com>
2024-01-25 18:41:03 +05:30
Aashish Sharma 2573426f54 mgr/dashboard: upgrade from old 'graph' type panels to the new
'timeseries' panel

The graph panel type is deprecated, and disappears after Grafana v9.1 (current version is 10.0) to prevent more old type panels being created. These should be migrated to the timeseries panel type, to avoid potential problems with future Grafana versions.

Fixes: https://tracker.ceph.com/issues/61720

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2023-12-22 11:19:40 +05:30
Nizamudeen A a42e286fc0
Merge pull request #54355 from nobuto-m/info-rbd-stats-pools
mgr/dashboard: info on why RBD graphs are empty

Reviewed-by: Ankush Behl <cloudbehl@gmail.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2023-11-30 13:38:54 +05:30
Aashish Sharma 39fea8f71c
Merge pull request #51340 from Javlopez/feature/12087-upgrade-and-generate-grafana-dashboards
monitoring: add new dashboards

Fixes: https://tracker.ceph.com/issues/63592

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
2023-11-20 11:33:07 +05:30
Nobuto Murata 9c026fa18c mgr/dashboard: info on why RBD graphs are empty
Those RBD IO statistics graphs are empty out of the box and it's on
purpose. Instead of giving an impression that those graphs are broken,
point users to a documentation explaining about optional steps to enable
those statistics.
https://docs.ceph.com/en/latest/mgr/prometheus/#rbd-io-statistics

Signed-off-by: Nobuto Murata <nobuto.murata@canonical.com>
2023-11-06 15:50:50 +09:00
Javier f0e8565b49 monitoring: update libsonnet files for generate ceph-cluster.json
add ceph-cluster.libsonnet file to generate ceph-cluster.json

Fixes: https://tracker.ceph.com/issues/61443
Signed-off-by: Javier <sjavierlopez@gmail.com>
2023-10-20 18:07:33 -06:00
Aashish Sharma 6f3f58cb8e mgr/dashboard: Consider null values as zero in grafana panels
After upgrading from RHCS4 to RHCS5..some of the grafana charts broke.
This is because in RHCS5 we do not generate the metrics if its value is
zero as a result the null value from that metric breaks the grafana
charts or graphs. This PR is to fix the above mentioned issue.

Fixes: https://tracker.ceph.com/issues/63088

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2023-10-04 12:31:42 +05:30
Pere Diaz Bou 8e07fbd2ea
Merge pull request #48843 from rhcs-dashboard/expose_slow_ops
mgr/prometheus: expose daemon health metrics

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2022-12-20 12:25:32 +01:00
Pere Diaz Bou 5a2b7c25b6 mgr/prometheus: expose daemon health metrics
Until now daemon health metrics were stored without being used. One of
the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs
which this commit tries to expose to bring fine grained metrics to find
troublesome OSDs instead of having a lone healthcheck of slow ops in the
whole cluster.

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
2022-12-20 09:44:49 +01:00
Tatjana Dehler 08352b6540
ceph-mixing: fix ceph_hosts variable
Do only use `instance` to query for hostnames in single-cluster-mode.
Consider the cluster matcher only in multi-cluster-mode. In this case
the query will look like:
`"label_values({cluster=~\"$cluster\"}, instance)"`.

Fixes: https://tracker.ceph.com/issues/57987
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2022-11-11 16:35:05 +01:00
Tatjana Dehler 15fa97d49d
monitoring/ceph-mixin: add RGW host to label info
Add the missing information about the RGW instance to the labels of the
"Average GET/PUT Latencies" panel on the "RGW Overview" dashboard.

Fixes: https://tracker.ceph.com/issues/57166
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2022-09-06 16:19:19 +02:00
Tatjana Dehler 8faaca2082
monitoring/ceph-mixin: OSD overview typo fix
Correct a wrongly set bracket on ceph-dashboard -> OSD Overview ->
OSD Objectstore Types resulting in a parser error.

Fixes: https://tracker.ceph.com/issues/56948
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2022-07-28 15:15:32 +02:00
Arthur Outhenin-Chalandre 37add644d1
ceph-mixin: remove timepicker override in every dashboards
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-24 11:54:26 +02:00
Arthur Outhenin-Chalandre 5db37300fd
ceph-mixin: rationalize local helper functions to utils
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-24 11:50:49 +02:00
Arthur Outhenin-Chalandre 0b7cc6bc99
ceph-mixin: fix typos
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-18 10:02:54 +02:00
Arthur Outhenin-Chalandre 3b6356c872
ceph-mixin: don't add cluster matcher if showcluster is disabled
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-17 09:41:21 +02:00
Arthur Outhenin-Chalandre fd4f484d22
ceph-mixin: refactor the structure of _config and utils
Before this refactor we couln't override the config externally. Now the
_config is correctly propagated and not only taken from the
config.libsonnet file.

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-16 15:26:56 +02:00
Arthur Outhenin-Chalandre faeea8d165
ceph-mixin: fix linting issue and add cluster template support
Fix most of the issues reported by dashboards-linter:
- Add matcher/template for job (and also cluster)
- use $__rate_interval everywhere

Also this change all the irate functions to rate as most of irate where
not actually used correctly. While using irate on graph for instance you
can easily miss some of the metrics values as irate only take the two
last values and the query steps can be quite large if you want a graph
for a few hours/a day or more.

Fixes: https://tracker.ceph.com/issues/55003
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>

ceph-mixin: add config with matchers and tags

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-16 15:26:53 +02:00
Arthur Outhenin-Chalandre 1452311a9b
ceph-mixin: rewrite promql queries to multiline
Fixes: https://tracker.ceph.com/issues/55005
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-04-27 17:55:52 +02:00
Aashish Sharma 9719cc795e mgr/dashboard: Pool overall performance shows multiple entries of same pool in pool overview
This PR intends to fix this issue

Fixes:https://tracker.ceph.com/issues/54513
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2022-03-28 18:25:25 +05:30
Arthur Outhenin-Chalandre 98236e3a1d
mgr/dashboard: monitoring: refactor into ceph-mixin
Mixin is a way to bundle dashboards, prometheus rules and alerts into
jsonnet package. Shifting to mixin will allow easier integration with
monitoring automation that some users may use.

This commit moves `/monitoring/grafana/dashboards` and
`/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts
was also converted to Jsonnet using an automated way (from yaml to json
to jsonnet). This commit minimises any change made to the generated files
and should not change neithers the dashboards nor the Prometheus alerts.

In the future some configuration will also be added to jsonnet to add
more functionalities to the dashboards or alerts (i.e.: multi cluster).

Fixes: https://tracker.ceph.com/issues/53374
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-02-03 13:08:20 +01:00