ceph/monitoring/ceph-mixin
Arthur Outhenin-Chalandre faeea8d165
ceph-mixin: fix linting issue and add cluster template support
Fix most of the issues reported by dashboards-linter:
- Add matcher/template for job (and also cluster)
- use $__rate_interval everywhere

Also this change all the irate functions to rate as most of irate where
not actually used correctly. While using irate on graph for instance you
can easily miss some of the metrics values as irate only take the two
last values and the query steps can be quite large if you want a graph
for a few hours/a day or more.

Fixes: https://tracker.ceph.com/issues/55003
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>

ceph-mixin: add config with matchers and tags

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
2022-05-16 15:26:53 +02:00
..
dashboards ceph-mixin: fix linting issue and add cluster template support 2022-05-16 15:26:53 +02:00
dashboards_out ceph-mixin: fix linting issue and add cluster template support 2022-05-16 15:26:53 +02:00
tests_alerts Merge pull request #45254 from travisn/prometheus-rules-typos 2022-04-04 13:46:00 +02:00
tests_dashboards
.gitignore
.pylintrc
alerts.libsonnet cephadm: change shared_folder directory for prometheus and grafana 2022-02-07 16:34:37 +05:30
CMakeLists.txt monitoring: build jsonnet/jb only for testing 2022-02-03 13:08:37 +01:00
config.libsonnet ceph-mixin: fix linting issue and add cluster template support 2022-05-16 15:26:53 +02:00
dashboards.jsonnet
jsonnet-build.sh monitoring: build jsonnet/jb only for testing 2022-02-03 13:08:37 +01:00
jsonnet-bundler-build.sh monitoring: build jsonnet/jb only for testing 2022-02-03 13:08:37 +01:00
jsonnetfile.json
jsonnetfile.lock.json
lint-jsonnet.sh
Makefile
mixin.libsonnet
prometheus_alerts.yml Merge pull request #45254 from travisn/prometheus-rules-typos 2022-04-04 13:46:00 +02:00
README.md cephadm: change shared_folder directory for prometheus and grafana 2022-02-07 16:34:37 +05:30
requirements-alerts.txt monitoring: mention PyYAML only once in requirements 2022-02-08 11:19:15 +05:30
requirements-grafonnet.txt
requirements-lint.txt
test-jsonnet.sh
tox.ini cephadm: change shared_folder directory for prometheus and grafana 2022-02-07 16:34:37 +05:30

Prometheus Monitoring Mixin for Ceph

A set of Grafana dashboards and Prometheus alerts for Ceph.

All the Grafana dashboards are already generated in the dashboards_out directory and alerts in the prometheus_alerts.yml file.

You can use the Grafana dashboards and alerts with Jsonnet like any other prometheus mixin. You can find more ressources about mixins in general on monitoring.mixins.dev.

Grafana dashboards for Ceph

In dashboards_out you can find a collection of Grafana dashboards for Ceph Monitoring.

These dashboards are based on metrics collected from prometheus scraping the prometheus mgr plugin and the node_exporter.

Requirements

Prometheus alerts

In prometheus_alerts.yml you'll find a set of Prometheus alert rules that should provide a decent set of default alerts for a Ceph cluster. Just put this file in a place according to your Prometheus configuration (wherever the rules configuration stanza points).

SNMP

Ceph provides a MIB (CEPH-PROMETHEUS-ALERT-MIB.txt) to support sending Prometheus alerts through to an SNMP management platform. The translation from Prometheus alert to SNMP trap requires the Prometheus alert to contain an OID that maps to a definition within the MIB. When making changes to the Prometheus alert rules file, developers should include any necessary changes to the MIB.

Building from Jsonnet

  • Install jsonnet (at least v0.18.0)
    • By installing the package jsonnet in most of the distro and golang-github-google-jsonnet in fedora
  • Install jsonnet-bundler

To rebuild all the generated files, you can run tox -egrafonnet-fix.

The jsonnet code located in this directory depends on some Jsonnet third party libraries. To update those libraries you can run jb update and then update the generated files using tox -egrafonnet-fix.