2020-05-06 14:08:16 +00:00
|
|
|
.. _mgr-cephadm-monitoring:
|
|
|
|
|
2021-02-18 14:48:27 +00:00
|
|
|
Monitoring Services
|
|
|
|
===================
|
2020-03-03 16:32:41 +00:00
|
|
|
|
2020-05-05 12:16:36 +00:00
|
|
|
Ceph Dashboard uses `Prometheus <https://prometheus.io/>`_, `Grafana
|
|
|
|
<https://grafana.com/>`_, and related tools to store and visualize detailed
|
|
|
|
metrics on cluster utilization and performance. Ceph users have three options:
|
2020-03-03 16:32:41 +00:00
|
|
|
|
2020-03-12 23:33:53 +00:00
|
|
|
#. Have cephadm deploy and configure these services. This is the default
|
|
|
|
when bootstrapping a new cluster unless the ``--skip-monitoring-stack``
|
|
|
|
option is used.
|
2020-03-03 16:32:41 +00:00
|
|
|
#. Deploy and configure these services manually. This is recommended for users
|
|
|
|
with existing prometheus services in their environment (and in cases where
|
|
|
|
Ceph is running in Kubernetes with Rook).
|
|
|
|
#. Skip the monitoring stack completely. Some Ceph dashboard graphs will
|
|
|
|
not be available.
|
2020-10-16 12:19:53 +00:00
|
|
|
|
2020-05-05 12:16:36 +00:00
|
|
|
The monitoring stack consists of `Prometheus <https://prometheus.io/>`_,
|
|
|
|
Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter
|
|
|
|
<https://prometheus.io/docs/guides/node-exporter/>`_), `Prometheus Alert
|
|
|
|
Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana
|
|
|
|
<https://grafana.com/>`_.
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
Prometheus' security model presumes that untrusted users have access to the
|
|
|
|
Prometheus HTTP endpoint and logs. Untrusted users have access to all the
|
|
|
|
(meta)data Prometheus collects that is contained in the database, plus a
|
|
|
|
variety of operational and debugging information.
|
|
|
|
|
|
|
|
However, Prometheus' HTTP API is limited to read-only operations.
|
|
|
|
Configurations can *not* be changed using the API and secrets are not
|
|
|
|
exposed. Moreover, Prometheus has some built-in measures to mitigate the
|
|
|
|
impact of denial of service attacks.
|
|
|
|
|
|
|
|
Please see `Prometheus' Security model
|
|
|
|
<https://prometheus.io/docs/operating/security/>` for more detailed
|
|
|
|
information.
|
2020-03-03 16:32:41 +00:00
|
|
|
|
2020-05-06 14:08:16 +00:00
|
|
|
Deploying monitoring with cephadm
|
|
|
|
---------------------------------
|
|
|
|
|
2020-03-12 23:33:53 +00:00
|
|
|
By default, bootstrap will deploy a basic monitoring stack. If you
|
|
|
|
did not do this (by passing ``--skip-monitoring-stack``, or if you
|
|
|
|
converted an existing cluster to cephadm management, you can set up
|
|
|
|
monitoring by following the steps below.
|
2020-03-03 16:32:41 +00:00
|
|
|
|
2020-09-11 23:04:14 +00:00
|
|
|
#. Enable the prometheus module in the ceph-mgr daemon. This exposes the internal Ceph metrics so that prometheus can scrape them.
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-03-03 16:32:41 +00:00
|
|
|
|
|
|
|
ceph mgr module enable prometheus
|
|
|
|
|
2020-09-11 23:04:14 +00:00
|
|
|
#. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization.
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-03-03 16:32:41 +00:00
|
|
|
|
2020-03-14 14:50:58 +00:00
|
|
|
ceph orch apply node-exporter '*'
|
2020-03-03 16:32:41 +00:00
|
|
|
|
2020-09-11 23:04:14 +00:00
|
|
|
#. Deploy alertmanager
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-03-03 16:32:41 +00:00
|
|
|
|
|
|
|
ceph orch apply alertmanager 1
|
|
|
|
|
|
|
|
#. Deploy prometheus. A single prometheus instance is sufficient, but
|
2020-09-11 23:04:14 +00:00
|
|
|
for HA you may want to deploy two.
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-03-03 16:32:41 +00:00
|
|
|
|
|
|
|
ceph orch apply prometheus 1 # or 2
|
|
|
|
|
2020-09-11 23:04:14 +00:00
|
|
|
#. Deploy grafana
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-03-03 16:32:41 +00:00
|
|
|
|
|
|
|
ceph orch apply grafana 1
|
|
|
|
|
2020-11-03 12:47:23 +00:00
|
|
|
Cephadm takes care of the configuration of Prometheus, Grafana, and Alertmanager
|
|
|
|
automatically.
|
|
|
|
|
|
|
|
However, there is one exception to this rule. In a some setups, the Dashboard
|
|
|
|
user's browser might not be able to access the Grafana URL configured in Ceph
|
|
|
|
Dashboard. One such scenario is when the cluster and the accessing user are each
|
|
|
|
in a different DNS zone.
|
|
|
|
|
|
|
|
For this case, there is an extra configuration option for Ceph Dashboard, which
|
|
|
|
can be used to configure the URL for accessing Grafana by the user's browser.
|
|
|
|
This value will never be altered by cephadm. To set this configuration option,
|
|
|
|
issue the following command::
|
|
|
|
|
|
|
|
$ ceph dashboard set-grafana-frontend-api-url <grafana-server-api>
|
2020-03-03 16:32:41 +00:00
|
|
|
|
|
|
|
It may take a minute or two for services to be deployed. Once
|
2020-09-11 23:04:14 +00:00
|
|
|
completed, you should see something like this from ``ceph orch ls``
|
|
|
|
|
|
|
|
.. code-block:: console
|
2020-03-03 16:32:41 +00:00
|
|
|
|
|
|
|
$ ceph orch ls
|
|
|
|
NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC
|
|
|
|
alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present
|
|
|
|
crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present
|
|
|
|
grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent
|
|
|
|
node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present
|
|
|
|
prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present
|
|
|
|
|
2020-10-16 12:19:53 +00:00
|
|
|
Configuring SSL/TLS for Grafana
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
2021-05-10 23:19:10 +00:00
|
|
|
``cephadm`` deploys Grafana using the certificate defined in the ceph
|
|
|
|
key/value store. If no certificate is specified, ``cephadm`` generates a
|
|
|
|
self-signed certificate during the deployment of the Grafana service.
|
2020-10-16 12:19:53 +00:00
|
|
|
|
2021-05-10 23:19:10 +00:00
|
|
|
A custom certificate can be configured using the following commands:
|
2020-10-16 12:19:53 +00:00
|
|
|
|
2021-05-10 23:19:10 +00:00
|
|
|
.. prompt:: bash #
|
2020-10-16 12:19:53 +00:00
|
|
|
|
|
|
|
ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem
|
|
|
|
ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem
|
|
|
|
|
2021-05-10 23:19:10 +00:00
|
|
|
If you have already deployed Grafana, run ``reconfig`` on the service to
|
|
|
|
update its configuration:
|
2020-10-16 12:19:53 +00:00
|
|
|
|
2021-05-10 23:19:10 +00:00
|
|
|
.. prompt:: bash #
|
2020-10-16 12:19:53 +00:00
|
|
|
|
2021-03-10 04:06:20 +00:00
|
|
|
ceph orch reconfig grafana
|
2020-10-16 12:19:53 +00:00
|
|
|
|
2021-05-10 23:19:10 +00:00
|
|
|
The ``reconfig`` command also sets the proper URL for Ceph Dashboard.
|
2020-10-16 12:19:53 +00:00
|
|
|
|
2020-05-25 13:43:38 +00:00
|
|
|
Using custom images
|
|
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
It is possible to install or upgrade monitoring components based on other
|
|
|
|
images. To do so, the name of the image to be used needs to be stored in the
|
|
|
|
configuration first. The following configuration options are available.
|
|
|
|
|
|
|
|
- ``container_image_prometheus``
|
|
|
|
- ``container_image_grafana``
|
|
|
|
- ``container_image_alertmanager``
|
|
|
|
- ``container_image_node_exporter``
|
|
|
|
|
2020-09-11 23:04:14 +00:00
|
|
|
Custom images can be set with the ``ceph config`` command
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-05-25 13:43:38 +00:00
|
|
|
|
|
|
|
ceph config set mgr mgr/cephadm/<option_name> <value>
|
|
|
|
|
2020-09-11 23:04:14 +00:00
|
|
|
For example
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-05-25 13:43:38 +00:00
|
|
|
|
|
|
|
ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
By setting a custom image, the default value will be overridden (but not
|
|
|
|
overwritten). The default value changes when updates become available.
|
|
|
|
By setting a custom image, you will not be able to update the component
|
|
|
|
you have set the custom image for automatically. You will need to
|
|
|
|
manually update the configuration (image name and tag) to be able to
|
|
|
|
install updates.
|
2020-10-16 12:19:53 +00:00
|
|
|
|
2020-05-25 13:43:38 +00:00
|
|
|
If you choose to go with the recommendations instead, you can reset the
|
|
|
|
custom image you have set before. After that, the default value will be
|
2020-09-11 23:04:14 +00:00
|
|
|
used again. Use ``ceph config rm`` to reset the configuration option
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-05-25 13:43:38 +00:00
|
|
|
|
|
|
|
ceph config rm mgr mgr/cephadm/<option_name>
|
|
|
|
|
2020-09-11 23:04:14 +00:00
|
|
|
For example
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-05-25 13:43:38 +00:00
|
|
|
|
|
|
|
ceph config rm mgr mgr/cephadm/container_image_prometheus
|
|
|
|
|
2020-09-25 13:39:32 +00:00
|
|
|
Using custom configuration files
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
By overriding cephadm templates, it is possible to completely customize the
|
|
|
|
configuration files for monitoring services.
|
|
|
|
|
|
|
|
Internally, cephadm already uses `Jinja2
|
|
|
|
<https://jinja.palletsprojects.com/en/2.11.x/>`_ templates to generate the
|
|
|
|
configuration files for all monitoring components. To be able to customize the
|
|
|
|
configuration of Prometheus, Grafana or the Alertmanager it is possible to store
|
|
|
|
a Jinja2 template for each service that will be used for configuration
|
|
|
|
generation instead. This template will be evaluated every time a service of that
|
|
|
|
kind is deployed or reconfigured. That way, the custom configuration is
|
|
|
|
preserved and automatically applied on future deployments of these services.
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
The configuration of the custom template is also preserved when the default
|
|
|
|
configuration of cephadm changes. If the updated configuration is to be used,
|
|
|
|
the custom template needs to be migrated *manually*.
|
|
|
|
|
|
|
|
Option names
|
|
|
|
""""""""""""
|
|
|
|
|
|
|
|
The following templates for files that will be generated by cephadm can be
|
|
|
|
overridden. These are the names to be used when storing with ``ceph config-key
|
|
|
|
set``:
|
|
|
|
|
2021-03-10 04:06:12 +00:00
|
|
|
- ``services/alertmanager/alertmanager.yml``
|
|
|
|
- ``services/grafana/ceph-dashboard.yml``
|
|
|
|
- ``services/grafana/grafana.ini``
|
|
|
|
- ``services/prometheus/prometheus.yml``
|
2020-09-25 13:39:32 +00:00
|
|
|
|
|
|
|
You can look up the file templates that are currently used by cephadm in
|
|
|
|
``src/pybind/mgr/cephadm/templates``:
|
|
|
|
|
|
|
|
- ``services/alertmanager/alertmanager.yml.j2``
|
|
|
|
- ``services/grafana/ceph-dashboard.yml.j2``
|
|
|
|
- ``services/grafana/grafana.ini.j2``
|
|
|
|
- ``services/prometheus/prometheus.yml.j2``
|
|
|
|
|
|
|
|
Usage
|
|
|
|
"""""
|
|
|
|
|
|
|
|
The following command applies a single line value:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
ceph config-key set mgr/cephadm/<option_name> <value>
|
|
|
|
|
|
|
|
To set contents of files as template use the ``-i`` argument:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
ceph config-key set mgr/cephadm/<option_name> -i $PWD/<filename>
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
When using files as input to ``config-key`` an absolute path to the file must
|
|
|
|
be used.
|
|
|
|
|
2021-03-10 04:06:27 +00:00
|
|
|
|
|
|
|
Then the configuration file for the service needs to be recreated.
|
2021-03-10 04:06:20 +00:00
|
|
|
This is done using `reconfig`. For more details see the following example.
|
2020-09-25 13:39:32 +00:00
|
|
|
|
|
|
|
Example
|
|
|
|
"""""""
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
# set the contents of ./prometheus.yml.j2 as template
|
2021-03-10 04:06:12 +00:00
|
|
|
ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml \
|
2020-09-25 13:39:32 +00:00
|
|
|
-i $PWD/prometheus.yml.j2
|
|
|
|
|
2021-03-10 04:06:20 +00:00
|
|
|
# reconfig the prometheus service
|
|
|
|
ceph orch reconfig prometheus
|
2020-09-25 13:39:32 +00:00
|
|
|
|
2020-03-12 23:33:53 +00:00
|
|
|
Disabling monitoring
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
If you have deployed monitoring and would like to remove it, you can do
|
2020-09-11 23:04:14 +00:00
|
|
|
so with
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-03-12 23:33:53 +00:00
|
|
|
|
|
|
|
ceph orch rm grafana
|
|
|
|
ceph orch rm prometheus --force # this will delete metrics data collected so far
|
|
|
|
ceph orch rm node-exporter
|
|
|
|
ceph orch rm alertmanager
|
|
|
|
ceph mgr module disable prometheus
|
|
|
|
|
2020-03-03 16:32:41 +00:00
|
|
|
|
|
|
|
Deploying monitoring manually
|
|
|
|
-----------------------------
|
|
|
|
|
|
|
|
If you have an existing prometheus monitoring infrastructure, or would like
|
|
|
|
to manage it yourself, you need to configure it to integrate with your Ceph
|
|
|
|
cluster.
|
|
|
|
|
2020-09-11 23:04:14 +00:00
|
|
|
* Enable the prometheus module in the ceph-mgr daemon
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2020-03-03 16:32:41 +00:00
|
|
|
|
|
|
|
ceph mgr module enable prometheus
|
|
|
|
|
|
|
|
By default, ceph-mgr presents prometheus metrics on port 9283 on each host
|
|
|
|
running a ceph-mgr daemon. Configure prometheus to scrape these.
|
|
|
|
|
|
|
|
* To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`.
|
|
|
|
|
|
|
|
* To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`.
|
2020-03-20 08:19:01 +00:00
|
|
|
|
|
|
|
Enabling RBD-Image monitoring
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see
|
|
|
|
:ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana
|
|
|
|
and the metrics will not be visible in Prometheus.
|