diff --git a/.github/labeler.yml b/.github/labeler.yml index 9bb1d618f96..17818b131c9 100644 --- a/.github/labeler.yml +++ b/.github/labeler.yml @@ -255,3 +255,14 @@ nfs: - doc/cephadm/nfs.rst - doc/radosgw/nfs.rst - doc/dev/vstart-ganesha.rst + +monitoring: + - doc/cephadm/monitoring.rst + - src/pybind/mgr/cephadm/services/monitoring.py + - src/pybind/mgr/cephadm/templates/services/alertmanager/** + - src/pybind/mgr/cephadm/templates/services/grafana/** + - src/pybind/mgr/cephadm/templates/services/prometheus/** + - src/pybind/mgr/dashboard/ci/check_grafana_dashboards.py + - src/pybind/mgr/prometheus/** + - monitoring/** + diff --git a/doc/cephadm/monitoring.rst b/doc/cephadm/monitoring.rst index 38f4b4bb45a..d1b84f9bfae 100644 --- a/doc/cephadm/monitoring.rst +++ b/doc/cephadm/monitoring.rst @@ -56,87 +56,32 @@ steps below: .. prompt:: bash # - ceph orch apply node-exporter '*' + ceph orch apply node-exporter #. Deploy alertmanager: .. prompt:: bash # - ceph orch apply alertmanager 1 + ceph orch apply alertmanager #. Deploy Prometheus. A single Prometheus instance is sufficient, but for high availablility (HA) you might want to deploy two: .. prompt:: bash # - ceph orch apply prometheus 1 + ceph orch apply prometheus or .. prompt:: bash # - ceph orch apply prometheus 2 + ceph orch apply prometheus --placement 'count:2' #. Deploy grafana: .. prompt:: bash # - ceph orch apply grafana 1 - -Manually setting the Grafana URL -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Cephadm automatically configures Prometheus, Grafana, and Alertmanager in -all cases except one. - -In a some setups, the Dashboard user's browser might not be able to access the -Grafana URL that is configured in Ceph Dashboard. This can happen when the -cluster and the accessing user are in different DNS zones. - -If this is the case, you can use a configuration option for Ceph Dashboard -to set the URL that the user's browser will use to access Grafana. This -value will never be altered by cephadm. To set this configuration option, -issue the following command: - - .. prompt:: bash $ - - ceph dashboard set-grafana-frontend-api-url - -It might take a minute or two for services to be deployed. After the -services have been deployed, you should see something like this when you issue the command ``ceph orch ls``: - -.. code-block:: console - - $ ceph orch ls - NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC - alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present - crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present - grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent - node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present - prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present - -Configuring SSL/TLS for Grafana -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -``cephadm`` deploys Grafana using the certificate defined in the ceph -key/value store. If no certificate is specified, ``cephadm`` generates a -self-signed certificate during the deployment of the Grafana service. - -A custom certificate can be configured using the following commands: - -.. prompt:: bash # - - ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem - ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem - -If you have already deployed Grafana, run ``reconfig`` on the service to -update its configuration: - -.. prompt:: bash # - - ceph orch reconfig grafana - -The ``reconfig`` command also sets the proper URL for Ceph Dashboard. + ceph orch apply grafana Networks and Ports ~~~~~~~~~~~~~~~~~~ @@ -291,22 +236,8 @@ Example # reconfig the prometheus service ceph orch reconfig prometheus -Disabling monitoring --------------------- - -To disable monitoring and remove the software that supports it, run the following commands: - -.. code-block:: console - - $ ceph orch rm grafana - $ ceph orch rm prometheus --force # this will delete metrics data collected so far - $ ceph orch rm node-exporter - $ ceph orch rm alertmanager - $ ceph mgr module disable prometheus - - -Deploying monitoring manually ------------------------------ +Deploying monitoring without cephadm +------------------------------------ If you have an existing prometheus monitoring infrastructure, or would like to manage it yourself, you need to configure it to integrate with your Ceph @@ -325,9 +256,112 @@ cluster. * To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`. -Enabling RBD-Image monitoring ---------------------------------- +Disabling monitoring +-------------------- + +To disable monitoring and remove the software that supports it, run the following commands: + +.. code-block:: console + + $ ceph orch rm grafana + $ ceph orch rm prometheus --force # this will delete metrics data collected so far + $ ceph orch rm node-exporter + $ ceph orch rm alertmanager + $ ceph mgr module disable prometheus + +Setting up RBD-Image monitoring +------------------------------- Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see :ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana and the metrics will not be visible in Prometheus. + +Setting up Grafana +------------------ + +Manually setting the Grafana URL +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Cephadm automatically configures Prometheus, Grafana, and Alertmanager in +all cases except one. + +In a some setups, the Dashboard user's browser might not be able to access the +Grafana URL that is configured in Ceph Dashboard. This can happen when the +cluster and the accessing user are in different DNS zones. + +If this is the case, you can use a configuration option for Ceph Dashboard +to set the URL that the user's browser will use to access Grafana. This +value will never be altered by cephadm. To set this configuration option, +issue the following command: + + .. prompt:: bash $ + + ceph dashboard set-grafana-frontend-api-url + +It might take a minute or two for services to be deployed. After the +services have been deployed, you should see something like this when you issue the command ``ceph orch ls``: + +.. code-block:: console + + $ ceph orch ls + NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC + alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present + crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present + grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent + node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present + prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present + +Configuring SSL/TLS for Grafana +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``cephadm`` deploys Grafana using the certificate defined in the ceph +key/value store. If no certificate is specified, ``cephadm`` generates a +self-signed certificate during the deployment of the Grafana service. + +A custom certificate can be configured using the following commands: + +.. prompt:: bash # + + ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem + ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem + +If you have already deployed Grafana, run ``reconfig`` on the service to +update its configuration: + +.. prompt:: bash # + + ceph orch reconfig grafana + +The ``reconfig`` command also sets the proper URL for Ceph Dashboard. + +Setting up Alertmanager +----------------------- + +Adding Alertmanager webhooks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To add new webhooks to the Alertmanager configuration, add additional +webhook urls like so: + +.. code-block:: yaml + + service_type: alertmanager + spec: + user_data: + default_webhook_urls: + - "https://foo" + - "https://bar" + +Where ``default_webhook_urls`` is a list of additional URLs that are +added to the default receivers' ```` configuration. + +Run ``reconfig`` on the service to update its configuration: + +.. prompt:: bash # + + ceph orch reconfig alertmanager + +Further Reading +--------------- + +* :ref:`mgr-prometheus`