ceph/doc/mgr/devicehealth.rst

Devicehealth plugin
===================

The *devicehealth* plugin includes code to manage physical devices
that back Ceph daemons (e.g., OSDs).  This includes scraping health
metrics (e.g., SMART) and responding to health metrics by migrating
data away from failing devices.

Enabling
--------

The *devicehealth* module is enabled with::

  ceph mgr module enable devicehealth

(It is enabled by default.)

Scraping
--------

Health metrics can be scraped from all devices with::

  ceph device scrape-health-metrics

A single device can be scraped with::

  ceph device scrape-health-metrics <device-id>

Or a single daemon's devices can be scraped with::

  ceph device scrape-daemon-health-metrics <who>


Health monitoring
-----------------

By default, the devicehealth module wakes up periodically and checks
the health of all devices in the system.  This will raise health
alerts if devices are expected to fail soon.  This can be disabled by
turning off the ``mgr/devicehealth/enable_monitoring`` option.

The ``mgr/devicehealth/warn_threshold`` controls how soon an expected
device failure must be before we generate a health warning.

If the ``mgr/devicehealth/self_heal`` option is enabled (it is by
default), then for devices that are expected to fail soon the module
will automatically migrate data away from them by marking the devices
"out".

The ``mgr/devicehealth/mark_out_threshold`` controls how soon an
expected device failure must be before we automatically mark an osd
"out".
doc/mgr/devicehealth: document devicehealth module Signed-off-by: Sage Weil <sage@redhat.com> 2018-07-31 14:38:39 +00:00			`Devicehealth plugin`
			`===================`

			`The devicehealth plugin includes code to manage physical devices`
			`that back Ceph daemons (e.g., OSDs). This includes scraping health`
			`metrics (e.g., SMART) and responding to health metrics by migrating`
			`data away from failing devices.`

			`Enabling`
			`--------`

			`The devicehealth module is enabled with::`

			`ceph mgr module enable devicehealth`

			`(It is enabled by default.)`

			`Scraping`
			`--------`

			`Health metrics can be scraped from all devices with::`

			`ceph device scrape-health-metrics`

			`A single device can be scraped with::`

			`ceph device scrape-health-metrics <device-id>`

			`Or a single daemon's devices can be scraped with::`

			`ceph device scrape-daemon-health-metrics <who>`


			`Health monitoring`
			`-----------------`

			`By default, the devicehealth module wakes up periodically and checks`
			`the health of all devices in the system. This will raise health`
			`alerts if devices are expected to fail soon. This can be disabled by`
			turning off the ``mgr/devicehealth/enable_monitoring`` option.

			The ``mgr/devicehealth/warn_threshold`` controls how soon an expected
			`device failure must be before we generate a health warning.`

			If the ``mgr/devicehealth/self_heal`` option is enabled (it is by
			`default), then for devices that are expected to fail soon the module`
			`will automatically migrate data away from them by marking the devices`
			`"out".`

			The ``mgr/devicehealth/mark_out_threshold`` controls how soon an
			`expected device failure must be before we automatically mark an osd`
			`"out".`