ipmi_exporter/README.md

227 lines
7.7 KiB
Markdown
Raw Normal View History

2017-07-25 14:17:57 +00:00
Prometheus IPMI Exporter
========================
This is an IPMI over LAN exporter for [Prometheus](https://prometheus.io).
An instance running on one host can be used to monitor a large number of IPMI
interfaces by passing the `target` parameter to a scrape. It uses tools from
the [FreeIPMI](https://www.thomas-krenn.com/en/wiki/FreeIPMI_ipmimonitoring)
suite for the actual IPMI communication.
## Installation
You need a Go development environment. Then, run the following to get the
source code and build and install the binary:
go get github.com/soundcloud/ipmi_exporter
## Running
A minimal invocation looks like this:
./ipmi_exporter
Supported parameters include:
- `web.listen-address`: the address/port to listen on (default: `":9290"`)
- `config.file`: path to the configuration file (default: `ipmi.yml`)
- `path`: path to the FreeIPMI executables (default: rely on `$PATH`)
Make sure you have at least the following tools from the
[FreeIPMI](https://www.thomas-krenn.com/en/wiki/FreeIPMI_ipmimonitoring) suite
installed:
- `ipmimonitoring`
- `ipmi-dcmi`
- `bmc-info`
## Configuration
The general configuration pattern is similar to that of the [blackbox
exporter](https://github.com/prometheus/blackbox_exporter), i.e. Prometheus
scrapes a small number (possibly one) of IPMI exporters with a `target` URL
parameter to tell the exporter which IPMI device it should use to retrieve the
IPMI metrics. We have taken this approach as IPMI devices often provide useful
information even while the supervised host is turned off. If you are running
the exporter on a separate host anyway, it makes more sense to have only a few
of them, each probing many (possibly thousands of) IPMI devices, rather than
one exporter per IPMI device.
### IPMI exporter
The exporter requires a configuration file called `ipmi.yml` (can be
overridden, see above). It must contain user names and passwords for IPMI
access to all targets. It supports a “default” target, which is used as
fallback if the target is not explicitly listed in the file.
The configuration file also supports a blacklist of sensors, useful in case of
OEM-specific sensors that FreeIPMI cannot deal with properly or otherwise
misbehaving sensors.
See the included `ipmi.yml` file for an example.
### Prometheus
To add your IPMI targets to Prometheus, you can use any of the supported
service discovery mechanism of your choice. The following example uses the
file-based SD and should be easy to adjust to other scenarios.
Create a YAML file that contains a list of targets, e.g.:
```
---
- targets:
- 10.1.2.23
- 10.1.2.24
- 10.1.2.25
- 10.1.2.26
- 10.1.2.27
- 10.1.2.28
- 10.1.2.29
- 10.1.2.30
labels:
job: ipmi_exporter
```
This file needs to be stored on the Prometheus server host. Assuming that this
file is called `/srv/ipmi_exporter/targets.yml`, and the IPMI exporter is
running on a host that has the DNS name `ipmi-exporter.internal.example.com`,
add the following to your Prometheus config:
```
- job_name: ipmi
scrape_interval: 1m
scrape_timeout: 30s
metrics_path: /ipmi
scheme: http
file_sd_configs:
- files:
- /srv/ipmi_exporter/targets.yml
refresh_interval: 5m
relabel_configs:
- source_labels: [__address__]
separator: ;
regex: (.*)(:80)?
target_label: __param_target
replacement: ${1}
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
target_label: instance
replacement: ${1}
action: replace
- separator: ;
regex: .*
target_label: __address__
replacement: ipmi-exporter.internal.example.com:9198
action: replace
```
For more information, e.g. how to use mechanisms other than a file to discover
the list of hosts to scrape, please refer to the [Prometheus
documentation](https://prometheus.io/docs).
## Exported data
### Scrape meta data
There are two metrics providing data about the scrape itself:
- `ipmi_up` is `1` if all data could successfully be retrieved from the remote
host, `0` otherwise
- `ipmi_scrape_duration_seconds` is the amount of time it took to retrieve the
data
### BMC info
For some basic information, there is a constant metric `ipmi_bmc_info` with
value `1` and labels providing the firmware revision and manufacturer as
returned from the BMC. Example:
ipmi_bmc_info{firmware_revision="2.52",manufacturer_id="Dell Inc. (674)"} 1
### Power consumption
The metric `ipmi_dcmi_power_consumption_current_watts` can be used to monitor
the live power consumption of the machine in Watts. If in doubt, this metric
should be used over any of the sensor data (see below), even if their name
might suggest that they measure the same thing. This metric has no labels.
### Sensors
IPMI sensors in general have one or two distinct pieces of information that are
of interest: a value and/or a state. The exporter always exports both, even if
the value is NaN or the state non-sensical. This is so one can still always
find the metrics to avoid ending up in a situation where one is looking for
e.g. the value of a sensor that is in a critical state, but can't find it and
assume this to be a problem.
The state of a sensor can be one of _nominal_, _warning_, _critical_, or _N/A_,
reflected by the metric values `0`, `1`, `2`, and `NaN` respectively. Think of
this as a kind of severity.
For sensors with known semantics (i.e. units), corresponding specific metrics
are exported. For everything else, generic metrics are exported.
#### Temperature sensors
Temperature sensors measure a temperature in degrees Celsius and their state
usually reflects the temperature going above the vendor-recommended value. For
each temperature sensor, two metrics are exported (state and value), using the
sensor ID and the sensor name as labels. Example:
ipmi_temperature_celsius{id="18",name="Inlet Temp"} 24
ipmi_temperature_state{id="18",name="Inlet Temp"} 0
#### Fan speed sensors
Fan speed sensors measure fan speed in rotations per minute (RPM) and their
state usually reflects the speed being to low, indicating the fan might be
broken. For each fan speed sensor, two metrics are exported (state and value),
using the sensor ID and the sensor name as labels. Example:
ipmi_fan_speed_rpm{id="12",name="Fan1A"} 4560
ipmi_fan_speed_state{id="12",name="Fan1A"} 0
#### Voltage sensors
Voltage sensors measure a voltage in Volts. For each voltage sensor, two
metrics are exported (state and value), using the sensor ID and the sensor name
as labels. Example:
ipmi_voltage_state{id="2416",name="12V"} 0
ipmi_voltage_volts{id="2416",name="12V"} 12
#### Current sensors
Current sensors measure a current in Amperes. For each current sensor, two
metrics are exported (state and value), using the sensor ID and the sensor name
as labels. Example:
ipmi_current_state{id="83",name="Current 1"} 0
ipmi_current_amperes{id="83",name="Current 1"} 0
#### Power sensors
Power sensors measure power in Watts. For each power sensor, two metrics are
exported (state and value), using the sensor ID and the sensor name as labels.
Example:
ipmi_power_state{id="90",name="Pwr Consumption"} 0
ipmi_power_watts{id="90",name="Pwr Consumption"} 70
Note that based on our observations, this may or may not be a reading
reflecting the actual live power consumption. We recommend using the more
explicit [power consumption metrics](#power_consumption) for this.
#### Generic sensors
For all sensors that can not be classified, two generic metrics are exported,
the state and the value. However, to provide a little more context, the sensor
type is added as label (in addition to name and ID). Example:
ipmi_sensor_state{id="139",name="Power Cable",type="Cable/Interconnect"} 0
ipmi_sensor_value{id="139",name="Power Cable",type="Cable/Interconnect"} NaN