ipmi_exporter/README.md
2018-05-24 16:28:06 +02:00

7.7 KiB

Prometheus IPMI Exporter

This is an IPMI over LAN exporter for Prometheus.

An instance running on one host can be used to monitor a large number of IPMI interfaces by passing the target parameter to a scrape. It uses tools from the FreeIPMI suite for the actual IPMI communication.

Installation

You need a Go development environment. Then, run the following to get the source code and build and install the binary:

go get github.com/soundcloud/ipmi_exporter

Running

A minimal invocation looks like this:

./ipmi_exporter

Supported parameters include:

  • web.listen-address: the address/port to listen on (default: ":9290")
  • config.file: path to the configuration file (default: ipmi.yml)
  • path: path to the FreeIPMI executables (default: rely on $PATH)

Make sure you have at least the following tools from the FreeIPMI suite installed:

  • ipmimonitoring
  • ipmi-dcmi
  • bmc-info

Configuration

The general configuration pattern is similar to that of the blackbox exporter, i.e. Prometheus scrapes a small number (possibly one) of IPMI exporters with a target URL parameter to tell the exporter which IPMI device it should use to retrieve the IPMI metrics. We have taken this approach as IPMI devices often provide useful information even while the supervised host is turned off. If you are running the exporter on a separate host anyway, it makes more sense to have only a few of them, each probing many (possibly thousands of) IPMI devices, rather than one exporter per IPMI device.

IPMI exporter

The exporter requires a configuration file called ipmi.yml (can be overridden, see above). It must contain user names and passwords for IPMI access to all targets. It supports a “default” target, which is used as fallback if the target is not explicitly listed in the file.

The configuration file also supports a blacklist of sensors, useful in case of OEM-specific sensors that FreeIPMI cannot deal with properly or otherwise misbehaving sensors.

See the included ipmi.yml file for an example.

Prometheus

To add your IPMI targets to Prometheus, you can use any of the supported service discovery mechanism of your choice. The following example uses the file-based SD and should be easy to adjust to other scenarios.

Create a YAML file that contains a list of targets, e.g.:

---
- targets:
  - 10.1.2.23
  - 10.1.2.24
  - 10.1.2.25
  - 10.1.2.26
  - 10.1.2.27
  - 10.1.2.28
  - 10.1.2.29
  - 10.1.2.30
  labels:
    job: ipmi_exporter

This file needs to be stored on the Prometheus server host. Assuming that this file is called /srv/ipmi_exporter/targets.yml, and the IPMI exporter is running on a host that has the DNS name ipmi-exporter.internal.example.com, add the following to your Prometheus config:

- job_name: ipmi
  scrape_interval: 1m
  scrape_timeout: 30s
  metrics_path: /ipmi
  scheme: http
  file_sd_configs:
  - files:
    - /srv/ipmi_exporter/targets.yml
    refresh_interval: 5m
  relabel_configs:
  - source_labels: [__address__]
    separator: ;
    regex: (.*)(:80)?
    target_label: __param_target
    replacement: ${1}
    action: replace
  - source_labels: [__param_target]
    separator: ;
    regex: (.*)
    target_label: instance
    replacement: ${1}
    action: replace
  - separator: ;
    regex: .*
    target_label: __address__
    replacement: ipmi-exporter.internal.example.com:9198
    action: replace

For more information, e.g. how to use mechanisms other than a file to discover the list of hosts to scrape, please refer to the Prometheus documentation.

Exported data

Scrape meta data

There are two metrics providing data about the scrape itself:

  • ipmi_up is 1 if all data could successfully be retrieved from the remote host, 0 otherwise
  • ipmi_scrape_duration_seconds is the amount of time it took to retrieve the data

BMC info

For some basic information, there is a constant metric ipmi_bmc_info with value 1 and labels providing the firmware revision and manufacturer as returned from the BMC. Example:

ipmi_bmc_info{firmware_revision="2.52",manufacturer_id="Dell Inc. (674)"} 1

Power consumption

The metric ipmi_dcmi_power_consumption_current_watts can be used to monitor the live power consumption of the machine in Watts. If in doubt, this metric should be used over any of the sensor data (see below), even if their name might suggest that they measure the same thing. This metric has no labels.

Sensors

IPMI sensors in general have one or two distinct pieces of information that are of interest: a value and/or a state. The exporter always exports both, even if the value is NaN or the state non-sensical. This is so one can still always find the metrics to avoid ending up in a situation where one is looking for e.g. the value of a sensor that is in a critical state, but can't find it and assume this to be a problem.

The state of a sensor can be one of nominal, warning, critical, or N/A, reflected by the metric values 0, 1, 2, and NaN respectively. Think of this as a kind of severity.

For sensors with known semantics (i.e. units), corresponding specific metrics are exported. For everything else, generic metrics are exported.

Temperature sensors

Temperature sensors measure a temperature in degrees Celsius and their state usually reflects the temperature going above the vendor-recommended value. For each temperature sensor, two metrics are exported (state and value), using the sensor ID and the sensor name as labels. Example:

ipmi_temperature_celsius{id="18",name="Inlet Temp"} 24
ipmi_temperature_state{id="18",name="Inlet Temp"} 0

Fan speed sensors

Fan speed sensors measure fan speed in rotations per minute (RPM) and their state usually reflects the speed being to low, indicating the fan might be broken. For each fan speed sensor, two metrics are exported (state and value), using the sensor ID and the sensor name as labels. Example:

ipmi_fan_speed_rpm{id="12",name="Fan1A"} 4560
ipmi_fan_speed_state{id="12",name="Fan1A"} 0

Voltage sensors

Voltage sensors measure a voltage in Volts. For each voltage sensor, two metrics are exported (state and value), using the sensor ID and the sensor name as labels. Example:

ipmi_voltage_state{id="2416",name="12V"} 0
ipmi_voltage_volts{id="2416",name="12V"} 12

Current sensors

Current sensors measure a current in Amperes. For each current sensor, two metrics are exported (state and value), using the sensor ID and the sensor name as labels. Example:

ipmi_current_state{id="83",name="Current 1"} 0
ipmi_current_amperes{id="83",name="Current 1"} 0

Power sensors

Power sensors measure power in Watts. For each power sensor, two metrics are exported (state and value), using the sensor ID and the sensor name as labels. Example:

ipmi_power_state{id="90",name="Pwr Consumption"} 0
ipmi_power_watts{id="90",name="Pwr Consumption"} 70

Note that based on our observations, this may or may not be a reading reflecting the actual live power consumption. We recommend using the more explicit power consumption metrics for this.

Generic sensors

For all sensors that can not be classified, two generic metrics are exported, the state and the value. However, to provide a little more context, the sensor type is added as label (in addition to name and ID). Example:

ipmi_sensor_state{id="139",name="Power Cable",type="Cable/Interconnect"} 0
ipmi_sensor_value{id="139",name="Power Cable",type="Cable/Interconnect"} NaN