alertmanager/README.md

69 lines
2.8 KiB
Markdown
Raw Normal View History

2015-02-09 20:30:49 +00:00
# Alertmanager
2013-07-16 15:09:56 +00:00
2015-02-09 20:30:49 +00:00
Prometheus Alertmanager - **still experimental!**
2015-02-09 20:30:49 +00:00
Alertmanager receives alerts generated by Prometheus and takes care of the
following aspects:
* manual silencing of specific alerts
* inhibiting alerts based on alert dependencies
* aggregating alerts by labelset
* handling notification repeats
* sending alert notifications via external services (currently email,
[PagerDuty](http://www.pagerduty.com/),
[HipChat](http://www.hipchat.com/), or
[Pushover](https://www.pushover.net/))
2015-02-09 20:30:49 +00:00
See [config/fixtures/sample.conf.input](config/fixtures/sample.conf.input) for
an example config. The full configuration schema including a documentation for
all possible options can be found in
[config/config.proto](config/config.proto). Alertmanager automatically reloads
the configuration when it changes, so restarts are not required for
configuration updates.
## Building and running
make
./alertmanager -logtostderr -config.file=/path/to/alertmanager.conf
## Configuring Prometheus to send alerts
To make Prometheus send alerts to your Alertmanager, set the `alertmanager.url`
command-line flag on the server:
./prometheus -alertmanager.url=http://<alertmanager-host>:<port> <...other flags...>
Prometheus only pushes firing alerts to Alertmanager. Alertmanager expects to
receive regular pushes of firing alerts from Prometheus. Alerts which are not
refreshed for a period of `-alerts.min-refresh-period` (5 minutes by
default) are expired.
Alertmanager only shows alerts which are currently firing and pushed to
Alertmanager.
## Running tests
2015-04-17 22:21:54 +00:00
[![Build Status](https://travis-ci.org/prometheus/alertmanager.svg)](https://travis-ci.org/prometheus/alertmanager)
2015-02-09 20:30:49 +00:00
make test
## Caveats and roadmap
Alertmanager is still in an experimental state. Some of the known caveats which
are going to be addressed in the future:
* Alertmanager is run as a single instance and does not provide high
availability yet. We plan on clustering multiple replicated Alertmanager
instances to ensure reliability in the future.
* Relatedly, silence information is currently only persisted locally in a file
and lost if you lose the machine your Alertmanager is running on.
* Alert aggregation needs to become more flexible. Currently alerts are
aggregated based on their full labelsets. In the future, we want to allow
grouping alerts based on a subset thereof (for example, grouping all alerts
with one alert name and from the same job together).
* For alert dependencies, we want to support time delays: if alert A inhibits
alert B due to a dependency and B begins firing before A, wait for a
configurable amount of time for A to start firing as well before sending
notifications for B. This is not yet supported.
* Alertmanager has not been tested or optimized for high alert loads yet.