Extend README by config example
This commit is contained in:
parent
3d216d887b
commit
fc2245bb9f
140
README.md
140
README.md
|
@ -1,15 +1,17 @@
|
|||
# Alertmanager
|
||||
|
||||
This is the development version of the Alertmanager. It is a rewrite and
|
||||
is only compatible to the present version 0.0.4 in terms of the API endpoint
|
||||
used by Prometheus to push new alerts.
|
||||
is incompatible to the present version 0.0.4. The only backport was the API endpoint used by Prometheus to push new alerts.
|
||||
|
||||
## Installation
|
||||
|
||||
The current version has to be run from the repository folder as UI assets and notification templates are not yet statically compiled into the binary.
|
||||
|
||||
You can either `go get` it:
|
||||
|
||||
```
|
||||
$ GO15VENDOREXPERIMENT=1 go get github.com/prometheus/alertmanager
|
||||
# cd $GOPATH/src/github.com/prometheus/alertmanager
|
||||
$ alertmanager -config.file=<your_file>
|
||||
```
|
||||
|
||||
|
@ -28,12 +30,16 @@ $ ./alertmanager -config.file=<your_file>
|
|||
|
||||
This version was written from scratch. Core features enabled by this is are more advanced alert routing configurations and grouping/batching of alerts. Thus, squashing expression results through aggregation in alerting rules is no longer required to avoid noisyness.
|
||||
|
||||
The concepts of alert routing were outlined in [this document](https://docs.google.com/document/d/1-4jefGkFo71jlaLo4lHz40ZBoCv9ycBBBbjzbXifGyY/edit?usp=sharing).
|
||||
|
||||
The version implements full persistence of alerts, silences, and notification state. On restart it picks up right where it left off.
|
||||
|
||||
### Known issues
|
||||
|
||||
This development version still has an extensive list of improvements and changes. This is an incomplete list of things that are still missing or need to be improved.
|
||||
|
||||
This will happen based on priority and demand. Feel free to ping fabxc about it
|
||||
|
||||
* On deleting silences it may take up to one `group_wait` cycle for a notification of a previously silenced alert to be sent.
|
||||
* Limiting inhibition rules to routing subtrees to avoid accidental interference.
|
||||
* Show silencing inhibition of alerts in the UI
|
||||
|
@ -43,6 +49,136 @@ This development version still has an extensive list of improvements and changes
|
|||
* Definition of a minimum data set provided to notification templates
|
||||
* Best practices around notification templating
|
||||
* Various common command line flags like `path-prefix`
|
||||
* Compiling templates and UI assets into the binary
|
||||
* Allow constraining displayed alerts in UI
|
||||
|
||||
## Example
|
||||
|
||||
This is an example configuration that should cover most relevant aspects of the new YAML configuration format. Authoritative source for now is the [code](https://github.com/prometheus/alertmanager/tree/dev/config).
|
||||
|
||||
```
|
||||
global:
|
||||
# The smarthost and SMTP sender used for mail notifications.
|
||||
smarthost: 'localhost:25'
|
||||
smtp_sender: 'alertmanager@example.org'
|
||||
|
||||
# The directory from which notification templates are read.
|
||||
templates:
|
||||
- 'template/*.tmpl'
|
||||
|
||||
# The root route on which each incoming alert enters.
|
||||
route:
|
||||
# The labels by which incoming alerts are grouped together. For example,
|
||||
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
|
||||
# be batched into a single group.
|
||||
group_by: ['alertname', 'cluster']
|
||||
|
||||
# When a new group of alerts is created by an incoming alert, wait at
|
||||
# least 'group_wait' to send the initial notification.
|
||||
# This way ensures that you get multiple alerts for the same group that start
|
||||
# firing shortly after another are batched together on the first
|
||||
# notification.
|
||||
group_wait: 30s
|
||||
|
||||
# When the first notification was sent, wait 'group_interval' to send a betch
|
||||
# of new alerts that started firing for that group.
|
||||
group_interval: 5m
|
||||
|
||||
# If an alert has successfully been sent, wait 'repeat_interval' to
|
||||
# resend them.
|
||||
repeat_interval: 3h
|
||||
|
||||
# If 'continue' is false, the first sub-route that matches this alert will
|
||||
# terminate the search and the alert will be inserted at that routing node.
|
||||
# If true, the alert is inserted to sibling nodes as well if there is a
|
||||
# match.
|
||||
# This allows to do first-match semantics (=false) in smaller scopes (e.g. team-level),
|
||||
# while avoiding accidental shadowing (=true) at alerts at larger scopes (e.g. company-level)
|
||||
continue: true
|
||||
|
||||
# All the above attributes are inherited by all child routes and can
|
||||
# overwritten on each.
|
||||
|
||||
# The child route trees.
|
||||
routes:
|
||||
# This routes performs a regular expression match on alert labels to
|
||||
# catch alerts that are related to a list of services.
|
||||
- match_re:
|
||||
service: ^(foo1|foo2|baz)$
|
||||
send_to: team-X-mails
|
||||
|
||||
# The service has a sub-route for critical alerts, any alerts
|
||||
# that do not match, i.e. severity != critical, fall-back to the
|
||||
# parent node and are sent to 'team-X-mails'
|
||||
routes:
|
||||
- match:
|
||||
severity: critical
|
||||
send_to: team-X-pager
|
||||
|
||||
- match:
|
||||
service: files
|
||||
send_to: team-Y-mails
|
||||
|
||||
routes:
|
||||
- match:
|
||||
severity: critical
|
||||
send_to: team-Y-pager
|
||||
|
||||
# This route handles all alerts coming from a database service. If there's
|
||||
# not team to handle it, it defaults to the DB team.
|
||||
- match:
|
||||
service: database
|
||||
|
||||
send_to: team-DB-pager
|
||||
# Also group alerts by affected database.
|
||||
group_by: [alertname, cluster, database]
|
||||
continue: false
|
||||
|
||||
routes:
|
||||
- match:
|
||||
owner: team-X
|
||||
send_to: team-X-pager
|
||||
|
||||
- match:
|
||||
owner: team-Y
|
||||
send_to: team-Y-pager
|
||||
|
||||
|
||||
# Inhibition rules allow to mute a set of alerts given that another alert is
|
||||
# firing.
|
||||
# We use this to mute any warning-level notifications if the same alert is
|
||||
# already critical.
|
||||
inhibit_rules:
|
||||
- source_match:
|
||||
severity: 'critical'
|
||||
target_match:
|
||||
severity: 'warning'
|
||||
# Apply inhibition if the alertname is the same.
|
||||
equal: ['alertname']
|
||||
|
||||
|
||||
notification_configs:
|
||||
- name: 'team-X-mails'
|
||||
email_configs:
|
||||
- email: 'team-X+alerts@example.org'
|
||||
|
||||
- name: 'team-X-pager'
|
||||
email_configs:
|
||||
- email: 'team-X+alerts-critical@example.org'
|
||||
pagerduty_configs:
|
||||
- service_key: <team-X-key>
|
||||
|
||||
- name: 'team-Y-mails'
|
||||
email_configs:
|
||||
- email: 'team-Y+alerts@example.org'
|
||||
|
||||
- name: 'team-Y-pager'
|
||||
pagerduty_configs:
|
||||
- service_key: <team-Y-key>
|
||||
|
||||
- name: 'team-DB-pager'
|
||||
pagerduty_configs:
|
||||
- service_key: <team-DB-key>
|
||||
```
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue