diff --git a/README.md b/README.md index bc4ac193..3e5f042a 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,17 @@ # Alertmanager This is the development version of the Alertmanager. It is a rewrite and -is only compatible to the present version 0.0.4 in terms of the API endpoint -used by Prometheus to push new alerts. +is incompatible to the present version 0.0.4. The only backport was the API endpoint used by Prometheus to push new alerts. ## Installation +The current version has to be run from the repository folder as UI assets and notification templates are not yet statically compiled into the binary. + You can either `go get` it: ``` $ GO15VENDOREXPERIMENT=1 go get github.com/prometheus/alertmanager +# cd $GOPATH/src/github.com/prometheus/alertmanager $ alertmanager -config.file= ``` @@ -28,12 +30,16 @@ $ ./alertmanager -config.file= This version was written from scratch. Core features enabled by this is are more advanced alert routing configurations and grouping/batching of alerts. Thus, squashing expression results through aggregation in alerting rules is no longer required to avoid noisyness. +The concepts of alert routing were outlined in [this document](https://docs.google.com/document/d/1-4jefGkFo71jlaLo4lHz40ZBoCv9ycBBBbjzbXifGyY/edit?usp=sharing). + The version implements full persistence of alerts, silences, and notification state. On restart it picks up right where it left off. ### Known issues This development version still has an extensive list of improvements and changes. This is an incomplete list of things that are still missing or need to be improved. +This will happen based on priority and demand. Feel free to ping fabxc about it + * On deleting silences it may take up to one `group_wait` cycle for a notification of a previously silenced alert to be sent. * Limiting inhibition rules to routing subtrees to avoid accidental interference. * Show silencing inhibition of alerts in the UI @@ -43,6 +49,136 @@ This development version still has an extensive list of improvements and changes * Definition of a minimum data set provided to notification templates * Best practices around notification templating * Various common command line flags like `path-prefix` +* Compiling templates and UI assets into the binary +* Allow constraining displayed alerts in UI + +## Example + +This is an example configuration that should cover most relevant aspects of the new YAML configuration format. Authoritative source for now is the [code](https://github.com/prometheus/alertmanager/tree/dev/config). + +``` +global: + # The smarthost and SMTP sender used for mail notifications. + smarthost: 'localhost:25' + smtp_sender: 'alertmanager@example.org' + +# The directory from which notification templates are read. +templates: +- 'template/*.tmpl' + +# The root route on which each incoming alert enters. +route: + # The labels by which incoming alerts are grouped together. For example, + # multiple alerts coming in for cluster=A and alertname=LatencyHigh would + # be batched into a single group. + group_by: ['alertname', 'cluster'] + + # When a new group of alerts is created by an incoming alert, wait at + # least 'group_wait' to send the initial notification. + # This way ensures that you get multiple alerts for the same group that start + # firing shortly after another are batched together on the first + # notification. + group_wait: 30s + + # When the first notification was sent, wait 'group_interval' to send a betch + # of new alerts that started firing for that group. + group_interval: 5m + + # If an alert has successfully been sent, wait 'repeat_interval' to + # resend them. + repeat_interval: 3h + + # If 'continue' is false, the first sub-route that matches this alert will + # terminate the search and the alert will be inserted at that routing node. + # If true, the alert is inserted to sibling nodes as well if there is a + # match. + # This allows to do first-match semantics (=false) in smaller scopes (e.g. team-level), + # while avoiding accidental shadowing (=true) at alerts at larger scopes (e.g. company-level) + continue: true + + # All the above attributes are inherited by all child routes and can + # overwritten on each. + + # The child route trees. + routes: + # This routes performs a regular expression match on alert labels to + # catch alerts that are related to a list of services. + - match_re: + service: ^(foo1|foo2|baz)$ + send_to: team-X-mails + + # The service has a sub-route for critical alerts, any alerts + # that do not match, i.e. severity != critical, fall-back to the + # parent node and are sent to 'team-X-mails' + routes: + - match: + severity: critical + send_to: team-X-pager + + - match: + service: files + send_to: team-Y-mails + + routes: + - match: + severity: critical + send_to: team-Y-pager + + # This route handles all alerts coming from a database service. If there's + # not team to handle it, it defaults to the DB team. + - match: + service: database + + send_to: team-DB-pager + # Also group alerts by affected database. + group_by: [alertname, cluster, database] + continue: false + + routes: + - match: + owner: team-X + send_to: team-X-pager + + - match: + owner: team-Y + send_to: team-Y-pager + + +# Inhibition rules allow to mute a set of alerts given that another alert is +# firing. +# We use this to mute any warning-level notifications if the same alert is +# already critical. +inhibit_rules: +- source_match: + severity: 'critical' + target_match: + severity: 'warning' + # Apply inhibition if the alertname is the same. + equal: ['alertname'] +notification_configs: +- name: 'team-X-mails' + email_configs: + - email: 'team-X+alerts@example.org' + +- name: 'team-X-pager' + email_configs: + - email: 'team-X+alerts-critical@example.org' + pagerduty_configs: + - service_key: + +- name: 'team-Y-mails' + email_configs: + - email: 'team-Y+alerts@example.org' + +- name: 'team-Y-pager' + pagerduty_configs: + - service_key: + +- name: 'team-DB-pager' + pagerduty_configs: + - service_key: +``` +