diff --git a/docs/alertmanager.md b/docs/alertmanager.md new file mode 100644 index 00000000..8e6eabab --- /dev/null +++ b/docs/alertmanager.md @@ -0,0 +1,76 @@ +--- +title: Alertmanager +sort_rank: 2 +nav_icon: sliders +--- + +# Alertmanager + +The [Alertmanager](https://github.com/prometheus/alertmanager) handles alerts +sent by client applications such as the Prometheus server. +It takes care of deduplicating, grouping, and routing +them to the correct receiver integration such as email, PagerDuty, or OpsGenie. +It also takes care of silencing and inhibition of alerts. + +The following describes the core concepts the Alertmanager implements. Consult +the [configuration documentation](configuration.md) to learn how to use them +in more detail. + +## Grouping + +Grouping categorizes alerts of similar nature into a single notification. This +is especially useful during larger outages when many systems fail at once and +hundreds to thousands of alerts may be firing simultaneously. + +**Example:** Dozens or hundreds of instances of a service are running in your +cluster when a network partition occurs. Half of your service instances +can no longer reach the database. +Alerting rules in Prometheus were configured to send an alert for each service +instance if it cannot communicate with the database. As a result hundreds of +alerts are sent to Alertmanager. + +As a user, one only wants to get a single page while still being able to see +exactly which service instances were affected. Thus one can configure +Alertmanager to group alerts by their cluster and alertname so it sends a +single compact notification. + +Grouping of alerts, timing for the grouped notifications, and the receivers +of those notifications are configured by a routing tree in the configuration +file. + +## Inhibition + +Inhibition is a concept of suppressing notifications for certain alerts if +certain other alerts are already firing. + +**Example:** An alert is firing that informs that an entire cluster is not +reachable. Alertmanager can be configured to mute all other alerts concerning +this cluster if that particular alert is firing. +This prevents notifications for hundreds or thousands of firing alerts that +are unrelated to the actual issue. + +Inhibitions are configured through the Alertmanager's configuration file. + +## Silences + +Silences are a straightforward way to simply mute alerts for a given time. +A silence is configured based on matchers, just like the routing tree. Incoming +alerts are checked whether they match all the equality or regular expression +matchers of an active silence. +If they do, no notifications will be sent out for that alert. + +Silences are configured in the web interface of the Alertmanager. + + +## Client behavior + +The Alertmanager has [special requirements](clients.md) for behavior of its +client. Those are only relevant for advanced use cases where Prometheus +is not used to send alerts. + +## High Availability + +Alertmanager supports configuration to create a cluster for high availability. +This can be configured using the [--cluster-*](https://github.com/prometheus/alertmanager#high-availability) flags. + +It's important not to load balance traffic between Prometheus and its Alertmanagers, but instead, point Prometheus to a list of all Alertmanagers. diff --git a/docs/clients.md b/docs/clients.md new file mode 100644 index 00000000..383658b0 --- /dev/null +++ b/docs/clients.md @@ -0,0 +1,52 @@ +--- +title: Clients +sort_rank: 6 +nav_icon: sliders +--- + +# Sending alerts + +__**Disclaimer**: Prometheus automatically takes care of sending alerts +generated by its configured [alerting +rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/). It is highly +recommended to configure alerting rules in Prometheus based on time series data +rather than implementing a direct client.__ + +The Alertmanager has two APIs, v1 and v2, both listening for alerts. The scheme +for v1 is described in the code snipped below. The scheme for v2 is specified as +an OpenAPI specification that can be found in the [Alertmanager +repository](https://github.com/prometheus/alertmanager/blob/master/api/v2/openapi.yaml). +Clients are expected to continuously re-send alerts as long as they are still +active (usually on the order of 30 seconds to 3 minutes). Clients can push a +list of alerts to Alertmanager via a POST request. + +The labels of each alert are used to identify identical instances of an alert +and to perform deduplication. The annotations are always set to those received +most recently and are not identifying an alert. + +Both `startsAt` and `endsAt` timestamp are optional. If `startsAt` is omitted, +the current time is assigned by the Alertmanager. `endsAt` is only set if the +end time of an alert is known. Otherwise it will be set to a configurable +timeout period from the time since the alert was last received. + +The `generatorURL` field is a unique back-link which identifies the causing +entity of this alert in the client. + +```json +[ + { + "labels": { + "alertname": "", + "": "", + ... + }, + "annotations": { + "": "", + }, + "startsAt": "", + "endsAt": "", + "generatorURL": "" + }, + ... +] +``` diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 00000000..e9991b43 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,716 @@ +--- +title: Configuration +sort_rank: 3 +nav_icon: sliders +--- + +# Configuration + +[Alertmanager](https://github.com/prometheus/alertmanager) is configured via +command-line flags and a configuration file. +While the command-line flags configure immutable system parameters, the +configuration file defines inhibition rules, notification routing and +notification receivers. + +The [visual editor](/webtools/alerting/routing-tree-editor) can assist in +building routing trees. + +To view all available command-line flags, run `alertmanager -h`. + +Alertmanager can reload its configuration at runtime. If the new configuration +is not well-formed, the changes will not be applied and an error is logged. +A configuration reload is triggered by sending a `SIGHUP` to the process or +sending a HTTP POST request to the `/-/reload` endpoint. + +## Configuration file + +To specify which configuration file to load, use the `--config.file` flag. + +```bash +./alertmanager --config.file=alertmanager.yml +``` + +The file is written in the [YAML format](http://en.wikipedia.org/wiki/YAML), +defined by the scheme described below. +Brackets indicate that a parameter is optional. For non-list parameters the +value is set to the specified default. + +Generic placeholders are defined as follows: + +* ``: a duration matching the regular expression `[0-9]+(ms|[smhdwy])` +* ``: a string matching the regular expression `[a-zA-Z_][a-zA-Z0-9_]*` +* ``: a string of unicode characters +* ``: a valid path in the current working directory +* ``: a boolean that can take the values `true` or `false` +* ``: a regular string +* ``: a regular string that is a secret, such as a password +* ``: a string which is template-expanded before usage +* ``: a string which is template-expanded before usage that is a secret +* ``: an integer value + +The other placeholders are specified separately. + +A provided [valid example file](https://github.com/prometheus/alertmanager/blob/master/doc/examples/simple.yml) +shows usage in context. + +The global configuration specifies parameters that are valid in all other +configuration contexts. They also serve as defaults for other configuration +sections. + +```yaml +global: + # The default SMTP From header field. + [ smtp_from: ] + # The default SMTP smarthost used for sending emails, including port number. + # Port number usually is 25, or 587 for SMTP over TLS (sometimes referred to as STARTTLS). + # Example: smtp.example.org:587 + [ smtp_smarthost: ] + # The default hostname to identify to the SMTP server. + [ smtp_hello: | default = "localhost" ] + # SMTP Auth using CRAM-MD5, LOGIN and PLAIN. If empty, Alertmanager doesn't authenticate to the SMTP server. + [ smtp_auth_username: ] + # SMTP Auth using LOGIN and PLAIN. + [ smtp_auth_password: ] + # SMTP Auth using PLAIN. + [ smtp_auth_identity: ] + # SMTP Auth using CRAM-MD5. + [ smtp_auth_secret: ] + # The default SMTP TLS requirement. + # Note that Go does not support unencrypted connections to remote SMTP endpoints. + [ smtp_require_tls: | default = true ] + + # The API URL to use for Slack notifications. + [ slack_api_url: ] + [ victorops_api_key: ] + [ victorops_api_url: | default = "https://alert.victorops.com/integrations/generic/20131114/alert/" ] + [ pagerduty_url: | default = "https://events.pagerduty.com/v2/enqueue" ] + [ opsgenie_api_key: ] + [ opsgenie_api_url: | default = "https://api.opsgenie.com/" ] + [ wechat_api_url: | default = "https://qyapi.weixin.qq.com/cgi-bin/" ] + [ wechat_api_secret: ] + [ wechat_api_corp_id: ] + + # The default HTTP client configuration + [ http_config: ] + + # ResolveTimeout is the default value used by alertmanager if the alert does + # not include EndsAt, after this time passes it can declare the alert as resolved if it has not been updated. + # This has no impact on alerts from Prometheus, as they always include EndsAt. + [ resolve_timeout: | default = 5m ] + +# Files from which custom notification template definitions are read. +# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'. +templates: + [ - ... ] + +# The root node of the routing tree. +route: + +# A list of notification receivers. +receivers: + - ... + +# A list of inhibition rules. +inhibit_rules: + [ - ... ] +``` + +## `` + +A route block defines a node in a routing tree and its children. Its optional +configuration parameters are inherited from its parent node if not set. + +Every alert enters the routing tree at the configured top-level route, which +must match all alerts (i.e. not have any configured matchers). +It then traverses the child nodes. If `continue` is set to false, it stops +after the first matching child. If `continue` is true on a matching node, the +alert will continue matching against subsequent siblings. +If an alert does not match any children of a node (no matching child nodes, or +none exist), the alert is handled based on the configuration parameters of the +current node. + +```yaml +[ receiver: ] +# The labels by which incoming alerts are grouped together. For example, +# multiple alerts coming in for cluster=A and alertname=LatencyHigh would +# be batched into a single group. +# +# To aggregate by all possible labels use the special value '...' as the sole label name, for example: +# group_by: ['...'] +# This effectively disables aggregation entirely, passing through all +# alerts as-is. This is unlikely to be what you want, unless you have +# a very low alert volume or your upstream notification system performs +# its own grouping. +[ group_by: '[' , ... ']' ] + +# Whether an alert should continue matching subsequent sibling nodes. +[ continue: | default = false ] + +# A set of equality matchers an alert has to fulfill to match the node. +match: + [ : , ... ] + +# A set of regex-matchers an alert has to fulfill to match the node. +match_re: + [ : , ... ] + +# How long to initially wait to send a notification for a group +# of alerts. Allows to wait for an inhibiting alert to arrive or collect +# more initial alerts for the same group. (Usually ~0s to few minutes.) +[ group_wait: | default = 30s ] + +# How long to wait before sending a notification about new alerts that +# are added to a group of alerts for which an initial notification has +# already been sent. (Usually ~5m or more.) +[ group_interval: | default = 5m ] + +# How long to wait before sending a notification again if it has already +# been sent successfully for an alert. (Usually ~3h or more). +[ repeat_interval: | default = 4h ] + +# Zero or more child routes. +routes: + [ - ... ] +``` + +### Example + +```yaml +# The root route with all parameters, which are inherited by the child +# routes if they are not overwritten. +route: + receiver: 'default-receiver' + group_wait: 30s + group_interval: 5m + repeat_interval: 4h + group_by: [cluster, alertname] + # All alerts that do not match the following child routes + # will remain at the root node and be dispatched to 'default-receiver'. + routes: + # All alerts with service=mysql or service=cassandra + # are dispatched to the database pager. + - receiver: 'database-pager' + group_wait: 10s + match_re: + service: mysql|cassandra + # All alerts with the team=frontend label match this sub-route. + # They are grouped by product and environment rather than cluster + # and alertname. + - receiver: 'frontend-pager' + group_by: [product, environment] + match: + team: frontend +``` + +## `` + +An inhibition rule mutes an alert (target) matching a set of matchers +when an alert (source) exists that matches another set of matchers. +Both target and source alerts must have the same label values +for the label names in the `equal` list. + +Semantically, a missing label and a label with an empty value are the same +thing. Therefore, if all the label names listed in `equal` are missing from +both the source and target alerts, the inhibition rule will apply. + +To prevent an alert from inhibiting itself, an alert that matches _both_ the +target and the source side of a rule cannot be inhibited by alerts for which +the same is true (including itself). However, we recommend to choose target and +source matchers in a way that alerts never match both sides. It is much easier +to reason about and does not trigger this special case. + +```yaml +# Matchers that have to be fulfilled in the alerts to be muted. +target_match: + [ : , ... ] +target_match_re: + [ : , ... ] + +# Matchers for which one or more alerts have to exist for the +# inhibition to take effect. +source_match: + [ : , ... ] +source_match_re: + [ : , ... ] + +# Labels that must have an equal value in the source and target +# alert for the inhibition to take effect. +[ equal: '[' , ... ']' ] + +``` + +## `` + +A `http_config` allows configuring the HTTP client that the receiver uses to +communicate with HTTP-based API services. + +```yaml +# Note that `basic_auth`, `bearer_token` and `bearer_token_file` options are +# mutually exclusive. + +# Sets the `Authorization` header with the configured username and password. +# password and password_file are mutually exclusive. +basic_auth: + [ username: ] + [ password: ] + [ password_file: ] + +# Sets the `Authorization` header with the configured bearer token. +[ bearer_token: ] + +# Sets the `Authorization` header with the bearer token read from the configured file. +[ bearer_token_file: ] + +# Configures the TLS settings. +tls_config: + [ ] + +# Optional proxy URL. +[ proxy_url: ] +``` + +## `` + +A `tls_config` allows configuring TLS connections. + +```yaml +# CA certificate to validate the server certificate with. +[ ca_file: ] + +# Certificate and key files for client cert authentication to the server. +[ cert_file: ] +[ key_file: ] + +# ServerName extension to indicate the name of the server. +# http://tools.ietf.org/html/rfc4366#section-3.1 +[ server_name: ] + +# Disable validation of the server certificate. +[ insecure_skip_verify: | default = false] +``` + +## `` + +Receiver is a named configuration of one or more notification integrations. + +__We're not actively adding new receivers, we recommend implementing custom notification integrations via the [webhook](#webhook_config) receiver.__ + +```yaml +# The unique name of the receiver. +name: + +# Configurations for several notification integrations. +email_configs: + [ - , ... ] +pagerduty_configs: + [ - , ... ] +pushover_configs: + [ - , ... ] +slack_configs: + [ - , ... ] +opsgenie_configs: + [ - , ... ] +webhook_configs: + [ - , ... ] +victorops_configs: + [ - , ... ] +wechat_configs: + [ - , ... ] +``` + +## `` + +```yaml +# Whether or not to notify about resolved alerts. +[ send_resolved: | default = false ] + +# The email address to send notifications to. +to: + +# The sender address. +[ from: | default = global.smtp_from ] + +# The SMTP host through which emails are sent. +[ smarthost: | default = global.smtp_smarthost ] + +# The hostname to identify to the SMTP server. +[ hello: | default = global.smtp_hello ] + +# SMTP authentication information. +[ auth_username: | default = global.smtp_auth_username ] +[ auth_password: | default = global.smtp_auth_password ] +[ auth_secret: | default = global.smtp_auth_secret ] +[ auth_identity: | default = global.smtp_auth_identity ] + +# The SMTP TLS requirement. +# Note that Go does not support unencrypted connections to remote SMTP endpoints. +[ require_tls: | default = global.smtp_require_tls ] + +# TLS configuration. +tls_config: + [ ] + +# The HTML body of the email notification. +[ html: | default = '{{ template "email.default.html" . }}' ] +# The text body of the email notification. +[ text: ] + +# Further headers email header key/value pairs. Overrides any headers +# previously set by the notification implementation. +[ headers: { : , ... } ] +``` + +## `` + +PagerDuty notifications are sent via the [PagerDuty API](https://developer.pagerduty.com/documentation/integration/events). +PagerDuty provides [documentation](https://www.pagerduty.com/docs/guides/prometheus-integration-guide/) on how to integrate. There are important differences with Alertmanager's v0.11 and greater support of PagerDuty's Events API v2. + +```yaml +# Whether or not to notify about resolved alerts. +[ send_resolved: | default = true ] + +# The following two options are mutually exclusive. +# The PagerDuty integration key (when using PagerDuty integration type `Events API v2`). +routing_key: +# The PagerDuty integration key (when using PagerDuty integration type `Prometheus`). +service_key: + +# The URL to send API requests to +[ url: | default = global.pagerduty_url ] + +# The client identification of the Alertmanager. +[ client: | default = '{{ template "pagerduty.default.client" . }}' ] +# A backlink to the sender of the notification. +[ client_url: | default = '{{ template "pagerduty.default.clientURL" . }}' ] + +# A description of the incident. +[ description: | default = '{{ template "pagerduty.default.description" .}}' ] + +# Severity of the incident. +[ severity: | default = 'error' ] + +# A set of arbitrary key/value pairs that provide further detail +# about the incident. +[ details: { : , ... } | default = { + firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}' + resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}' + num_firing: '{{ .Alerts.Firing | len }}' + num_resolved: '{{ .Alerts.Resolved | len }}' +} ] + +# Images to attach to the incident. +images: + [ ... ] + +# Links to attach to the incident. +links: + [ ... ] + +# The HTTP client's configuration. +[ http_config: | default = global.http_config ] +``` + +### `` + +The fields are documented in the [PagerDuty API documentation](https://v2.developer.pagerduty.com/v2/docs/send-an-event-events-api-v2#section-the-images-property). + +```yaml +href: +source: +alt: +``` + +### `` + +The fields are documented in the [PagerDuty API documentation](https://v2.developer.pagerduty.com/v2/docs/send-an-event-events-api-v2#section-the-links-property). + +```yaml +href: +text: +``` + +## `` + +Pushover notifications are sent via the [Pushover API](https://pushover.net/api). + +```yaml +# Whether or not to notify about resolved alerts. +[ send_resolved: | default = true ] + +# The recipient user’s user key. +user_key: + +# Your registered application’s API token, see https://pushover.net/apps +token: + +# Notification title. +[ title: | default = '{{ template "pushover.default.title" . }}' ] + +# Notification message. +[ message: | default = '{{ template "pushover.default.message" . }}' ] + +# A supplementary URL shown alongside the message. +[ url: | default = '{{ template "pushover.default.url" . }}' ] + +# Priority, see https://pushover.net/api#priority +[ priority: | default = '{{ if eq .Status "firing" }}2{{ else }}0{{ end }}' ] + +# How often the Pushover servers will send the same notification to the user. +# Must be at least 30 seconds. +[ retry: | default = 1m ] + +# How long your notification will continue to be retried for, unless the user +# acknowledges the notification. +[ expire: | default = 1h ] + +# The HTTP client's configuration. +[ http_config: | default = global.http_config ] +``` + +## `` + +Slack notifications are sent via [Slack +webhooks](https://api.slack.com/incoming-webhooks). The notification contains +an [attachment](https://api.slack.com/docs/message-attachments). + +```yaml +# Whether or not to notify about resolved alerts. +[ send_resolved: | default = false ] + +# The Slack webhook URL. +[ api_url: | default = global.slack_api_url ] + +# The channel or user to send notifications to. +channel: + +# API request data as defined by the Slack webhook API. +[ icon_emoji: ] +[ icon_url: ] +[ link_names: | default = false ] +[ username: | default = '{{ template "slack.default.username" . }}' ] +# The following parameters define the attachment. +actions: + [ ... ] +[ callback_id: | default = '{{ template "slack.default.callbackid" . }}' ] +[ color: | default = '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' ] +[ fallback: | default = '{{ template "slack.default.fallback" . }}' ] +fields: + [ ... ] +[ footer: | default = '{{ template "slack.default.footer" . }}' ] +[ mrkdwn_in: '[' , ... ']' | default = ["fallback", "pretext", "text"] ] +[ pretext: | default = '{{ template "slack.default.pretext" . }}' ] +[ short_fields: | default = false ] +[ text: | default = '{{ template "slack.default.text" . }}' ] +[ title: | default = '{{ template "slack.default.title" . }}' ] +[ title_link: | default = '{{ template "slack.default.titlelink" . }}' ] +[ image_url: ] +[ thumb_url: ] + +# The HTTP client's configuration. +[ http_config: | default = global.http_config ] +``` + +### `` + +The fields are documented in the Slack API documentation for [message attachments](https://api.slack.com/docs/message-attachments#action_fields) and [interactive messages](https://api.slack.com/docs/interactive-message-field-guide#action_fields). + +```yaml +text: +type: +# Either url or name and value are mandatory. +[ url: ] +[ name: ] +[ value: ] + +[ confirm: ] +[ style: | default = '' ] +``` + +#### `` + +The fields are documented in the [Slack API documentation](https://api.slack.com/docs/interactive-message-field-guide#confirmation_fields). + +```yaml +text: +[ dismiss_text: | default '' ] +[ ok_text: | default '' ] +[ title: | default '' ] +``` + +### `` + +The fields are documented in the [Slack API documentation](https://api.slack.com/docs/message-attachments#fields). + +```yaml +title: +value: +[ short: | default = slack_config.short_fields ] +``` + +## `` + +OpsGenie notifications are sent via the [OpsGenie API](https://docs.opsgenie.com/docs/alert-api). + +```yaml +# Whether or not to notify about resolved alerts. +[ send_resolved: | default = true ] + +# The API key to use when talking to the OpsGenie API. +[ api_key: | default = global.opsgenie_api_key ] + +# The host to send OpsGenie API requests to. +[ api_url: | default = global.opsgenie_api_url ] + +# Alert text limited to 130 characters. +[ message: ] + +# A description of the incident. +[ description: | default = '{{ template "opsgenie.default.description" . }}' ] + +# A backlink to the sender of the notification. +[ source: | default = '{{ template "opsgenie.default.source" . }}' ] + +# A set of arbitrary key/value pairs that provide further detail +# about the incident. +[ details: { : , ... } ] + +# List of responders responsible for notifications. +responders: + [ - ... ] + +# Comma separated list of tags attached to the notifications. +[ tags: ] + +# Additional alert note. +[ note: ] + +# Priority level of alert. Possible values are P1, P2, P3, P4, and P5. +[ priority: ] + +# The HTTP client's configuration. +[ http_config: | default = global.http_config ] +``` + +### `` + +```yaml +# Exactly one of these fields should be defined. +[ id: ] +[ name: ] +[ username: ] + +# "team", "user", "escalation" or schedule". +type: +``` + +## `` + +VictorOps notifications are sent out via the [VictorOps API](https://help.victorops.com/knowledge-base/victorops-restendpoint-integration/) + +```yaml +# Whether or not to notify about resolved alerts. +[ send_resolved: | default = true ] + +# The API key to use when talking to the VictorOps API. +[ api_key: | default = global.victorops_api_key ] + +# The VictorOps API URL. +[ api_url: | default = global.victorops_api_url ] + +# A key used to map the alert to a team. +routing_key: + +# Describes the behavior of the alert (CRITICAL, WARNING, INFO). +[ message_type: | default = 'CRITICAL' ] + +# Contains summary of the alerted problem. +[ entity_display_name: | default = '{{ template "victorops.default.entity_display_name" . }}' ] + +# Contains long explanation of the alerted problem. +[ state_message: | default = '{{ template "victorops.default.state_message" . }}' ] + +# The monitoring tool the state message is from. +[ monitoring_tool: | default = '{{ template "victorops.default.monitoring_tool" . }}' ] + +# The HTTP client's configuration. +[ http_config: | default = global.http_config ] +``` + +## `` + +The webhook receiver allows configuring a generic receiver. + +```yaml +# Whether or not to notify about resolved alerts. +[ send_resolved: | default = true ] + +# The endpoint to send HTTP POST requests to. +url: + +# The HTTP client's configuration. +[ http_config: | default = global.http_config ] + +# The maximum number of alerts to include in a single webhook message. Alerts +# above this threshold are truncated. When leaving this at its default value of +# 0, all alerts are included. +[ max_alerts: | default = 0 ] +``` + +The Alertmanager +will send HTTP POST requests in the following JSON format to the configured +endpoint: + +``` +{ + "version": "4", + "groupKey": , // key identifying the group of alerts (e.g. to deduplicate) + "truncatedAlerts": , // how many alerts have been truncated due to "max_alerts" + "status": "", + "receiver": , + "groupLabels": , + "commonLabels": , + "commonAnnotations": , + "externalURL": , // backlink to the Alertmanager. + "alerts": [ + { + "status": "", + "labels": , + "annotations": , + "startsAt": "", + "endsAt": "", + "generatorURL": // identifies the entity that caused the alert + }, + ... + ] +} +``` + +There is a list of +[integrations](https://prometheus.io/docs/operating/integrations/#alertmanager-webhook-receiver) with +this feature. + +## `` + +WeChat notifications are sent via the [WeChat +API](http://admin.wechat.com/wiki/index.php?title=Customer_Service_Messages). + +```yaml +# Whether or not to notify about resolved alerts. +[ send_resolved: | default = false ] + +# The API key to use when talking to the WeChat API. +[ api_secret: | default = global.wechat_api_secret ] + +# The WeChat API URL. +[ api_url: | default = global.wechat_api_url ] + +# The corp id for authentication. +[ corp_id: | default = global.wechat_api_corp_id ] + +# API request data as defined by the WeChat API. +[ message: | default = '{{ template "wechat.default.message" . }}' ] +[ agent_id: | default = '{{ template "wechat.default.agent_id" . }}' ] +[ to_user: | default = '{{ template "wechat.default.to_user" . }}' ] +[ to_party: | default = '{{ template "wechat.default.to_party" . }}' ] +[ to_tag: | default = '{{ template "wechat.default.to_tag" . }}' ] +``` diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000..a14ae2ee --- /dev/null +++ b/docs/index.md @@ -0,0 +1,5 @@ +--- +title: Alerting +sort_rank: 7 +nav_icon: bell-o +--- diff --git a/docs/management_api.md b/docs/management_api.md new file mode 100644 index 00000000..910c1d84 --- /dev/null +++ b/docs/management_api.md @@ -0,0 +1,37 @@ +--- +title: Management API +sort_rank: 9 +--- + +# Management API + +Alertmanager provides a set of management API to ease automation and integrations. + + +### Health check + +``` +GET /-/healthy +``` + +This endpoint always returns 200 and should be used to check Alertmanager health. + + +### Readiness check + +``` +GET /-/ready +``` + +This endpoint returns 200 when Alertmanager is ready to serve traffic (i.e. respond to queries). + + +### Reload + +``` +POST /-/reload +``` + +This endpoint triggers a reload of the Alertmanager configuration file. + +An alternative way to trigger a configuration reload is by sending a `SIGHUP` to the Alertmanager process. diff --git a/docs/notification_examples.md b/docs/notification_examples.md new file mode 100644 index 00000000..7c3ab05a --- /dev/null +++ b/docs/notification_examples.md @@ -0,0 +1,103 @@ +--- +title: Notification template examples +sort_rank: 8 +--- +# Notification Template Examples + +The following are all different examples of alerts and corresponding Alertmanager configuration file setups (alertmanager.yml). +Each use the [Go templating](http://golang.org/pkg/text/template/) system. + +## Customizing Slack notifications + +In this example we've customised our Slack notification to send a URL to our organisation's wiki on how to deal with the particular alert that's been sent. + +``` +global: + slack_api_url: '' + +route: + receiver: 'slack-notifications' + group_by: [alertname, datacenter, app] + +receivers: +- name: 'slack-notifications' + slack_configs: + - channel: '#alerts' + text: 'https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.app }}/{{ .GroupLabels.alertname }}' +``` + +## Accessing annotations in CommonAnnotations + +In this example we again customize the text sent to our Slack receiver accessing the `summary` and `description` stored in the `CommonAnnotations` of the data sent by the Alertmanager. + +Alert + +``` +groups: +- name: Instances + rules: + - alert: InstanceDown + expr: up == 0 + for: 5m + labels: + severity: page + # Prometheus templates apply here in the annotation and label fields of the alert. + annotations: + description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.' + summary: 'Instance {{ $labels.instance }} down' +``` + +Receiver + +``` +- name: 'team-x' + slack_configs: + - channel: '#alerts' + # Alertmanager templates apply here. + text: " \nsummary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}" +``` + +## Ranging over all received Alerts + +Finally, assuming the same alert as the previous example, we customize our receiver to range over all of the alerts received from the Alertmanager, printing their respective annotation summaries and descriptions on new lines. + +Receiver + +``` +- name: 'default-receiver' + slack_configs: + - channel: '#alerts' + title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}" + text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}" +``` + +## Defining reusable templates + +Going back to our first example, we can also provide a file containing named templates which are then loaded by Alertmanager in order to avoid complex templates that span many lines. +Create a file under `/alertmanager/template/myorg.tmpl` and create a template in it named "slack.myorg.txt": + +``` +{{ define "slack.myorg.text" }}https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.app }}/{{ .GroupLabels.alertname }}{{ end}} +``` + +The configuration now loads the template with the given name for the "text" field and we provide a path to our custom template file: + +``` +global: + slack_api_url: '' + +route: + receiver: 'slack-notifications' + group_by: [alertname, datacenter, app] + +receivers: +- name: 'slack-notifications' + slack_configs: + - channel: '#alerts' + text: '{{ template "slack.myorg.text" . }}' + +templates: +- '/etc/alertmanager/templates/myorg.tmpl' +``` + +This example is explained in further detail in this [blogpost](https://prometheus.io/blog/2016/03/03/custom-alertmanager-templates/). \ No newline at end of file diff --git a/docs/notifications.md b/docs/notifications.md new file mode 100644 index 00000000..29462338 --- /dev/null +++ b/docs/notifications.md @@ -0,0 +1,94 @@ +--- +title: Notification template reference +sort_rank: 7 +--- +# Notification Template Reference + +Prometheus creates and sends alerts to the Alertmanager which then sends notifications out to different receivers based on their labels. +A receiver can be one of many integrations including: Slack, PagerDuty, email, or a custom integration via the generic webhook interface. + +The notifications sent to receivers are constructed via templates. The Alertmanager comes with default templates but they can also be customized. +To avoid confusion it's important to note that the Alertmanager templates differ from [templating in Prometheus](https://prometheus.io/docs/visualization/template_reference/), however Prometheus templating also includes the templating in alert rule labels/annotations. + + +The Alertmanager's notification templates are based on the [Go templating](http://golang.org/pkg/text/template) system. +Note that some fields are evaluated as text, and others as HTML which will affect escaping. + +# Data Structures + +## Data + +`Data` is the structure passed to notification templates and webhook pushes. + +| Name | Type | Notes | +| ------------- | ------------- | -------- | +| Receiver | string | Defines the receiver's name that the notification will be sent to (slack, email etc.). | +| Status | string | Defined as firing if at least one alert is firing, otherwise resolved. | +| Alerts | [Alert](#alert) | List of all alert objects in this group ([see below](#alert)). | +| GroupLabels | [KV](#kv) | The labels these alerts were grouped by. | +| CommonLabels | [KV](#kv) | The labels common to all of the alerts. | +| CommonAnnotations | [KV](#kv) | Set of common annotations to all of the alerts. Used for longer additional strings of information about the alert. | +| ExternalURL | string | Backlink to the Alertmanager that sent the notification. | + +The `Alerts` type exposes functions for filtering alerts: + - `Alerts.Firing` returns a list of currently firing alert objects in this group + - `Alerts.Resolved` returns a list of resolved alert objects in this group + +## Alert + +`Alert` holds one alert for notification templates. + +| Name | Type | Notes | +| ------------- | ------------- | -------- | +| Status | string | Defines whether or not the alert is resolved or currently firing. | +| Labels | [KV](#kv) | A set of labels to be attached to the alert. | +| Annotations | [KV](#kv) | A set of annotations for the alert. | +| StartsAt | time.Time | The time the alert started firing. If omitted, the current time is assigned by the Alertmanager. | +| EndsAt | time.Time | Only set if the end time of an alert is known. Otherwise set to a configurable timeout period from the time since the last alert was received. | +| GeneratorURL | string | A backlink which identifies the causing entity of this alert. | + +## KV + +`KV` is a set of key/value string pairs used to represent labels and annotations. + +``` +type KV map[string]string +``` + +Annotation example containing two annotations: + +``` +{ + summary: "alert summary", + description: "alert description", +} +``` + +In addition to direct access of data (labels and annotations) stored as KV, there are also methods for sorting, removing, and viewing the LabelSets: + +### KV methods +| Name | Arguments | Returns | Notes | +| ------------- | ------------- | -------- | -------- | +| SortedPairs | - | Pairs (list of key/value string pairs.) | Returns a sorted list of key/value pairs. | +| Remove | []string | KV | Returns a copy of the key/value map without the given keys. | +| Names | - | []string | Returns the names of the label names in the LabelSet. | +| Values | - | []string | Returns a list of the values in the LabelSet. | + +# Functions + +Note the [default +functions](http://golang.org/pkg/text/template/#hdr-Functions) also provided by Go +templating. + +## Strings + +| Name | Arguments | Returns | Notes | +| ------------- | ------------- | -------- | -------- | +| title | string |[strings.Title](http://golang.org/pkg/strings/#Title), capitalises first character of each word. | +| toUpper | string | [strings.ToUpper](http://golang.org/pkg/strings/#ToUpper), converts all characters to upper case. | +| toLower | string | [strings.ToLower](http://golang.org/pkg/strings/#ToLower), converts all characters to lower case. | +| match | pattern, string | [Regexp.MatchString](https://golang.org/pkg/regexp/#MatchString). Match a string using Regexp. | +| reReplaceAll | pattern, replacement, text | [Regexp.ReplaceAllString](http://golang.org/pkg/regexp/#Regexp.ReplaceAllString) Regexp substitution, unanchored. | +| join | sep string, s []string | [strings.Join](http://golang.org/pkg/strings/#Join), concatenates the elements of s to create a single string. The separator string sep is placed between elements in the resulting string. (note: argument order inverted for easier pipelining in templates.) | +| safeHtml | text string | [html/template.HTML](https://golang.org/pkg/html/template/#HTML), Marks string as HTML not requiring auto-escaping. | +| stringSlice | ...string | Returns the passed strings as a slice of strings. | diff --git a/docs/overview.md b/docs/overview.md new file mode 100644 index 00000000..86f12289 --- /dev/null +++ b/docs/overview.md @@ -0,0 +1,18 @@ +--- +title: Alerting overview +sort_rank: 1 +nav_icon: sliders +--- + +# Alerting Overview + +Alerting with Prometheus is separated into two parts. Alerting rules in +Prometheus servers send alerts to an Alertmanager. The [Alertmanager](alertmanager.md) +then manages those alerts, including silencing, inhibition, aggregation and +sending out notifications via methods such as email, on-call notification systems, and chat platforms. + +The main steps to setting up alerting and notifications are: + +* Setup and [configure](configuration.md) the Alertmanager +* [Configure Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alertmanager_config) to talk to the Alertmanager +* Create [alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) in Prometheus