alertmanager/doc/alertmanager-mixin
fpetkovski b408b522bc Improve the AlertmanagerMembersInconsistent alert
The expression alertmanager_cluster_members{job="alertmanager"}[5m]) is assumed to return
one series for each alertmanager instance in the cluster. When running inside Kubernetes,
alertmanager pods can get evicted and rescheduled. This can change the instance label and
produce a new series for that alertmanager instance.

When the same pod gets evicted several times in a row, there will be a short interval in which
Prometheus will return values from both the new series and the old series.
As a result, counting the number of series for the alertmanager_cluster_members metric
will overestimate the number of instances in the given cluster.

This commit modifies the the AlertmanagerMembersInconsistent alert to increase the for clause to 15m
in order to reduce the probability of a false positive.

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-06-22 08:21:02 +02:00
..
dashboards [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
.gitignore [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
alerts.jsonnet Beginnings of an Alertmanager mixin. (#1629) 2020-12-03 15:57:42 +01:00
alerts.libsonnet Improve the AlertmanagerMembersInconsistent alert 2021-06-22 08:21:02 +02:00
config.libsonnet [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
dashboards.jsonnet [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
dashboards.libsonnet [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
jsonnetfile.json [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
jsonnetfile.lock.json [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
Makefile [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
mixin.libsonnet [mixins] Alertmanager Overview dashboard (#2540) 2021-06-07 19:54:22 +02:00
README.md Beginnings of an Alertmanager mixin. (#1629) 2020-12-03 15:57:42 +01:00

Alertmanager Mixin

The Alertmanager Mixin is a set of configurable, reusable, and extensible alerts (and eventually dashboards) for Alertmanager.

The alerts are designed to monitor a cluster of Alertmanager instances. To make them work as expected, the Prometheus server the alerts are evaluated on has to scrape all Alertmanager instances of the cluster, even if those instances are distributed over different locations. All Alertmanager instances in the same Alertmanager cluster must have the same job label. In turn, if monitoring multiple different Alertmanager clusters, instances from different clusters must have a different job label.

The most basic use of the Alertmanager Mixin is to create a YAML file with the alerts from it. To do so, you need to have jsonnetfmt and mixtool installed. If you have a working Go development environment, it's easiest to run the following:

$ go get github.com/monitoring-mixins/mixtool/cmd/mixtool
$ go get github.com/google/go-jsonnet/cmd/jsonnetfmt

Edit config.libsonnet to match your environment and then build alertmanager_alerts.yaml with the alerts by running:

$ make build

For instructions on more advanced uses of mixins, see https://github.com/monitoring-mixins/docs.