Add version tracking of silences states. Adding a silence to the state
increments the version. If the version hasn't changed since the last
time an alert was checked for being silenced, we only have to verify
that the relevant silences are still active rather than checking the
alert against all silences.
Signed-off-by: beorn7 <beorn@soundcloud.com>
This clarifies a bunch of things I have run into during code reading
in preparation for some performance improvements around muting.
It also moves doc comments from places where they don't show up in
godoc to visible places.
It also fixes golint warnings.
Signed-off-by: beorn7 <beorn@soundcloud.com>
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
Alert merging assumed that EndsAt would always be empty for firing
alerts. This is no longer true starting with Prometheus v2.4.0: EndsAt
is set to a multiple of the evaluation interval or resend interval
(whichever is the largest). This change updates the merging logic to
support both cases.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Sort dispatched alerts by job+instance in the correct order (#1178)
Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>
* dispatch: add unit test for alerts sorting
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
This adds metrics that look like this:
```
alertmanager_alerts{state="active"} 6
alertmanager_alerts{state="suppressed"} 0
alertmanager_silences{state="active"} 1
alertmanager_silences{state="expired"} 1
alertmanager_silences{state="pending"} 0
```
This can be used to monitor alertmanager's usage and validate that
alertmanagers in a mesh have a similar number of silences and alerts.
This commit removes the dependency on model.Silence for the internal
Silence type, uses UUIDs instead of uint64s and clarifies invariants
around timestamp handling.
The created_at timestamp is removed for the time being.
No longer update components based on a new configuration. Generally,
destroying and recreating has no performance impact and is less
error-prone.
This also removes the Reloadable interface and simplifies the entire
startup contraption.