Add version tracking of silences states. Adding a silence to the state
increments the version. If the version hasn't changed since the last
time an alert was checked for being silenced, we only have to verify
that the relevant silences are still active rather than checking the
alert against all silences.
Signed-off-by: beorn7 <beorn@soundcloud.com>
This encapsulates the logic of querying and marking silenced
alerts. It removes the code duplication flagged earlier.
I removed the error returned by the setAlertStatus function as we were
only logging it, and that's already done anyway when the error is
received from the `silence.Query` call (now in the `Mutes` method).
Signed-off-by: beorn7 <beorn@soundcloud.com>
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
* notify: notify resolved alerts properly
The PR #1205 while fixing an existing issue introduced another bug when
the send_resolved flag of the integration is set to true.
With send_resolved set to false, the semantics remain the same:
AlertManager generates a notification when new firing alerts are added
to the alert group. The notification only carries firing alerts.
With send_resolved set to true, AlertManager generates a notification
when new firing or resolved alerts are added to the alert group. The
notification carries both the firing and resolved notifications.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.
This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
Resolved alerts, even when filtered, have to end up in the
SetNotifiesStage, otherwise when an alert fires again it is ambiguous
whether it was resolved in between or not.
fixes#523
This commit replaces the previous NotifyInfo provider with the new
nflog package. It needs adjustments in the behavior of the deduping
stage.
The nflog stores notification digests per receiver per alert aggregation
group rather than one entry for alert per receiver. This drastically
reduces the number of entries and removes interference
across aggregation groups.
This commit directly adds the nflogpb.Receiver object to stage
objects at stage creation time. Hence, we no longer rely on a value from
within the context.
This string value is initially used to store a receiver name. It is
later overloaded with a unique string identifier of <name, integration,
index>.
This renaming is in preparation to separate the two and use the Receiver
object of the nflogpb package.
This commit removes the dependency on model.Silence for the internal
Silence type, uses UUIDs instead of uint64s and clarifies invariants
around timestamp handling.
The created_at timestamp is removed for the time being.
This commit changes the notification grouping behavior
to simply send all alerts of a group as soon as a single
one of them needs updating.
This fixes a critical bug which caused erroneous resolved
notifications to be sent.
Notifcation configs may have multiple notification destinations.
This commit changes the pipeline so that each one has its own
retry and deduplication logic.
Retrying notifier is added to the end of the pipeline where it retries
sending out the final notifications until the context times out.
Exponential backoff is used.
Setting and getting of context values are done via helper
methods in the notifier package. The used keys are unexported. Thus,
we ensure that external code cannot overwrite the values and the type
is always correct.