Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
Resolved alerts, even when filtered, have to end up in the
SetNotifiesStage, otherwise when an alert fires again it is ambiguous
whether it was resolved in between or not.
fixes#523
The retry flag allows an integration to specify whether a retry can
potentially be solved or if the error is likely not going to recover.
For example invalid authentication is likely a wrong configuration and
therefore a retry would not make sense, while a server error is likely
a temporary problem and can potentially be solved on the next retry.
This commit replaces the previous NotifyInfo provider with the new
nflog package. It needs adjustments in the behavior of the deduping
stage.
The nflog stores notification digests per receiver per alert aggregation
group rather than one entry for alert per receiver. This drastically
reduces the number of entries and removes interference
across aggregation groups.
This commit directly adds the nflogpb.Receiver object to stage
objects at stage creation time. Hence, we no longer rely on a value from
within the context.
This string value is initially used to store a receiver name. It is
later overloaded with a unique string identifier of <name, integration,
index>.
This renaming is in preparation to separate the two and use the Receiver
object of the nflogpb package.
This commit implements a wait period before actually dispatching
notifications. The backoff linearly depends on the UID order of
participating peers.
This gives the gossip state time to catch up and avoids duplicate
notifications while ensuring that every peer notifies eventually.
This commit removes the dependency on model.Silence for the internal
Silence type, uses UUIDs instead of uint64s and clarifies invariants
around timestamp handling.
The created_at timestamp is removed for the time being.
Add default VictorOpsAPIURL
Add VictorOps default config
Add VictorOpsConfig struct in notifiers
Add new template tags for victorops
Add notifications logic for victorops
Compiled template tags with make assets
Remove common labels from entity_id template
Set messageType default value to CRITICAL
Recovery messageType is not configurable anymore. Firing state only allows specific keys
Make assets
Using log.Debugf
EntityID should not be configureable
Remove entity_id from template
Use GroupKey(ctx) as entity_id
Improve debug logging
Fix type of entity_id
The pushover notification of resolved alerts can end up in an empty
message (only a newline). I've fixed the check for an empty message with
trimming the whitespace. But I also thought that adding the resolved
alerts to the message would be helpful.
OpsGenie returns HTTP 400 to alert close requests if the alert has
already been closed. There is no need to try again if this happens.
When an error is returned from a notifiation service, an error is
logged, and the logged error is more meaningful if it includes a hint
about which notification service that caused a problem.
Defer the resp.body.Close() call in the OpsGenie Notify()
implementation.