Resolved alerts, even when filtered, have to end up in the
SetNotifiesStage, otherwise when an alert fires again it is ambiguous
whether it was resolved in between or not.
fixes#523
The retry flag allows an integration to specify whether a retry can
potentially be solved or if the error is likely not going to recover.
For example invalid authentication is likely a wrong configuration and
therefore a retry would not make sense, while a server error is likely
a temporary problem and can potentially be solved on the next retry.
This commit replaces the previous NotifyInfo provider with the new
nflog package. It needs adjustments in the behavior of the deduping
stage.
The nflog stores notification digests per receiver per alert aggregation
group rather than one entry for alert per receiver. This drastically
reduces the number of entries and removes interference
across aggregation groups.
This commit directly adds the nflogpb.Receiver object to stage
objects at stage creation time. Hence, we no longer rely on a value from
within the context.
This string value is initially used to store a receiver name. It is
later overloaded with a unique string identifier of <name, integration,
index>.
This renaming is in preparation to separate the two and use the Receiver
object of the nflogpb package.
This commit implements a wait period before actually dispatching
notifications. The backoff linearly depends on the UID order of
participating peers.
This gives the gossip state time to catch up and avoids duplicate
notifications while ensuring that every peer notifies eventually.
This commit changes the notification grouping behavior
to simply send all alerts of a group as soon as a single
one of them needs updating.
This fixes a critical bug which caused erroneous resolved
notifications to be sent.
Notifcation configs may have multiple notification destinations.
This commit changes the pipeline so that each one has its own
retry and deduplication logic.
Retrying notifier is added to the end of the pipeline where it retries
sending out the final notifications until the context times out.
Exponential backoff is used.
Setting and getting of context values are done via helper
methods in the notifier package. The used keys are unexported. Thus,
we ensure that external code cannot overwrite the values and the type
is always correct.