* notify: notify resolved alerts properly
The PR #1205 while fixing an existing issue introduced another bug when
the send_resolved flag of the integration is set to true.
With send_resolved set to false, the semantics remain the same:
AlertManager generates a notification when new firing alerts are added
to the alert group. The notification only carries firing alerts.
With send_resolved set to true, AlertManager generates a notification
when new firing or resolved alerts are added to the alert group. The
notification carries both the firing and resolved notifications.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.
This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.
The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.
If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
Resolved alerts, even when filtered, have to end up in the
SetNotifiesStage, otherwise when an alert fires again it is ambiguous
whether it was resolved in between or not.
fixes#523
This commit replaces the previous NotifyInfo provider with the new
nflog package. It needs adjustments in the behavior of the deduping
stage.
The nflog stores notification digests per receiver per alert aggregation
group rather than one entry for alert per receiver. This drastically
reduces the number of entries and removes interference
across aggregation groups.
Initial testing has shown BoltDB in plain usage to be a bottleneck
at a few thousand alerts or more (especially JSON decoding.
This commit actually makes them purely memory as a temporary solution.
This commit changes the notification grouping behavior
to simply send all alerts of a group as soon as a single
one of them needs updating.
This fixes a critical bug which caused erroneous resolved
notifications to be sent.