Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
* Find MAC address if mesh.hardware-addr not given
Defaulting to the machine's MAC address fails
sometimes fails and causes a panic. Allow the user
to specify custom address to skip this so they can
run AlertManager.
* -mesh.hardware-address -> -mesh.peer-id
* Fix command-line invocation
* Wait for test server to be ready before running tests
This fixes problems when running the acceptance tests in slow or CPU-starved
machines, as mentioned in #472.
Resolved alerts, even when filtered, have to end up in the
SetNotifiesStage, otherwise when an alert fires again it is ambiguous
whether it was resolved in between or not.
fixes#523
This commit replaces the previous NotifyInfo provider with the new
nflog package. It needs adjustments in the behavior of the deduping
stage.
The nflog stores notification digests per receiver per alert aggregation
group rather than one entry for alert per receiver. This drastically
reduces the number of entries and removes interference
across aggregation groups.
This commit removes the dependency on model.Silence for the internal
Silence type, uses UUIDs instead of uint64s and clarifies invariants
around timestamp handling.
The created_at timestamp is removed for the time being.
Initial testing has shown BoltDB in plain usage to be a bottleneck
at a few thousand alerts or more (especially JSON decoding.
This commit actually makes them purely memory as a temporary solution.
Previously, the tests would listen on all available interfaces.
Instead, have the tests use localhost only; using all available
interfaces is unnecessary.
On Mac OS X with the builtin firewall enabled, it triggers annoying
prompts to allow the tests to listen on all interfaces.
This commit changes the notification grouping behavior
to simply send all alerts of a group as soon as a single
one of them needs updating.
This fixes a critical bug which caused erroneous resolved
notifications to be sent.
- Cut back to bare minimum to make the rest simpler
- Consistency in config naming
- Have one data strucutre that's the same for all templates
- Pass in common labels to templates
- Support templates almost everywhere
- Support multiple SMTP recipients
- Support non-ASCII SMTP headers
- Handle colour logic via templates
- Make $subjects have consistent output, go maps aren't sorted.
- Make tests pass when v6 is disabled