* Mark muted groups
This commit updates TimeMuteStage and TimeActiveStage to mark groups
as muted when its alerts are muted by an active or mute time interval,
and remove any existing markers when outside all active and mute
time intervals.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove unlock to defer
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Fix race condition in dispatch.go
This commit fixes a race condition in dispatch.go that would cause
a firing alert to be deleted from the aggregation group when instead
it should have been flushed.
The root cause is a race condition that can occur when dispatch.go
deletes resolved alerts from the aggregation group following a
successful notification. If a firing alert with the same
fingerprint is added back to the aggregation group at the same time
then the firing alert can be deleted.
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add GroupMarker interface
This commit adds a new GroupMarker interface that marks the status
of groups. For example, whether an alert is muted because or one
or more active or mute time intervals.
It renames the existing Marker interface to AlertMarker to avoid
confusion.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* add active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix unittests for active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update dispatch/route.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* split the stage for active and mute intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Adds doc for a helper function
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix code after commit suggestions
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Making mute_time_interval and time_intervals can coexist in the config
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* docs: configuration's doc has been updated about time intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update config/config.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* updates configuration readme to improve active time description
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* merge deprecated mute_time_intervals and time_intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update cmd/alertmanager/main.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update cmd/alertmanager/main.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fmt main.go
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix lint error
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Document that matchers are ANDed together
Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Remove extra parentheticals
Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* config: root route should have empty matchers
Unmarshal should validate that the root route does
not contain any matchers. Prior to this change,
only the deprecated match structures were checked.
Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* chore: Let git ignore temporary files for ui/app
Signed-off-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* adding max_alerts parameter to slack webhook config
correcting the logic to trucate fields instead of dropping alerts in the slack integration
Signed-off-by: Prashant Balachandran <pnair@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* *: bump to Go 1.17 (#2792)
* *: bump to Go 1.17
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: fix yamllint errors
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Automate CSS-inlining for default HTML email template (#2798)
* Automate CSS-inlining for default HTML email template
The original HTML email template was added in `template/email.html`.
It looks like the CSS was manually inlined. Most likely using the
premailer.dialect.ca web form, which is mentioned in the README for
the Mailgun transactional-email-templates project. The resulting HTML
with inlined CSS was then copied into `template/default.tmpl`. This
has resulted in `email.html` and `default.tmpl` diverging at times.
This commit adds build automation to inline the CSS automatically
using [juice][1]. The Go template containing the resulting HTML has
been moved into its own file to avoid the script that performs the CSS
inlining having to parse the `default.tmpl` file to insert it there.
Fixes#1939.
[1]: https://www.npmjs.com/package/juice
Signed-off-by: Brad Ison <bison@xvdf.io>
* Update asset/assets_vfsdata.go
Signed-off-by: Brad Ison <bison@xvdf.io>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* go.{mod,sum}: update Go dependencies
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* amtool to support http_config to access alertmanager (#2764)
* Support http_config for amtool
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* notify/sns: detect FIFO topic based on the rendered value
Since the TopicARN field is a template string, it's safer to check for
the ".fifo" suffix in the rendered string.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* config: delegate Sigv4 validation to the inner type
This change also adds unit tests for SNS configuration.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix unittests
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix comment about active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix another comment about active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Fix typo in documentation
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: clyang82 <chuyang@redhat.com>
Co-authored-by: Mac Chaffee <me@macchaffee.com>
Co-authored-by: Philip Gough <philip.p.gough@gmail.com>
Co-authored-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Co-authored-by: Prashant Balachandran <pnair@redhat.com>
Co-authored-by: Simon Pasquier <pasquier.simon@gmail.com>
Co-authored-by: Brad Ison <brad.ison@redhat.com>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Limits are not used in standalone alertmanager.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
* fix dispatcher race condition
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
* add test to check for race condition in dispatcher
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
* return when dispatcher Stop has nil receiver
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
* remove unneeded chec
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
The aggregation group is already responsible for removing the resolved
alerts. Running the garbage collection in parallel introduces a race and
eventually resolved notifications may be dropped.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.
Mitigate race and fix data races in TestAggrGroup.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
To aggregate by all possible labels use '...' as the sole label name.
This effectively disables aggregation entirely, passing through all
alerts as-is. This is unlikely to be what you want, unless you have
a very low alert volume or your upstream notification system performs
its own grouping. Example: group_by: [...]
Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>
Move the code for storing and GC'ing alerts from being re-implemented in
several packages to existing in its own package
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
* fix concurrent read and wirte group
Signed-off-by: denghuan <denghuan@actionsky.com>
* make lock more elegant
Signed-off-by: denghuan <denghuan@actionsky.com>
* Sort dispatched alerts by job+instance in the correct order (#1178)
Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>
* dispatch: add unit test for alerts sorting
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.
The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.
If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
* Expose alert fingerprint in the API
Alert fingerprint is already provided as the value of status.inhibitedBy[] attribute that inhibited alerts have, but there's no way to get back to the alert that's inhibiting it as the fingerprint is not exposed.
* Expose alert fingerprint as ID in the list endpoint
* Rename ID to Fingerprint
* Use Fingerprint().String() in the API
AlertStatus doesn't have json tag with the field name, so it's serialized into 'Status', and it's the only uppercase field in the alert object. Tag it with 'status' name for consistency
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
* Vendor dependencies.
This updates several old dependencies, removes
some that are no longer needed, and adds
`pkg/labels` from prometheus `dev-2.0` branch.
* Add metrics selector parsing code
This is a temporary simplified re-implementation
of promQL's metric selector parsing.
* Add alerts filtering
Filter alerts through `?filter=` query string.
* Add silences filtering
Filter silences through `?filter=` query string.
* Move `parse` to `pkg/parse`
This string value is initially used to store a receiver name. It is
later overloaded with a unique string identifier of <name, integration,
index>.
This renaming is in preparation to separate the two and use the Receiver
object of the nflogpb package.