Commit Graph

52 Commits

Author SHA1 Message Date
George Robinson c4a763c401
#3513: Mark muted alerts (#3793)
* Mark muted groups

This commit updates TimeMuteStage and TimeActiveStage to mark groups
as muted when its alerts are muted by an active or mute time interval,
and remove any existing markers when outside all active and mute
time intervals.

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Remove unlock to defer

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-05-13 11:16:26 +01:00
George Robinson ca4c90eb4e
Fix race condition in dispatch.go (#3826)
* Fix race condition in dispatch.go

This commit fixes a race condition in dispatch.go that would cause
a firing alert to be deleted from the aggregation group when instead
it should have been flushed.

The root cause is a race condition that can occur when dispatch.go
deletes resolved alerts from the aggregation group following a
successful notification. If a firing alert with the same
fingerprint is added back to the aggregation group at the same time
then the firing alert can be deleted.

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-05-07 10:34:03 +01:00
George Robinson d31a249ffc
#3513: Add GroupMarker interface (#3792)
* Add GroupMarker interface

This commit adds a new GroupMarker interface that marks the status
of groups. For example, whether an alert is muted because or one
or more active or mute time intervals.

It renames the existing Marker interface to AlertMarker to avoid
confusion.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-04-30 15:26:04 +01:00
Matthieu MOREL b81bad8711 use Go standard errors
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-08 16:44:13 +01:00
Julius Volz 684484ef49
Remove unused Marker from Dispatcher struct (#2898)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2022-05-02 16:28:29 +02:00
Sinuhe Tellez Rivera d155153305
Adds: Active time interval (#2779)
* add active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix unittests for active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update dispatch/route.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* split the stage for active and mute intervals

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Adds doc for a helper function

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix code after commit suggestions

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Making mute_time_interval and time_intervals can coexist in the config

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* docs: configuration's doc has been updated about time intervals

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update config/config.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* updates configuration readme to improve active time description

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* merge deprecated mute_time_intervals and time_intervals

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update cmd/alertmanager/main.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update cmd/alertmanager/main.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fmt main.go

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix lint error

Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Document that matchers are ANDed together

Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Remove extra parentheticals

Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* config: root route should have empty matchers

Unmarshal should validate that the root route does
not contain any matchers. Prior to this change,
only the deprecated match structures were checked.

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* chore: Let git ignore temporary files for ui/app

Signed-off-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* adding max_alerts parameter to slack webhook config

correcting the logic to trucate fields instead of dropping alerts in the slack integration

Signed-off-by: Prashant Balachandran <pnair@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* *: bump to Go 1.17 (#2792)

* *: bump to Go 1.17

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* *: fix yamllint errors

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Automate CSS-inlining for default HTML email template (#2798)

* Automate CSS-inlining for default HTML email template

The original HTML email template was added in `template/email.html`.
It looks like the CSS was manually inlined.  Most likely using the
premailer.dialect.ca web form, which is mentioned in the README for
the Mailgun transactional-email-templates project.  The resulting HTML
with inlined CSS was then copied into `template/default.tmpl`.  This
has resulted in `email.html` and `default.tmpl` diverging at times.

This commit adds build automation to inline the CSS automatically
using [juice][1].  The Go template containing the resulting HTML has
been moved into its own file to avoid the script that performs the CSS
inlining having to parse the `default.tmpl` file to insert it there.

Fixes #1939.

[1]: https://www.npmjs.com/package/juice

Signed-off-by: Brad Ison <bison@xvdf.io>

* Update asset/assets_vfsdata.go

Signed-off-by: Brad Ison <bison@xvdf.io>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* go.{mod,sum}: update Go dependencies

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* amtool to support http_config to access alertmanager (#2764)

* Support http_config for amtool

Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* notify/sns: detect FIFO topic based on the rendered value

Since the TopicARN field is a template string, it's safer to check for
the ".fifo" suffix in the rendered string.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* config: delegate Sigv4 validation to the inner type

This change also adds unit tests for SNS configuration.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix unittests

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix comment about active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix another comment about active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Fix typo in documentation

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: clyang82 <chuyang@redhat.com>
Co-authored-by: Mac Chaffee <me@macchaffee.com>
Co-authored-by: Philip Gough <philip.p.gough@gmail.com>
Co-authored-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Co-authored-by: Prashant Balachandran <pnair@redhat.com>
Co-authored-by: Simon Pasquier <pasquier.simon@gmail.com>
Co-authored-by: Brad Ison <brad.ison@redhat.com>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
2022-03-04 15:24:29 +01:00
Julien Pivotto b2a4cacb95 Update go dependencies & switch to go-kit/log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-08-02 12:43:23 +02:00
Peter Štibraný d5ed7bfb15 Only register limit metrics when they are used.
Limits are not used in standalone alertmanager.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný 390474ffbe Added group limit to dispatcher.
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-06-02 12:00:31 +02:00
Marco Pracucci 72ef6e04e1
Fix race condition causing 1st alert to not be immediately delivered when group_wait is 0s
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 15:15:53 +02:00
Arve Knudsen 87b1cc6637 Unlock at specific points instead of deferring
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2021-04-27 10:44:18 +02:00
Arve Knudsen bd543f1345 Dispatch: Make sure mutex gets unlocked on call to Stop
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2021-04-27 09:25:16 +02:00
Ben Ridley 5d4231b001 Use consistent naming for mute time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
ben d1f5e07909 Add mute time stage and pipeline
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:01 +11:00
Jacob Lisi 0c0c6bdb01
Fix race condition in dispatcher (#2208)
* fix dispatcher race condition

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* add test to check for race condition in dispatcher

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* return when dispatcher Stop has nil receiver

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* remove unneeded chec

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
2020-03-19 15:32:37 +01:00
Marco Pracucci 1f77f320a7
Fixed dispatcher metrics registration
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-03-06 15:09:30 +01:00
Simon Pasquier 4f45457b9c
dispatch: add metrics (#2113)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-11-26 09:04:56 +01:00
Simon Pasquier 4535311c34 dispatch: don't garbage-collect alerts from store
The aggregation group is already responsible for removing the resolved
alerts. Running the garbage collection in parallel introduces a race and
eventually resolved notifications may be dropped.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-18 11:42:14 +02:00
Simon Pasquier ab537b5b2f
dispatch: fix missing receivers in Groups() (#1964)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-24 17:12:37 +02:00
bigMacro 5ff6cffa08 fix memory visibility error (#1936)
Signed-off-by: denghuan <denghuan@actionsky.com>
2019-06-25 10:11:45 +02:00
Simon Pasquier c78b449f4a provider/mem: fix dropped alerts
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-19 15:35:21 +02:00
stuart nelson 2fa210d0e3 add groups endpoint to v2 api
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-04-17 11:32:21 +02:00
Simon Pasquier a5e26cc721 *: log at debug level when context is canceled
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-03 16:41:03 +02:00
JoeWrightss b926c6935e Fix some typos in comment (#1750)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 14:57:08 +01:00
Brian Brazil 7078333202 Make a copy of firing alerts with EndsAt=0 when flushing. (#1686)
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.

Mitigate race and fix data races in TestAggrGroup.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-04 16:52:20 +01:00
kirillsablin 32bb289906 dispatch: Add group_by_all support (#1588)
To aggregate by all possible labels use '...' as the sole label name. 
This effectively disables aggregation entirely, passing through all 
alerts as-is. This is unlikely to be what you want, unless you have 
a very low alert volume or your upstream notification system performs 
its own grouping. Example: group_by: [...]

Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>
2018-11-29 12:31:14 +01:00
Simon Pasquier 306fd73e32 *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-09 10:00:23 +01:00
stuart nelson e883ccb9de
pull out shared code for storing alerts (#1507)
Move the code for storing and GC'ing alerts from being re-implemented in
several packages to existing in its own package

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-09-03 14:52:53 +02:00
Simon Pasquier 899226f3ac *: remove v1/alerts/groups API endpoint (#1525)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-23 16:03:49 +02:00
bigMacro f3bc41d256 fix concurrent read and wirte group error (#1447)
* fix concurrent read and wirte group

Signed-off-by: denghuan <denghuan@actionsky.com>

* make lock more elegant

Signed-off-by: denghuan <denghuan@actionsky.com>
2018-07-10 17:13:41 +02:00
Simon Pasquier 6a7c912559 Sort alerts in correct order (#1349)
* Sort dispatched alerts by job+instance in the correct order (#1178)

Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>

* dispatch: add unit test for alerts sorting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-14 15:54:33 +02:00
Simon Pasquier 0ebaeccd4b *: add missing license headers
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 17:37:13 +02:00
Simon Pasquier 4cba49155d dispatch: don't reset timer if flush is in-progress (#1301)
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
2018-03-29 12:22:49 +02:00
Ted Zlatanov 099b6a1d43 Sort dispatched alerts by job+instance then rest by default (#1178) (#1234) 2018-03-22 20:06:37 +01:00
pasquier-s 9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00
pasquier-s 907ac510f8 Fix flaky TestBatching acceptance test (#1193)
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.

The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.

If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
2018-01-11 22:45:59 +01:00
Julius Volz b145c51b99 Clarify variable names in Dispatcher.processAlert()
A single entry in aggrGroups is just a single group, not plural.
2017-11-01 15:06:23 +01:00
Julius Volz 947970af44 Convert Alertmanager to use non-global go-kit loggers
Fixes https://github.com/prometheus/alertmanager/issues/1040
2017-10-22 00:20:40 -07:00
Łukasz Mierzwa 8e61ebf6c3 Expose alert fingerprint in the API (#786)
* Expose alert fingerprint in the API

Alert fingerprint is already provided as the value of status.inhibitedBy[] attribute that inhibited alerts have, but there's no way to get back to the alert that's inhibiting it as the fingerprint is not exposed.

* Expose alert fingerprint as ID in the list endpoint

* Rename ID to Fingerprint

* Use Fingerprint().String() in the API
2017-08-18 19:30:18 +02:00
stuart nelson a7009a9db7 Stn/add receiver support (#872)
Add ability to filter alerts by receiver in UI. This adds changes both in the Elm UI, as well as the Go backend.
2017-06-26 18:20:26 +02:00
Fabian Reinartz 8170206070 Fix alert status handling in UI 2017-05-08 12:56:03 +02:00
Łukasz Mierzwa 8bc5855c87 Serialize AlertStatus as 'status'
AlertStatus doesn't have json tag with the field name, so it's serialized into 'Status', and it's the only uppercase field in the alert object. Tag it with 'status' name for consistency
2017-04-28 11:34:01 -07:00
stuart nelson 6a909abf17 Add processing status field to alert 2017-04-27 14:18:52 +02:00
Fabian Reinartz 3269bc39e1 *: switch group key to matcher serialization
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
2017-04-21 12:06:23 +02:00
stuart nelson 1e34f29532 Filter alerts (#633)
* Vendor dependencies.

This updates several old dependencies, removes
some that are no longer needed, and adds
`pkg/labels` from prometheus `dev-2.0` branch.

* Add metrics selector parsing code

This is a temporary simplified re-implementation
of promQL's metric selector parsing.

* Add alerts filtering

Filter alerts through `?filter=` query string.

* Add silences filtering

Filter silences through `?filter=` query string.

* Move `parse` to `pkg/parse`
2017-03-16 11:16:10 +01:00
Felix 9fcdb47faa add GroupKey() to aggrGroup, add it to alerts/groups and use the function for context initialization 2016-12-07 11:05:11 +08:00
Fabian Reinartz e9fbe62e0f *: consider mesh wait in notification timeouts
This adds the peer wait duration to the standard timeout to avoid
terminating a notification prematurely while being in failover
wait status.
2016-09-05 13:21:28 +02:00
Fabian Reinartz a4e8703567 *: integrate new silence package 2016-08-30 12:15:23 +02:00
Fabian Reinartz d2a556b269 notify: include context in Stage interface
This adds context.Context to the return arguments of a Stage.
This is necessary to propagate modified contexts.
2016-08-18 11:42:37 +02:00
Fabian Reinartz 998a9ce38e notify: rename Receiver to ReceiverName
This string value is initially used to store a receiver name. It is
later overloaded with a unique string identifier of <name, integration,
index>.
This renaming is in preparation to separate the two and use the Receiver
object of the nflogpb package.
2016-08-16 16:33:17 +02:00