Commit Graph

74 Commits

Author SHA1 Message Date
gotjosh 805e505288
Alert metric reports different results to what the user sees via API (#2943)
* Alert metric reports different results to what the user sees via API

Fixes #1439 and #2619.

The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-06-16 12:16:06 +02:00
Julius Volz 710588f10f
Remove unneeded nil check before ranging over slice (#2900)
Ranging over a nil slice is just a noop as well.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2022-05-02 16:29:04 +02:00
Julius Volz 684484ef49
Remove unused Marker from Dispatcher struct (#2898)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2022-05-02 16:28:29 +02:00
Matthias Loibl a6d10bd5bc
Update golangci-lint and fix complaints (#2853)
* Copy latest golangci-lint files from Prometheus

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use grafana/regexp over stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix typos in comments

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix goimports complains in import sorting

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* gofumpt all Go files

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Update naming to comply with revive linter

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* test/cli: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* .golangci.yaml: Remove obsolete space

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix expected victorOps error

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Clean up Go modules

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2022-03-25 17:59:51 +01:00
Sinuhe Tellez Rivera d155153305
Adds: Active time interval (#2779)
* add active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix unittests for active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update dispatch/route.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* split the stage for active and mute intervals

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Adds doc for a helper function

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update notify/notify.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix code after commit suggestions

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Making mute_time_interval and time_intervals can coexist in the config

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* docs: configuration's doc has been updated about time intervals

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update config/config.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* updates configuration readme to improve active time description

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* merge deprecated mute_time_intervals and time_intervals

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update cmd/alertmanager/main.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update cmd/alertmanager/main.go

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fmt main.go

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix lint error

Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Document that matchers are ANDed together

Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Remove extra parentheticals

Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* config: root route should have empty matchers

Unmarshal should validate that the root route does
not contain any matchers. Prior to this change,
only the deprecated match structures were checked.

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* chore: Let git ignore temporary files for ui/app

Signed-off-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* adding max_alerts parameter to slack webhook config

correcting the logic to trucate fields instead of dropping alerts in the slack integration

Signed-off-by: Prashant Balachandran <pnair@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* *: bump to Go 1.17 (#2792)

* *: bump to Go 1.17

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* *: fix yamllint errors

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Automate CSS-inlining for default HTML email template (#2798)

* Automate CSS-inlining for default HTML email template

The original HTML email template was added in `template/email.html`.
It looks like the CSS was manually inlined.  Most likely using the
premailer.dialect.ca web form, which is mentioned in the README for
the Mailgun transactional-email-templates project.  The resulting HTML
with inlined CSS was then copied into `template/default.tmpl`.  This
has resulted in `email.html` and `default.tmpl` diverging at times.

This commit adds build automation to inline the CSS automatically
using [juice][1].  The Go template containing the resulting HTML has
been moved into its own file to avoid the script that performs the CSS
inlining having to parse the `default.tmpl` file to insert it there.

Fixes #1939.

[1]: https://www.npmjs.com/package/juice

Signed-off-by: Brad Ison <bison@xvdf.io>

* Update asset/assets_vfsdata.go

Signed-off-by: Brad Ison <bison@xvdf.io>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* go.{mod,sum}: update Go dependencies

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* amtool to support http_config to access alertmanager (#2764)

* Support http_config for amtool

Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* notify/sns: detect FIFO topic based on the rendered value

Since the TopicARN field is a template string, it's safer to check for
the ".fifo" suffix in the rendered string.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* config: delegate Sigv4 validation to the inner type

This change also adds unit tests for SNS configuration.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix unittests

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix comment about active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* fix another comment about active time interval

Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Fix typo in documentation

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

* Update docs/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: clyang82 <chuyang@redhat.com>
Co-authored-by: Mac Chaffee <me@macchaffee.com>
Co-authored-by: Philip Gough <philip.p.gough@gmail.com>
Co-authored-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Co-authored-by: Prashant Balachandran <pnair@redhat.com>
Co-authored-by: Simon Pasquier <pasquier.simon@gmail.com>
Co-authored-by: Brad Ison <brad.ison@redhat.com>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
2022-03-04 15:24:29 +01:00
Julien Pivotto b2a4cacb95 Update go dependencies & switch to go-kit/log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-08-02 12:43:23 +02:00
Peter Štibraný 0bb65d1e4b Reduce number of dispatched alerts to avoid hitting the limit on number of alive goroutines.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 15:28:00 +02:00
Peter Štibraný b3ea60e9bb Fix compilation errors after rebase on master.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 15:14:55 +02:00
Peter Štibraný 358645cfe2 Extract TestGroupsWithLimits, and remove limit test from TestGroups.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný 0f86edcf5c Extract TestGroupsWithLimits, and remove limit test from TestGroups.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný d5ed7bfb15 Only register limit metrics when they are used.
Limits are not used in standalone alertmanager.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný 390474ffbe Added group limit to dispatcher.
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný cc0b08fd7c Added possibility to pass callback to *mem.NewAlerts, useful for implementing limits on alerts.
Update provider/mem/mem.go

Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-05-31 09:56:57 +02:00
Marco Pracucci f84af78693
Lowered number of alert groups
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 16:15:46 +02:00
Marco Pracucci 1ad22c808f
Added unit test
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 15:48:02 +02:00
Marco Pracucci 72ef6e04e1
Fix race condition causing 1st alert to not be immediately delivered when group_wait is 0s
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 15:15:53 +02:00
Arve Knudsen 87b1cc6637 Unlock at specific points instead of deferring
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2021-04-27 10:44:18 +02:00
Arve Knudsen bd543f1345 Dispatch: Make sure mutex gets unlocked on call to Stop
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2021-04-27 09:25:16 +02:00
Ben Ridley 5983d2078d Fix formatting
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
Ben Ridley 5d4231b001 Use consistent naming for mute time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
ben d1f5e07909 Add mute time stage and pipeline
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:01 +11:00
ben cbfbf07188 Allow routes to reference time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:00 +11:00
Atibhi Agrawal 6b36afbbec
Add negative matchers for routing. (#2434)
Add negative route matchers using label.Matcher

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

Signed-off-by: beorn7 <beorn@grafana.com>

Co-authored-by: Björn Rabenstein <beorn@grafana.com>
2021-01-15 21:11:39 +01:00
Jacob Lisi 0c0c6bdb01
Fix race condition in dispatcher (#2208)
* fix dispatcher race condition

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* add test to check for race condition in dispatcher

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* return when dispatcher Stop has nil receiver

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* remove unneeded chec

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
2020-03-19 15:32:37 +01:00
Marco Pracucci 1f77f320a7
Fixed dispatcher metrics registration
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-03-06 15:09:30 +01:00
Sho Okada 04ca507125 Inherit their parent route's grouping when "group_by: [...]" (#2154)
Signed-off-by: Sho Okada <shokada3@gmail.com>
2020-01-10 14:20:03 +01:00
johncming 134c3c0ed9 move walkRoute to dispatch package. (#2136)
Signed-off-by: johncming <johncming@yahoo.com>
2019-12-20 15:27:58 +01:00
Simon Pasquier b49ebfc683
Merge release 0.20 (#2140)
* Revert "slack: retry 429 errors (#2112)" (#2128)

This reverts commit 26cc96a787.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "config: remove support for JSON marshaling (#2086)" (#2133)

This reverts commit 918f08b66a.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* config: fix JSON unmarshaling for HostPort (#2134)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Cut 0.20.0 (#2137)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 16:35:19 +01:00
Simon Pasquier 4f45457b9c
dispatch: add metrics (#2113)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-11-26 09:04:56 +01:00
Simon Pasquier 918f08b66a
config: remove support for JSON marshaling (#2086)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-29 10:45:42 +01:00
johncming bad2e792ca dispatch: route group labels should contain group common label. (#2055)
Signed-off-by: johncming <johncming@yahoo.com>
2019-10-02 14:54:34 +02:00
Simon Pasquier 4535311c34 dispatch: don't garbage-collect alerts from store
The aggregation group is already responsible for removing the resolved
alerts. Running the garbage collection in parallel introduces a race and
eventually resolved notifications may be dropped.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-18 11:42:14 +02:00
Simon Pasquier ab537b5b2f
dispatch: fix missing receivers in Groups() (#1964)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-24 17:12:37 +02:00
Simon Pasquier 612222b693 dispatch: use strings.Builder instead of []byte
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-15 15:27:37 +02:00
bigMacro 5ff6cffa08 fix memory visibility error (#1936)
Signed-off-by: denghuan <denghuan@actionsky.com>
2019-06-25 10:11:45 +02:00
Simon Pasquier 2ccb4707f1 dispatch: fix flaky test
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-23 11:16:48 +02:00
Simon Pasquier c78b449f4a provider/mem: fix dropped alerts
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-19 15:35:21 +02:00
stuart nelson 2fa210d0e3 add groups endpoint to v2 api
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-04-17 11:32:21 +02:00
Simon Pasquier a5e26cc721 *: log at debug level when context is canceled
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-03 16:41:03 +02:00
JoeWrightss b926c6935e Fix some typos in comment (#1750)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 14:57:08 +01:00
Brian Brazil 7078333202 Make a copy of firing alerts with EndsAt=0 when flushing. (#1686)
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.

Mitigate race and fix data races in TestAggrGroup.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-04 16:52:20 +01:00
kirillsablin 32bb289906 dispatch: Add group_by_all support (#1588)
To aggregate by all possible labels use '...' as the sole label name. 
This effectively disables aggregation entirely, passing through all 
alerts as-is. This is unlikely to be what you want, unless you have 
a very low alert volume or your upstream notification system performs 
its own grouping. Example: group_by: [...]

Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>
2018-11-29 12:31:14 +01:00
Simon Pasquier 306fd73e32 *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-09 10:00:23 +01:00
stuart nelson e883ccb9de
pull out shared code for storing alerts (#1507)
Move the code for storing and GC'ing alerts from being re-implemented in
several packages to existing in its own package

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-09-03 14:52:53 +02:00
Simon Pasquier 899226f3ac *: remove v1/alerts/groups API endpoint (#1525)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-23 16:03:49 +02:00
bigMacro f3bc41d256 fix concurrent read and wirte group error (#1447)
* fix concurrent read and wirte group

Signed-off-by: denghuan <denghuan@actionsky.com>

* make lock more elegant

Signed-off-by: denghuan <denghuan@actionsky.com>
2018-07-10 17:13:41 +02:00
Simon Pasquier 6a7c912559 Sort alerts in correct order (#1349)
* Sort dispatched alerts by job+instance in the correct order (#1178)

Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>

* dispatch: add unit test for alerts sorting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-14 15:54:33 +02:00
Simon Pasquier 0ebaeccd4b *: add missing license headers
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 17:37:13 +02:00
Manos Fokas 300a87e85b Removed file changes to resolve conflict. (#1318)
Signed-off-by: manosf <manosf@protonmail.com>
2018-04-17 16:22:46 +02:00
Simon Pasquier 4cba49155d dispatch: don't reset timer if flush is in-progress (#1301)
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
2018-03-29 12:22:49 +02:00