Commit Graph

69 Commits

Author SHA1 Message Date
Julien Pivotto b2a4cacb95 Update go dependencies & switch to go-kit/log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-08-02 12:43:23 +02:00
Peter Štibraný 0bb65d1e4b Reduce number of dispatched alerts to avoid hitting the limit on number of alive goroutines.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 15:28:00 +02:00
Peter Štibraný b3ea60e9bb Fix compilation errors after rebase on master.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 15:14:55 +02:00
Peter Štibraný 358645cfe2 Extract TestGroupsWithLimits, and remove limit test from TestGroups.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný 0f86edcf5c Extract TestGroupsWithLimits, and remove limit test from TestGroups.
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný d5ed7bfb15 Only register limit metrics when they are used.
Limits are not used in standalone alertmanager.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný 390474ffbe Added group limit to dispatcher.
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-06-02 12:00:31 +02:00
Peter Štibraný cc0b08fd7c Added possibility to pass callback to *mem.NewAlerts, useful for implementing limits on alerts.
Update provider/mem/mem.go

Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2021-05-31 09:56:57 +02:00
Marco Pracucci f84af78693
Lowered number of alert groups
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 16:15:46 +02:00
Marco Pracucci 1ad22c808f
Added unit test
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 15:48:02 +02:00
Marco Pracucci 72ef6e04e1
Fix race condition causing 1st alert to not be immediately delivered when group_wait is 0s
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-11 15:15:53 +02:00
Arve Knudsen 87b1cc6637 Unlock at specific points instead of deferring
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2021-04-27 10:44:18 +02:00
Arve Knudsen bd543f1345 Dispatch: Make sure mutex gets unlocked on call to Stop
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2021-04-27 09:25:16 +02:00
Ben Ridley 5983d2078d Fix formatting
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
Ben Ridley 5d4231b001 Use consistent naming for mute time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
ben d1f5e07909 Add mute time stage and pipeline
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:01 +11:00
ben cbfbf07188 Allow routes to reference time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:00 +11:00
Atibhi Agrawal 6b36afbbec
Add negative matchers for routing. (#2434)
Add negative route matchers using label.Matcher

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

Signed-off-by: beorn7 <beorn@grafana.com>

Co-authored-by: Björn Rabenstein <beorn@grafana.com>
2021-01-15 21:11:39 +01:00
Jacob Lisi 0c0c6bdb01
Fix race condition in dispatcher (#2208)
* fix dispatcher race condition

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* add test to check for race condition in dispatcher

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* return when dispatcher Stop has nil receiver

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* remove unneeded chec

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
2020-03-19 15:32:37 +01:00
Marco Pracucci 1f77f320a7
Fixed dispatcher metrics registration
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-03-06 15:09:30 +01:00
Sho Okada 04ca507125 Inherit their parent route's grouping when "group_by: [...]" (#2154)
Signed-off-by: Sho Okada <shokada3@gmail.com>
2020-01-10 14:20:03 +01:00
johncming 134c3c0ed9 move walkRoute to dispatch package. (#2136)
Signed-off-by: johncming <johncming@yahoo.com>
2019-12-20 15:27:58 +01:00
Simon Pasquier b49ebfc683
Merge release 0.20 (#2140)
* Revert "slack: retry 429 errors (#2112)" (#2128)

This reverts commit 26cc96a787.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "config: remove support for JSON marshaling (#2086)" (#2133)

This reverts commit 918f08b66a.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* config: fix JSON unmarshaling for HostPort (#2134)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Cut 0.20.0 (#2137)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 16:35:19 +01:00
Simon Pasquier 4f45457b9c
dispatch: add metrics (#2113)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-11-26 09:04:56 +01:00
Simon Pasquier 918f08b66a
config: remove support for JSON marshaling (#2086)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-29 10:45:42 +01:00
johncming bad2e792ca dispatch: route group labels should contain group common label. (#2055)
Signed-off-by: johncming <johncming@yahoo.com>
2019-10-02 14:54:34 +02:00
Simon Pasquier 4535311c34 dispatch: don't garbage-collect alerts from store
The aggregation group is already responsible for removing the resolved
alerts. Running the garbage collection in parallel introduces a race and
eventually resolved notifications may be dropped.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-18 11:42:14 +02:00
Simon Pasquier ab537b5b2f
dispatch: fix missing receivers in Groups() (#1964)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-24 17:12:37 +02:00
Simon Pasquier 612222b693 dispatch: use strings.Builder instead of []byte
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-15 15:27:37 +02:00
bigMacro 5ff6cffa08 fix memory visibility error (#1936)
Signed-off-by: denghuan <denghuan@actionsky.com>
2019-06-25 10:11:45 +02:00
Simon Pasquier 2ccb4707f1 dispatch: fix flaky test
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-23 11:16:48 +02:00
Simon Pasquier c78b449f4a provider/mem: fix dropped alerts
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-19 15:35:21 +02:00
stuart nelson 2fa210d0e3 add groups endpoint to v2 api
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-04-17 11:32:21 +02:00
Simon Pasquier a5e26cc721 *: log at debug level when context is canceled
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-03 16:41:03 +02:00
JoeWrightss b926c6935e Fix some typos in comment (#1750)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 14:57:08 +01:00
Brian Brazil 7078333202 Make a copy of firing alerts with EndsAt=0 when flushing. (#1686)
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.

Mitigate race and fix data races in TestAggrGroup.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-04 16:52:20 +01:00
kirillsablin 32bb289906 dispatch: Add group_by_all support (#1588)
To aggregate by all possible labels use '...' as the sole label name. 
This effectively disables aggregation entirely, passing through all 
alerts as-is. This is unlikely to be what you want, unless you have 
a very low alert volume or your upstream notification system performs 
its own grouping. Example: group_by: [...]

Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>
2018-11-29 12:31:14 +01:00
Simon Pasquier 306fd73e32 *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-09 10:00:23 +01:00
stuart nelson e883ccb9de
pull out shared code for storing alerts (#1507)
Move the code for storing and GC'ing alerts from being re-implemented in
several packages to existing in its own package

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-09-03 14:52:53 +02:00
Simon Pasquier 899226f3ac *: remove v1/alerts/groups API endpoint (#1525)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-23 16:03:49 +02:00
bigMacro f3bc41d256 fix concurrent read and wirte group error (#1447)
* fix concurrent read and wirte group

Signed-off-by: denghuan <denghuan@actionsky.com>

* make lock more elegant

Signed-off-by: denghuan <denghuan@actionsky.com>
2018-07-10 17:13:41 +02:00
Simon Pasquier 6a7c912559 Sort alerts in correct order (#1349)
* Sort dispatched alerts by job+instance in the correct order (#1178)

Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>

* dispatch: add unit test for alerts sorting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-14 15:54:33 +02:00
Simon Pasquier 0ebaeccd4b *: add missing license headers
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 17:37:13 +02:00
Manos Fokas 300a87e85b Removed file changes to resolve conflict. (#1318)
Signed-off-by: manosf <manosf@protonmail.com>
2018-04-17 16:22:46 +02:00
Simon Pasquier 4cba49155d dispatch: don't reset timer if flush is in-progress (#1301)
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
2018-03-29 12:22:49 +02:00
Ted Zlatanov 099b6a1d43 Sort dispatched alerts by job+instance then rest by default (#1178) (#1234) 2018-03-22 20:06:37 +01:00
Brian Brazil aa950668bf The default group_by is meant to be no labels. (#1287)
This is what the intended default is, and what
the documentation says.
2018-03-16 18:39:23 +01:00
pasquier-s c39a913f8a test: enable race detection (#1262)
This change enables race detection when running the tests. It also fixes
a couple of existing race conditions.
2018-02-27 18:18:53 +01:00
Brian Brazil 5cb71e1def Fix spelling and comment style. (#1257) 2018-02-27 10:07:33 +01:00
pasquier-s 9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00