Commit Graph

58 Commits

Author SHA1 Message Date
Arve Knudsen 87b1cc6637 Unlock at specific points instead of deferring
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2021-04-27 10:44:18 +02:00
Arve Knudsen bd543f1345 Dispatch: Make sure mutex gets unlocked on call to Stop
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2021-04-27 09:25:16 +02:00
Ben Ridley 5983d2078d Fix formatting
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
Ben Ridley 5d4231b001 Use consistent naming for mute time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
ben d1f5e07909 Add mute time stage and pipeline
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:01 +11:00
ben cbfbf07188 Allow routes to reference time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:00 +11:00
Atibhi Agrawal 6b36afbbec
Add negative matchers for routing. (#2434)
Add negative route matchers using label.Matcher

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

Signed-off-by: beorn7 <beorn@grafana.com>

Co-authored-by: Björn Rabenstein <beorn@grafana.com>
2021-01-15 21:11:39 +01:00
Jacob Lisi 0c0c6bdb01
Fix race condition in dispatcher (#2208)
* fix dispatcher race condition

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* add test to check for race condition in dispatcher

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* return when dispatcher Stop has nil receiver

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* remove unneeded chec

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
2020-03-19 15:32:37 +01:00
Marco Pracucci 1f77f320a7
Fixed dispatcher metrics registration
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-03-06 15:09:30 +01:00
Sho Okada 04ca507125 Inherit their parent route's grouping when "group_by: [...]" (#2154)
Signed-off-by: Sho Okada <shokada3@gmail.com>
2020-01-10 14:20:03 +01:00
johncming 134c3c0ed9 move walkRoute to dispatch package. (#2136)
Signed-off-by: johncming <johncming@yahoo.com>
2019-12-20 15:27:58 +01:00
Simon Pasquier b49ebfc683
Merge release 0.20 (#2140)
* Revert "slack: retry 429 errors (#2112)" (#2128)

This reverts commit 26cc96a787.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "config: remove support for JSON marshaling (#2086)" (#2133)

This reverts commit 918f08b66a.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* config: fix JSON unmarshaling for HostPort (#2134)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Cut 0.20.0 (#2137)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 16:35:19 +01:00
Simon Pasquier 4f45457b9c
dispatch: add metrics (#2113)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-11-26 09:04:56 +01:00
Simon Pasquier 918f08b66a
config: remove support for JSON marshaling (#2086)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-29 10:45:42 +01:00
johncming bad2e792ca dispatch: route group labels should contain group common label. (#2055)
Signed-off-by: johncming <johncming@yahoo.com>
2019-10-02 14:54:34 +02:00
Simon Pasquier 4535311c34 dispatch: don't garbage-collect alerts from store
The aggregation group is already responsible for removing the resolved
alerts. Running the garbage collection in parallel introduces a race and
eventually resolved notifications may be dropped.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-18 11:42:14 +02:00
Simon Pasquier ab537b5b2f
dispatch: fix missing receivers in Groups() (#1964)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-24 17:12:37 +02:00
Simon Pasquier 612222b693 dispatch: use strings.Builder instead of []byte
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-15 15:27:37 +02:00
bigMacro 5ff6cffa08 fix memory visibility error (#1936)
Signed-off-by: denghuan <denghuan@actionsky.com>
2019-06-25 10:11:45 +02:00
Simon Pasquier 2ccb4707f1 dispatch: fix flaky test
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-23 11:16:48 +02:00
Simon Pasquier c78b449f4a provider/mem: fix dropped alerts
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-19 15:35:21 +02:00
stuart nelson 2fa210d0e3 add groups endpoint to v2 api
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-04-17 11:32:21 +02:00
Simon Pasquier a5e26cc721 *: log at debug level when context is canceled
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-03 16:41:03 +02:00
JoeWrightss b926c6935e Fix some typos in comment (#1750)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 14:57:08 +01:00
Brian Brazil 7078333202 Make a copy of firing alerts with EndsAt=0 when flushing. (#1686)
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.

Mitigate race and fix data races in TestAggrGroup.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-04 16:52:20 +01:00
kirillsablin 32bb289906 dispatch: Add group_by_all support (#1588)
To aggregate by all possible labels use '...' as the sole label name. 
This effectively disables aggregation entirely, passing through all 
alerts as-is. This is unlikely to be what you want, unless you have 
a very low alert volume or your upstream notification system performs 
its own grouping. Example: group_by: [...]

Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>
2018-11-29 12:31:14 +01:00
Simon Pasquier 306fd73e32 *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-09 10:00:23 +01:00
stuart nelson e883ccb9de
pull out shared code for storing alerts (#1507)
Move the code for storing and GC'ing alerts from being re-implemented in
several packages to existing in its own package

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-09-03 14:52:53 +02:00
Simon Pasquier 899226f3ac *: remove v1/alerts/groups API endpoint (#1525)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-23 16:03:49 +02:00
bigMacro f3bc41d256 fix concurrent read and wirte group error (#1447)
* fix concurrent read and wirte group

Signed-off-by: denghuan <denghuan@actionsky.com>

* make lock more elegant

Signed-off-by: denghuan <denghuan@actionsky.com>
2018-07-10 17:13:41 +02:00
Simon Pasquier 6a7c912559 Sort alerts in correct order (#1349)
* Sort dispatched alerts by job+instance in the correct order (#1178)

Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>

* dispatch: add unit test for alerts sorting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-14 15:54:33 +02:00
Simon Pasquier 0ebaeccd4b *: add missing license headers
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 17:37:13 +02:00
Manos Fokas 300a87e85b Removed file changes to resolve conflict. (#1318)
Signed-off-by: manosf <manosf@protonmail.com>
2018-04-17 16:22:46 +02:00
Simon Pasquier 4cba49155d dispatch: don't reset timer if flush is in-progress (#1301)
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
2018-03-29 12:22:49 +02:00
Ted Zlatanov 099b6a1d43 Sort dispatched alerts by job+instance then rest by default (#1178) (#1234) 2018-03-22 20:06:37 +01:00
Brian Brazil aa950668bf The default group_by is meant to be no labels. (#1287)
This is what the intended default is, and what
the documentation says.
2018-03-16 18:39:23 +01:00
pasquier-s c39a913f8a test: enable race detection (#1262)
This change enables race detection when running the tests. It also fixes
a couple of existing race conditions.
2018-02-27 18:18:53 +01:00
Brian Brazil 5cb71e1def Fix spelling and comment style. (#1257) 2018-02-27 10:07:33 +01:00
pasquier-s 9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00
pasquier-s 907ac510f8 Fix flaky TestBatching acceptance test (#1193)
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.

The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.

If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
2018-01-11 22:45:59 +01:00
Julius Volz b145c51b99 Clarify variable names in Dispatcher.processAlert()
A single entry in aggrGroups is just a single group, not plural.
2017-11-01 15:06:23 +01:00
Julius Volz 947970af44 Convert Alertmanager to use non-global go-kit loggers
Fixes https://github.com/prometheus/alertmanager/issues/1040
2017-10-22 00:20:40 -07:00
Frederic Branczyk 5328885fe9 dispatch: fix race condition in dispatch test (#1025) 2017-10-04 18:01:23 +02:00
Łukasz Mierzwa 8e61ebf6c3 Expose alert fingerprint in the API (#786)
* Expose alert fingerprint in the API

Alert fingerprint is already provided as the value of status.inhibitedBy[] attribute that inhibited alerts have, but there's no way to get back to the alert that's inhibiting it as the fingerprint is not exposed.

* Expose alert fingerprint as ID in the list endpoint

* Rename ID to Fingerprint

* Use Fingerprint().String() in the API
2017-08-18 19:30:18 +02:00
stuart nelson a7009a9db7 Stn/add receiver support (#872)
Add ability to filter alerts by receiver in UI. This adds changes both in the Elm UI, as well as the Go backend.
2017-06-26 18:20:26 +02:00
Corentin Chary 9b2afbf18b Make sure Matchers are always ordered
This fixes https://github.com/prometheus/alertmanager/issues/881
Also add some unit tests
2017-06-23 15:30:34 +02:00
Fabian Reinartz 8170206070 Fix alert status handling in UI 2017-05-08 12:56:03 +02:00
Łukasz Mierzwa 8bc5855c87 Serialize AlertStatus as 'status'
AlertStatus doesn't have json tag with the field name, so it's serialized into 'Status', and it's the only uppercase field in the alert object. Tag it with 'status' name for consistency
2017-04-28 11:34:01 -07:00
stuart nelson 6a909abf17 Add processing status field to alert 2017-04-27 14:18:52 +02:00
Fabian Reinartz 3269bc39e1 *: switch group key to matcher serialization
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
2017-04-21 12:06:23 +02:00