Commit Graph

43 Commits

Author SHA1 Message Date
Simon Pasquier
4535311c34 dispatch: don't garbage-collect alerts from store
The aggregation group is already responsible for removing the resolved
alerts. Running the garbage collection in parallel introduces a race and
eventually resolved notifications may be dropped.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-18 11:42:14 +02:00
Simon Pasquier
ab537b5b2f
dispatch: fix missing receivers in Groups() (#1964)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-24 17:12:37 +02:00
Simon Pasquier
612222b693 dispatch: use strings.Builder instead of []byte
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-15 15:27:37 +02:00
bigMacro
5ff6cffa08 fix memory visibility error (#1936)
Signed-off-by: denghuan <denghuan@actionsky.com>
2019-06-25 10:11:45 +02:00
Simon Pasquier
2ccb4707f1 dispatch: fix flaky test
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-23 11:16:48 +02:00
Simon Pasquier
c78b449f4a provider/mem: fix dropped alerts
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-19 15:35:21 +02:00
stuart nelson
2fa210d0e3 add groups endpoint to v2 api
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-04-17 11:32:21 +02:00
Simon Pasquier
a5e26cc721 *: log at debug level when context is canceled
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-03 16:41:03 +02:00
JoeWrightss
b926c6935e Fix some typos in comment (#1750)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 14:57:08 +01:00
Brian Brazil
7078333202 Make a copy of firing alerts with EndsAt=0 when flushing. (#1686)
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.

Mitigate race and fix data races in TestAggrGroup.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-04 16:52:20 +01:00
kirillsablin
32bb289906 dispatch: Add group_by_all support (#1588)
To aggregate by all possible labels use '...' as the sole label name. 
This effectively disables aggregation entirely, passing through all 
alerts as-is. This is unlikely to be what you want, unless you have 
a very low alert volume or your upstream notification system performs 
its own grouping. Example: group_by: [...]

Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>
2018-11-29 12:31:14 +01:00
Simon Pasquier
306fd73e32 *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-09 10:00:23 +01:00
stuart nelson
e883ccb9de
pull out shared code for storing alerts (#1507)
Move the code for storing and GC'ing alerts from being re-implemented in
several packages to existing in its own package

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-09-03 14:52:53 +02:00
Simon Pasquier
899226f3ac *: remove v1/alerts/groups API endpoint (#1525)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-23 16:03:49 +02:00
bigMacro
f3bc41d256 fix concurrent read and wirte group error (#1447)
* fix concurrent read and wirte group

Signed-off-by: denghuan <denghuan@actionsky.com>

* make lock more elegant

Signed-off-by: denghuan <denghuan@actionsky.com>
2018-07-10 17:13:41 +02:00
Simon Pasquier
6a7c912559 Sort alerts in correct order (#1349)
* Sort dispatched alerts by job+instance in the correct order (#1178)

Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>

* dispatch: add unit test for alerts sorting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-14 15:54:33 +02:00
Simon Pasquier
0ebaeccd4b *: add missing license headers
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 17:37:13 +02:00
Manos Fokas
300a87e85b Removed file changes to resolve conflict. (#1318)
Signed-off-by: manosf <manosf@protonmail.com>
2018-04-17 16:22:46 +02:00
Simon Pasquier
4cba49155d dispatch: don't reset timer if flush is in-progress (#1301)
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
2018-03-29 12:22:49 +02:00
Ted Zlatanov
099b6a1d43 Sort dispatched alerts by job+instance then rest by default (#1178) (#1234) 2018-03-22 20:06:37 +01:00
Brian Brazil
aa950668bf The default group_by is meant to be no labels. (#1287)
This is what the intended default is, and what
the documentation says.
2018-03-16 18:39:23 +01:00
pasquier-s
c39a913f8a test: enable race detection (#1262)
This change enables race detection when running the tests. It also fixes
a couple of existing race conditions.
2018-02-27 18:18:53 +01:00
Brian Brazil
5cb71e1def Fix spelling and comment style. (#1257) 2018-02-27 10:07:33 +01:00
pasquier-s
9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00
pasquier-s
907ac510f8 Fix flaky TestBatching acceptance test (#1193)
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.

The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.

If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
2018-01-11 22:45:59 +01:00
Julius Volz
b145c51b99 Clarify variable names in Dispatcher.processAlert()
A single entry in aggrGroups is just a single group, not plural.
2017-11-01 15:06:23 +01:00
Julius Volz
947970af44 Convert Alertmanager to use non-global go-kit loggers
Fixes https://github.com/prometheus/alertmanager/issues/1040
2017-10-22 00:20:40 -07:00
Frederic Branczyk
5328885fe9 dispatch: fix race condition in dispatch test (#1025) 2017-10-04 18:01:23 +02:00
Łukasz Mierzwa
8e61ebf6c3 Expose alert fingerprint in the API (#786)
* Expose alert fingerprint in the API

Alert fingerprint is already provided as the value of status.inhibitedBy[] attribute that inhibited alerts have, but there's no way to get back to the alert that's inhibiting it as the fingerprint is not exposed.

* Expose alert fingerprint as ID in the list endpoint

* Rename ID to Fingerprint

* Use Fingerprint().String() in the API
2017-08-18 19:30:18 +02:00
stuart nelson
a7009a9db7 Stn/add receiver support (#872)
Add ability to filter alerts by receiver in UI. This adds changes both in the Elm UI, as well as the Go backend.
2017-06-26 18:20:26 +02:00
Corentin Chary
9b2afbf18b Make sure Matchers are always ordered
This fixes https://github.com/prometheus/alertmanager/issues/881
Also add some unit tests
2017-06-23 15:30:34 +02:00
Fabian Reinartz
8170206070 Fix alert status handling in UI 2017-05-08 12:56:03 +02:00
Łukasz Mierzwa
8bc5855c87 Serialize AlertStatus as 'status'
AlertStatus doesn't have json tag with the field name, so it's serialized into 'Status', and it's the only uppercase field in the alert object. Tag it with 'status' name for consistency
2017-04-28 11:34:01 -07:00
stuart nelson
6a909abf17 Add processing status field to alert 2017-04-27 14:18:52 +02:00
Fabian Reinartz
3269bc39e1 *: switch group key to matcher serialization
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
2017-04-21 12:06:23 +02:00
stuart nelson
1e34f29532 Filter alerts (#633)
* Vendor dependencies.

This updates several old dependencies, removes
some that are no longer needed, and adds
`pkg/labels` from prometheus `dev-2.0` branch.

* Add metrics selector parsing code

This is a temporary simplified re-implementation
of promQL's metric selector parsing.

* Add alerts filtering

Filter alerts through `?filter=` query string.

* Add silences filtering

Filter silences through `?filter=` query string.

* Move `parse` to `pkg/parse`
2017-03-16 11:16:10 +01:00
Felix
9fcdb47faa add GroupKey() to aggrGroup, add it to alerts/groups and use the function for context initialization 2016-12-07 11:05:11 +08:00
Fabian Reinartz
e9fbe62e0f *: consider mesh wait in notification timeouts
This adds the peer wait duration to the standard timeout to avoid
terminating a notification prematurely while being in failover
wait status.
2016-09-05 13:21:28 +02:00
Fabian Reinartz
a4e8703567 *: integrate new silence package 2016-08-30 12:15:23 +02:00
Fabian Reinartz
d2a556b269 notify: include context in Stage interface
This adds context.Context to the return arguments of a Stage.
This is necessary to propagate modified contexts.
2016-08-18 11:42:37 +02:00
Fabian Reinartz
998a9ce38e notify: rename Receiver to ReceiverName
This string value is initially used to store a receiver name. It is
later overloaded with a unique string identifier of <name, integration,
index>.
This renaming is in preparation to separate the two and use the Receiver
object of the nflogpb package.
2016-08-16 16:33:17 +02:00
Frederic Branczyk
7bc851e894 rework building of stage pipelines 2016-08-16 10:56:46 +02:00
Fabian Reinartz
3931d4e64b *: restructure package tree
This commit packages up individual modules and removes the top-level
main package.
2016-08-09 14:24:52 +02:00