Commit Graph

285 Commits

Author SHA1 Message Date
Marco Pracucci
04d683e880
Upgrade prometheus/common
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-04-21 12:01:34 +02:00
Marco Pracucci
37f4742922
Add HTTP client options to receiver integrations
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-04-21 11:56:20 +02:00
Julien Duchesne
59c7fd5053 Add support to set the Slack URL in a file
- Added support for the file in both the global and the lower level
- Tried to follow configuration patterns I saw in prometheus
- The slack file is read on every request as mentioned in the prometheus issue to enable seamless switches

https://github.com/prometheus/alertmanager/issues/2498
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2021-04-01 21:59:49 -04:00
Ganesh Vernekar
10757eb5fb
Export newMetrics function and metrics struct (#2523)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-03-24 12:37:58 +05:30
Ben Kochie
53535551f5
Fix up golangci-lint errors.
Signed-off-by: Ben Kochie <superq@gmail.com>
2021-03-16 10:43:45 +01:00
Steve Simpson
1711e72d1b Clustering: Change WaitReady to accept a Context.
WaitReady is a blocking call and so should accept a Context in order to
be responsive to cancellation of the notification pipeline for any reason.

Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
2021-03-10 09:18:39 +01:00
Goutham Veeramachaneni
7866b9bb09
Merge pull request #2487 from gotjosh/alertmanager-clustering-interfaces
Clustering: Interface for Peers in other packages
2021-03-03 16:44:52 +01:00
Ben Ridley
df54b4bacf Improve documentation wording and formatting in response to maintainer feedback
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
Ben Ridley
5d4231b001 Use consistent naming for mute time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
Ben Ridley
fa2fab64de Simplify logging on time mute
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
Ben Ridley
fb60329aad Add fullstops to comments
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
Ben Ridley
70b138a17a Apply formatting suggestions from code review
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:02 +11:00
Ben Ridley
93e0117b46 Change logging to debug when notifications aren't sent due to route mute
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:01 +11:00
Ben Ridley
a3cb125e5c Move timeinterval library into locally maintained package
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:01 +11:00
Ben Ridley
f53e7a984c Add tests for TimeMuteStage
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:01 +11:00
ben
d1f5e07909 Add mute time stage and pipeline
Signed-off-by: Ben Ridley <benridley29@gmail.com>
2021-03-01 08:30:01 +11:00
gotjosh
9a2ae39430
Clustering: Interface for Peers in other packages
A Peer as defined by the `cluster` package represents the node in the
cluster. It is used in other packages to know the status of all of the
members or how long should we wait to know if a notification has already fired.

In Cortex, we'd like to implement a slightly different way of
clustering (using gRPC for communication and a
hash ring for node discovery).

This is a small change to support that by changing the consumer of other
packages to an interface.

Silences and Notification channels don't need an interface as they take
a `func([]byte) error` as a parameter.

Signed-off-by: gotjosh <josue@grafana.com>
2021-02-19 19:07:41 +00:00
Jack Baldry
bf94d58d56
fix(notify/victorops): Catch routing_key templating errors (#2467)
* test(notify/victorops): Add test for templating errors

Signed-off-by: Jack Baldry <jack.baldry@grafana.com>

* fix(notify/victorops): Catch routing_key templating errors

Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
2021-01-29 14:40:33 +01:00
Max Neverov
c39b787800
Add metrics for notification requests (#2361) (#2383)
Signed-off-by: Max Neverov <neverov.max@gmail.com>
2020-11-06 15:24:18 +01:00
Benoît Knecht
59a96579cc
notify/pagerduty: Filter out empty images and links (#2379)
PagerDuty Event API v2 [1] requires images to have an `src` property, and links
to have an `href` property.

This commit filters out images and links that don't satisfy those conditions,
to avoid getting an HTTP 400 error in response.

This also adds flexibilty when using templates to configure images and links,
as it's now possible to omit images or links by letting the template return an
empty string for the `src` or `href` property, respectively.

[1]: https://developer.pagerduty.com/docs/events-api-v2/trigger-events/#context-properties

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-09-25 17:31:22 +02:00
Julien Pivotto
470634d49f
Update common (#2353)
- Disable HTTP2: https://github.com/prometheus/common/pull/249
- Composite duration: https://github.com/prometheus/common/pull/246

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-25 15:48:59 +02:00
Jason Cooper
277c9ed462
notify: add markdown support for wechat (#2309)
* notify: add markdown support for wechat

Signed-off-by: Jason Cooper <master@deamwork.com>

* docs: update WeChat receiver configuration document

Signed-off-by: Jason Cooper <master@deamwork.com>

* fix: check WeChat msgType, apply default if not present

Signed-off-by: Jason Cooper <master@deamwork.com>

* chore: remove unnecessary comment

Signed-off-by: Jason Cooper <master@deamwork.com>

* fix: simplify msgType process

Signed-off-by: Jason Cooper <master@deamwork.com>

* docs: wechat configs document update

Signed-off-by: Jason Cooper <master@deamwork.com>

* fix: apply error message suggestions

Signed-off-by: Jason Cooper <master@deamwork.com>

* test: add test for regex

Signed-off-by: Jason Cooper <master@deamwork.com>

* fix: wechat message safe param

Signed-off-by: Jason Cooper <master@deamwork.com>
2020-07-06 15:56:42 +02:00
Simon Pasquier
a3d98c476a Merge remote-tracking branch 'origin/release-0.21' into merge-release-0.21
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-06-17 16:46:08 +02:00
Simon Pasquier
56f09a62b2
notify: always retry with a back-off (#2290)
By default the library implementing the back-off timer stops the timer
after 15 minutes. Since the code never checked the value returned by the
ticker, notification retries were executed without delay after the 15
minutes had elapsed (e.g. for `group_interval` greater than 15m).

This change ensures that the back-off timer never expires.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-06-16 09:50:35 +02:00
Julien Pivotto
1cba0c7a37
Remove HipChat (#2281)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-06-11 15:51:10 +02:00
Bartlomiej Plotka
6d77929c30
Merge pull request #2276 from ricoberger/pass-labels-to-opsgenie-details
Propagate labels to Opsgenie details
2020-06-09 14:39:03 +01:00
ricoberger
4b59db0adc Always pass all labels to Opsgenie
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-09 13:51:46 +02:00
ricoberger
3cff6cb5b5 Add tests for Opsgenie details
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-09 09:00:52 +02:00
ricoberger
9a87f5c113 Populate details from common labels and details
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-09 07:40:04 +02:00
ricoberger
8248c50365 Provide option to use common labels for OpsGenie details
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-05 08:07:58 +02:00
ricoberger
dcccf542f1 Adjust Opsgenie config for labels propagation
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-04 15:47:18 +02:00
Simon Pasquier
f7c595c168
notify: improve logs on notification errors (#2273)
* notify: improve logs on notification errors

Alertmanager can experience occasional failures when sending
notifications to an external service. If the operation succeeds after
some retry, the 'alertmanager_notifications_failed_total' metric
increases but nothing is logged (unless running with log.level=debug).
Hence an operator might receive an alert about notification failures but
wouldn't know which integration was failing.

With this change, notification failures are logged at the warning level.
To avoid log flooding, similar failures on retries aren't logged.
Additional information on the failing integration has also been added.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Log notify success at info level if it's a retry

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-06-04 10:38:48 +02:00
Julius Volz
70b5e00ffc
Allow limiting maximum number of alerts in webhook (#2274)
* Allow limiting maximum number of alerts in webhook

The webhook notifier is the only notifier that does not allow templating
on the Alertmanager side. Users who encounter occasional alert storms
(10ks of alerts going off at once for the same group) have reported
webhook receiver systems not being able to cope with the load caused by
the resulting large webhook notifier messages (the alerting rules also
contained large annotations that can't be stripped away due to lack of
templating). Reducing group size also wasn't an option, but this change
proposes to allow truncating the list of alerts sent in the webhook body
to a provided maximum length. This assumes that e.g. if a group receives
20k alerts, you really are fine only receiving 10k because you wouldn't
be able to check them all anyway.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Change max_alerts to uint32

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Add truncatedAlerts field to webhook message

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Fix JSON struct tag

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2020-06-04 10:07:33 +02:00
ricoberger
117c8ba8f1 Propagate labels to Opsgenie details
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-04 09:30:02 +02:00
Julien Pivotto
013177e2d0
Update dependencies (#2257)
Update membership

Update common (support HTTP/2 client)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-18 15:00:36 +02:00
shamilpd
d8ad30179a
Enforce 512KB event size limit for Pagerduty events (#2225)
* Enforce 512kb event size limit for Pagerduty

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Add size limit to error message

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Replace MaxEventSize setting with a const.

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Change to package variable

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Removed recursion in encodeMessage()

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Unexport maxEventSize

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>
2020-05-15 15:15:18 +02:00
Simon Pasquier
44af3201fe
notify: add retry field to debug log (#2188)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-03-13 15:39:16 +01:00
melchiormoulin
e37f769035
Add slack channel when logging error. (#2177)
Signed-off-by: Melchior MOULIN <m.moulin@criteo.com>
2020-02-05 09:17:15 +01:00
Simon Pasquier
b49ebfc683
Merge release 0.20 (#2140)
* Revert "slack: retry 429 errors (#2112)" (#2128)

This reverts commit 26cc96a787.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "config: remove support for JSON marshaling (#2086)" (#2133)

This reverts commit 918f08b66a.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* config: fix JSON unmarshaling for HostPort (#2134)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Cut 0.20.0 (#2137)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 16:35:19 +01:00
Julien Pivotto
26cc96a787 slack: retry 429 errors (#2112)
Fix #2111

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-11-21 14:14:10 +01:00
johncming
d965ac6393 notify: optimize length check. (#2106)
Signed-off-by: johncming <johncming@yahoo.com>
2019-11-19 09:00:06 +01:00
Simon Pasquier
71b3b3d7a4
notify/pagerduty: check that PagerDuty keys aren't empty (#2085)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-29 10:46:40 +01:00
n33pm
a75cd02786 Add email notify Message-Id Header (#2057)
* add email message-id

Signed-off-by: PM <wugyresearcher@gmail.com>

* check if message-id already exists

Signed-off-by: PM <wugyresearcher@gmail.com>

* simplify mail message-id procedure

Signed-off-by: PM <wugyresearcher@gmail.com>

* Add unit test

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-23 15:49:30 +02:00
johncming
7d21f5a5a9 notify/wechat: adjust result check sequence. (#2044)
Signed-off-by: johncming <johncming@yahoo.com>
2019-09-23 09:31:57 +02:00
Simon Pasquier
5fe5ea77a3
*: check Smarthost validity at config loading (#1957)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-28 15:04:40 +02:00
Simon Pasquier
9f7f4ead46
notify: don't use the global metrics registry (#1977)
* notify: don't use the global metrics registry

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Max's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-26 16:37:13 +02:00
Simon Pasquier
94d875f122
Bump prometheus/client_golang to v1.1.0 (#1989)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-08 14:36:10 +02:00
Simon Pasquier
655947d7e0
notify: refactor code to retry requests (#1974)
* notify: refactor code to retry requests

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* s/Process/Check/

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-02 16:17:40 +02:00
Asher Foa
f45f870d2c Add the ability to configure slack markdown field (#1967)
* slack markdown field config

Signed-off-by: Asher Foa <asher@asherfoa.com>

* Add Test

Signed-off-by: Asher Foa <asher@asherfoa.com>

* remove empty lines

Signed-off-by: Asher Foa <asher@asherfoa.com>

* add empty line

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-31 12:04:59 +02:00
Simon Pasquier
bdd91d2639
notify/opsgenie: log error from OpsGenie API (#1965)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-23 09:49:15 +02:00