Commit Graph

268 Commits

Author SHA1 Message Date
Jack Baldry
bf94d58d56
fix(notify/victorops): Catch routing_key templating errors (#2467)
* test(notify/victorops): Add test for templating errors

Signed-off-by: Jack Baldry <jack.baldry@grafana.com>

* fix(notify/victorops): Catch routing_key templating errors

Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
2021-01-29 14:40:33 +01:00
Max Neverov
c39b787800
Add metrics for notification requests (#2361) (#2383)
Signed-off-by: Max Neverov <neverov.max@gmail.com>
2020-11-06 15:24:18 +01:00
Benoît Knecht
59a96579cc
notify/pagerduty: Filter out empty images and links (#2379)
PagerDuty Event API v2 [1] requires images to have an `src` property, and links
to have an `href` property.

This commit filters out images and links that don't satisfy those conditions,
to avoid getting an HTTP 400 error in response.

This also adds flexibilty when using templates to configure images and links,
as it's now possible to omit images or links by letting the template return an
empty string for the `src` or `href` property, respectively.

[1]: https://developer.pagerduty.com/docs/events-api-v2/trigger-events/#context-properties

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2020-09-25 17:31:22 +02:00
Julien Pivotto
470634d49f
Update common (#2353)
- Disable HTTP2: https://github.com/prometheus/common/pull/249
- Composite duration: https://github.com/prometheus/common/pull/246

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-25 15:48:59 +02:00
Jason Cooper
277c9ed462
notify: add markdown support for wechat (#2309)
* notify: add markdown support for wechat

Signed-off-by: Jason Cooper <master@deamwork.com>

* docs: update WeChat receiver configuration document

Signed-off-by: Jason Cooper <master@deamwork.com>

* fix: check WeChat msgType, apply default if not present

Signed-off-by: Jason Cooper <master@deamwork.com>

* chore: remove unnecessary comment

Signed-off-by: Jason Cooper <master@deamwork.com>

* fix: simplify msgType process

Signed-off-by: Jason Cooper <master@deamwork.com>

* docs: wechat configs document update

Signed-off-by: Jason Cooper <master@deamwork.com>

* fix: apply error message suggestions

Signed-off-by: Jason Cooper <master@deamwork.com>

* test: add test for regex

Signed-off-by: Jason Cooper <master@deamwork.com>

* fix: wechat message safe param

Signed-off-by: Jason Cooper <master@deamwork.com>
2020-07-06 15:56:42 +02:00
Simon Pasquier
a3d98c476a Merge remote-tracking branch 'origin/release-0.21' into merge-release-0.21
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-06-17 16:46:08 +02:00
Simon Pasquier
56f09a62b2
notify: always retry with a back-off (#2290)
By default the library implementing the back-off timer stops the timer
after 15 minutes. Since the code never checked the value returned by the
ticker, notification retries were executed without delay after the 15
minutes had elapsed (e.g. for `group_interval` greater than 15m).

This change ensures that the back-off timer never expires.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-06-16 09:50:35 +02:00
Julien Pivotto
1cba0c7a37
Remove HipChat (#2281)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-06-11 15:51:10 +02:00
Bartlomiej Plotka
6d77929c30
Merge pull request #2276 from ricoberger/pass-labels-to-opsgenie-details
Propagate labels to Opsgenie details
2020-06-09 14:39:03 +01:00
ricoberger
4b59db0adc Always pass all labels to Opsgenie
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-09 13:51:46 +02:00
ricoberger
3cff6cb5b5 Add tests for Opsgenie details
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-09 09:00:52 +02:00
ricoberger
9a87f5c113 Populate details from common labels and details
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-09 07:40:04 +02:00
ricoberger
8248c50365 Provide option to use common labels for OpsGenie details
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-05 08:07:58 +02:00
ricoberger
dcccf542f1 Adjust Opsgenie config for labels propagation
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-04 15:47:18 +02:00
Simon Pasquier
f7c595c168
notify: improve logs on notification errors (#2273)
* notify: improve logs on notification errors

Alertmanager can experience occasional failures when sending
notifications to an external service. If the operation succeeds after
some retry, the 'alertmanager_notifications_failed_total' metric
increases but nothing is logged (unless running with log.level=debug).
Hence an operator might receive an alert about notification failures but
wouldn't know which integration was failing.

With this change, notification failures are logged at the warning level.
To avoid log flooding, similar failures on retries aren't logged.
Additional information on the failing integration has also been added.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Log notify success at info level if it's a retry

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-06-04 10:38:48 +02:00
Julius Volz
70b5e00ffc
Allow limiting maximum number of alerts in webhook (#2274)
* Allow limiting maximum number of alerts in webhook

The webhook notifier is the only notifier that does not allow templating
on the Alertmanager side. Users who encounter occasional alert storms
(10ks of alerts going off at once for the same group) have reported
webhook receiver systems not being able to cope with the load caused by
the resulting large webhook notifier messages (the alerting rules also
contained large annotations that can't be stripped away due to lack of
templating). Reducing group size also wasn't an option, but this change
proposes to allow truncating the list of alerts sent in the webhook body
to a provided maximum length. This assumes that e.g. if a group receives
20k alerts, you really are fine only receiving 10k because you wouldn't
be able to check them all anyway.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Change max_alerts to uint32

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Add truncatedAlerts field to webhook message

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Fix JSON struct tag

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2020-06-04 10:07:33 +02:00
ricoberger
117c8ba8f1 Propagate labels to Opsgenie details
Signed-off-by: ricoberger <mail@ricoberger.de>
2020-06-04 09:30:02 +02:00
Julien Pivotto
013177e2d0
Update dependencies (#2257)
Update membership

Update common (support HTTP/2 client)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-18 15:00:36 +02:00
shamilpd
d8ad30179a
Enforce 512KB event size limit for Pagerduty events (#2225)
* Enforce 512kb event size limit for Pagerduty

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Add size limit to error message

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Replace MaxEventSize setting with a const.

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Change to package variable

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Removed recursion in encodeMessage()

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>

* Unexport maxEventSize

Signed-off-by: Shamil Ishraq <shamil@pagerduty.com>
2020-05-15 15:15:18 +02:00
Simon Pasquier
44af3201fe
notify: add retry field to debug log (#2188)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-03-13 15:39:16 +01:00
melchiormoulin
e37f769035
Add slack channel when logging error. (#2177)
Signed-off-by: Melchior MOULIN <m.moulin@criteo.com>
2020-02-05 09:17:15 +01:00
Simon Pasquier
b49ebfc683
Merge release 0.20 (#2140)
* Revert "slack: retry 429 errors (#2112)" (#2128)

This reverts commit 26cc96a787.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "config: remove support for JSON marshaling (#2086)" (#2133)

This reverts commit 918f08b66a.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* config: fix JSON unmarshaling for HostPort (#2134)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Cut 0.20.0 (#2137)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 16:35:19 +01:00
Julien Pivotto
26cc96a787 slack: retry 429 errors (#2112)
Fix #2111

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-11-21 14:14:10 +01:00
johncming
d965ac6393 notify: optimize length check. (#2106)
Signed-off-by: johncming <johncming@yahoo.com>
2019-11-19 09:00:06 +01:00
Simon Pasquier
71b3b3d7a4
notify/pagerduty: check that PagerDuty keys aren't empty (#2085)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-29 10:46:40 +01:00
n33pm
a75cd02786 Add email notify Message-Id Header (#2057)
* add email message-id

Signed-off-by: PM <wugyresearcher@gmail.com>

* check if message-id already exists

Signed-off-by: PM <wugyresearcher@gmail.com>

* simplify mail message-id procedure

Signed-off-by: PM <wugyresearcher@gmail.com>

* Add unit test

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-23 15:49:30 +02:00
johncming
7d21f5a5a9 notify/wechat: adjust result check sequence. (#2044)
Signed-off-by: johncming <johncming@yahoo.com>
2019-09-23 09:31:57 +02:00
Simon Pasquier
5fe5ea77a3
*: check Smarthost validity at config loading (#1957)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-28 15:04:40 +02:00
Simon Pasquier
9f7f4ead46
notify: don't use the global metrics registry (#1977)
* notify: don't use the global metrics registry

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Max's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-26 16:37:13 +02:00
Simon Pasquier
94d875f122
Bump prometheus/client_golang to v1.1.0 (#1989)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-08 14:36:10 +02:00
Simon Pasquier
655947d7e0
notify: refactor code to retry requests (#1974)
* notify: refactor code to retry requests

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* s/Process/Check/

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-02 16:17:40 +02:00
Asher Foa
f45f870d2c Add the ability to configure slack markdown field (#1967)
* slack markdown field config

Signed-off-by: Asher Foa <asher@asherfoa.com>

* Add Test

Signed-off-by: Asher Foa <asher@asherfoa.com>

* remove empty lines

Signed-off-by: Asher Foa <asher@asherfoa.com>

* add empty line

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-31 12:04:59 +02:00
Simon Pasquier
bdd91d2639
notify/opsgenie: log error from OpsGenie API (#1965)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-23 09:49:15 +02:00
Simon Pasquier
9b0ecaa0fe
notify/email: wrap all errors for easier debugging (#1953)
* notify/email: wrap all errors for easier debugging

In addition, this commit passes the current context to the TCP dialer
and it doesn't log any QUIT errors if the email delivery wasn't
successful.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix typo

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-10 11:24:51 +02:00
Simon Pasquier
02c9bb05bf
notify/pagerduty: fix images (#1931)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-06-24 12:19:06 +02:00
Simon Pasquier
0c3120efac *: split notify package
Instead of keeping all notifiers in the notify package, it splits them
into individual sub-packages. This improves readability and
maintainability of the code.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-06-18 15:36:19 +02:00
Simon Pasquier
f1664ac870
notify: truncate description for PagerDuty v1 (#1922)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-06-14 14:41:36 +02:00
Diogo Nicoleti
7ab700a6c2
Merge branch 'master' into slack 2019-06-07 15:12:55 -03:00
Simon Pasquier
2abd78cbb7
*: use persistent HTTP clients (#1904)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-06-07 10:37:49 +02:00
Diogo Nicoleti
d8e79386cb
Refactoring to simplify slack retry function
Signed-off-by: Diogo Nicoleti <diogo.nicoleti@gmail.com>
2019-06-05 17:24:39 -03:00
Diogo Nicoleti
70f95cfa51
Use %q instead of %s
Signed-off-by: Diogo Nicoleti <diogo.nicoleti@gmail.com>
2019-06-04 18:16:01 -03:00
Diogo Nicoleti
35eb066e54
Move slack error handling to a new function
Signed-off-by: Diogo Nicoleti <diogo.nicoleti@gmail.com>
2019-05-30 15:49:14 -03:00
Diogo Nicoleti
fa805a2f15
Fix lint
Signed-off-by: Diogo Nicoleti <diogo.nicoleti@gmail.com>
2019-05-30 15:00:40 -03:00
Diogo Nicoleti
9ca88e3ebf
Improve slack error handling
Signed-off-by: Diogo Nicoleti <diogo.nicoleti@gmail.com>
2019-05-30 14:46:52 -03:00
Diogo Nicoleti
920179e5a9
fix text
Signed-off-by: Diogo Nicoleti <diogo.nicoleti@gmail.com>
2019-05-30 14:16:13 -03:00
Diogo Nicoleti
c2ff8bd285
fix typo
Signed-off-by: Diogo Nicoleti <diogo.nicoleti@gmail.com>
2019-05-30 14:16:13 -03:00
Diogo Nicoleti
77d073167d
Add Slack error message to the log
Signed-off-by: Diogo Nicoleti <diogo.nicoleti@gmail.com>
2019-05-30 14:10:54 -03:00
Bartek Płotka
9ddc5f1348 opsgenie: Moved from deprecated, non documented teams to responders field. (#1863)
Teams config option will fail unmarshalling as it is deprecated.

Fixes https://github.com/prometheus/alertmanager/issues/1818

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-05-13 14:51:26 +02:00
Simon Pasquier
f32ad1dd8b *: enable default linters (#1861)
* *: enable default linters

* Remove direct usage of errcheck

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-29 10:54:40 +02:00
Simon Pasquier
1c0b8e4139 notify: redact more secret data from logs (#1825)
Follow-up of #1822

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-04 18:27:13 +02:00