Commit Graph

175 Commits

Author SHA1 Message Date
Simon Pasquier
b7d891cf39 notify: notify resolved alerts properly (#1408)
* notify: notify resolved alerts properly

The PR #1205 while fixing an existing issue introduced another bug when
the send_resolved flag of the integration is set to true.

With send_resolved set to false, the semantics remain the same:
AlertManager generates a notification when new firing alerts are added
to the alert group. The notification only carries firing alerts.

With send_resolved set to true, AlertManager generates a notification
when new firing or resolved alerts are added to the alert group. The
notification carries both the firing and resolved notifications.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix comments

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-08 11:37:38 +02:00
Simon Pasquier
0ebaeccd4b *: add missing license headers
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 17:37:13 +02:00
Alex Lardschneider
1f9a7b6182 [Request] Add Slack actions to notifications (#1355)
* Added slack actions to notifications

Signed-off-by: Alex Lardschneider <alex.lardschneider@gmail.com>
2018-05-14 17:26:11 +02:00
RogerYuQian
8a0faa9946 fix wechat issue (#1353) (#1356) 2018-05-03 09:32:09 +02:00
Simon Pasquier
b3cc6229a2 notify: remove wechat unit test (#1350)
The unit test was making a request to the public Wechat endpoint which
caused flaky results.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-04-30 20:01:38 +02:00
Trevor Wood
cecfe5b2f5 Validate Slack field config and only allow the necessary input (#1334)
Signed-off-by: Trevor Wood <Trevor.G.Wood@gmail.com>
2018-04-25 18:58:11 +02:00
stuart nelson
bc263d3e61
Improve notification instrumentation (#1335)
* Improve notification instrumentation

- Add notificationLatencySeconds histogram to
debug duplicate messages. This can help rule out
if duplicate messages are being caused by
excessive latency when sending a notification.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-04-23 14:23:01 +02:00
Manos Fokas
300a87e85b Removed file changes to resolve conflict. (#1318)
Signed-off-by: manosf <manosf@protonmail.com>
2018-04-17 16:22:46 +02:00
pasquier-s
7b80919b36 Remove unused code (#1272) 2018-03-03 11:07:47 +01:00
Corentin Chary
dd75201f1c Add /-/ready based on mesh status (#1209)
* Wait for the gossip to settle before sending notifications

See #1209 for details.

As an heuristic for mesh readyness, try to see if
the mesh looks stable (the number of peers isn't changing too much).
This implementation always mark the altermanager as ready after a maximum of 60s.

This adds one new flags to control this behavior:
```
      --cluster.settle-timeout=60s  mesh settling timeout. Do not wait more than this duration on startup.
```

It also adds `/-/ready` which always return 200 (in order to make it clear
that we are ready as soon as we can receive requests).

The mesh status is exposed in `/api/v1/status` and visible on `/#/status`.

* cluster: fix typos and base interval on gossipInterval
2018-03-02 15:45:21 +01:00
pasquier-s
e8a92f65ef Run staticcheck as part of the build process (#1264)
This change also fixes potential issues highlighted by running
staticcheck.
2018-02-28 17:42:32 +01:00
Simon Pasquier
955c92f1b6 Configure http client for Wechat 2018-02-13 14:52:53 +01:00
Frederic Branczyk
d678022fea *: configure http client from config 2018-02-13 14:30:59 +01:00
Stuart Nelson
a552afd998 Merge branch 'master' into memberlist 2018-02-13 10:47:17 +01:00
songjiayang
d07a072b08 Fix WeChat issue (#1229)
* fix wechat issue

* wechat issue code review
2018-02-11 20:09:47 +01:00
Mike Bryant
6615ed15d2 Add templating to PD-CEF fields; Add missing field (#1231)
* Allow templating of Component and Group in PagerDuty v2

Related to #1211

* Add missing PD-CEF field Component
2018-02-09 10:50:18 +01:00
Fabian Reinartz
fd49dbb477 *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
Corentin Chary
a43a513b77 Fix OpsGenie notifier and add unit tests (#1224)
See #1223, looks like OpsGenie now sometimes returns a 422 when you
don't specify a team. This change cleans up the JSON output and
add a few unit tests.
2018-02-06 13:45:59 +01:00
Carlos Alexandro Becker
23f31d7d5a improved error when victorops fails (#1207)
* improved error when victorops fails

* moved to debug

* allocate mem only once

* joining strings

* logging receiver name

* passing only group name
2018-01-29 16:00:04 +01:00
Daniel Bonatto
94bef6419f Fixes prometheus/alertmanager#1211 (#1214)
Add template to severity field for PagerDuty API v2.
2018-01-27 11:22:41 +01:00
pasquier-s
62b957cc14 Notify only when new firing alerts are added (#1205)
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.

This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
2018-01-23 16:52:03 +01:00
pasquier-s
9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00
stuart nelson
3aa7f03b10
Template secret keys for pagerduty notifier (#1168) (#1182)
The tmpl() call was removed when migrating to
support pd v2 events api.
2018-01-08 13:41:10 +01:00
Evan Baker
6a3dfaff45 Add Slack additional "fields" to notifications (#1135)
* impl slack fields

* wrap title and value in tmplText
2017-12-15 12:18:05 +01:00
Jose Donizetti
d75ff37a38 Refactor inhibit stage (#1105)
* Refactor BuildPipeline to receive a muter

* Remove marker not used by InhibitStage
2017-12-14 16:22:31 +01:00
stuart nelson
7736ea0f61
Add footer field for slack messages (#1141) 2017-12-12 22:50:41 +01:00
berlinsaint
6bab629590 Add notify support for Chinese User wechat (#1059)
* WECHAT support by ybyang2/berlinsaint

* correct the whitespace

* add some TestFile and modify some naming errors by ybyang2/berlinsaint

* modify wechat retry test expect

* template error

* add newline
Signed-off-by: yb_home <berlinsaint@126.com>

* fmt some pr code

* use the @stuartnelson3 the test-ci-wechat bingdata.go

* notify go add wechat
2017-12-09 16:20:22 +01:00
Demitri Morgan
b62cddd807 Removed incorrect end-of-support warning (#1131)
The PagerDuty Events API (v1), used by integrations with monitoring tools, will continue to be supported. There are currently no plans to deprecate, end support for or sunset it.

The end-of-support notice cited in the log message removed applies only to the *REST API* version 1, which PagerDuty will no longer support as of February 2018, but which Prometheus does not use.
2017-12-09 15:17:17 +01:00
stuart nelson
3464ab4fa2
Template Source in PagerDuty alert payload (#1117)
* Template Source field in pagerduty payload

As a sane default we link to alertmanager, but
leave templating available to the user if
something suits their system better.
2017-11-22 14:51:44 -05:00
Tom Fawcett
8bfbb0a1aa Fix OpsGenie Tags field (#1108) 2017-11-15 17:20:02 -05:00
Tom Fawcett
291fc5722b Fix OpsGenie Teams field (#1101)
* Fix OpsGenie Teams field

* Remove use of opsGenieTeam struct
2017-11-15 16:00:50 -05:00
Tom Fawcett
fd0ace8d88 Support OpsGenie Priority field (#1094) 2017-11-12 12:01:19 -05:00
Jose Donizetti
7fbe63b94f Add tests to retries (#1084)
* Add tests to retry logic

* Extract retry logic from OpsGenie,VictorOps,PushOver
2017-11-11 15:45:11 +01:00
Jose Donizetti
76c15a0ef5 Fix config name inconsistency (#1087)
* Rename global config hipchat_url to hipchat_api_url

* Rename opsgenie config 'host' to 'url'
2017-11-11 15:01:21 +01:00
Jose Donizetti
74808e40f3 Refactor silence constants (#1076)
* Refactor remove dups silence state constants

* Refactor to use const instead of string
2017-11-07 11:36:30 +01:00
stuart nelson
0cbc8dff18 Rearrange some notification code 2017-11-07 11:32:37 +01:00
Steven Carvellas
5db8055bab Add support for PagerDuty API v2 (#1054) 2017-11-07 11:07:27 +01:00
Adam Smith
4ee40f97e6 Allow template in victorops routing_key field (#1083) 2017-11-07 10:38:46 +01:00
Julius Volz
b0aab04906 Fix notifications for flapping alerts (#1071)
Fixes https://github.com/prometheus/alertmanager/issues/1063
2017-11-02 11:12:12 +01:00
Julius Volz
9b72c10134 Minor code cleanups 2017-11-01 23:08:34 +01:00
Jose Donizetti
f8dc12c317 Remove not used code (#1069) 2017-11-01 16:40:46 +01:00
Tom-Fawcett
2d1b84ff5e Upgrade OpsGenie notifier to v2 API. (#1061) 2017-10-29 16:19:17 +01:00
Jose Donizetti
1c7bf17eec Add logging to statemessage truncation (#1056) 2017-10-26 11:09:54 +02:00
Jose Donizetti
877d2780a0 Add limit to OpsGenie message (#1045) 2017-10-26 11:09:23 +02:00
Jose Donizetti
0a16b09fc1 Fix pushover limits (title, message, url) (#1055) 2017-10-25 14:44:35 +02:00
Julius Volz
947970af44 Convert Alertmanager to use non-global go-kit loggers
Fixes https://github.com/prometheus/alertmanager/issues/1040
2017-10-22 00:20:40 -07:00
Carlos Alexandro Becker
4a8e710691 Allow template in victorops message_type field (#1038) 2017-10-11 15:42:10 +02:00
Frederic Branczyk
ff9e5270c7 Merge pull request #1026 from brancz/marker-race
Remove .WasInhibited and .WasSilenced fields of Alert type
2017-10-10 16:49:55 +02:00
Frederic Branczyk
0ef6695055
*: Remove .WasInhibited and .WasSilenced fields of Alert type 2017-10-10 15:50:15 +02:00
Conor Broderick
10b9d34f80 Initialise notifications_total and notifications_failed_total (#1011) 2017-10-07 11:57:53 +02:00