Commit Graph

1501 Commits

Author SHA1 Message Date
stuart nelson df839a60c1 release 0.14 2018-02-12 12:32:15 +01:00
songjiayang d07a072b08 Fix WeChat issue (#1229)
* fix wechat issue

* wechat issue code review
2018-02-11 20:09:47 +01:00
pasquier-s 76ee5388e7 Forbid 0 value for group_interval and repeat_interval (#1230)
Setting one of these parameters to a zero value doesn't make sense
semantically and can cause high CPU usage.
2018-02-09 10:53:46 +01:00
Mike Bryant 6615ed15d2 Add templating to PD-CEF fields; Add missing field (#1231)
* Allow templating of Component and Group in PagerDuty v2

Related to #1211

* Add missing PD-CEF field Component
2018-02-09 10:50:18 +01:00
Andrey Kuzmin 5101d65938 Fix the slowness of the Silence UI (#1235)
* Cache tabs and fix slow css

* update bindata
2018-02-09 10:42:44 +01:00
Frederic Branczyk 168cb217c6
Merge pull request #1233 from Conorbro/resolved-alert-counter-fix
Fixes AM wrongly counting alerts with EndTimes in the future as resolved
2018-02-08 10:54:13 +01:00
conorbroderick e8832619e0 Fixes AM wrongly counting alerts with EndTimes in the future as resolved 2018-02-07 15:52:26 +00:00
Corentin Chary a43a513b77 Fix OpsGenie notifier and add unit tests (#1224)
See #1223, looks like OpsGenie now sometimes returns a 422 when you
don't specify a team. This change cleans up the JSON output and
add a few unit tests.
2018-02-06 13:45:59 +01:00
pasquier-s 17bd637c97 Add mesh metrics (#1225)
* Add mesh metrics

This change adds 2 new metrics for the mesh:

* alertmanager_peer_connection, state of the connection between the
  Alertmanager instance and a peer.
* alertmanager_peer_terminations_total, total number of terminated
  connection.

It also moves the gathering of the alertmanager_peer_position metric
outside of the meshWait() function so that the metric is computed
accurately even when no alerting group fires.

* Remove 'nick' label from alertmanager_peer_connection metric
2018-02-06 12:13:52 +01:00
Carlos Alexandro Becker c5ea346d06 allow global opsgenie api key (#1208)
* allow global opsgenie api key

* added missing files

* removed test
2018-01-29 16:05:17 +01:00
Carlos Alexandro Becker 23f31d7d5a improved error when victorops fails (#1207)
* improved error when victorops fails

* moved to debug

* allocate mem only once

* joining strings

* logging receiver name

* passing only group name
2018-01-29 16:00:04 +01:00
Tom Paine 081fc7d982 Update simple.yml (#1216)
match spacing on other receiver groups
2018-01-29 15:58:44 +01:00
Daniel Bonatto 94bef6419f Fixes prometheus/alertmanager#1211 (#1214)
Add template to severity field for PagerDuty API v2.
2018-01-27 11:22:41 +01:00
pasquier-s 62b957cc14 Notify only when new firing alerts are added (#1205)
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.

This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
2018-01-23 16:52:03 +01:00
Stuart Nelson b45c11b561 Fix tests 2018-01-21 15:38:19 +01:00
Jose Donizetti fc9306cd7e Add expired silence validation (#1096)
* Add expired silence validation

* Add silence end time in the past validation
2018-01-21 15:29:51 +01:00
Jose Donizetti 2fe013bcaa Add tests to memory provider (#1104) 2018-01-21 15:27:21 +01:00
pasquier-s 63598904dc Fix pending connections never going to established (#1204) 2018-01-21 15:09:50 +01:00
pasquier-s 9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00
benbradley 0db01af11e amtool silence update support dwy suffixes to expire flag (#1197) 2018-01-15 19:45:46 +01:00
Stuart Nelson d20282e1e3 Correct CHANGELOG.md 2018-01-12 14:24:40 +01:00
stuart nelson fb713f6d82
v0.13.0 (#1194) 2018-01-12 11:29:15 +01:00
Stuart Nelson 7d36d79aba Update silence query long help 2018-01-12 10:44:38 +01:00
Thomás S. Bregolin cdb44955cf Make --expired list only expired silences (#1176) (#1190)
This means there's no longer a way to list both active and expired
silences at the same time. This is the desired behaviour according to
consensus at https://github.com/prometheus/alertmanager/pull/1175
2018-01-12 10:35:06 +01:00
pasquier-s 907ac510f8 Fix flaky TestBatching acceptance test (#1193)
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.

The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.

If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
2018-01-11 22:45:59 +01:00
Colin Douch 17846f2e33 Fix updating silence comments (#1189)
Possibly another regression introduced by #976 . We use the wrong
variable to update comments in the `amtool silence update` command
which causes us to fail silently. This fixes that.
2018-01-10 17:05:03 +01:00
pasquier-s a7d4e4ea7c Log snapshot sizes on maintenance (#1155)
* Log snapshot sizes on maintenance

* Add metrics for snapshot sizes

This change adds 2 new gauges for tracking the last snapshots' sizes:

  - alertmanager_nflog_snapshot_size_bytes
  - alertmanager_silences_snapshot_size_bytes
2018-01-10 14:53:57 +01:00
stuart nelson 7b787dab05
Re-introduce prometheus durations in amtool silence creation (#1185)
* Fixes #1183

* Update expires comment

The default time is already output thanks to
kingpin.
2018-01-09 10:47:41 +01:00
stuart nelson 3aa7f03b10
Template secret keys for pagerduty notifier (#1168) (#1182)
The tmpl() call was removed when migrating to
support pd v2 events api.
2018-01-08 13:41:10 +01:00
stuart nelson 3c61fe3fef
Return reload status from http endpoint (#1152) (#1180)
* Return reload status from http endpoint (#1152)

* Use same reload messaging as prometheus
2018-01-08 11:51:05 +01:00
Frederic Branczyk 0b5af7510b
Merge pull request #1159 from simonpasquier/add-healthy-probes
Add /-/healthy endpoint
2018-01-08 11:25:16 +01:00
Calle Pettersson b7da058efb Switch cmd/alertmanager to kingpin (#974) 2018-01-06 11:22:26 +01:00
Conor Broderick a1153e83ff
Merge pull request #1167 from prometheus/fix-error-message
Fix error message
2018-01-03 11:10:39 +00:00
Christian Hoffmann 0e63715b23 UI: Fix JavaScript error in MSIE due to endswith() usage (#1172)
* index: avoid endswith() for MSIE compatibility

MSIE does not support endswith() [1]. substr() can
be used to work around this limitation.

[1] https://docs.microsoft.com/en-us/scripting/javascript/reference/endswith-method-string-javascript

* index: clean up comment

* ui: update bindata
2018-01-02 14:25:54 +01:00
Andrey Kuzmin b8d20dffca Update bindata.go 2018-01-02 12:46:24 +01:00
Andrey Kuzmin 1ccc7b1133 Dont output malformed error body 2018-01-02 12:45:36 +01:00
Andrey Kuzmin 6f8ccb031c
Fix expire buttons on the silences page (#1171)
* Only show confirmation for the specific silence

* Update bindata.go
2018-01-02 12:25:34 +01:00
Fabian Reinartz 92c04096a8
Merge pull request #1154 from dvrkps/patch-1
travis: update go version
2017-12-27 19:05:12 +01:00
pasquier-s 364979bbf8 Display connections in the Status page (#1164)
This change shows the status of the local connections in the web UI. It
can be used to troubleshoot mesh issues.
2017-12-22 11:39:27 +01:00
Calle Pettersson 608848390f Switch amtool to kingpin (#976)
* Switch cmd/amtool to kingpin

* Touch-ups

* Implement long help

* Add missing short-form of --output

* Fix backwards compatibility for config file options

* Fix vendoring

* Review fixes

* Fix flag word order
2017-12-22 11:17:13 +01:00
anthraxn8b 2a0989094b Added 2nd email address to “to“ field (#1163)
Did this to give an example with multiple email addresses in the “to“ field.
2017-12-22 00:14:23 +01:00
Fabian Reinartz 1fdfe9f807
Merge pull request #1162 from prometheus/fabxc-patch-2
Fix wrong lock
2017-12-21 17:12:47 +01:00
Fabian Reinartz 405dbb8d9c
Fix wrong lock 2017-12-21 16:55:55 +01:00
Frederic Branczyk db8386fd68
Merge pull request #1158 from prometheus/stn/api-update-locks
Lock around variables used in Update()
2017-12-21 13:06:23 +01:00
Simon Pasquier e8661f5768 Add /-/healthy endpoint 2017-12-21 12:29:38 +01:00
stuart nelson 1abe4c9a56 Lock around variables used in Update()
Found two places where struct members being
updated in api.Update() where being accessed
elsewhere without locks.
2017-12-21 12:08:39 +01:00
Davor Kapsa eb2ea25ccd
travis: update go version
1.x match latest stable go version(1.9.2 today)
2017-12-20 15:46:00 +01:00
Frederic Branczyk 8b8642935a
Merge pull request #1151 from prometheus/stn/configurable-alert-gc
Make alertGC interval configurable
2017-12-19 20:30:05 +01:00
stuart nelson 69b97058f6 Fix tests 2017-12-19 15:43:23 +01:00
stuart nelson 481eab7b83 Make alertGC interval configurable 2017-12-19 15:36:38 +01:00