Commit Graph

1512 Commits

Author SHA1 Message Date
pasquier-s
382a0d8089 api: support zero StartsAt for alerts (#1238)
When the API receives alerts where StartsAt is zero, it updates the
value to EndsAt (if not zero itself) or "now". This ensures that the
alert validation will not fail since StartsAt has to be less than or
equal to EndsAt.
2018-02-13 16:26:34 +01:00
Frederic Branczyk
32f90a02ca
Merge pull request #1239 from simonpasquier/remove-dead-code
cmd: remove unused code
2018-02-13 16:21:44 +01:00
Simon Pasquier
9d16fe8266 cmd: remove unused code 2018-02-13 14:20:54 +01:00
Frederic Branczyk
8eb8c1baa0
Merge pull request #1232 from prometheus/memberlist
*: move to memberlist for clustering
2018-02-13 11:26:51 +01:00
Stuart Nelson
a552afd998 Merge branch 'master' into memberlist 2018-02-13 10:47:17 +01:00
stuart nelson
30af4d051b
release 0.14 (#1237) 2018-02-13 09:13:44 +01:00
Fabian Reinartz
e6df2d8751 Adapt cluster listen address flag in tests 2018-02-12 11:31:55 +01:00
Stuart Nelson
46c6b3f2f1 Update frontend 2018-02-12 11:13:27 +01:00
Fabian Reinartz
6cfbe6e8b4 update cluster listen address flag 2018-02-12 10:22:49 +01:00
songjiayang
d07a072b08 Fix WeChat issue (#1229)
* fix wechat issue

* wechat issue code review
2018-02-11 20:09:47 +01:00
Fabian Reinartz
3f2e00fbea cluster/api: improve metrics and cluster status 2018-02-09 11:16:00 +01:00
Fabian Reinartz
247bfff606 cluster: remove MergeSingle 2018-02-09 11:06:51 +01:00
pasquier-s
76ee5388e7 Forbid 0 value for group_interval and repeat_interval (#1230)
Setting one of these parameters to a zero value doesn't make sense
semantically and can cause high CPU usage.
2018-02-09 10:53:46 +01:00
Mike Bryant
6615ed15d2 Add templating to PD-CEF fields; Add missing field (#1231)
* Allow templating of Component and Group in PagerDuty v2

Related to #1211

* Add missing PD-CEF field Component
2018-02-09 10:50:18 +01:00
Andrey Kuzmin
5101d65938 Fix the slowness of the Silence UI (#1235)
* Cache tabs and fix slow css

* update bindata
2018-02-09 10:42:44 +01:00
Fabian Reinartz
fd49dbb477 *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
Frederic Branczyk
168cb217c6
Merge pull request #1233 from Conorbro/resolved-alert-counter-fix
Fixes AM wrongly counting alerts with EndTimes in the future as resolved
2018-02-08 10:54:13 +01:00
conorbroderick
e8832619e0 Fixes AM wrongly counting alerts with EndTimes in the future as resolved 2018-02-07 15:52:26 +00:00
Corentin Chary
a43a513b77 Fix OpsGenie notifier and add unit tests (#1224)
See #1223, looks like OpsGenie now sometimes returns a 422 when you
don't specify a team. This change cleans up the JSON output and
add a few unit tests.
2018-02-06 13:45:59 +01:00
pasquier-s
17bd637c97 Add mesh metrics (#1225)
* Add mesh metrics

This change adds 2 new metrics for the mesh:

* alertmanager_peer_connection, state of the connection between the
  Alertmanager instance and a peer.
* alertmanager_peer_terminations_total, total number of terminated
  connection.

It also moves the gathering of the alertmanager_peer_position metric
outside of the meshWait() function so that the metric is computed
accurately even when no alerting group fires.

* Remove 'nick' label from alertmanager_peer_connection metric
2018-02-06 12:13:52 +01:00
Carlos Alexandro Becker
c5ea346d06 allow global opsgenie api key (#1208)
* allow global opsgenie api key

* added missing files

* removed test
2018-01-29 16:05:17 +01:00
Carlos Alexandro Becker
23f31d7d5a improved error when victorops fails (#1207)
* improved error when victorops fails

* moved to debug

* allocate mem only once

* joining strings

* logging receiver name

* passing only group name
2018-01-29 16:00:04 +01:00
Tom Paine
081fc7d982 Update simple.yml (#1216)
match spacing on other receiver groups
2018-01-29 15:58:44 +01:00
Daniel Bonatto
94bef6419f Fixes prometheus/alertmanager#1211 (#1214)
Add template to severity field for PagerDuty API v2.
2018-01-27 11:22:41 +01:00
pasquier-s
62b957cc14 Notify only when new firing alerts are added (#1205)
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.

This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
2018-01-23 16:52:03 +01:00
Stuart Nelson
b45c11b561 Fix tests 2018-01-21 15:38:19 +01:00
Jose Donizetti
fc9306cd7e Add expired silence validation (#1096)
* Add expired silence validation

* Add silence end time in the past validation
2018-01-21 15:29:51 +01:00
Jose Donizetti
2fe013bcaa Add tests to memory provider (#1104) 2018-01-21 15:27:21 +01:00
pasquier-s
63598904dc Fix pending connections never going to established (#1204) 2018-01-21 15:09:50 +01:00
pasquier-s
9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00
benbradley
0db01af11e amtool silence update support dwy suffixes to expire flag (#1197) 2018-01-15 19:45:46 +01:00
Stuart Nelson
d20282e1e3 Correct CHANGELOG.md 2018-01-12 14:24:40 +01:00
stuart nelson
fb713f6d82
v0.13.0 (#1194) 2018-01-12 11:29:15 +01:00
Stuart Nelson
7d36d79aba Update silence query long help 2018-01-12 10:44:38 +01:00
Thomás S. Bregolin
cdb44955cf Make --expired list only expired silences (#1176) (#1190)
This means there's no longer a way to list both active and expired
silences at the same time. This is the desired behaviour according to
consensus at https://github.com/prometheus/alertmanager/pull/1175
2018-01-12 10:35:06 +01:00
pasquier-s
907ac510f8 Fix flaky TestBatching acceptance test (#1193)
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.

The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.

If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
2018-01-11 22:45:59 +01:00
Colin Douch
17846f2e33 Fix updating silence comments (#1189)
Possibly another regression introduced by #976 . We use the wrong
variable to update comments in the `amtool silence update` command
which causes us to fail silently. This fixes that.
2018-01-10 17:05:03 +01:00
pasquier-s
a7d4e4ea7c Log snapshot sizes on maintenance (#1155)
* Log snapshot sizes on maintenance

* Add metrics for snapshot sizes

This change adds 2 new gauges for tracking the last snapshots' sizes:

  - alertmanager_nflog_snapshot_size_bytes
  - alertmanager_silences_snapshot_size_bytes
2018-01-10 14:53:57 +01:00
stuart nelson
7b787dab05
Re-introduce prometheus durations in amtool silence creation (#1185)
* Fixes #1183

* Update expires comment

The default time is already output thanks to
kingpin.
2018-01-09 10:47:41 +01:00
stuart nelson
3aa7f03b10
Template secret keys for pagerduty notifier (#1168) (#1182)
The tmpl() call was removed when migrating to
support pd v2 events api.
2018-01-08 13:41:10 +01:00
stuart nelson
3c61fe3fef
Return reload status from http endpoint (#1152) (#1180)
* Return reload status from http endpoint (#1152)

* Use same reload messaging as prometheus
2018-01-08 11:51:05 +01:00
Frederic Branczyk
0b5af7510b
Merge pull request #1159 from simonpasquier/add-healthy-probes
Add /-/healthy endpoint
2018-01-08 11:25:16 +01:00
Calle Pettersson
b7da058efb Switch cmd/alertmanager to kingpin (#974) 2018-01-06 11:22:26 +01:00
Conor Broderick
a1153e83ff
Merge pull request #1167 from prometheus/fix-error-message
Fix error message
2018-01-03 11:10:39 +00:00
Christian Hoffmann
0e63715b23 UI: Fix JavaScript error in MSIE due to endswith() usage (#1172)
* index: avoid endswith() for MSIE compatibility

MSIE does not support endswith() [1]. substr() can
be used to work around this limitation.

[1] https://docs.microsoft.com/en-us/scripting/javascript/reference/endswith-method-string-javascript

* index: clean up comment

* ui: update bindata
2018-01-02 14:25:54 +01:00
Andrey Kuzmin
b8d20dffca Update bindata.go 2018-01-02 12:46:24 +01:00
Andrey Kuzmin
1ccc7b1133 Dont output malformed error body 2018-01-02 12:45:36 +01:00
Andrey Kuzmin
6f8ccb031c
Fix expire buttons on the silences page (#1171)
* Only show confirmation for the specific silence

* Update bindata.go
2018-01-02 12:25:34 +01:00
Fabian Reinartz
92c04096a8
Merge pull request #1154 from dvrkps/patch-1
travis: update go version
2017-12-27 19:05:12 +01:00
pasquier-s
364979bbf8 Display connections in the Status page (#1164)
This change shows the status of the local connections in the web UI. It
can be used to troubleshoot mesh issues.
2017-12-22 11:39:27 +01:00