Commit Graph

1510 Commits

Author SHA1 Message Date
Simon Pasquier
f4c81c43e9 cluster: pass resolved peers to Join() 2018-02-13 16:53:09 +01:00
Frederic Branczyk
8eb8c1baa0
Merge pull request #1232 from prometheus/memberlist
*: move to memberlist for clustering
2018-02-13 11:26:51 +01:00
Stuart Nelson
a552afd998 Merge branch 'master' into memberlist 2018-02-13 10:47:17 +01:00
stuart nelson
30af4d051b
release 0.14 (#1237) 2018-02-13 09:13:44 +01:00
Fabian Reinartz
e6df2d8751 Adapt cluster listen address flag in tests 2018-02-12 11:31:55 +01:00
Stuart Nelson
46c6b3f2f1 Update frontend 2018-02-12 11:13:27 +01:00
Fabian Reinartz
6cfbe6e8b4 update cluster listen address flag 2018-02-12 10:22:49 +01:00
songjiayang
d07a072b08 Fix WeChat issue (#1229)
* fix wechat issue

* wechat issue code review
2018-02-11 20:09:47 +01:00
Fabian Reinartz
3f2e00fbea cluster/api: improve metrics and cluster status 2018-02-09 11:16:00 +01:00
Fabian Reinartz
247bfff606 cluster: remove MergeSingle 2018-02-09 11:06:51 +01:00
pasquier-s
76ee5388e7 Forbid 0 value for group_interval and repeat_interval (#1230)
Setting one of these parameters to a zero value doesn't make sense
semantically and can cause high CPU usage.
2018-02-09 10:53:46 +01:00
Mike Bryant
6615ed15d2 Add templating to PD-CEF fields; Add missing field (#1231)
* Allow templating of Component and Group in PagerDuty v2

Related to #1211

* Add missing PD-CEF field Component
2018-02-09 10:50:18 +01:00
Andrey Kuzmin
5101d65938 Fix the slowness of the Silence UI (#1235)
* Cache tabs and fix slow css

* update bindata
2018-02-09 10:42:44 +01:00
Fabian Reinartz
fd49dbb477 *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
Frederic Branczyk
168cb217c6
Merge pull request #1233 from Conorbro/resolved-alert-counter-fix
Fixes AM wrongly counting alerts with EndTimes in the future as resolved
2018-02-08 10:54:13 +01:00
conorbroderick
e8832619e0 Fixes AM wrongly counting alerts with EndTimes in the future as resolved 2018-02-07 15:52:26 +00:00
Corentin Chary
a43a513b77 Fix OpsGenie notifier and add unit tests (#1224)
See #1223, looks like OpsGenie now sometimes returns a 422 when you
don't specify a team. This change cleans up the JSON output and
add a few unit tests.
2018-02-06 13:45:59 +01:00
pasquier-s
17bd637c97 Add mesh metrics (#1225)
* Add mesh metrics

This change adds 2 new metrics for the mesh:

* alertmanager_peer_connection, state of the connection between the
  Alertmanager instance and a peer.
* alertmanager_peer_terminations_total, total number of terminated
  connection.

It also moves the gathering of the alertmanager_peer_position metric
outside of the meshWait() function so that the metric is computed
accurately even when no alerting group fires.

* Remove 'nick' label from alertmanager_peer_connection metric
2018-02-06 12:13:52 +01:00
Carlos Alexandro Becker
c5ea346d06 allow global opsgenie api key (#1208)
* allow global opsgenie api key

* added missing files

* removed test
2018-01-29 16:05:17 +01:00
Carlos Alexandro Becker
23f31d7d5a improved error when victorops fails (#1207)
* improved error when victorops fails

* moved to debug

* allocate mem only once

* joining strings

* logging receiver name

* passing only group name
2018-01-29 16:00:04 +01:00
Tom Paine
081fc7d982 Update simple.yml (#1216)
match spacing on other receiver groups
2018-01-29 15:58:44 +01:00
Daniel Bonatto
94bef6419f Fixes prometheus/alertmanager#1211 (#1214)
Add template to severity field for PagerDuty API v2.
2018-01-27 11:22:41 +01:00
pasquier-s
62b957cc14 Notify only when new firing alerts are added (#1205)
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.

This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
2018-01-23 16:52:03 +01:00
Stuart Nelson
b45c11b561 Fix tests 2018-01-21 15:38:19 +01:00
Jose Donizetti
fc9306cd7e Add expired silence validation (#1096)
* Add expired silence validation

* Add silence end time in the past validation
2018-01-21 15:29:51 +01:00
Jose Donizetti
2fe013bcaa Add tests to memory provider (#1104) 2018-01-21 15:27:21 +01:00
pasquier-s
63598904dc Fix pending connections never going to established (#1204) 2018-01-21 15:09:50 +01:00
pasquier-s
9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00
benbradley
0db01af11e amtool silence update support dwy suffixes to expire flag (#1197) 2018-01-15 19:45:46 +01:00
Stuart Nelson
d20282e1e3 Correct CHANGELOG.md 2018-01-12 14:24:40 +01:00
stuart nelson
fb713f6d82
v0.13.0 (#1194) 2018-01-12 11:29:15 +01:00
Stuart Nelson
7d36d79aba Update silence query long help 2018-01-12 10:44:38 +01:00
Thomás S. Bregolin
cdb44955cf Make --expired list only expired silences (#1176) (#1190)
This means there's no longer a way to list both active and expired
silences at the same time. This is the desired behaviour according to
consensus at https://github.com/prometheus/alertmanager/pull/1175
2018-01-12 10:35:06 +01:00
pasquier-s
907ac510f8 Fix flaky TestBatching acceptance test (#1193)
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.

The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.

If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
2018-01-11 22:45:59 +01:00
Colin Douch
17846f2e33 Fix updating silence comments (#1189)
Possibly another regression introduced by #976 . We use the wrong
variable to update comments in the `amtool silence update` command
which causes us to fail silently. This fixes that.
2018-01-10 17:05:03 +01:00
pasquier-s
a7d4e4ea7c Log snapshot sizes on maintenance (#1155)
* Log snapshot sizes on maintenance

* Add metrics for snapshot sizes

This change adds 2 new gauges for tracking the last snapshots' sizes:

  - alertmanager_nflog_snapshot_size_bytes
  - alertmanager_silences_snapshot_size_bytes
2018-01-10 14:53:57 +01:00
stuart nelson
7b787dab05
Re-introduce prometheus durations in amtool silence creation (#1185)
* Fixes #1183

* Update expires comment

The default time is already output thanks to
kingpin.
2018-01-09 10:47:41 +01:00
stuart nelson
3aa7f03b10
Template secret keys for pagerduty notifier (#1168) (#1182)
The tmpl() call was removed when migrating to
support pd v2 events api.
2018-01-08 13:41:10 +01:00
stuart nelson
3c61fe3fef
Return reload status from http endpoint (#1152) (#1180)
* Return reload status from http endpoint (#1152)

* Use same reload messaging as prometheus
2018-01-08 11:51:05 +01:00
Frederic Branczyk
0b5af7510b
Merge pull request #1159 from simonpasquier/add-healthy-probes
Add /-/healthy endpoint
2018-01-08 11:25:16 +01:00
Calle Pettersson
b7da058efb Switch cmd/alertmanager to kingpin (#974) 2018-01-06 11:22:26 +01:00
Conor Broderick
a1153e83ff
Merge pull request #1167 from prometheus/fix-error-message
Fix error message
2018-01-03 11:10:39 +00:00
Christian Hoffmann
0e63715b23 UI: Fix JavaScript error in MSIE due to endswith() usage (#1172)
* index: avoid endswith() for MSIE compatibility

MSIE does not support endswith() [1]. substr() can
be used to work around this limitation.

[1] https://docs.microsoft.com/en-us/scripting/javascript/reference/endswith-method-string-javascript

* index: clean up comment

* ui: update bindata
2018-01-02 14:25:54 +01:00
Andrey Kuzmin
b8d20dffca Update bindata.go 2018-01-02 12:46:24 +01:00
Andrey Kuzmin
1ccc7b1133 Dont output malformed error body 2018-01-02 12:45:36 +01:00
Andrey Kuzmin
6f8ccb031c
Fix expire buttons on the silences page (#1171)
* Only show confirmation for the specific silence

* Update bindata.go
2018-01-02 12:25:34 +01:00
Fabian Reinartz
92c04096a8
Merge pull request #1154 from dvrkps/patch-1
travis: update go version
2017-12-27 19:05:12 +01:00
pasquier-s
364979bbf8 Display connections in the Status page (#1164)
This change shows the status of the local connections in the web UI. It
can be used to troubleshoot mesh issues.
2017-12-22 11:39:27 +01:00
Calle Pettersson
608848390f Switch amtool to kingpin (#976)
* Switch cmd/amtool to kingpin

* Touch-ups

* Implement long help

* Add missing short-form of --output

* Fix backwards compatibility for config file options

* Fix vendoring

* Review fixes

* Fix flag word order
2017-12-22 11:17:13 +01:00
anthraxn8b
2a0989094b Added 2nd email address to “to“ field (#1163)
Did this to give an example with multiple email addresses in the “to“ field.
2017-12-22 00:14:23 +01:00