Commit Graph

1657 Commits

Author SHA1 Message Date
Max Leonard Inden
7a6cf68775
*: Cut v0.15.2
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-08-14 13:19:31 +02:00
Max Inden
2b4598c6d1
Merge pull request #1514 from s-urbaniak/concurrency
provider/mem: cleanup closed listener in GC
2018-08-13 23:17:09 +02:00
Simon Pasquier
0f24c85d06 Sync Makefile.common (#1518)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-13 18:01:55 +02:00
Johannes 'fish' Ziemke
f443038149 Slack: Support image/thumb url in attachment (#1506)
This closes #1491

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2018-08-13 15:14:45 +02:00
stuart nelson
f8b95a2e95
Correctly encode query strings in notifiers (#1516)
In moving from a plain string to url.URL, we
incorrectly were setting the query string via the
path. The `?` signaling the start of the query
string would then be escaped when the URL was
turned into a string.

Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>
2018-08-13 13:33:51 +02:00
Max Inden
2c00f06575 MAINTAINERS.md: Add Max Inden (#1509)
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-08-13 12:14:53 +02:00
Sergiusz Urbaniak
f9896e0162
provider/mem: cleanup closed listener in GC
... rather than in the Subscribe method. Currently the cleanup for a
given Alert subscription is done in a blocking goroutine, started in
the Subscribe method.

This simplifies it by moving the cleanup to the GC.

Additionally it simplifies the subscribe method by setting up the
buffered channel big enough to fill it up with all pending alerts
preventing the necessity to start a goroutine in Subscribe at all.

Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
2018-08-13 09:35:11 +02:00
comicmuse
ec263489e9 Add cache control headers to the API responses to avoid IE caching th… (#1500)
Add cache control headers to the API responses to avoid IE caching the response.
2018-08-06 18:51:54 +02:00
Max Inden
d4788ed195 provider/mem: Add Put Subscribe starvation test (#1503)
TestAlertsSubscribePutStarvation tests starvation of `iterator.Close` and
`alerts.Put`. Both `Subscribe` and `Put` use the Alerts.mtx lock. `Subscribe`
needs it to subscribe and more importantly unsubscribe `Alerts.listeners`.
`Put` uses the lock to add additional alerts and iterate the `Alerts.listeners`
map.  If the channel of a listener is at its limit, `alerts.Lock` is blocked,
whereby a listener can not unsubscribe as the lock is hold by `alerts.Lock`.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-08-06 16:00:17 +02:00
wangYue
0fc0ff8e71 Avoid listener blocking (#1482)
Signed-off-by: wangyue <wangyue@actiontech.com>
2018-08-06 13:24:21 +02:00
Julius Volz
6d0edbe630 Fix a bunch of unhandled errors (#1501)
...as discovered by "gosec" (many other ones reported, but not all make
a lot of sense to fix).

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-05 15:38:25 +02:00
Mike Bryant
216aa785b0 fix: Update PagerDuty API V2 to send full details on resolve (#1483)
This adds compatiblity with PagerDuty's Event rules feature, allowing resolve events to be routed based on attributes

Fixes #1440

Signed-off-by: Mike Bryant <m@ocado.com>
2018-08-01 15:57:15 +02:00
Adam Shannon
77452894b8 notify: log PagerDuty v1 response on BadRequest (#1481)
Signed-off-by: Adam Shannon <adamkshannon@gmail.com>
2018-07-30 12:25:52 +02:00
Max Inden
4fff29c683
Merge pull request #1486 from mxinden/staticcheck
config/test: Count `<secret>` occurrences via golang strings
2018-07-30 09:41:51 +02:00
Max Leonard Inden
0e50299679
config/test: Count <secret> occurrences via golang strings
`honnef.co/go/tools/cmd/staticcheck` complains with
`config/config_test.go:260:32: regular expression does not contain any
meta characters (SA6004)`. Instead of using a RegEx this patch simply
switches to using Golangs `strings.Count` function.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-30 08:08:44 +02:00
Simon Pasquier
0ccc7c9f74 config: validate URLs at config load time (#1468)
* config: validate URLs at config load time

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Brian and Lucas comments

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Shallow copy of URL instead of reparsing it

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Unshadow net/url package

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Make a deep-copy of URL struct

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 12:39:33 +02:00
Simon Pasquier
37884c8460 alertmanager: fix Settle() interval (#1478)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-24 22:42:09 +02:00
Ben Chess
235944cc5f Email is green if only none firing (#1475)
Signed-off-by: Benjamin Chess <bchess@gmail.com>
2018-07-23 14:06:46 +02:00
Max Inden
81b9a83f06 notify: Improve error handling (#1474)
- `tmplText` and `tmplHTML` are using a monad-style error handling [1].
This reduces the verbosity of the error logic, but introduces the risk
of forgetting the final error check. This patch does not remove this
coding-style, but ensures proper error checking in the Email and
PagerDuty notifier.

- Ensure to handle errors returned by `multipartWriter.Close()` and
`wc.Write(buffer.Bytes())` in `Email.Notify()`.

[1] https://www.innoq.com/en/blog/golang-errors-monads/

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-23 14:04:40 +02:00
Mark Van De Weert
7f86d613b6 enable templating of hipchat room_id (#1463)
Signed-off-by: Mark Van De Weert <mark.vandeweert@wpengine.com>
2018-07-19 18:35:53 +02:00
stuart nelson
bd6100793f
Add timeout support to amtool commands (#1471)
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-07-17 09:50:48 +02:00
Bob Shannon
50e271678d Add support for adding alerts using amtool (#1461)
* Add support for adding alerts using amtool

Signed-off-by: Bob Shannon <bshannon@palantir.com>

* comment: Simplify return in addAlert

Signed-off-by: Bob Shannon <bshannon@palantir.com>
2018-07-16 16:29:04 +02:00
Max Inden
81cc0ffa12
*: Cut 0.15.1 (#1467)
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-14 14:23:31 +02:00
Max Inden
3735df3ac7
cluster: Do not exit when failing to join cluster (#1465)
Alertmanager is exiting with a non-zero exit code if the initial cluster
join fails. This behavior could be not wanted because:

- As Alertmanager is a critical component with an at-least-once
guarantee, failing on joining the cluster is unnecessary as
Alertmanager still functions by itself.

- In an environment like Kubernetes discovering peers via DNS, peers
might roll out one-by-one, leaving the DNS entries unpopulated for the
first peer of a set. Failing on initial join prevents a roll-out.

Instead of failing on the initial join this patch only logs the failure.
The cluster can be later joined via the `handleReconnect`.

This is a regression introduced in PR #1456 [1].

[1] https://github.com/prometheus/alertmanager/pull/1456

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-11 17:19:33 +02:00
bigMacro
f3bc41d256 fix concurrent read and wirte group error (#1447)
* fix concurrent read and wirte group

Signed-off-by: denghuan <denghuan@actionsky.com>

* make lock more elegant

Signed-off-by: denghuan <denghuan@actionsky.com>
2018-07-10 17:13:41 +02:00
Simon Pasquier
5aac7c840b amtool: add support for stdin to check-config (#1431)
* amtool: add support for stdin to check-config

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Stuart's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-09 19:27:04 +02:00
Corentin Chary
42ea9a565b cluster: make sure we don't miss the first pushPull (#1456)
* cluster: make sure we don't miss the first pushPull

During the join, memberlist initiates a pushPull to get initial data.
Unfortunately, at this point the nflog and silence listener have not
been registered yet, so the first data arrives only after one pushPull
cycle (1min by default !).

Signed-off-by: Corentin Chary <c.chary@criteo.com>
2018-07-09 11:16:04 +02:00
Simon Pasquier
f5a258dd1d cluster: fail when no private address can be found (#1437)
The memberlist library fails when it can't find a private address and no
advertise address is given. To return a helpful message to the user,
AlertManager mimics the logic from memberlist. However the code had a
bug that swallowed the error message and made it difficult for the user
to understand how to fix the problem.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-05 22:59:56 +02:00
Max Inden
67d3d9e85a
Merge pull request #1458 from mxinden/add-next-release-changelog
CHANGELOG.md: Add 'Next release' section with docker working dir change
2018-07-05 17:07:19 +02:00
Max Inden
a736a90dd0
Merge pull request #1436 from simonpasquier/fix-wechat-templ
notify: catch templating errors for Wechat
2018-07-05 14:52:16 +02:00
Max Leonard Inden
4a6496c964
CHANGELOG.md: Add 'Next release' section with docker working dir change
To ensure we include the breaking change notice in the next release
notes, this patch adds a 'Next release' section mentioning the breaking
change of the working directory of the Alertmanager Dockerfile.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-05 13:47:40 +02:00
Simon Pasquier
2d3c4065e8 config: fix regression with Pager Duty (#1455)
The YAML strict mode doesn't allow mapping keys that are duplicates. If
someone wants to override one of the default keys in the Details hash,
the unmarshal function returns an error because the key is already
defined by DefaultPagerdutyConfig.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-05 09:54:28 +02:00
Martin Chodur
2cd2bd3644 fix: reverted change of dockerfile entrypoint (#1435) 2018-07-04 10:53:51 +02:00
Max Inden
7d70fd9031
Merge pull request #1421 from palmerabollo/patch-1
fix: email template typo in alert-warning style
2018-07-04 10:00:39 +02:00
Waldemar Biller
4e8a910b9d Lookup parts in strings using regexp.MatchString (#1452)
Signed-off-by: Waldemar Biller <wbiller@gmail.com>
2018-07-03 10:55:47 +01:00
Max Inden
98105b8360
Merge pull request #1438 from mxinden/master
CHANGELOG.md: Improve [CHANGE] section of v0.15.0 release
2018-06-30 02:55:14 +08:00
Max Leonard Inden
b8333e11fa
CHANGELOG.md: Improve [CHANGE] section of v0.15.0 release
- Add entry for working dir change in Alertmanager Docker image
- Indicate cluster flag changes

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-06-26 22:23:10 +08:00
Simon Pasquier
d188c21fb0 notify: catch templating errors for Wechat
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-24 14:02:21 +02:00
Max Inden
dac673a6aa
Merge pull request #1430 from mxinden/release-0.15
*: Cut 0.15.0 merge to master
2018-06-23 00:56:13 +08:00
Max Leonard Inden
462c969d85
*: Cut 0.15.0
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-06-22 19:01:25 +08:00
Guido García
fd97b969c8 fix: email template typo in alert-warning style
Signed-off-by: Guido García <guido.garciabernardo@telefonica.com>
2018-06-18 17:39:26 +02:00
Max Inden
5e86f61bd7
Merge pull request #1419 from mxinden/cut-0.15.0-rc.3
*: Cut 0.15.0-rc.3
2018-06-18 12:17:42 +02:00
Max Leonard Inden
17e2fc7c2b
*: Cut 0.15.0-rc.3
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-06-16 10:09:30 +02:00
Simon Pasquier
7a272416de cluster: prune the queue if it contains too many items (#1418)
* cluster: prune the queue if too large

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address review comments

Also increases the pruning interval to 15 minutes and the max queue size
to 4096 items (same value as used by Serf).

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-15 18:08:12 +02:00
stuart nelson
445fbdf1a8
gossip large messages via SendReliable (#1415)
* Gossip large messages via SendReliable

For messages beyond half of the maximum gossip
packet size, send the message to all peer nodes
via TCP.

The choice of "larger than half the max gossip
size" is relatively arbitrary. From brief testing,
the overhead from memberlist on a packet seemed to
only use ~3 of the available 1400 bytes, and most
gossip messages seem to be <<500 bytes.

* Add tests for oversized/normal message gossiping

* Make oversize metric names consistent

* Remove errant printf in test

* Correctly increment WaitGroup

* Add comment for OversizedMessage func

* Add metric for oversized messages dropped

Code was added to drop oversized messages if the
buffered channel they are sent on is full. This
is a good thing to surface as a metric.

* Add counter for total oversized messages sent

* Change full queue log level to debug

Was previously a warning, which isn't necessary
now that there is a metric tracking it.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-06-15 13:40:21 +02:00
Simon Pasquier
8034f137e1 cluster: don't track FQDN addresses as inital peers (#1416)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-15 12:34:50 +02:00
Simon Pasquier
6a7c912559 Sort alerts in correct order (#1349)
* Sort dispatched alerts by job+instance in the correct order (#1178)

Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>

* dispatch: add unit test for alerts sorting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-14 15:54:33 +02:00
Simon Pasquier
387e684faa vendor: update prometheus/common packages (#1414)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-13 16:11:22 +02:00
Simon Pasquier
0c512998ee Use Makefile.common from Prometheus (#1396)
* Include Makefile.common
* Fix the bindata.go files to make the style target happy
* Inline `.PHONY` statements

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-13 14:41:52 +02:00
stuart nelson
77cc718a81 [nflog] register snapshotSize
This metric was never registered.
2018-06-12 13:59:48 +02:00