Commit Graph

1829 Commits

Author SHA1 Message Date
Paul Gier
538305bec9 update Makefile.common and license headers
Sync Makefile.common to latest which updates promu version
and adds license check to default target.
Add missing license headers.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-03-11 10:39:31 -05:00
Paul Gier
3ffe6cfdc8 api/v2: sort silences similarly to v1 api (#1786)
* api/v2: sort silences similarly to v1 api

Sort the queried silences to match behaviour in the v1 api.

Sort silences in-place instead of creating multiple slices.
Use separate function for sorting silences for easier testing.
Add unit test for sort order.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-03-11 14:19:52 +01:00
stuart nelson
c5e7dca3dc
Merge pull request #1788 from simonpasquier/fix-multi-string-query-parameters
*: fix filter parameters with comma
2019-03-09 11:03:26 +01:00
Simon Pasquier
bc373f562f *: fix filter parameters with comma
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-08 09:56:05 +01:00
Max Inden
0a2d108bb9
Merge pull request #1763 from mxinden/design-secure-memberlist
doc: Add 'Secure Alertmanager cluster traffic' design document
2019-03-05 14:44:59 +01:00
stuart nelson
baa8dd7ef8
Merge pull request #1782 from simonpasquier/go-1.12
Switch to Go 1.12
2019-03-04 17:19:43 +01:00
Simon Pasquier
a4412270ef Switch to Go 1.12
It also pins errcheck to the latest stable release and simplifies how it
gets installed.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-04 16:33:32 +01:00
stuart nelson
0788670f81
Merge pull request #1780 from prometheus/fix-receiver-check
Fix receiver name checking in deep sub-routes
2019-03-04 11:11:58 +01:00
Julius Volz
3a73ca5b65 Fix receiver name checking in deep sub-routes
Fixes https://github.com/prometheus/alertmanager/issues/1759

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-03-02 12:35:09 +01:00
Max Leonard Inden
d81b9a5435
doc: Add 'Secure Alertmanager cluster traffic' design document
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-03-01 16:12:25 +01:00
Jo Walsh
8642c0b46e Allow sending of unauthenticated SMTP requests when smtp_auth_username is not supplied (#1739)
* try a more complicated but clearer approach explicitly returning a no-auth stmp.Auth when no username is supplied in config

Signed-off-by: Jo Walsh <jowalsh@bgs.ac.uk>

* fix test to expect no error from auth if username is not supplied
Signed-off-by: Jo Walsh <jowalsh@bgs.ac.uk>

* clean up some formatting errors in surplus comments

Signed-off-by: Jo Walsh <jowalsh@bgs.ac.uk>

* keep noAuth / loginAuth functions all together

Signed-off-by: Jo Walsh <jowalsh@bgs.ac.uk>

* Address latest comments

Co-Authored-By: Jo Walsh <jowalsh@bgs.ac.uk>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-01 15:53:18 +01:00
Björn Rabenstein
a732f6dbe4
Merge pull request #1776 from prometheus/beorn7/ui
Make the silence preview show also muted alerts
2019-03-01 14:47:08 +01:00
beorn7
f2345d7c8e Make the silence preview show also muted alerts
Even if an alert is already silenced and/or muted, I still want to
know that my newly created silence will affect it.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-03-01 14:15:53 +01:00
Max Inden
c11117fe9f
Merge pull request #1774 from prometheus/beorn7/muting
Performance improvement checking the silenced state
2019-03-01 14:07:56 +01:00
beorn7
15ed7be846 Remove -u from go get for errcheck
Without the `-u`, it will load what's required in the `go.mod`
file. But with the `-u`, it will load new stuff once it's available,
which makes the build non-reproducible. (Without any change in the
`errcheck` repo, other things will happen.)

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-28 14:27:21 +01:00
beorn7
46b61a38cd Remove a confusing closure
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-28 13:04:05 +01:00
beorn7
0ab3b724cc Fix bug with zero retention time
Essentially, the Silences.Expire() will in that case have no effect
because the affected silence is immediately seen as expired from the
storage and thus not updated. The silence will stay around in its old
state.

This fix makes sure to use the same “now” throughout the expiration
process.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-28 12:51:40 +01:00
beorn7
82b634916e Improve testing, expose a bug with zero retention time
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-28 12:34:54 +01:00
beorn7
3c981a92f7 Improve Mutes performance for silences
Add version tracking of silences states. Adding a silence to the state
increments the version. If the version hasn't changed since the last
time an alert was checked for being silenced, we only have to verify
that the relevant silences are still active rather than checking the
alert against all silences.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-28 12:34:41 +01:00
beorn7
49ff877079 Add benchmark for querying silences
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-27 12:36:49 +01:00
stuart nelson
6dc4d1f825
Merge pull request #1771 from prometheus/beorn7/muting3
Create a `Muter` implementation for silences
2019-02-27 12:29:24 +01:00
stuart nelson
339e861450
Merge pull request #1770 from simonpasquier/update-coordinator-logs
config: update coordinator's logs
2019-02-26 18:16:11 +01:00
beorn7
f3d9c89bbc Create a Muter implementation for silences
This encapsulates the logic of querying and marking silenced
alerts. It removes the code duplication flagged earlier.

I removed the error returned by the setAlertStatus function as we were
only logging it, and that's already done anyway when the error is
received from the `silence.Query` call (now in the `Mutes` method).

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-26 16:42:59 +01:00
Simon Pasquier
873f1fb87c config: update coordinator's logs
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 16:41:50 +01:00
Björn Rabenstein
891f368c51
Merge pull request #1764 from prometheus/beorn7/muting
Modify the self-inhibition prevention semantics
2019-02-26 14:01:22 +01:00
stuart nelson
0f634debfd
Merge pull request #1767 from prometheus/beorn7/muting2
Improve doc comments for Marker and friends
2019-02-26 13:00:04 +01:00
beorn7
f7df3743da Another doc comment fix for inhibition tests
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-26 12:40:53 +01:00
Simon Pasquier
c7de536129
*: use stdlib context (#1768)
This changes removes all usage of golang.org/x/net/context in the code
base. It also bumps a few dependencies for the same reason:
- github.com/gogo/protobuf
- go-openapi/*

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 12:18:57 +01:00
beorn7
12671bd261 Improve doc comments for Marker and friends
This clarifies a bunch of things I have run into during code reading
in preparation for some performance improvements around muting.

It also moves doc comments from places where they don't show up in
godoc to visible places.

It also fixes golint warnings.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-25 17:48:15 +01:00
beorn7
33638b1412 Fix doc comment
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-25 17:18:35 +01:00
beorn7
22db73fbf7 Modify the self-inhibition prevention semantics
This has been discussed in #666 (issue of hell...).

As concluded there, the cleanest semantics is most likely the
following: "An alert that matches both target and source side cannot
inhibit alerts for which the same is true." The two open questions
were:
1. How difficult is the implementation?
2. Is it needed?

This relatively simple commit proves that the answer to (1) is: Not
very difficult. (This also includes a performance-improving
simplification, which would have been possible without a change of
semantics.)

The answer to (2) is twofold:

For one, the original use case in #666 wasn't solved by our interim
solution. What we solved is the case where the self-inhibition is
triggered by a wide target match, i.e. I have a specific alert that
should inhibit a whole group of target alerts without inhibiting
itself. What we did _not_ solve is the inverted case: Self-inhibition
by a wide source match, i.e. an alert that should only fire if none of
a whole group of source alert fires. I mean, we "fixed" it as in, the
target alert will never be inhibited, but @lmb in #666 wanted the
alert to be inhibited _sometimes_ (just not _always_).

The other part is that I think that the asymmetry in our interim
solution will at some point haunt us. Thus, I really would like to get
this change in before we do a 1.0 release.

In practice, I expect this to be only relevant in very rare cases. But
those cases will be most difficult to reason with, and I claim that
the solution in this commit is matching what humans intuitively
expect.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-25 16:10:08 +01:00
Max Inden
6d555302fc
Merge pull request #1744 from mxinden/introduce-config-coordinator
*: Introduce config coordinator bundling config specific logic
2019-02-25 12:01:46 +01:00
Max Leonard Inden
d0cd5a0f08
*: Introduce config coordinator bundling config specific logic
Instead of handling all config specific logic inside
Alertmangaer.main(), this patch introduces the config coordinator
component.

Tasks of the config coordinator:
- Load and parse configuration
- Notify subscribers on configuration changes
- Register and manage configuration specific metrics

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-25 11:26:30 +01:00
Nguyen Quang Huy
edf170d03b Add signed off to commit (#1766)
- Fix error string should not be capitalized from [Golang coding convention](https://github.com/golang/go/wiki/CodeReviewComments#error-strings)
- Fix some typos

Signed-off-by: Nguyen Quang Huy <huynq0911@gmail.com>
2019-02-25 11:04:42 +01:00
Simon Pasquier
57c4ff10ab api/v2: serve OpenAPI specification (#1751)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 15:36:19 +01:00
Simon Pasquier
f809c45f4e Link to the Prometheus's contributing guide (#1745)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-19 11:09:35 +01:00
Miek Gieben
41c5b5cefa Add original field to Regexp (#1757)
In similar vein to prometheus/prometheus/pkg/relabel/relabel.go, extend
Regexp to include the original regular expression string to faithfully
output what was read.

Update TestEmptyFieldsAndRegex.

Fixes: #1753

Signed-off-by: Miek Gieben <miek@miek.nl>
2019-02-19 11:08:35 +01:00
stuart nelson
51eebbef85
Stn/correctly mark api silences (#1733)
* Update alert status on every GET to alerts

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-02-18 17:06:51 +01:00
Björn Rabenstein
da6e2a88dd
Merge pull request #1743 from prometheus/beorn7/api
Introduce concurrency limit for GET requests and a general timeout for HTTP
2019-02-15 16:17:15 +01:00
beorn7
21de9ff88c Various improvements after code review
Most importantly, `api.New` now takes an `Options` struct as an
argument, which allows some other things done here as well:

- Timout and concurrency limit are now in the options, streamlining
  the registration and the implementation of the limiting middleware.

- A local registry is used for metrics, and the metrics used so far
  inside any of the api packages are using it now.

The 'in flight' metric now contains the 'get' as a method label. I
have also added a TODO to instrument other methods in the same way
(otherwise, the label doesn't reall make sense, semantically). I have
also added an explicit error counter for requests rejected because of
the concurrency limit. (They also show up as 503s in the generic HTTP
instrumentation (or they would, if v2 were instrumented, too), but
those 503s might have a number of reasons, while users might want to
alert on concurrency limit problems explicitly).

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-12 18:42:08 +01:00
beorn7
3382a0e949 Add HTTP instrumentation for GET requests in flight
While the newly added in-flight instrumentation works for all GET
requests, the existing HTTP instrumentation omits api/v2 calls. This
commit adds a TODO note about that.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
beorn7
4747fd9b2f Propagate timeout to alert listing via context
The context is created by the http.TimeoutHandler we use to set the
timeout.

I believe this is the only endpoint where propagating the timeout is
feasible and needed.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
beorn7
fc4b67ce80 Introduce a timeout and concurrency limit for HTTP requests
The default concurrency limit is max(GOMAXPROCS, 8). That should not
imply that each GET requests eats a whole CPU. It's more to get some
reasonable heuristics for the processing power of the hosting machine
(while allowing at least 8 concurrent requests even on the smallest
machines). As GET requests can easily overload the Alertmanager,
rendering it incapable of doing its main task, namely sending alert
notifications, we need to limit GET requests by default.

In contrast, no timeout is set by default. The http.TimeoutHandler
inovkes quite a bit of machinery behind the scenes, in particular an
additional layer of buffering. Thus, we should first get a bit of
experience with it before we consider enforcing a timeout by default,
even if setting a timeout is in general the safer setting for
resiliency.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
Simon Pasquier
b10646f9ac notify: factorize code truncating strings (#1752)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-11 18:07:14 +01:00
Max Inden
17e8cc04b8
Merge pull request #1738 from mxinden/update-memberlist
vendor: Update to hashicorp/memberlist v1.0.3
2019-02-11 10:42:17 +01:00
JoeWrightss
b926c6935e Fix some typos in comment (#1750)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 14:57:08 +01:00
Max Inden
46ff99aba7
Merge pull request #1741 from mxinden/register-marker-metrics
main.go: Move marker metric registering into types/types.go
2019-02-08 11:17:05 +01:00
Björn Rabenstein
59fd5df91b
Merge pull request #1747 from prometheus/beorn7/build
Fix build problems
2019-02-07 18:42:03 +01:00
Björn Rabenstein
9e8437b54a
Merge pull request #1749 from simonpasquier/master
Update Makefile.common
2019-02-07 17:57:34 +01:00
Simon Pasquier
f5797a0e79 Update Makefile.common
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-07 17:34:58 +01:00