Commit Graph

1805 Commits

Author SHA1 Message Date
Björn Rabenstein
891f368c51
Merge pull request #1764 from prometheus/beorn7/muting
Modify the self-inhibition prevention semantics
2019-02-26 14:01:22 +01:00
stuart nelson
0f634debfd
Merge pull request #1767 from prometheus/beorn7/muting2
Improve doc comments for Marker and friends
2019-02-26 13:00:04 +01:00
beorn7
f7df3743da Another doc comment fix for inhibition tests
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-26 12:40:53 +01:00
Simon Pasquier
c7de536129
*: use stdlib context (#1768)
This changes removes all usage of golang.org/x/net/context in the code
base. It also bumps a few dependencies for the same reason:
- github.com/gogo/protobuf
- go-openapi/*

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 12:18:57 +01:00
beorn7
12671bd261 Improve doc comments for Marker and friends
This clarifies a bunch of things I have run into during code reading
in preparation for some performance improvements around muting.

It also moves doc comments from places where they don't show up in
godoc to visible places.

It also fixes golint warnings.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-25 17:48:15 +01:00
beorn7
33638b1412 Fix doc comment
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-25 17:18:35 +01:00
beorn7
22db73fbf7 Modify the self-inhibition prevention semantics
This has been discussed in #666 (issue of hell...).

As concluded there, the cleanest semantics is most likely the
following: "An alert that matches both target and source side cannot
inhibit alerts for which the same is true." The two open questions
were:
1. How difficult is the implementation?
2. Is it needed?

This relatively simple commit proves that the answer to (1) is: Not
very difficult. (This also includes a performance-improving
simplification, which would have been possible without a change of
semantics.)

The answer to (2) is twofold:

For one, the original use case in #666 wasn't solved by our interim
solution. What we solved is the case where the self-inhibition is
triggered by a wide target match, i.e. I have a specific alert that
should inhibit a whole group of target alerts without inhibiting
itself. What we did _not_ solve is the inverted case: Self-inhibition
by a wide source match, i.e. an alert that should only fire if none of
a whole group of source alert fires. I mean, we "fixed" it as in, the
target alert will never be inhibited, but @lmb in #666 wanted the
alert to be inhibited _sometimes_ (just not _always_).

The other part is that I think that the asymmetry in our interim
solution will at some point haunt us. Thus, I really would like to get
this change in before we do a 1.0 release.

In practice, I expect this to be only relevant in very rare cases. But
those cases will be most difficult to reason with, and I claim that
the solution in this commit is matching what humans intuitively
expect.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-25 16:10:08 +01:00
Max Inden
6d555302fc
Merge pull request #1744 from mxinden/introduce-config-coordinator
*: Introduce config coordinator bundling config specific logic
2019-02-25 12:01:46 +01:00
Max Leonard Inden
d0cd5a0f08
*: Introduce config coordinator bundling config specific logic
Instead of handling all config specific logic inside
Alertmangaer.main(), this patch introduces the config coordinator
component.

Tasks of the config coordinator:
- Load and parse configuration
- Notify subscribers on configuration changes
- Register and manage configuration specific metrics

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-25 11:26:30 +01:00
Nguyen Quang Huy
edf170d03b Add signed off to commit (#1766)
- Fix error string should not be capitalized from [Golang coding convention](https://github.com/golang/go/wiki/CodeReviewComments#error-strings)
- Fix some typos

Signed-off-by: Nguyen Quang Huy <huynq0911@gmail.com>
2019-02-25 11:04:42 +01:00
Simon Pasquier
57c4ff10ab api/v2: serve OpenAPI specification (#1751)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 15:36:19 +01:00
Simon Pasquier
f809c45f4e Link to the Prometheus's contributing guide (#1745)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-19 11:09:35 +01:00
Miek Gieben
41c5b5cefa Add original field to Regexp (#1757)
In similar vein to prometheus/prometheus/pkg/relabel/relabel.go, extend
Regexp to include the original regular expression string to faithfully
output what was read.

Update TestEmptyFieldsAndRegex.

Fixes: #1753

Signed-off-by: Miek Gieben <miek@miek.nl>
2019-02-19 11:08:35 +01:00
stuart nelson
51eebbef85
Stn/correctly mark api silences (#1733)
* Update alert status on every GET to alerts

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-02-18 17:06:51 +01:00
Björn Rabenstein
da6e2a88dd
Merge pull request #1743 from prometheus/beorn7/api
Introduce concurrency limit for GET requests and a general timeout for HTTP
2019-02-15 16:17:15 +01:00
beorn7
21de9ff88c Various improvements after code review
Most importantly, `api.New` now takes an `Options` struct as an
argument, which allows some other things done here as well:

- Timout and concurrency limit are now in the options, streamlining
  the registration and the implementation of the limiting middleware.

- A local registry is used for metrics, and the metrics used so far
  inside any of the api packages are using it now.

The 'in flight' metric now contains the 'get' as a method label. I
have also added a TODO to instrument other methods in the same way
(otherwise, the label doesn't reall make sense, semantically). I have
also added an explicit error counter for requests rejected because of
the concurrency limit. (They also show up as 503s in the generic HTTP
instrumentation (or they would, if v2 were instrumented, too), but
those 503s might have a number of reasons, while users might want to
alert on concurrency limit problems explicitly).

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-12 18:42:08 +01:00
beorn7
3382a0e949 Add HTTP instrumentation for GET requests in flight
While the newly added in-flight instrumentation works for all GET
requests, the existing HTTP instrumentation omits api/v2 calls. This
commit adds a TODO note about that.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
beorn7
4747fd9b2f Propagate timeout to alert listing via context
The context is created by the http.TimeoutHandler we use to set the
timeout.

I believe this is the only endpoint where propagating the timeout is
feasible and needed.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
beorn7
fc4b67ce80 Introduce a timeout and concurrency limit for HTTP requests
The default concurrency limit is max(GOMAXPROCS, 8). That should not
imply that each GET requests eats a whole CPU. It's more to get some
reasonable heuristics for the processing power of the hosting machine
(while allowing at least 8 concurrent requests even on the smallest
machines). As GET requests can easily overload the Alertmanager,
rendering it incapable of doing its main task, namely sending alert
notifications, we need to limit GET requests by default.

In contrast, no timeout is set by default. The http.TimeoutHandler
inovkes quite a bit of machinery behind the scenes, in particular an
additional layer of buffering. Thus, we should first get a bit of
experience with it before we consider enforcing a timeout by default,
even if setting a timeout is in general the safer setting for
resiliency.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
Simon Pasquier
b10646f9ac notify: factorize code truncating strings (#1752)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-11 18:07:14 +01:00
Max Inden
17e8cc04b8
Merge pull request #1738 from mxinden/update-memberlist
vendor: Update to hashicorp/memberlist v1.0.3
2019-02-11 10:42:17 +01:00
JoeWrightss
b926c6935e Fix some typos in comment (#1750)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 14:57:08 +01:00
Max Inden
46ff99aba7
Merge pull request #1741 from mxinden/register-marker-metrics
main.go: Move marker metric registering into types/types.go
2019-02-08 11:17:05 +01:00
Björn Rabenstein
59fd5df91b
Merge pull request #1747 from prometheus/beorn7/build
Fix build problems
2019-02-07 18:42:03 +01:00
Björn Rabenstein
9e8437b54a
Merge pull request #1749 from simonpasquier/master
Update Makefile.common
2019-02-07 17:57:34 +01:00
Simon Pasquier
f5797a0e79 Update Makefile.common
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-07 17:34:58 +01:00
beorn7
f7e4d7b375 Update scripts/errcheck_excludes.txt
With Go modules, the path appears un-vendored.

Plus, we are not calling AllowedLevel.Set anywhere anymore.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-07 16:29:12 +01:00
Nayana Thorat
5d1bd285ca promu.yml: Add suport for s390x (#1742)
Signed-off-by: Nayana <nthorat@us.ibm.com>
2019-02-07 15:52:10 +01:00
beorn7
6f0e911dd1 Fix the assets make target
Presumable, it broke with the introduction of Go modules

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-07 15:11:13 +01:00
Max Leonard Inden
3a38db8faa
vendor: Update to hashicorp/memberlist v0.1.3
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-05 16:52:53 +01:00
Max Leonard Inden
09a7370572
main.go: Move marker metric registering into types/types.go
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-05 14:59:22 +01:00
Max Inden
083639e31e
Merge pull request #1736 from mxinden/combine-apis
api: Combine v1 and v2 into generic api
2019-02-05 09:17:18 +01:00
Max Leonard Inden
c57542127d
api: Combine v1 and v2 into generic api
Instead of cmd/alertmanager/main.go instantiating and starting both api
v1 and v2, delegate that work to a generic api combining the two.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-04 14:31:33 +01:00
Max Inden
5c5ff9e769
Merge pull request #1728 from mxinden/nil-peer-master
api/v2: Make cluster status peers and name optional
2019-02-04 12:12:14 +01:00
Max Leonard Inden
8e157b3af5
api/v2: Make cluster status peers and name optional
If a users chooses to disable the Alertmanager cluster feature, there is
no cluster name nor cluster peers. Hence these should be optional. Only
cluster status is set to "disabled".

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-04 11:40:30 +01:00
JoeWrightss
5c61f4dbc8 Fixs typo in README.md (#1687)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-01 16:49:57 +01:00
Max Inden
f1bf34b234
Merge pull request #1732 from mxinden/back-to-master
*: Cut v0.16.1 - back to master
2019-01-31 17:33:24 +01:00
Max Leonard Inden
d90c52d6a1
*: Cut v0.16.1
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-31 16:57:20 +01:00
Max Leonard Inden
2f055d9966
api/v2: Do not populate cluster info if clustering is disabled
When users start Alertmanager with `--cluster.listen-address=`, the
cluster will not be initialized, hence api.peer will be `nil`. So far
this would result in a nil pointer dereference by the API v2 accessing
the api.peer field.

With this patch, api v2 skips populating the peers array, sets the name
to an empty string and the status to "disabled" in case `api.peer` is
nil.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-31 16:56:59 +01:00
Ye Ben
5f8eaf9560 cluster/delegate: Replace labels to const to reduce hardcode (#1724)
Signed-off-by: yeya24 <ben.ye@daocloud.io>
2019-01-28 10:17:55 +01:00
Stefan Büringer
fc1153560d trim PagerDuty message summary to 1024 chars, add PagerDuty debug log (#1701)
PagerDuty Alerts are rejected (a 400 BadRequest is sent back from PagerDuty)
when the summary field is longer than 1024 characters
(https://v2.developer.pagerduty.com/docs/send-an-event-events-api-v2).

Signed-off-by: Stefan Bueringer <sbueringer@gmail.com>
2019-01-24 14:38:35 +01:00
Hrishikesh Barman
23e7fec030 scripts/genproto.sh: Add version locking for protobuf extensions (#1707)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-22 11:48:37 +01:00
Max Inden
3f5b6924e8
Merge pull request #1716 from mxinden/back-to-master
*: Cut v0.16.0 back to master
2019-01-22 10:26:13 +01:00
Max Leonard Inden
a0b7f6eea3
*: Cut v0.16.0
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-21 14:00:55 +01:00
Max Inden
705abf31c2
Merge pull request #1711 from mxinden/disable-redoc
api/v2: Disable serving swagger spec and redoc UI
2019-01-17 17:12:13 +01:00
Max Leonard Inden
7aa8ea9d9d
api/v2: Disable serving swagger spec and redoc UI
By default go-swagger serves the swagger spec and the redoc UI. This
patch disables both.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-17 16:19:29 +01:00
Max Inden
96a0f06d3d
Merge pull request #1710 from mxinden/back-to-master
*: Cut v0.16.0-beta.0 back to master
2019-01-16 14:51:30 +01:00
Max Leonard Inden
71997ffc49
*: Cut v0.16.0-beta.0
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-15 21:51:12 +01:00
Hrishikesh Barman
4e424e3cd6 config: Change DefaultGlobalConfig to a function (#1656)
The variable DefaultGlobalConfig was being used to initialize values, but it stored previous information due to which some things were persisting in the newer initialization.

In this PR, DefaultGlobalConfig is changed to a function so that it returns a fresh GlobalConfig for initialization.

Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-15 18:03:45 +01:00
Jason Roberts
b02afcad63 Support adding custom fields to VictorOps notifications (#1420)
* Support adding custom fields to VictorOps notifications

* Response to feedback

* Added logic to validate victorops custom fields to config load time

* Cleanup victorops notifier of logic duplicated in config check

* rebase and further cleanup from feedback

* another grammer fix

Signed-off-by: Jason Roberts <jroberts@drud.com>
2019-01-15 11:59:05 +01:00