Commit Graph

1790 Commits

Author SHA1 Message Date
beorn7
21de9ff88c Various improvements after code review
Most importantly, `api.New` now takes an `Options` struct as an
argument, which allows some other things done here as well:

- Timout and concurrency limit are now in the options, streamlining
  the registration and the implementation of the limiting middleware.

- A local registry is used for metrics, and the metrics used so far
  inside any of the api packages are using it now.

The 'in flight' metric now contains the 'get' as a method label. I
have also added a TODO to instrument other methods in the same way
(otherwise, the label doesn't reall make sense, semantically). I have
also added an explicit error counter for requests rejected because of
the concurrency limit. (They also show up as 503s in the generic HTTP
instrumentation (or they would, if v2 were instrumented, too), but
those 503s might have a number of reasons, while users might want to
alert on concurrency limit problems explicitly).

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-12 18:42:08 +01:00
beorn7
3382a0e949 Add HTTP instrumentation for GET requests in flight
While the newly added in-flight instrumentation works for all GET
requests, the existing HTTP instrumentation omits api/v2 calls. This
commit adds a TODO note about that.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
beorn7
4747fd9b2f Propagate timeout to alert listing via context
The context is created by the http.TimeoutHandler we use to set the
timeout.

I believe this is the only endpoint where propagating the timeout is
feasible and needed.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
beorn7
fc4b67ce80 Introduce a timeout and concurrency limit for HTTP requests
The default concurrency limit is max(GOMAXPROCS, 8). That should not
imply that each GET requests eats a whole CPU. It's more to get some
reasonable heuristics for the processing power of the hosting machine
(while allowing at least 8 concurrent requests even on the smallest
machines). As GET requests can easily overload the Alertmanager,
rendering it incapable of doing its main task, namely sending alert
notifications, we need to limit GET requests by default.

In contrast, no timeout is set by default. The http.TimeoutHandler
inovkes quite a bit of machinery behind the scenes, in particular an
additional layer of buffering. Thus, we should first get a bit of
experience with it before we consider enforcing a timeout by default,
even if setting a timeout is in general the safer setting for
resiliency.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
Simon Pasquier
b10646f9ac notify: factorize code truncating strings (#1752)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-11 18:07:14 +01:00
Max Inden
17e8cc04b8
Merge pull request #1738 from mxinden/update-memberlist
vendor: Update to hashicorp/memberlist v1.0.3
2019-02-11 10:42:17 +01:00
JoeWrightss
b926c6935e Fix some typos in comment (#1750)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-08 14:57:08 +01:00
Max Inden
46ff99aba7
Merge pull request #1741 from mxinden/register-marker-metrics
main.go: Move marker metric registering into types/types.go
2019-02-08 11:17:05 +01:00
Björn Rabenstein
59fd5df91b
Merge pull request #1747 from prometheus/beorn7/build
Fix build problems
2019-02-07 18:42:03 +01:00
Björn Rabenstein
9e8437b54a
Merge pull request #1749 from simonpasquier/master
Update Makefile.common
2019-02-07 17:57:34 +01:00
Simon Pasquier
f5797a0e79 Update Makefile.common
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-07 17:34:58 +01:00
beorn7
f7e4d7b375 Update scripts/errcheck_excludes.txt
With Go modules, the path appears un-vendored.

Plus, we are not calling AllowedLevel.Set anywhere anymore.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-07 16:29:12 +01:00
Nayana Thorat
5d1bd285ca promu.yml: Add suport for s390x (#1742)
Signed-off-by: Nayana <nthorat@us.ibm.com>
2019-02-07 15:52:10 +01:00
beorn7
6f0e911dd1 Fix the assets make target
Presumable, it broke with the introduction of Go modules

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-07 15:11:13 +01:00
Max Leonard Inden
3a38db8faa
vendor: Update to hashicorp/memberlist v0.1.3
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-05 16:52:53 +01:00
Max Leonard Inden
09a7370572
main.go: Move marker metric registering into types/types.go
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-05 14:59:22 +01:00
Max Inden
083639e31e
Merge pull request #1736 from mxinden/combine-apis
api: Combine v1 and v2 into generic api
2019-02-05 09:17:18 +01:00
Max Leonard Inden
c57542127d
api: Combine v1 and v2 into generic api
Instead of cmd/alertmanager/main.go instantiating and starting both api
v1 and v2, delegate that work to a generic api combining the two.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-04 14:31:33 +01:00
Max Inden
5c5ff9e769
Merge pull request #1728 from mxinden/nil-peer-master
api/v2: Make cluster status peers and name optional
2019-02-04 12:12:14 +01:00
Max Leonard Inden
8e157b3af5
api/v2: Make cluster status peers and name optional
If a users chooses to disable the Alertmanager cluster feature, there is
no cluster name nor cluster peers. Hence these should be optional. Only
cluster status is set to "disabled".

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-04 11:40:30 +01:00
JoeWrightss
5c61f4dbc8 Fixs typo in README.md (#1687)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-01 16:49:57 +01:00
Max Inden
f1bf34b234
Merge pull request #1732 from mxinden/back-to-master
*: Cut v0.16.1 - back to master
2019-01-31 17:33:24 +01:00
Max Leonard Inden
d90c52d6a1
*: Cut v0.16.1
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-31 16:57:20 +01:00
Max Leonard Inden
2f055d9966
api/v2: Do not populate cluster info if clustering is disabled
When users start Alertmanager with `--cluster.listen-address=`, the
cluster will not be initialized, hence api.peer will be `nil`. So far
this would result in a nil pointer dereference by the API v2 accessing
the api.peer field.

With this patch, api v2 skips populating the peers array, sets the name
to an empty string and the status to "disabled" in case `api.peer` is
nil.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-31 16:56:59 +01:00
Ye Ben
5f8eaf9560 cluster/delegate: Replace labels to const to reduce hardcode (#1724)
Signed-off-by: yeya24 <ben.ye@daocloud.io>
2019-01-28 10:17:55 +01:00
Stefan Büringer
fc1153560d trim PagerDuty message summary to 1024 chars, add PagerDuty debug log (#1701)
PagerDuty Alerts are rejected (a 400 BadRequest is sent back from PagerDuty)
when the summary field is longer than 1024 characters
(https://v2.developer.pagerduty.com/docs/send-an-event-events-api-v2).

Signed-off-by: Stefan Bueringer <sbueringer@gmail.com>
2019-01-24 14:38:35 +01:00
Hrishikesh Barman
23e7fec030 scripts/genproto.sh: Add version locking for protobuf extensions (#1707)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-22 11:48:37 +01:00
Max Inden
3f5b6924e8
Merge pull request #1716 from mxinden/back-to-master
*: Cut v0.16.0 back to master
2019-01-22 10:26:13 +01:00
Max Leonard Inden
a0b7f6eea3
*: Cut v0.16.0
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-21 14:00:55 +01:00
Max Inden
705abf31c2
Merge pull request #1711 from mxinden/disable-redoc
api/v2: Disable serving swagger spec and redoc UI
2019-01-17 17:12:13 +01:00
Max Leonard Inden
7aa8ea9d9d
api/v2: Disable serving swagger spec and redoc UI
By default go-swagger serves the swagger spec and the redoc UI. This
patch disables both.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-17 16:19:29 +01:00
Max Inden
96a0f06d3d
Merge pull request #1710 from mxinden/back-to-master
*: Cut v0.16.0-beta.0 back to master
2019-01-16 14:51:30 +01:00
Max Leonard Inden
71997ffc49
*: Cut v0.16.0-beta.0
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-15 21:51:12 +01:00
Hrishikesh Barman
4e424e3cd6 config: Change DefaultGlobalConfig to a function (#1656)
The variable DefaultGlobalConfig was being used to initialize values, but it stored previous information due to which some things were persisting in the newer initialization.

In this PR, DefaultGlobalConfig is changed to a function so that it returns a fresh GlobalConfig for initialization.

Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-15 18:03:45 +01:00
Jason Roberts
b02afcad63 Support adding custom fields to VictorOps notifications (#1420)
* Support adding custom fields to VictorOps notifications

* Response to feedback

* Added logic to validate victorops custom fields to config load time

* Cleanup victorops notifier of logic duplicated in config check

* rebase and further cleanup from feedback

* another grammer fix

Signed-off-by: Jason Roberts <jroberts@drud.com>
2019-01-15 11:59:05 +01:00
stuart nelson
dba283edd0
respect regex matchers when recreating silences (#1697)
* Respect regexes when recreating silences
* Generate assets

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-01-09 10:33:43 +01:00
stuart nelson
b437240bd9
Stn/update alert compact view (#1698)
* Remove inhibited/silenced text

In the alert list, this is already seen via the
icons. In the silence preview, since it's in the
silence preview, clearly it's affected by the
silence.

* Generate assets

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-01-08 16:26:12 +01:00
Hrishikesh Barman
dc74b6a15b support for assuming first label is alertname in silence add and query (#1693)
* simplified setting first assumed alertname in cli/silence_query.go
* added assumed first label to alertname when adding silences

Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-07 13:49:41 +01:00
Brian Brazil
7078333202 Make a copy of firing alerts with EndsAt=0 when flushing. (#1686)
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.

Mitigate race and fix data races in TestAggrGroup.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-04 16:52:20 +01:00
Simon Pasquier
b676fa79c0 *: update Makefile.common with new staticcheck (#1692)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 15:37:33 +01:00
Tomas Dabasinskas
cfc0d9c558 Pushover: support HTML, URL title and custom sounds (#1634)
* Support HTML inside Pushover message

Signed-off-by: Tomas Dabasinskas <tomas@dabasinskas.net>
2018-12-18 15:15:30 +01:00
Simon Pasquier
16be34fed8 Bump prometheus/client_golang to v0.9.2 (#1670)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-17 11:05:40 +01:00
Simon Pasquier
9a116736ef api/v2: Add CORS support (#1667)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-16 14:05:34 +01:00
JoeWrightss
9ccbeb585b cluster: Fix typo in comment (#1668)
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-16 14:03:55 +01:00
Paul Traylor
cd4a524848 Update prometheus/common and add support for --log.format (#1658)
Signed-off-by: Paul Traylor <paul.traylor@linecorp.com>
2018-12-13 12:58:43 +01:00
stuart nelson
082b1efed0
Fix #1662 (#1665)
GroupByAll and a duplicate GroupBy were showing up
in the marshaled config, which we don't want.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-12-13 11:51:49 +01:00
Simon Pasquier
34f78c9146 config: fix unmarshalling of secret URLs (#1663)
* config: fix unmarshalling of secret URLs

* Add comment describing why we need the special case

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-13 11:09:02 +01:00
Sylvain Rabot
f1d33fbcc2 Travis: specify go_import_path (#1653)
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
2018-12-10 15:37:13 +01:00
Max Inden
c850bdd334
Merge pull request #1652 from prometheus/release-0.16
CHANGELOG.md: Fix date typo, back to master
2018-12-09 12:33:43 +01:00
Hrishikesh Barman
78914f868d added documentation (#1654)
* ref #1610 : added documentation

Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2018-12-08 12:05:47 +01:00