Most importantly, `api.New` now takes an `Options` struct as an
argument, which allows some other things done here as well:
- Timout and concurrency limit are now in the options, streamlining
the registration and the implementation of the limiting middleware.
- A local registry is used for metrics, and the metrics used so far
inside any of the api packages are using it now.
The 'in flight' metric now contains the 'get' as a method label. I
have also added a TODO to instrument other methods in the same way
(otherwise, the label doesn't reall make sense, semantically). I have
also added an explicit error counter for requests rejected because of
the concurrency limit. (They also show up as 503s in the generic HTTP
instrumentation (or they would, if v2 were instrumented, too), but
those 503s might have a number of reasons, while users might want to
alert on concurrency limit problems explicitly).
Signed-off-by: beorn7 <beorn@soundcloud.com>
While the newly added in-flight instrumentation works for all GET
requests, the existing HTTP instrumentation omits api/v2 calls. This
commit adds a TODO note about that.
Signed-off-by: beorn7 <beorn@soundcloud.com>
The context is created by the http.TimeoutHandler we use to set the
timeout.
I believe this is the only endpoint where propagating the timeout is
feasible and needed.
Signed-off-by: beorn7 <beorn@soundcloud.com>
The default concurrency limit is max(GOMAXPROCS, 8). That should not
imply that each GET requests eats a whole CPU. It's more to get some
reasonable heuristics for the processing power of the hosting machine
(while allowing at least 8 concurrent requests even on the smallest
machines). As GET requests can easily overload the Alertmanager,
rendering it incapable of doing its main task, namely sending alert
notifications, we need to limit GET requests by default.
In contrast, no timeout is set by default. The http.TimeoutHandler
inovkes quite a bit of machinery behind the scenes, in particular an
additional layer of buffering. Thus, we should first get a bit of
experience with it before we consider enforcing a timeout by default,
even if setting a timeout is in general the safer setting for
resiliency.
Signed-off-by: beorn7 <beorn@soundcloud.com>
With Go modules, the path appears un-vendored.
Plus, we are not calling AllowedLevel.Set anywhere anymore.
Signed-off-by: beorn7 <beorn@soundcloud.com>
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
Instead of cmd/alertmanager/main.go instantiating and starting both api
v1 and v2, delegate that work to a generic api combining the two.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
If a users chooses to disable the Alertmanager cluster feature, there is
no cluster name nor cluster peers. Hence these should be optional. Only
cluster status is set to "disabled".
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
When users start Alertmanager with `--cluster.listen-address=`, the
cluster will not be initialized, hence api.peer will be `nil`. So far
this would result in a nil pointer dereference by the API v2 accessing
the api.peer field.
With this patch, api v2 skips populating the peers array, sets the name
to an empty string and the status to "disabled" in case `api.peer` is
nil.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
The variable DefaultGlobalConfig was being used to initialize values, but it stored previous information due to which some things were persisting in the newer initialization.
In this PR, DefaultGlobalConfig is changed to a function so that it returns a fresh GlobalConfig for initialization.
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
* Support adding custom fields to VictorOps notifications
* Response to feedback
* Added logic to validate victorops custom fields to config load time
* Cleanup victorops notifier of logic duplicated in config check
* rebase and further cleanup from feedback
* another grammer fix
Signed-off-by: Jason Roberts <jroberts@drud.com>
* Remove inhibited/silenced text
In the alert list, this is already seen via the
icons. In the silence preview, since it's in the
silence preview, clearly it's affected by the
silence.
* Generate assets
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
* simplified setting first assumed alertname in cli/silence_query.go
* added assumed first label to alertname when adding silences
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
If the original EndsAt is left in place, then as time moves forwards
past the EndsAt then firing alerts will be rendered and treated as
resolved alerts which can cause confusion and races. This is most
likely to happen on retries for a notification.
Mitigate race and fix data races in TestAggrGroup.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
GroupByAll and a duplicate GroupBy were showing up
in the marshaled config, which we don't want.
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>