A Peer as defined by the `cluster` package represents the node in the
cluster. It is used in other packages to know the status of all of the
members or how long should we wait to know if a notification has already fired.
In Cortex, we'd like to implement a slightly different way of
clustering (using gRPC for communication and a
hash ring for node discovery).
This is a small change to support that by changing the consumer of other
packages to an interface.
Silences and Notification channels don't need an interface as they take
a `func([]byte) error` as a parameter.
Signed-off-by: gotjosh <josue@grafana.com>
* Make filter labels consistent with Prometheus
Filtering the alert out when the label is missing precludes a
possible match for an empty value. This change allows the
match to be evaluated.
Closes#2342
Signed-off-by: Victor Araujo <vear91@gmail.com>
* Add tests for matchFilterLabels in v2 api
Signed-off-by: Victor Araujo <vear91@gmail.com>
* api/v2: add path and method to API v2 logs
When an API v2 handler logged a message, the log wouldn't include the
path and method. Since different handlers perform the same validations
(e.g. matchers for alerts and silences), it isn't easy to know which
handler was invoked (though the logged filename
+ line number provides a hint).
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Capitalize messages + improve logs
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix an error message about start and end time validation
Signed-off-by: Célian Garcia <celian.garcia@amadeus.com>
* Modified start and end time validation message to be affirmative
Signed-off-by: Célian Garcia <celian.garcia@amadeus.com>
When the client attempts to update a silence with a non-existent
ID, respond with a 404 (Not Found) instead of a 400 (Bad Request).
Signed-off-by: Paul Gier <pgier@redhat.com>
- Move the generated api/v2 client code out of the test directory
and into the api/v2 directory with models and restapi.
- Remove duplicate models directory
- Update tests to use api/v2 package for models and client
Signed-off-by: Paul Gier <pgier@redhat.com>
- make clean shouldn't print errors when files/directories have already
been removed
- add copyright header to generated api files to pass license check
Signed-off-by: Paul Gier <pgier@redhat.com>
Sync Makefile.common to latest which updates promu version
and adds license check to default target.
Add missing license headers.
Signed-off-by: Paul Gier <pgier@redhat.com>
* api/v2: sort silences similarly to v1 api
Sort the queried silences to match behaviour in the v1 api.
Sort silences in-place instead of creating multiple slices.
Use separate function for sorting silences for easier testing.
Add unit test for sort order.
Signed-off-by: Paul Gier <pgier@redhat.com>
Add version tracking of silences states. Adding a silence to the state
increments the version. If the version hasn't changed since the last
time an alert was checked for being silenced, we only have to verify
that the relevant silences are still active rather than checking the
alert against all silences.
Signed-off-by: beorn7 <beorn@soundcloud.com>
This encapsulates the logic of querying and marking silenced
alerts. It removes the code duplication flagged earlier.
I removed the error returned by the setAlertStatus function as we were
only logging it, and that's already done anyway when the error is
received from the `silence.Query` call (now in the `Mutes` method).
Signed-off-by: beorn7 <beorn@soundcloud.com>
This changes removes all usage of golang.org/x/net/context in the code
base. It also bumps a few dependencies for the same reason:
- github.com/gogo/protobuf
- go-openapi/*
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Instead of handling all config specific logic inside
Alertmangaer.main(), this patch introduces the config coordinator
component.
Tasks of the config coordinator:
- Load and parse configuration
- Notify subscribers on configuration changes
- Register and manage configuration specific metrics
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
Most importantly, `api.New` now takes an `Options` struct as an
argument, which allows some other things done here as well:
- Timout and concurrency limit are now in the options, streamlining
the registration and the implementation of the limiting middleware.
- A local registry is used for metrics, and the metrics used so far
inside any of the api packages are using it now.
The 'in flight' metric now contains the 'get' as a method label. I
have also added a TODO to instrument other methods in the same way
(otherwise, the label doesn't reall make sense, semantically). I have
also added an explicit error counter for requests rejected because of
the concurrency limit. (They also show up as 503s in the generic HTTP
instrumentation (or they would, if v2 were instrumented, too), but
those 503s might have a number of reasons, while users might want to
alert on concurrency limit problems explicitly).
Signed-off-by: beorn7 <beorn@soundcloud.com>
While the newly added in-flight instrumentation works for all GET
requests, the existing HTTP instrumentation omits api/v2 calls. This
commit adds a TODO note about that.
Signed-off-by: beorn7 <beorn@soundcloud.com>
The context is created by the http.TimeoutHandler we use to set the
timeout.
I believe this is the only endpoint where propagating the timeout is
feasible and needed.
Signed-off-by: beorn7 <beorn@soundcloud.com>
The default concurrency limit is max(GOMAXPROCS, 8). That should not
imply that each GET requests eats a whole CPU. It's more to get some
reasonable heuristics for the processing power of the hosting machine
(while allowing at least 8 concurrent requests even on the smallest
machines). As GET requests can easily overload the Alertmanager,
rendering it incapable of doing its main task, namely sending alert
notifications, we need to limit GET requests by default.
In contrast, no timeout is set by default. The http.TimeoutHandler
inovkes quite a bit of machinery behind the scenes, in particular an
additional layer of buffering. Thus, we should first get a bit of
experience with it before we consider enforcing a timeout by default,
even if setting a timeout is in general the safer setting for
resiliency.
Signed-off-by: beorn7 <beorn@soundcloud.com>
Instead of cmd/alertmanager/main.go instantiating and starting both api
v1 and v2, delegate that work to a generic api combining the two.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>