* chore!: adopt log/slog, drop go-kit/log
The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:
https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434
This commit includes several changes:
- bump exporter-tookit to v0.13.1 for log/slog support
- updates golangci-lint deprecated configs
- enables sloglint linter
- removes old go-kit/log linter configs
- introduce some `if logger == nil { $newLogger }` additions to prevent
nil references
- converts cluster membership config to use a stdlib compatible slog
adapter, rather than creating a custom io.Writer for use as the
membership `logOutput` config
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* chore: address PR feedback
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
---------
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
This commit replaces the archived and no longer maintained
benbjohnson/clock package with coder/quartz.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit fixes a bug where the MaxSilenceSizeBytes limit can
cause an incomplete update of existing silences, where the old
silence can be expired but the new silence cannot be created
because it would exceed the maximum size limit.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit fixes a bug where an invalid silence causes incomplete
updates of existing silences. This is fixed moving validation
out of the setSilence method and putting it at the start of the
Set method instead.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Fix MaxSilences limit causes incomplete updates of existing silences
This commit fixes a bug where the MaxSilences limit can cause an
incomplete update of existing silences, where the old silence can
be expired but the new silence cannot be created because it would
exceeded the maximum number of silences.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Rename matchers package to matcher singular
I realized that we had named the package plural "matchers" when
its idiomatic in Go to use singular package names.
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit removes the Id from the method silences.Set(*pb.Silence)
as it is redundant. The Id is still set even when creating a silence
fails. This will be fixed in a later change.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Rename silence limit to max-silence-size-bytes
This commit renames an existing (unreleased) limit from
max-per-silence-bytes to max-silence-size-bytes.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Update help
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Silence limits as functions
This commit changes silence limits from a struct of ints to a struct
of functions that return individual limits. This allows limits
to be lazy-loaded and updated without having to call silences.New().
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add explicit test for no limits
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Fix run()
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Limits should include expired silences
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Fix docs
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add limits for silences
This commit adds limits for silences including the maximum number
of active and pending silences, and the maximum size per silence
(in bytes).
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove default limits
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Allow expiration of silences that exceed max size
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add GroupMarker interface
This commit adds a new GroupMarker interface that marks the status
of groups. For example, whether an alert is muted because or one
or more active or mute time intervals.
It renames the existing Marker interface to AlertMarker to avoid
confusion.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
Note that this does not stop showing classic metrics, for now
it is up to the scrape config to decide whether to keep those instead or
both.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This commit removes some code that should have been removed in #3668.
The FeatureFlags in silence.Options are no longer used but were
still initialized. These had a no-op effect.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit fixes inconsistent UTF-8 behavior if the compat package is
not initialized and feature flags are not passed to the API. This can
happen when Alertmanager is used as a package in software such
as Cortex or Mimir.
The inconsistent behavior is that Alertmanager will accept UTF-8 alerts
but reject UTF-8 configurations.
Since feature flags are optional via api.Options, we cannot force them
to be passed to api.New at compile time. Instead, it's better to defer
back to the compat package which is consistent even when not initialized.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Support UTF-8 label matchers: Use compat package in Alertmanager server
This pull request adds use of the compat package in Alertmanager server that will allow users to switch between the new matchers/parse parser and the old pkg/labels parser. The new matchers/parse parser uses a fallback mechanism where if the input cannot be parsed in the new parser it then attempts to use the old parser. If an input is parsed in the old parser but not the new parser then a warning log is emitted.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Metrics: Silence maintenance success and failure
Due to various reasons, we've observed different kind of errors on this area. From read-only disks to silly code bugs.
Errors during maintenance are effectively a _data loss_ and therefore we should encourage proper monitoring of this area.
This PR Introduces a total and failure metric for silence maintenance. If agreed, I'll do the same for the nflog and fix the flaky test like I did for silences while I'm there.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Refactor nflog configuration options to make it similar to Silences.
The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval.
To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and `run` to be similar to the silences.
github.com/benbjohnson/clock provides a time interface to programs
rather than using the stdlib time package. This allows mocking time in
programs and tests. In this commit, the clock is used to speed up and
simplify testing of the silences package.
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
This accurately reflects what the function _actually_ does. If no active silences IDs are provided and the list of inhibitions we have is already empty the alert is actually set to Active. Took me a while to realise this as I was understanding how do we populate the alert list.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
so third parties, Grafana in particular, can over ride the validation.
Grafana wants to do this because other data sources will have label keys with things like spaces, periods, or other characters - and looking for a better integration with alert manager.
goes with grafana/grafana#38629
replaces https://github.com/prometheus/alertmanager/pull/2694
Signed-off-by: Kyle Brandt <kyle@grafana.com>
https://github.com/prometheus/alertmanager/pull/2689 introduced a
regression where the default maintenance function would no longer be
called even if no override was specified. The Alertmanager now crashes
on any silence maintenance run without this fix.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Enable support for custom callbacks as part of maintenance
This enables support for custom Maintenance callbacks as part of the periodic maintenance of silences and notification logs.
Effectively a no-op for the Alertmanager but allows downstream implementation to inject custom logic as part of it.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Add tests
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Fix tests and remove whitespace
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Address review comments
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* run go fmt
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Fix import ordering
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Previously, if a pending silence existed for an alert, and it later
became active without any silences getting added in the meantime, we
would miss the existence of that newly active silence.
Signed-off-by: beorn7 <beorn@grafana.com>
This has been discussed in #2479. Even if the conclusion there was
that we don't need this in a bugfix release, it's still better to have
this kind of robustness. So this introduces the same check into the
main branch.
Signed-off-by: beorn7 <beorn@grafana.com>
* check if at least one silence matcher doesn't match empty strings
Signed-off-by: qoops <ilya.v.gladyshev@gmail.com>
* fixed grammar
Signed-off-by: qoops <ilya.v.gladyshev@gmail.com>
With the next release of client_golang, Summaries will not have
objectives by default. Interestingly, this will do the right thing for
the Summaries affected by this commit. However, right now those
summaries do get the old default objectives. They don't really make
sense because the affected Summaries receive Observations quite
infrequently (far less than once in the 10m max age currently
used). To not get surprising changes when moving on to client_golang
v1, let's explicitly set the Summaries as objective-less now.
Signed-off-by: beorn7 <beorn@grafana.com>
Essentially, the Silences.Expire() will in that case have no effect
because the affected silence is immediately seen as expired from the
storage and thus not updated. The silence will stay around in its old
state.
This fix makes sure to use the same “now” throughout the expiration
process.
Signed-off-by: beorn7 <beorn@soundcloud.com>
Add version tracking of silences states. Adding a silence to the state
increments the version. If the version hasn't changed since the last
time an alert was checked for being silenced, we only have to verify
that the relevant silences are still active rather than checking the
alert against all silences.
Signed-off-by: beorn7 <beorn@soundcloud.com>
This encapsulates the logic of querying and marking silenced
alerts. It removes the code duplication flagged earlier.
I removed the error returned by the setAlertStatus function as we were
only logging it, and that's already done anyway when the error is
received from the `silence.Query` call (now in the `Mutes` method).
Signed-off-by: beorn7 <beorn@soundcloud.com>