Commit Graph

98 Commits

Author SHA1 Message Date
TJ Hoplock f6b942cf9b
chore!: adopt log/slog, drop go-kit/log (#4089)
* chore!: adopt log/slog, drop go-kit/log

The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:

https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434

This commit includes several changes:
- bump exporter-tookit to v0.13.1 for log/slog support
- updates golangci-lint deprecated configs
- enables sloglint linter
- removes old go-kit/log linter configs
- introduce some `if logger == nil { $newLogger }` additions to prevent
  nil references
- converts cluster membership config to use a stdlib compatible slog
  adapter, rather than creating a custom io.Writer for use as the
membership `logOutput` config

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

* chore: address PR feedback

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>

---------

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-11-06 09:09:57 +00:00
Ethan Hunter 69fe3f81fa
only increment silences version if a silence is added (#3961) 2024-10-23 17:27:35 +01:00
George Robinson d9c82e7613 Replace benbjohnson/clock with coder/quartz
This commit replaces the archived and no longer maintained
benbjohnson/clock package with coder/quartz.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-08-27 12:12:23 +01:00
Ethan Hunter 3e6356b4c9
bugfix: fix leaking of Silences matcherCache entries (#3930)
* fix leaking of matcher cache entries

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>

* improve clock logic in TestSilenceGCOverTime

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>

* remove TestSilencesGc

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>

* make table driven test more idiomatic

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>

* replace test with one suggested by grobinson-grafana

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>

* replace require.Len with require.Empty where needed

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>

---------

Signed-off-by: Ethan Hunter <ehunter@hudson-trading.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2024-08-21 17:10:15 +01:00
George Robinson 94ac36b3e0
Fix MaxSilenceSizeBytes limit causes incomplete updates of existing silences (#3897)
This commit fixes a bug where the MaxSilenceSizeBytes limit can
cause an incomplete update of existing silences, where the old
silence can be expired but the new silence cannot be created
because it would exceed the maximum size limit.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-25 13:23:09 +01:00
George Robinson 58dc6f8d33
Fix invalid silence causes incomplete updates (#3898)
This commit fixes a bug where an invalid silence causes incomplete
updates of existing silences. This is fixed moving validation
out of the setSilence method and putting it at the start of the
Set method instead.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-25 12:38:33 +01:00
George Robinson ffd7681fb4
Improve test coverage for silences (#3896)
This commit improves the existing test coverage for silences to
cover a number of additional cases, and also improve the comments
of existing cases.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-24 15:32:16 +01:00
George Robinson b4443817e8
Fix MaxSilences limit causes incomplete updates of existing silences (#3877)
* Fix MaxSilences limit causes incomplete updates of existing silences

This commit fixes a bug where the MaxSilences limit can cause an
incomplete update of existing silences, where the old silence can
be expired but the new silence cannot be created because it would
exceeded the maximum number of silences.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-24 15:11:10 +01:00
George Robinson 52eb1fc4aa
Rename matchers package to matcher singular (#3777)
* Rename matchers package to matcher singular

I realized that we had named the package plural "matchers" when
its idiomatic in Go to use singular package names.

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-21 16:17:27 +02:00
George Robinson cc6de9c666 Remove Id return from silences.Set(*pb.Silence)
This commit removes the Id from the method silences.Set(*pb.Silence)
as it is redundant. The Id is still set even when creating a silence
fails. This will be fixed in a later change.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-20 15:47:49 +01:00
George Robinson e690fbe250
Rename silence limit to max-silence-size-bytes (#3886)
* Rename silence limit to max-silence-size-bytes

This commit renames an existing (unreleased) limit from
max-per-silence-bytes to max-silence-size-bytes.

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Update help

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-20 15:20:52 +01:00
George Robinson 124da3462d
Silence limits as functions (#3885)
* Silence limits as functions

This commit changes silence limits from a struct of ints to a struct
of functions that return individual limits. This allows limits
to be lazy-loaded and updated without having to call silences.New().

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Add explicit test for no limits

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Fix run()

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-20 14:50:53 +01:00
George Robinson db32fab612
Replace incorrect use of fmt.Errorf (#3883) 2024-06-20 12:02:05 +01:00
George Robinson f9d5a08759
Fix TestSilenceLimits tests (#3866)
This commit fixes silence tests that relied on the maintenance
function running at a fixed 100ms interval. If the go runtime
that runs the maintenance is not scheduled with 150ms then the
test will fail.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-05 15:03:00 +01:00
George Robinson dbe6312f09
Limits should include expired silences (#3862)
* Limits should include expired silences

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Fix docs

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-06-03 09:12:19 +01:00
George Robinson b67bde8cf9
Add limits for silences (#3852)
* Add limits for silences

This commit adds limits for silences including the maximum number
of active and pending silences, and the maximum size per silence
(in bytes).

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Remove default limits

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Allow expiration of silences that exceed max size

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-05-31 17:52:44 +01:00
George Robinson d31a249ffc
#3513: Add GroupMarker interface (#3792)
* Add GroupMarker interface

This commit adds a new GroupMarker interface that marks the status
of groups. For example, whether an alert is muted because or one
or more active or mute time intervals.

It renames the existing Marker interface to AlertMarker to avoid
confusion.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-04-30 15:26:04 +01:00
George Robinson 6c70b5c014
Silences: Add benchmarks for Mutes (#3771)
* Add benchmarks for Mutes

This commit updates the existing benchmarks for silences to also
benchmark Mutes. This complements the existing Query benchmarks
by also measuring the time taken to mark silenced alerts.

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-03-21 20:54:56 +00:00
George Krajcsovits d85bef20d9
feature: add native histogram support to latency metrics (#3737)
Note that this does not stop showing classic metrics, for now
it is up to the scrape config to decide whether to keep those instead or
both.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-02-29 14:53:47 +00:00
George Robinson f69a508665
Remove metrics from compat package (#3714)
This commit removes the metrics from the compat package
in favour of the existing logging and the additional tools
at hand, such as amtool, to validate Alertmanager configurations.

Due to the global nature of the compat package, a consequence
of config.Load, these metrics have proven to be less useful
in practice than expected, both in Alertmanager and other projects
such as Mimir.

There are a number of reasons for this:

1. Because the compat package is global, these metrics cannot be
   reset each time config.Load is called, as in multi-tenant
   projects like Mimir loading a config for one tenant would reset
   the metrics for all tenants. This is also the reason the metrics
   are counters and not gauges.

2. Since the metrics are counters, it is difficult to create
   meaningful dashboards for Alertmanager as, unlike in Mimir,
   configurations are not reloaded at fixed intervals, and as such,
   operators cannot use rate to track configuration changes
   over time.

In Alertmanager, there are much better tools available to validate
that an Alertmanager configuration is compatible with the UTF-8
parser, including both the existing logging from Alertmanager
server and amtool check-config.

In other projects like Mimir, we can track configurations for
individual tenants using log aggregation and storage systems
such as Loki. This gives operators far more information than
what is possible with the metrics, including the timestamp,
input and ID of tenant configurations that are incompatible
or have disagreement.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-02-08 09:59:03 +00:00
George Robinson f92a08d073
Remove unused feature flags (#3676)
This commit removes some code that should have been removed in #3668.
The FeatureFlags in silence.Options are no longer used but were
still initialized. These had a no-op effect.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-01-19 10:43:50 +00:00
George Robinson fa6a7e6dd6
Fix inconsistent defaults in UTF-8 behavior (#3668)
This commit fixes inconsistent UTF-8 behavior if the compat package is
not initialized and feature flags are not passed to the API. This can
happen when Alertmanager is used as a package in software such
as Cortex or Mimir.

The inconsistent behavior is that Alertmanager will accept UTF-8 alerts
but reject UTF-8 configurations.

Since feature flags are optional via api.Options, we cannot force them
to be passed to api.New at compile time. Instead, it's better to defer
back to the compat package which is consistent even when not initialized.

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-01-15 10:03:51 +00:00
Matthieu MOREL b9e347b9d1 golangci-lint: enable testifylint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-10 08:50:03 +00:00
Matthieu MOREL b81bad8711 use Go standard errors
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-08 16:44:13 +01:00
George Robinson 70bd5dad98
Support UTF-8 label matchers: Use compat package in Alertmanager server (#3567)
* Support UTF-8 label matchers: Use compat package in Alertmanager server

This pull request adds use of the compat package in Alertmanager server that will allow users to switch between the new matchers/parse parser and the old pkg/labels parser. The new matchers/parse parser uses a fallback mechanism where if the input cannot be parsed in the new parser it then attempts to use the old parser. If an input is parsed in the old parser but not the new parser then a warning log is emitted.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2023-11-24 10:01:40 +00:00
gotjosh 3ee2cd0f12
Metrics: Silence maintenance success and failure (#3285)
* Metrics: Silence maintenance success and failure

Due to various reasons, we've observed different kind of errors on this area. From read-only disks to silly code bugs.
Errors during maintenance are effectively a _data loss_ and therefore we should encourage proper monitoring of this area.

This PR Introduces a total and failure metric for silence maintenance. If agreed, I'll do the same for the nflog and fix the flaky test like I did for silences while I'm there.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-03-08 12:32:59 +00:00
gotjosh 5318bc3ccb
replace atomic for uber fix atomic
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-02-24 12:11:50 +00:00
gotjosh c61ca09246
Fix silences flaky test
Today I learned that `runtime.Gosched()` doesn't do what I thought it would.
While it allows other goroutines to run it doesn't guarantee that the main goroutine will be blocked until others are run.

sadly, I had to fall back to the sleep approach.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-02-24 12:09:47 +00:00
gotjosh f59460bfd4
Refactor nflog configuration options to make it similar to Silences. (#3220)
* Refactor nflog configuration options to make it similar to Silences.

The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval.

To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and `run` to be similar to the silences.
2023-01-19 16:39:03 +00:00
inosato 791e542100 Remove ioutil
Signed-off-by: inosato <si17_21@yahoo.co.jp>
2022-07-18 22:01:02 +09:00
Joe Blubaugh 01d1e49c54 Simplify Silence test to remove unnecessary wait.
As noted in #2867, there is an unnecessary require.Eventually in a
silence test. This PR addresses that by using a channel to signal that
that the maintenance loop has completed.

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-07-06 09:47:52 +08:00
Joe Blubaugh 505f944c6a Apply suggestions from code review.
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-07-05 11:22:46 +08:00
Joe Blubaugh 0c3bf4b6ce Loosen up the timing on an Eventually to avoid CI timeout
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-07-05 11:22:46 +08:00
Joe Blubaugh c9249a02bc Remove a stray line that was breaking the linter.
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-07-05 11:22:46 +08:00
Joe Blubaugh bedd3c4175 Clean up linter warnings about unused code and atomic package
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-07-05 11:22:46 +08:00
Joe Blubaugh cb00d9259b Issue #2850: Add benbjohnson/clock to the silences package.
github.com/benbjohnson/clock provides a time interface to programs
rather than using the stdlib time package. This allows mocking time in
programs and tests. In this commit, the clock is used to speed up and
simplify testing of the silences package.

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-07-05 11:22:46 +08:00
gotjosh cfb909f419
Marker: Rename `SetSilenced` to `SetActiveOrSilenced`
This accurately reflects what the function _actually_ does. If no active silences IDs are provided and the list of inhibitions we have is already empty the alert is actually set to Active. Took me a while to realise this as I was understanding how do we populate the alert list.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-06-17 12:51:23 +01:00
Matthias Loibl a6d10bd5bc
Update golangci-lint and fix complaints (#2853)
* Copy latest golangci-lint files from Prometheus

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use grafana/regexp over stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix typos in comments

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix goimports complains in import sorting

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* gofumpt all Go files

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Update naming to comply with revive linter

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* test/cli: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* .golangci.yaml: Remove obsolete space

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix expected victorOps error

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Clean up Go modules

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2022-03-25 17:59:51 +01:00
Simon Pasquier 3f42c5e813
Merge pull request #2816 from prashbnair/update_check
Correcting the condition for updating a silence. Earlier was checking…
2022-03-04 15:17:12 +01:00
Soon-Ping a2d18c93de
Return no error when deleting expired silence (#2817)
* Changed Silences.expire(id) to not return error for already expired silence

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Added comment explaining idempotency change for Silences.expire()

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Trigger build

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Trigger build

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Fixed typo in comment

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Trigger build

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Trigger build

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Fixed another typo in comment

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Promoted comment to function-level

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Added API v2 test for DeleteSilence, PostSilence

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Fixed lint errors

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Trigger build

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Trigger build

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>

* Trigger build

Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
2022-02-22 13:34:21 +01:00
Prashant Balachandran 66182178d0 Correcting the condition for updating a silence. Earlier was checking upto
nanosecond precision but reduced to second as the UI only sends upto millisecond

Signed-off-by: Prashant Balachandran <pnair@redhat.com>
2022-01-31 11:32:48 +05:30
Kyle Brandt 1b8afe7cb5
export ValidateMatcher for DI (#2) (#2716)
so third parties, Grafana in particular, can over ride the validation.

Grafana wants to do this because other data sources will have label keys with things like spaces, periods, or other characters - and looking for a better integration with alert manager.

goes with grafana/grafana#38629
replaces https://github.com/prometheus/alertmanager/pull/2694

Signed-off-by: Kyle Brandt <kyle@grafana.com>
2021-10-21 09:29:55 +02:00
Yuriy Tseretyan 15f44f4a61
Close file descriptor after snapshot file was read (#2710)
* close file if it is opened

Signed-off-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
2021-10-19 01:12:02 +02:00
Julius Volz 5195460c95
Correctly call default silence maintenance function (#2701)
https://github.com/prometheus/alertmanager/pull/2689 introduced a
regression where the default maintenance function would no longer be
called even if no override was specified. The Alertmanager now crashes
on any silence maintenance run without this fix.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2021-09-13 19:42:48 +05:30
gotjosh 8da517524a
Enable support for custom callbacks as part of maintenance (#2689)
* Enable support for custom callbacks as part of maintenance

This enables support for custom Maintenance callbacks as part of the periodic maintenance of silences and notification logs.
Effectively a no-op for the Alertmanager but allows downstream implementation to inject custom logic as part of it.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add tests

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Fix tests and remove whitespace

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Address review comments

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* run go fmt

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Fix import ordering

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2021-09-06 16:19:39 +05:30
Julien Pivotto b2a4cacb95 Update go dependencies & switch to go-kit/log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-08-02 12:43:23 +02:00
beorn7 e84c265196 Include pending silences for future muting decisions
Previously, if a pending silence existed for an alert, and it later
became active without any silences getting added in the meantime, we
would miss the existence of that newly active silence.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-05-27 22:15:57 +02:00
beorn7 f7c8a4b28a Add test to expose issue #2426
Signed-off-by: beorn7 <beorn@grafana.com>
2021-05-26 19:39:25 +02:00
Ganesh Vernekar 1f946f8a7d
Replace satori/go.uuid with gofrs/uuid
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2021-03-15 19:39:15 +05:30
Ganesh Vernekar 406ddd200a
Upgrade github.com/satori/go.uuid
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2021-03-10 14:49:07 +05:30