* chore!: adopt log/slog, drop go-kit/log
The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:
https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434
This commit includes several changes:
- bump exporter-tookit to v0.13.1 for log/slog support
- updates golangci-lint deprecated configs
- enables sloglint linter
- removes old go-kit/log linter configs
- introduce some `if logger == nil { $newLogger }` additions to prevent
nil references
- converts cluster membership config to use a stdlib compatible slog
adapter, rather than creating a custom io.Writer for use as the
membership `logOutput` config
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* chore: address PR feedback
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
---------
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
This commit updates /api/v2/alerts/groups to show if an alert is
suppressed from one or more active or mute time intervals. While
the muted by field can be found in /api/v2/alerts, it is not
used here because /api/v2/alerts does not take aggregation
or routing into consideration.
It also updates the UI to support filtering muted alerts via the
Muted checkbox.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Rename matchers package to matcher singular
I realized that we had named the package plural "matchers" when
its idiomatic in Go to use singular package names.
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Rename silence limit to max-silence-size-bytes
This commit renames an existing (unreleased) limit from
max-per-silence-bytes to max-silence-size-bytes.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Update help
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Silence limits as functions
This commit changes silence limits from a struct of ints to a struct
of functions that return individual limits. This allows limits
to be lazy-loaded and updated without having to call silences.New().
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add explicit test for no limits
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Fix run()
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Limits should include expired silences
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Fix docs
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add limits for silences
This commit adds limits for silences including the maximum number
of active and pending silences, and the maximum size per silence
(in bytes).
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove default limits
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Allow expiration of silences that exceed max size
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Mark muted groups
This commit updates TimeMuteStage and TimeActiveStage to mark groups
as muted when its alerts are muted by an active or mute time interval,
and remove any existing markers when outside all active and mute
time intervals.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove unlock to defer
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Bump prometheus/common to v0.52.3
This commit bumps prometheus/common to v0.52.3. It has a breaking
change where the metric alertmanager_build_info has been renamed
to go_build_info as the metric has been moved from prometheus/common
to prometheus/client_golang and the namspace argument has been
removed.
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
Note that this does not stop showing classic metrics, for now
it is up to the scrape config to decide whether to keep those instead or
both.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This commit removes the metrics from the compat package
in favour of the existing logging and the additional tools
at hand, such as amtool, to validate Alertmanager configurations.
Due to the global nature of the compat package, a consequence
of config.Load, these metrics have proven to be less useful
in practice than expected, both in Alertmanager and other projects
such as Mimir.
There are a number of reasons for this:
1. Because the compat package is global, these metrics cannot be
reset each time config.Load is called, as in multi-tenant
projects like Mimir loading a config for one tenant would reset
the metrics for all tenants. This is also the reason the metrics
are counters and not gauges.
2. Since the metrics are counters, it is difficult to create
meaningful dashboards for Alertmanager as, unlike in Mimir,
configurations are not reloaded at fixed intervals, and as such,
operators cannot use rate to track configuration changes
over time.
In Alertmanager, there are much better tools available to validate
that an Alertmanager configuration is compatible with the UTF-8
parser, including both the existing logging from Alertmanager
server and amtool check-config.
In other projects like Mimir, we can track configurations for
individual tenants using log aggregation and storage systems
such as Loki. This gives operators far more information than
what is possible with the metrics, including the timestamp,
input and ID of tenant configurations that are incompatible
or have disagreement.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add metric for inhibit rules
This commit adds a new metric called alertmanager_inhibit_rules.
It is identical to the alertmanager_integrations and
alertmanager_receivers metrics that are present in the current
and previous versions.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Rename metric and variable
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit removes some code that should have been removed in #3668.
The FeatureFlags in silence.Options are no longer used but were
still initialized. These had a no-op effect.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit fixes inconsistent UTF-8 behavior if the compat package is
not initialized and feature flags are not passed to the API. This can
happen when Alertmanager is used as a package in software such
as Cortex or Mimir.
The inconsistent behavior is that Alertmanager will accept UTF-8 alerts
but reject UTF-8 configurations.
Since feature flags are optional via api.Options, we cannot force them
to be passed to api.New at compile time. Instead, it's better to defer
back to the compat package which is consistent even when not initialized.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add metrics to matchers compat package
This commit adds the following metrics to the compat package:
alertmanager_matchers_parse
alertmanager_matchers_disagree
alertmanager_matchers_incompatible
alertmanager_matchers_invalid
With a label called origin to differentiate the different sources
of inputs: the configuration file, the API, and amtool.
The disagree_total metric is incremented when an input is invalid
in both parsers, but results in different parsed representations,
then there is disagreement. This should not happen, and suggests
their is either a bug in one of the parsers or a mistake in the
backwards compatible guarantees of the matchers/parse parser.
The incompatible_total metric is incremented when an input is valid
in pkg/labels, but not the UTF-8 parser in matchers/parse. In such
case, the matcher should be updated to be compatible. This often
means adding double quotes around the right hand side of the matcher.
For example, foo="bar".
The invalid_total metric is incremented when an input is invalid
in both parsers. This was never a valid input.
The tests have been updated to check the metrics are incremented
as expected.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Support UTF-8 label matchers: Use compat package in Alertmanager server
This pull request adds use of the compat package in Alertmanager server that will allow users to switch between the new matchers/parse parser and the old pkg/labels parser. The new matchers/parse parser uses a fallback mechanism where if the input cannot be parsed in the new parser it then attempts to use the old parser. If an input is parsed in the old parser but not the new parser then a warning log is emitted.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Move and export BuildReceiverIntegrations
Signed-off-by: Alex Weaver <weaver.alex.d@gmail.com>
---------
Signed-off-by: Alex Weaver <weaver.alex.d@gmail.com>
* Refactor: Move `inTimeIntervals` from `notify` to `timeinterval`
There's absolutely no change of functionality here and I've expanded coverage for similar logic in both places.
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Add receiver name as a label to notify metrics
This commit adds in a second label to the notify family of metrics
(e.g. numTotalFailedNotifications) - the receiver name. This allows
disambiguating which receiver is failing when one has many receivers
with the same integration type
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
* Gate receiver names behind a feature flag
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
---------
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Log a warning when repeat_interval is less than group_interval
This commit updates Alertmanager to log a warning when
repeat_interval is less than group_interval for an individual route.
When repeat_interval is less than group_interval, the earliest
a notification can be sent again is the next time the aggregation
group is flushed, and this happens at each group_interval.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* Add msteams
Signed-off-by: Jack Zhang <jack4zhang@gmail.com>
---------
Signed-off-by: Jack Zhang <jack4zhang@gmail.com>
Signed-off-by: Jack <jack4zhang@gmail.com>
* Refactor nflog configuration options to make it similar to Silences.
The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval.
To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and `run` to be similar to the silences.
Cisco's Webex has been one of the most requested notifiers on Grafana for a while now, please see: https://github.com/grafana/grafana/issues/11750#issue-318358659
Given it's straightforward implementation, low maintance overhead and request demand, I think it's worth including this directly in the Alertmanager.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
No need to spawn a goroutine, nor wait for a channel.
Let's just put everything in a single select call.
Signed-off-by: Xavier Nicollet <xnicollet@gmail.com>
* Alert metric reports different results to what the user sees via API
Fixes#1439 and #2619.
The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
The function value and parameters of a defer statement are immediately
evaluated, so this "disp" value is always nil, and calling Stop() on a nil
dispatcher is a no-op, so this does nothing, but wrapping it in a closure
that refers to "disp" fixes it.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add CLI args for snapshot intervals
Signed-off-by: sed-i <82407168+sed-i@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: sed-i <82407168+sed-i@users.noreply.github.com>
* use same flag for silences and nflogs intervals
Signed-off-by: sed-i <82407168+sed-i@users.noreply.github.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
* add active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix unittests for active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update dispatch/route.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* split the stage for active and mute intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Adds doc for a helper function
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix code after commit suggestions
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Making mute_time_interval and time_intervals can coexist in the config
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* docs: configuration's doc has been updated about time intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update config/config.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* updates configuration readme to improve active time description
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* merge deprecated mute_time_intervals and time_intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update cmd/alertmanager/main.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update cmd/alertmanager/main.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fmt main.go
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix lint error
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Document that matchers are ANDed together
Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Remove extra parentheticals
Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* config: root route should have empty matchers
Unmarshal should validate that the root route does
not contain any matchers. Prior to this change,
only the deprecated match structures were checked.
Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* chore: Let git ignore temporary files for ui/app
Signed-off-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* adding max_alerts parameter to slack webhook config
correcting the logic to trucate fields instead of dropping alerts in the slack integration
Signed-off-by: Prashant Balachandran <pnair@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* *: bump to Go 1.17 (#2792)
* *: bump to Go 1.17
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: fix yamllint errors
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Automate CSS-inlining for default HTML email template (#2798)
* Automate CSS-inlining for default HTML email template
The original HTML email template was added in `template/email.html`.
It looks like the CSS was manually inlined. Most likely using the
premailer.dialect.ca web form, which is mentioned in the README for
the Mailgun transactional-email-templates project. The resulting HTML
with inlined CSS was then copied into `template/default.tmpl`. This
has resulted in `email.html` and `default.tmpl` diverging at times.
This commit adds build automation to inline the CSS automatically
using [juice][1]. The Go template containing the resulting HTML has
been moved into its own file to avoid the script that performs the CSS
inlining having to parse the `default.tmpl` file to insert it there.
Fixes#1939.
[1]: https://www.npmjs.com/package/juice
Signed-off-by: Brad Ison <bison@xvdf.io>
* Update asset/assets_vfsdata.go
Signed-off-by: Brad Ison <bison@xvdf.io>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* go.{mod,sum}: update Go dependencies
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* amtool to support http_config to access alertmanager (#2764)
* Support http_config for amtool
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* notify/sns: detect FIFO topic based on the rendered value
Since the TopicARN field is a template string, it's safer to check for
the ".fifo" suffix in the rendered string.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* config: delegate Sigv4 validation to the inner type
This change also adds unit tests for SNS configuration.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix unittests
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix comment about active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix another comment about active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Fix typo in documentation
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: clyang82 <chuyang@redhat.com>
Co-authored-by: Mac Chaffee <me@macchaffee.com>
Co-authored-by: Philip Gough <philip.p.gough@gmail.com>
Co-authored-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Co-authored-by: Prashant Balachandran <pnair@redhat.com>
Co-authored-by: Simon Pasquier <pasquier.simon@gmail.com>
Co-authored-by: Brad Ison <brad.ison@redhat.com>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
* Add feature flag to enable discovery and use of public IPaddr for clustering.
Before this change, Alertmanager would refuse to startup if using a
advertise address binding to any address (0.0.0.0), and the host only
had an interface with a public IP address. After this change we feature
flag permitting the use of a discovered public address for cluster
gossiping.
Signed-off-by: Devin Trejo <dtrejo@palantir.com>
* Enable support for custom callbacks as part of maintenance
This enables support for custom Maintenance callbacks as part of the periodic maintenance of silences and notification logs.
Effectively a no-op for the Alertmanager but allows downstream implementation to inject custom logic as part of it.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Add tests
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Fix tests and remove whitespace
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Address review comments
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* run go fmt
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Fix import ordering
Signed-off-by: gotjosh <josue.abreu@gmail.com>