* chore!: adopt log/slog, drop go-kit/log
The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:
https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434
This commit includes several changes:
- bump exporter-tookit to v0.13.1 for log/slog support
- updates golangci-lint deprecated configs
- enables sloglint linter
- removes old go-kit/log linter configs
- introduce some `if logger == nil { $newLogger }` additions to prevent
nil references
- converts cluster membership config to use a stdlib compatible slog
adapter, rather than creating a custom io.Writer for use as the
membership `logOutput` config
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* chore: address PR feedback
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
---------
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
This commit removes the Id from the method silences.Set(*pb.Silence)
as it is redundant. The Id is still set even when creating a silence
fails. This will be fixed in a later change.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Mark muted groups
This commit updates TimeMuteStage and TimeActiveStage to mark groups
as muted when its alerts are muted by an active or mute time interval,
and remove any existing markers when outside all active and mute
time intervals.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove unlock to defer
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit rewrites the existing TestTimeActiveStage unit tests
to have complete isolation between test cases. Before this change,
each test case affected the state of its subsequent tests.
The motivation behind this change is to make it easier to assert
that alerts have been marked as muted.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* feat: add counter to track alerts dropped outside of time_intervals
Addresses: #3512
This adds a new counter metric `alertmanager_alerts_supressed_total`
that is incremented by `len(alerts)` when an alert is suppressed for
being outside of a time_interval, ie inside of a mute_time_intervals or
outside of an active_time_intervals.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* test: add time interval suppression metric checks for notify
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* test: fix failure message log values in notifier
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* ref: address PR feedback for #3565
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* fix: track suppressed notifications metric for inhibit/silence
Based on PR feedback:
https://github.com/prometheus/alertmanager/pull/3565/files#r1393068026
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* fix: broken notifier tests
- fixed metric count check to properly check the diff between
input/output notifications from the suppression to compare to suppression
metric, was previously inverted to compare to how many notifications it
suppressed.
- stopped using `Reset()` to compare collection counts between the
multiple stages that are executed in `TestMuteStageWithSilences()`.
the intent was to compare a clean metric collection after each stage
execution, but the final stage where all silences are lifted results in
no metric being created in the test, causing `prom_testutil.ToFloat64()`
to panic. changed to separate vars to check counts between each stage,
with care to consider prior counts.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* rename metric and add constants
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: Walther Lee <walther.lee@reddit.com>
Co-authored-by: Walther Lee <walther.lee@reddit.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
* Refactor: Move `inTimeIntervals` from `notify` to `timeinterval`
There's absolutely no change of functionality here and I've expanded coverage for similar logic in both places.
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Add receiver name as a label to notify metrics
This commit adds in a second label to the notify family of metrics
(e.g. numTotalFailedNotifications) - the receiver name. This allows
disambiguating which receiver is failing when one has many receivers
with the same integration type
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
* Gate receiver names behind a feature flag
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
---------
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
It seems useless to keep the notifications in the nflog for longer than
twice the repeat interval. This should help reduce memory usage of
clustered alertmanagers.
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
* Add explicit UTC to time interval tests
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Add timezone support to time intervals
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Update time interval documentation with time zone info
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Refactor notification tests to test timezone support
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Make use of Local more clear
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Fix documentation about timezone support.
Makes it clear that the default is UTC, but others are supported.
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Remove commented/unused function
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Fix tests using incorrect timezones
Previously tests were using time zone names that were unsupported by the
RFC822 parser. This switches the tests to use RFC822Z and specifies the
zones by number.
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Add a few more timezone test cases
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Remove unnecessary if/else branch
Co-authored-by: Sylvain Rabot <sylvain@abstraction.fr>
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Rename timezone to location for consistency with Go stdlib
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Make Windows timezone error more specific
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Update docs to use 'location'
Signed-off-by: Ben Ridley <benridley29@gmail.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Rabot <sylvain@abstraction.fr>
Signed-off-by: Ben Ridley <benridley29@gmail.com>
Signed-off-by: Ben Ridley <benridley29@gmail.com>
Co-authored-by: Sylvain Rabot <sylvain@abstraction.fr>
This accurately reflects what the function _actually_ does. If no active silences IDs are provided and the list of inhibitions we have is already empty the alert is actually set to Active. Took me a while to realise this as I was understanding how do we populate the alert list.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* add active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix unittests for active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update dispatch/route.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* split the stage for active and mute intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Adds doc for a helper function
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix code after commit suggestions
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Making mute_time_interval and time_intervals can coexist in the config
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* docs: configuration's doc has been updated about time intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update config/config.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* updates configuration readme to improve active time description
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* merge deprecated mute_time_intervals and time_intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update cmd/alertmanager/main.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update cmd/alertmanager/main.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fmt main.go
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix lint error
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Document that matchers are ANDed together
Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Remove extra parentheticals
Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* config: root route should have empty matchers
Unmarshal should validate that the root route does
not contain any matchers. Prior to this change,
only the deprecated match structures were checked.
Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* chore: Let git ignore temporary files for ui/app
Signed-off-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* adding max_alerts parameter to slack webhook config
correcting the logic to trucate fields instead of dropping alerts in the slack integration
Signed-off-by: Prashant Balachandran <pnair@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* *: bump to Go 1.17 (#2792)
* *: bump to Go 1.17
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: fix yamllint errors
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Automate CSS-inlining for default HTML email template (#2798)
* Automate CSS-inlining for default HTML email template
The original HTML email template was added in `template/email.html`.
It looks like the CSS was manually inlined. Most likely using the
premailer.dialect.ca web form, which is mentioned in the README for
the Mailgun transactional-email-templates project. The resulting HTML
with inlined CSS was then copied into `template/default.tmpl`. This
has resulted in `email.html` and `default.tmpl` diverging at times.
This commit adds build automation to inline the CSS automatically
using [juice][1]. The Go template containing the resulting HTML has
been moved into its own file to avoid the script that performs the CSS
inlining having to parse the `default.tmpl` file to insert it there.
Fixes#1939.
[1]: https://www.npmjs.com/package/juice
Signed-off-by: Brad Ison <bison@xvdf.io>
* Update asset/assets_vfsdata.go
Signed-off-by: Brad Ison <bison@xvdf.io>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* go.{mod,sum}: update Go dependencies
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* amtool to support http_config to access alertmanager (#2764)
* Support http_config for amtool
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* notify/sns: detect FIFO topic based on the rendered value
Since the TopicARN field is a template string, it's safer to check for
the ".fifo" suffix in the rendered string.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* config: delegate Sigv4 validation to the inner type
This change also adds unit tests for SNS configuration.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix unittests
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix comment about active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix another comment about active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Fix typo in documentation
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: clyang82 <chuyang@redhat.com>
Co-authored-by: Mac Chaffee <me@macchaffee.com>
Co-authored-by: Philip Gough <philip.p.gough@gmail.com>
Co-authored-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Co-authored-by: Prashant Balachandran <pnair@redhat.com>
Co-authored-by: Simon Pasquier <pasquier.simon@gmail.com>
Co-authored-by: Brad Ison <brad.ison@redhat.com>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Previously, if a pending silence existed for an alert, and it later
became active without any silences getting added in the meantime, we
would miss the existence of that newly active silence.
Signed-off-by: beorn7 <beorn@grafana.com>
* notify: don't use the global metrics registry
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Max's comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Instead of keeping all notifiers in the notify package, it splits them
into individual sub-packages. This improves readability and
maintainability of the code.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Add version tracking of silences states. Adding a silence to the state
increments the version. If the version hasn't changed since the last
time an alert was checked for being silenced, we only have to verify
that the relevant silences are still active rather than checking the
alert against all silences.
Signed-off-by: beorn7 <beorn@soundcloud.com>
This encapsulates the logic of querying and marking silenced
alerts. It removes the code duplication flagged earlier.
I removed the error returned by the setAlertStatus function as we were
only logging it, and that's already done anyway when the error is
received from the `silence.Query` call (now in the `Mutes` method).
Signed-off-by: beorn7 <beorn@soundcloud.com>
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
* notify: notify resolved alerts properly
The PR #1205 while fixing an existing issue introduced another bug when
the send_resolved flag of the integration is set to true.
With send_resolved set to false, the semantics remain the same:
AlertManager generates a notification when new firing alerts are added
to the alert group. The notification only carries firing alerts.
With send_resolved set to true, AlertManager generates a notification
when new firing or resolved alerts are added to the alert group. The
notification carries both the firing and resolved notifications.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.
This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.