* chore!: adopt log/slog, drop go-kit/log
The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:
https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434
This commit includes several changes:
- bump exporter-tookit to v0.13.1 for log/slog support
- updates golangci-lint deprecated configs
- enables sloglint linter
- removes old go-kit/log linter configs
- introduce some `if logger == nil { $newLogger }` additions to prevent
nil references
- converts cluster membership config to use a stdlib compatible slog
adapter, rather than creating a custom io.Writer for use as the
membership `logOutput` config
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* chore: address PR feedback
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
---------
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* Feat(discord):
Allow for custom username and avatar URLs to be set in discord notifications.
Add `username` and `avatar_url` to discord configuration, default empty string.
Re-implement #3821
Signed-off-by: Jeff Wong <awole20@gmail.com>
* Test the new fields
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* These are not templeatable strings
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: Jeff Wong <awole20@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Just to ensure this works correclty as expected, I originally thought there was a bug with the shadowing of the `content` varible but there isn't - to avoid further confusion I have followed up on this document left by George: https://github.com/prometheus/alertmanager/pull/3555#discussion_r1398448423
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Before commit [1], the message parse mode value is the same
as bot parse mode. Therefore, we don't need to pass ParseMode
whenever sending message. But after that commit, if we don't
pass SendOpts's ParseMode, the library will use the default
mode which is empty string "".
[1] 864bef4e4d
Signed-off-by: Kien Nguyen <kiennt2609@gmail.com>
* feat(3920): add msteamsv2 receiver
Signed-off-by: Simon Schneider <github@simon-schneider.eu>
* Don't use `fmt.Errorf` when there's no formatting required on `config/config.go`
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Don't use `fmt.Errorf` when there's no formatting required on `config/notifiers.go`
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Remove additional documentation steps
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* add more info to the documentation
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Change documentation links to convey the message better
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: Simon Schneider <github@simon-schneider.eu>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* fix: close SMTP submission correctly to handle errors
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
* lint
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
* comments
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
This commit removes the Id from the method silences.Set(*pb.Silence)
as it is redundant. The Id is still set even when creating a silence
fails. This will be fixed in a later change.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Mark muted groups
This commit updates TimeMuteStage and TimeActiveStage to mark groups
as muted when its alerts are muted by an active or mute time interval,
and remove any existing markers when outside all active and mute
time intervals.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove unlock to defer
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* TimeMuter returns the names of time intervals
This commit updates the TimeMuter interface to also return the names
of the time intervals that muted the alerts.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit rewrites the existing TestTimeActiveStage unit tests
to have complete isolation between test cases. Before this change,
each test case affected the state of its subsequent tests.
The motivation behind this change is to make it easier to assert
that alerts have been marked as muted.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add godot linter
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove extra line from LICENSE
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
Note that this does not stop showing classic metrics, for now
it is up to the scrape config to decide whether to keep those instead or
both.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* feat: add counter to track alerts dropped outside of time_intervals
Addresses: #3512
This adds a new counter metric `alertmanager_alerts_supressed_total`
that is incremented by `len(alerts)` when an alert is suppressed for
being outside of a time_interval, ie inside of a mute_time_intervals or
outside of an active_time_intervals.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* test: add time interval suppression metric checks for notify
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* test: fix failure message log values in notifier
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* ref: address PR feedback for #3565
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* fix: track suppressed notifications metric for inhibit/silence
Based on PR feedback:
https://github.com/prometheus/alertmanager/pull/3565/files#r1393068026
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* fix: broken notifier tests
- fixed metric count check to properly check the diff between
input/output notifications from the suppression to compare to suppression
metric, was previously inverted to compare to how many notifications it
suppressed.
- stopped using `Reset()` to compare collection counts between the
multiple stages that are executed in `TestMuteStageWithSilences()`.
the intent was to compare a clean metric collection after each stage
execution, but the final stage where all silences are lifted results in
no metric being created in the test, causing `prom_testutil.ToFloat64()`
to panic. changed to separate vars to check counts between each stage,
with care to consider prior counts.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* rename metric and add constants
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* feat: implement webhook_url_file for discord
implements #3482
Signed-off-by: Philipp Born <git@pborn.eu>
* feat: implement webhook_url_file for msteams
implements #3536
Signed-off-by: Philipp Born <git@pborn.eu>
---------
Signed-off-by: Philipp Born <git@pborn.eu>
---------
Signed-off-by: Walther Lee <walther.lee@reddit.com>
Co-authored-by: Walther Lee <walther.lee@reddit.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
* Reflect Discord's max length message limits
Signed-off-by: Tomas Kozak <kozak@talko.cz>
* Fix log key name
Signed-off-by: Tomas Kozak <kozak@talko.cz>
---------
Signed-off-by: Tomas Kozak <kozak@talko.cz>
This commit adds debug logs to MuteStage that logs when an alert
is muted. This can help operators root cause missing notifications
when alerts are silenced by mistake or purpose but then forgotten
about.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit updates Alertmanager to add a duration to the notify
success message. It complements the existing histogram to offer
fine-grained information about notification attempts. This can be
useful when debuggin duplicate notifications, for example, when
the duration exceeds peer_timeout.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Refactor: Move `inTimeIntervals` from `notify` to `timeinterval`
There's absolutely no change of functionality here and I've expanded coverage for similar logic in both places.
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Add receiver name as a label to notify metrics
This commit adds in a second label to the notify family of metrics
(e.g. numTotalFailedNotifications) - the receiver name. This allows
disambiguating which receiver is failing when one has many receivers
with the same integration type
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
* Gate receiver names behind a feature flag
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
---------
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
As described in the "More error types" section below, Slack API can return
errors with a 200 response code:
https://slack.dev/node-slack-sdk/web-api#handle-errors
This change adds parsing of API response to extract error messages.
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
This commit updates notify.go to log the GroupKey and fingerprints
of an alert at the debug level, and just the GroupKey at the
warning level should the notify attempt fail.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add msteams
Signed-off-by: Jack Zhang <jack4zhang@gmail.com>
---------
Signed-off-by: Jack Zhang <jack4zhang@gmail.com>
Signed-off-by: Jack <jack4zhang@gmail.com>
* add reason code to slack notifier
this uses the new error with reason to determine based on status code what the reason is for the slack integration
partial #3231
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Add some tests
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Handle the error
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>