* fix: close SMTP submission correctly to handle errors
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
* lint
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
* comments
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
This commit removes the Id from the method silences.Set(*pb.Silence)
as it is redundant. The Id is still set even when creating a silence
fails. This will be fixed in a later change.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Mark muted groups
This commit updates TimeMuteStage and TimeActiveStage to mark groups
as muted when its alerts are muted by an active or mute time interval,
and remove any existing markers when outside all active and mute
time intervals.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove unlock to defer
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* TimeMuter returns the names of time intervals
This commit updates the TimeMuter interface to also return the names
of the time intervals that muted the alerts.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit rewrites the existing TestTimeActiveStage unit tests
to have complete isolation between test cases. Before this change,
each test case affected the state of its subsequent tests.
The motivation behind this change is to make it easier to assert
that alerts have been marked as muted.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add godot linter
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove extra line from LICENSE
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
Note that this does not stop showing classic metrics, for now
it is up to the scrape config to decide whether to keep those instead or
both.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* feat: add counter to track alerts dropped outside of time_intervals
Addresses: #3512
This adds a new counter metric `alertmanager_alerts_supressed_total`
that is incremented by `len(alerts)` when an alert is suppressed for
being outside of a time_interval, ie inside of a mute_time_intervals or
outside of an active_time_intervals.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* test: add time interval suppression metric checks for notify
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* test: fix failure message log values in notifier
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* ref: address PR feedback for #3565
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* fix: track suppressed notifications metric for inhibit/silence
Based on PR feedback:
https://github.com/prometheus/alertmanager/pull/3565/files#r1393068026
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* fix: broken notifier tests
- fixed metric count check to properly check the diff between
input/output notifications from the suppression to compare to suppression
metric, was previously inverted to compare to how many notifications it
suppressed.
- stopped using `Reset()` to compare collection counts between the
multiple stages that are executed in `TestMuteStageWithSilences()`.
the intent was to compare a clean metric collection after each stage
execution, but the final stage where all silences are lifted results in
no metric being created in the test, causing `prom_testutil.ToFloat64()`
to panic. changed to separate vars to check counts between each stage,
with care to consider prior counts.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* rename metric and add constants
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
* feat: implement webhook_url_file for discord
implements #3482
Signed-off-by: Philipp Born <git@pborn.eu>
* feat: implement webhook_url_file for msteams
implements #3536
Signed-off-by: Philipp Born <git@pborn.eu>
---------
Signed-off-by: Philipp Born <git@pborn.eu>
---------
Signed-off-by: Walther Lee <walther.lee@reddit.com>
Co-authored-by: Walther Lee <walther.lee@reddit.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
* Reflect Discord's max length message limits
Signed-off-by: Tomas Kozak <kozak@talko.cz>
* Fix log key name
Signed-off-by: Tomas Kozak <kozak@talko.cz>
---------
Signed-off-by: Tomas Kozak <kozak@talko.cz>
This commit adds debug logs to MuteStage that logs when an alert
is muted. This can help operators root cause missing notifications
when alerts are silenced by mistake or purpose but then forgotten
about.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit updates Alertmanager to add a duration to the notify
success message. It complements the existing histogram to offer
fine-grained information about notification attempts. This can be
useful when debuggin duplicate notifications, for example, when
the duration exceeds peer_timeout.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Refactor: Move `inTimeIntervals` from `notify` to `timeinterval`
There's absolutely no change of functionality here and I've expanded coverage for similar logic in both places.
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Add receiver name as a label to notify metrics
This commit adds in a second label to the notify family of metrics
(e.g. numTotalFailedNotifications) - the receiver name. This allows
disambiguating which receiver is failing when one has many receivers
with the same integration type
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
* Gate receiver names behind a feature flag
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
---------
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
As described in the "More error types" section below, Slack API can return
errors with a 200 response code:
https://slack.dev/node-slack-sdk/web-api#handle-errors
This change adds parsing of API response to extract error messages.
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
This commit updates notify.go to log the GroupKey and fingerprints
of an alert at the debug level, and just the GroupKey at the
warning level should the notify attempt fail.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add msteams
Signed-off-by: Jack Zhang <jack4zhang@gmail.com>
---------
Signed-off-by: Jack Zhang <jack4zhang@gmail.com>
Signed-off-by: Jack <jack4zhang@gmail.com>
* add reason code to slack notifier
this uses the new error with reason to determine based on status code what the reason is for the slack integration
partial #3231
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Add some tests
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Handle the error
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* support loading webhook URL from a file
/cc #2498
Signed-off-by: Simon Rozet <me@simonrozet.com>
* notify/webhook: add test for reading url from file
Signed-off-by: Simon Rozet <me@simonrozet.com>
* notify/pushover: add tests for reading secrets from files
Signed-off-by: Simon Rozet <me@simonrozet.com>
---------
Signed-off-by: Simon Rozet <me@simonrozet.com>
* support loading pushover secrets from files
Add the user_key_file and token_file keys to the pushover config.
/cc https://github.com/prometheus/alertmanager/issues/2498
Signed-off-by: Simon Rozet <me@simonrozet.com>