* Add msteams
Signed-off-by: Jack Zhang <jack4zhang@gmail.com>
---------
Signed-off-by: Jack Zhang <jack4zhang@gmail.com>
Signed-off-by: Jack <jack4zhang@gmail.com>
It seems useless to keep the notifications in the nflog for longer than
twice the repeat interval. This should help reduce memory usage of
clustered alertmanagers.
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
This is less confusing to read if the receiver stages are constructed in the
same order as they are used in the pipeline below.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* add active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix unittests for active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update dispatch/route.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* split the stage for active and mute intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Adds doc for a helper function
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update notify/notify.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix code after commit suggestions
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Making mute_time_interval and time_intervals can coexist in the config
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* docs: configuration's doc has been updated about time intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update config/config.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* updates configuration readme to improve active time description
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* merge deprecated mute_time_intervals and time_intervals
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update cmd/alertmanager/main.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update cmd/alertmanager/main.go
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fmt main.go
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix lint error
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Document that matchers are ANDed together
Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Remove extra parentheticals
Signed-off-by: Mac Chaffee <me@macchaffee.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* config: root route should have empty matchers
Unmarshal should validate that the root route does
not contain any matchers. Prior to this change,
only the deprecated match structures were checked.
Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* chore: Let git ignore temporary files for ui/app
Signed-off-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* adding max_alerts parameter to slack webhook config
correcting the logic to trucate fields instead of dropping alerts in the slack integration
Signed-off-by: Prashant Balachandran <pnair@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* *: bump to Go 1.17 (#2792)
* *: bump to Go 1.17
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: fix yamllint errors
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Automate CSS-inlining for default HTML email template (#2798)
* Automate CSS-inlining for default HTML email template
The original HTML email template was added in `template/email.html`.
It looks like the CSS was manually inlined. Most likely using the
premailer.dialect.ca web form, which is mentioned in the README for
the Mailgun transactional-email-templates project. The resulting HTML
with inlined CSS was then copied into `template/default.tmpl`. This
has resulted in `email.html` and `default.tmpl` diverging at times.
This commit adds build automation to inline the CSS automatically
using [juice][1]. The Go template containing the resulting HTML has
been moved into its own file to avoid the script that performs the CSS
inlining having to parse the `default.tmpl` file to insert it there.
Fixes#1939.
[1]: https://www.npmjs.com/package/juice
Signed-off-by: Brad Ison <bison@xvdf.io>
* Update asset/assets_vfsdata.go
Signed-off-by: Brad Ison <bison@xvdf.io>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* go.{mod,sum}: update Go dependencies
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* amtool to support http_config to access alertmanager (#2764)
* Support http_config for amtool
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: clyang82 <chuyang@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* notify/sns: detect FIFO topic based on the rendered value
Since the TopicARN field is a template string, it's safer to check for
the ".fifo" suffix in the rendered string.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* config: delegate Sigv4 validation to the inner type
This change also adds unit tests for SNS configuration.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix unittests
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix comment about active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* fix another comment about active time interval
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Fix typo in documentation
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
* Update docs/configuration.md
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Sinuhe Tellez <dubyte@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: clyang82 <chuyang@redhat.com>
Co-authored-by: Mac Chaffee <me@macchaffee.com>
Co-authored-by: Philip Gough <philip.p.gough@gmail.com>
Co-authored-by: nekketsuuu <nekketsuuu@users.noreply.github.com>
Co-authored-by: Prashant Balachandran <pnair@redhat.com>
Co-authored-by: Simon Pasquier <pasquier.simon@gmail.com>
Co-authored-by: Brad Ison <brad.ison@redhat.com>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
WaitReady is a blocking call and so should accept a Context in order to
be responsive to cancellation of the notification pipeline for any reason.
Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
A Peer as defined by the `cluster` package represents the node in the
cluster. It is used in other packages to know the status of all of the
members or how long should we wait to know if a notification has already fired.
In Cortex, we'd like to implement a slightly different way of
clustering (using gRPC for communication and a
hash ring for node discovery).
This is a small change to support that by changing the consumer of other
packages to an interface.
Silences and Notification channels don't need an interface as they take
a `func([]byte) error` as a parameter.
Signed-off-by: gotjosh <josue@grafana.com>
By default the library implementing the back-off timer stops the timer
after 15 minutes. Since the code never checked the value returned by the
ticker, notification retries were executed without delay after the 15
minutes had elapsed (e.g. for `group_interval` greater than 15m).
This change ensures that the back-off timer never expires.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* notify: improve logs on notification errors
Alertmanager can experience occasional failures when sending
notifications to an external service. If the operation succeeds after
some retry, the 'alertmanager_notifications_failed_total' metric
increases but nothing is logged (unless running with log.level=debug).
Hence an operator might receive an alert about notification failures but
wouldn't know which integration was failing.
With this change, notification failures are logged at the warning level.
To avoid log flooding, similar failures on retries aren't logged.
Additional information on the failing integration has also been added.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Log notify success at info level if it's a retry
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* notify: don't use the global metrics registry
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Max's comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* notify/email: wrap all errors for easier debugging
In addition, this commit passes the current context to the TCP dialer
and it doesn't log any QUIT errors if the email delivery wasn't
successful.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix typo
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Instead of keeping all notifiers in the notify package, it splits them
into individual sub-packages. This improves readability and
maintainability of the code.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
This encapsulates the logic of querying and marking silenced
alerts. It removes the code duplication flagged earlier.
I removed the error returned by the setAlertStatus function as we were
only logging it, and that's already done anyway when the error is
received from the `silence.Query` call (now in the `Mutes` method).
Signed-off-by: beorn7 <beorn@soundcloud.com>
alertmanager_notifications_total is increased only on
successful notifications, and not on any attempted
notification as the current description points
Signed-off-by: Chris Mark <chrismarkou92@gmail.com>
* notify: notify resolved alerts properly
The PR #1205 while fixing an existing issue introduced another bug when
the send_resolved flag of the integration is set to true.
With send_resolved set to false, the semantics remain the same:
AlertManager generates a notification when new firing alerts are added
to the alert group. The notification only carries firing alerts.
With send_resolved set to true, AlertManager generates a notification
when new firing or resolved alerts are added to the alert group. The
notification carries both the firing and resolved notifications.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Improve notification instrumentation
- Add notificationLatencySeconds histogram to
debug duplicate messages. This can help rule out
if duplicate messages are being caused by
excessive latency when sending a notification.
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
* Wait for the gossip to settle before sending notifications
See #1209 for details.
As an heuristic for mesh readyness, try to see if
the mesh looks stable (the number of peers isn't changing too much).
This implementation always mark the altermanager as ready after a maximum of 60s.
This adds one new flags to control this behavior:
```
--cluster.settle-timeout=60s mesh settling timeout. Do not wait more than this duration on startup.
```
It also adds `/-/ready` which always return 200 (in order to make it clear
that we are ready as soon as we can receive requests).
The mesh status is exposed in `/api/v1/status` and visible on `/#/status`.
* cluster: fix typos and base interval on gossipInterval
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.
This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.