- Added support for the file in both the global and the lower level
- Tried to follow configuration patterns I saw in prometheus
- The slack file is read on every request as mentioned in the prometheus issue to enable seamless switches
https://github.com/prometheus/alertmanager/issues/2498
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
WaitReady is a blocking call and so should accept a Context in order to
be responsive to cancellation of the notification pipeline for any reason.
Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
A Peer as defined by the `cluster` package represents the node in the
cluster. It is used in other packages to know the status of all of the
members or how long should we wait to know if a notification has already fired.
In Cortex, we'd like to implement a slightly different way of
clustering (using gRPC for communication and a
hash ring for node discovery).
This is a small change to support that by changing the consumer of other
packages to an interface.
Silences and Notification channels don't need an interface as they take
a `func([]byte) error` as a parameter.
Signed-off-by: gotjosh <josue@grafana.com>
PagerDuty Event API v2 [1] requires images to have an `src` property, and links
to have an `href` property.
This commit filters out images and links that don't satisfy those conditions,
to avoid getting an HTTP 400 error in response.
This also adds flexibilty when using templates to configure images and links,
as it's now possible to omit images or links by letting the template return an
empty string for the `src` or `href` property, respectively.
[1]: https://developer.pagerduty.com/docs/events-api-v2/trigger-events/#context-properties
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
* notify: add markdown support for wechat
Signed-off-by: Jason Cooper <master@deamwork.com>
* docs: update WeChat receiver configuration document
Signed-off-by: Jason Cooper <master@deamwork.com>
* fix: check WeChat msgType, apply default if not present
Signed-off-by: Jason Cooper <master@deamwork.com>
* chore: remove unnecessary comment
Signed-off-by: Jason Cooper <master@deamwork.com>
* fix: simplify msgType process
Signed-off-by: Jason Cooper <master@deamwork.com>
* docs: wechat configs document update
Signed-off-by: Jason Cooper <master@deamwork.com>
* fix: apply error message suggestions
Signed-off-by: Jason Cooper <master@deamwork.com>
* test: add test for regex
Signed-off-by: Jason Cooper <master@deamwork.com>
* fix: wechat message safe param
Signed-off-by: Jason Cooper <master@deamwork.com>
By default the library implementing the back-off timer stops the timer
after 15 minutes. Since the code never checked the value returned by the
ticker, notification retries were executed without delay after the 15
minutes had elapsed (e.g. for `group_interval` greater than 15m).
This change ensures that the back-off timer never expires.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* notify: improve logs on notification errors
Alertmanager can experience occasional failures when sending
notifications to an external service. If the operation succeeds after
some retry, the 'alertmanager_notifications_failed_total' metric
increases but nothing is logged (unless running with log.level=debug).
Hence an operator might receive an alert about notification failures but
wouldn't know which integration was failing.
With this change, notification failures are logged at the warning level.
To avoid log flooding, similar failures on retries aren't logged.
Additional information on the failing integration has also been added.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Log notify success at info level if it's a retry
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Allow limiting maximum number of alerts in webhook
The webhook notifier is the only notifier that does not allow templating
on the Alertmanager side. Users who encounter occasional alert storms
(10ks of alerts going off at once for the same group) have reported
webhook receiver systems not being able to cope with the load caused by
the resulting large webhook notifier messages (the alerting rules also
contained large annotations that can't be stripped away due to lack of
templating). Reducing group size also wasn't an option, but this change
proposes to allow truncating the list of alerts sent in the webhook body
to a provided maximum length. This assumes that e.g. if a group receives
20k alerts, you really are fine only receiving 10k because you wouldn't
be able to check them all anyway.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Change max_alerts to uint32
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add truncatedAlerts field to webhook message
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fix JSON struct tag
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* notify: don't use the global metrics registry
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Max's comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>