* try a more complicated but clearer approach explicitly returning a no-auth stmp.Auth when no username is supplied in config
Signed-off-by: Jo Walsh <jowalsh@bgs.ac.uk>
* fix test to expect no error from auth if username is not supplied
Signed-off-by: Jo Walsh <jowalsh@bgs.ac.uk>
* clean up some formatting errors in surplus comments
Signed-off-by: Jo Walsh <jowalsh@bgs.ac.uk>
* keep noAuth / loginAuth functions all together
Signed-off-by: Jo Walsh <jowalsh@bgs.ac.uk>
* Address latest comments
Co-Authored-By: Jo Walsh <jowalsh@bgs.ac.uk>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Add version tracking of silences states. Adding a silence to the state
increments the version. If the version hasn't changed since the last
time an alert was checked for being silenced, we only have to verify
that the relevant silences are still active rather than checking the
alert against all silences.
Signed-off-by: beorn7 <beorn@soundcloud.com>
This encapsulates the logic of querying and marking silenced
alerts. It removes the code duplication flagged earlier.
I removed the error returned by the setAlertStatus function as we were
only logging it, and that's already done anyway when the error is
received from the `silence.Query` call (now in the `Mutes` method).
Signed-off-by: beorn7 <beorn@soundcloud.com>
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
* Support adding custom fields to VictorOps notifications
* Response to feedback
* Added logic to validate victorops custom fields to config load time
* Cleanup victorops notifier of logic duplicated in config check
* rebase and further cleanup from feedback
* another grammer fix
Signed-off-by: Jason Roberts <jroberts@drud.com>
When creating a `NewClient`, pass only the hostname
as value for the `host` parameter, instead of `n.conf.Smarthost`,
which is hostname port.
This is the same as `smtp.Dial` does, which is called in
the else-branch.
Signed-off-by: Claudio Kressibucher <ckressibucher@graphicworks.ch>
alertmanager_notifications_total is increased only on
successful notifications, and not on any attempted
notification as the current description points
Signed-off-by: Chris Mark <chrismarkou92@gmail.com>
* Add support for images and links in the PagerDuty notification config
This only works when using the v2 API. It supports template values for all fields.
Signed-off-by: Sean Houghton <sean.houghton@activision.com>
Errcheck [1] enforces error handling accross all go files. Functions can
be excluded via `scripts/errcheck_excludes.txt`.
This patch adds errcheck to the `test` Make target.
[1] https://github.com/kisielk/errcheck
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
In moving from a plain string to url.URL, we
incorrectly were setting the query string via the
path. The `?` signaling the start of the query
string would then be escaped when the URL was
turned into a string.
Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>
This adds compatiblity with PagerDuty's Event rules feature, allowing resolve events to be routed based on attributes
Fixes#1440
Signed-off-by: Mike Bryant <m@ocado.com>
* config: validate URLs at config load time
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Brian and Lucas comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Shallow copy of URL instead of reparsing it
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Unshadow net/url package
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Make a deep-copy of URL struct
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
- `tmplText` and `tmplHTML` are using a monad-style error handling [1].
This reduces the verbosity of the error logic, but introduces the risk
of forgetting the final error check. This patch does not remove this
coding-style, but ensures proper error checking in the Email and
PagerDuty notifier.
- Ensure to handle errors returned by `multipartWriter.Close()` and
`wc.Write(buffer.Bytes())` in `Email.Notify()`.
[1] https://www.innoq.com/en/blog/golang-errors-monads/
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
* notify: notify resolved alerts properly
The PR #1205 while fixing an existing issue introduced another bug when
the send_resolved flag of the integration is set to true.
With send_resolved set to false, the semantics remain the same:
AlertManager generates a notification when new firing alerts are added
to the alert group. The notification only carries firing alerts.
With send_resolved set to true, AlertManager generates a notification
when new firing or resolved alerts are added to the alert group. The
notification carries both the firing and resolved notifications.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Improve notification instrumentation
- Add notificationLatencySeconds histogram to
debug duplicate messages. This can help rule out
if duplicate messages are being caused by
excessive latency when sending a notification.
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
* Wait for the gossip to settle before sending notifications
See #1209 for details.
As an heuristic for mesh readyness, try to see if
the mesh looks stable (the number of peers isn't changing too much).
This implementation always mark the altermanager as ready after a maximum of 60s.
This adds one new flags to control this behavior:
```
--cluster.settle-timeout=60s mesh settling timeout. Do not wait more than this duration on startup.
```
It also adds `/-/ready` which always return 200 (in order to make it clear
that we are ready as soon as we can receive requests).
The mesh status is exposed in `/api/v1/status` and visible on `/#/status`.
* cluster: fix typos and base interval on gossipInterval
See #1223, looks like OpsGenie now sometimes returns a 422 when you
don't specify a team. This change cleans up the JSON output and
add a few unit tests.
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.
This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
* WECHAT support by ybyang2/berlinsaint
* correct the whitespace
* add some TestFile and modify some naming errors by ybyang2/berlinsaint
* modify wechat retry test expect
* template error
* add newline
Signed-off-by: yb_home <berlinsaint@126.com>
* fmt some pr code
* use the @stuartnelson3 the test-ci-wechat bingdata.go
* notify go add wechat
The PagerDuty Events API (v1), used by integrations with monitoring tools, will continue to be supported. There are currently no plans to deprecate, end support for or sunset it.
The end-of-support notice cited in the log message removed applies only to the *REST API* version 1, which PagerDuty will no longer support as of February 2018, but which Prometheus does not use.
* Template Source field in pagerduty payload
As a sane default we link to alertmanager, but
leave templating available to the user if
something suits their system better.
* Don't send parts with empty templates.
* Add a MIME-Version: 1.0 header.
* Place text/html part last, as parts are supposed to be in increasing preference order.
* added email notification text content support (configuration text)
added default email notification text content default value empty
converted email notification from html to multipart/alternatives with support to both html (first) and text (last)
ignore intellij IDE .idea working folder
* removed specific editor .gitignore entries
* renamed TEXT to Text as it's not an acronym
added TODO note to refactor multipart code to use standard go library
* refactored to use standard go mime/multipart library for text and html parts and bounderies
* use multipart createPart returned writer
added error handling while creating parts
removed unnecessary quotes from boundry string
* removed unnecessary comments
* Support for custom SMTP hello string
Some MTAs insist that they be greeted with a fully qualified domain
name. The default provided by the net/smtp library, "HELLO localhost",
is not sufficient and will result in rejected messages.
This changeset adds a new configuration option that allows the
alertmanager to do its job in such an environment.
* Test SMTPHello parsing
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
Resolved alerts, even when filtered, have to end up in the
SetNotifiesStage, otherwise when an alert fires again it is ambiguous
whether it was resolved in between or not.
fixes#523
The retry flag allows an integration to specify whether a retry can
potentially be solved or if the error is likely not going to recover.
For example invalid authentication is likely a wrong configuration and
therefore a retry would not make sense, while a server error is likely
a temporary problem and can potentially be solved on the next retry.
This commit replaces the previous NotifyInfo provider with the new
nflog package. It needs adjustments in the behavior of the deduping
stage.
The nflog stores notification digests per receiver per alert aggregation
group rather than one entry for alert per receiver. This drastically
reduces the number of entries and removes interference
across aggregation groups.
This commit directly adds the nflogpb.Receiver object to stage
objects at stage creation time. Hence, we no longer rely on a value from
within the context.
This string value is initially used to store a receiver name. It is
later overloaded with a unique string identifier of <name, integration,
index>.
This renaming is in preparation to separate the two and use the Receiver
object of the nflogpb package.
This commit implements a wait period before actually dispatching
notifications. The backoff linearly depends on the UID order of
participating peers.
This gives the gossip state time to catch up and avoids duplicate
notifications while ensuring that every peer notifies eventually.
This commit removes the dependency on model.Silence for the internal
Silence type, uses UUIDs instead of uint64s and clarifies invariants
around timestamp handling.
The created_at timestamp is removed for the time being.
Add default VictorOpsAPIURL
Add VictorOps default config
Add VictorOpsConfig struct in notifiers
Add new template tags for victorops
Add notifications logic for victorops
Compiled template tags with make assets
Remove common labels from entity_id template
Set messageType default value to CRITICAL
Recovery messageType is not configurable anymore. Firing state only allows specific keys
Make assets
Using log.Debugf
EntityID should not be configureable
Remove entity_id from template
Use GroupKey(ctx) as entity_id
Improve debug logging
Fix type of entity_id