In moving from a plain string to url.URL, we
incorrectly were setting the query string via the
path. The `?` signaling the start of the query
string would then be escaped when the URL was
turned into a string.
Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>
... rather than in the Subscribe method. Currently the cleanup for a
given Alert subscription is done in a blocking goroutine, started in
the Subscribe method.
This simplifies it by moving the cleanup to the GC.
Additionally it simplifies the subscribe method by setting up the
buffered channel big enough to fill it up with all pending alerts
preventing the necessity to start a goroutine in Subscribe at all.
Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
TestAlertsSubscribePutStarvation tests starvation of `iterator.Close` and
`alerts.Put`. Both `Subscribe` and `Put` use the Alerts.mtx lock. `Subscribe`
needs it to subscribe and more importantly unsubscribe `Alerts.listeners`.
`Put` uses the lock to add additional alerts and iterate the `Alerts.listeners`
map. If the channel of a listener is at its limit, `alerts.Lock` is blocked,
whereby a listener can not unsubscribe as the lock is hold by `alerts.Lock`.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
This adds compatiblity with PagerDuty's Event rules feature, allowing resolve events to be routed based on attributes
Fixes#1440
Signed-off-by: Mike Bryant <m@ocado.com>
`honnef.co/go/tools/cmd/staticcheck` complains with
`config/config_test.go:260:32: regular expression does not contain any
meta characters (SA6004)`. Instead of using a RegEx this patch simply
switches to using Golangs `strings.Count` function.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
* config: validate URLs at config load time
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Brian and Lucas comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Shallow copy of URL instead of reparsing it
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Unshadow net/url package
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Make a deep-copy of URL struct
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
- `tmplText` and `tmplHTML` are using a monad-style error handling [1].
This reduces the verbosity of the error logic, but introduces the risk
of forgetting the final error check. This patch does not remove this
coding-style, but ensures proper error checking in the Email and
PagerDuty notifier.
- Ensure to handle errors returned by `multipartWriter.Close()` and
`wc.Write(buffer.Bytes())` in `Email.Notify()`.
[1] https://www.innoq.com/en/blog/golang-errors-monads/
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
* Add support for adding alerts using amtool
Signed-off-by: Bob Shannon <bshannon@palantir.com>
* comment: Simplify return in addAlert
Signed-off-by: Bob Shannon <bshannon@palantir.com>
Alertmanager is exiting with a non-zero exit code if the initial cluster
join fails. This behavior could be not wanted because:
- As Alertmanager is a critical component with an at-least-once
guarantee, failing on joining the cluster is unnecessary as
Alertmanager still functions by itself.
- In an environment like Kubernetes discovering peers via DNS, peers
might roll out one-by-one, leaving the DNS entries unpopulated for the
first peer of a set. Failing on initial join prevents a roll-out.
Instead of failing on the initial join this patch only logs the failure.
The cluster can be later joined via the `handleReconnect`.
This is a regression introduced in PR #1456 [1].
[1] https://github.com/prometheus/alertmanager/pull/1456
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
* fix concurrent read and wirte group
Signed-off-by: denghuan <denghuan@actionsky.com>
* make lock more elegant
Signed-off-by: denghuan <denghuan@actionsky.com>
* amtool: add support for stdin to check-config
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Stuart's comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* cluster: make sure we don't miss the first pushPull
During the join, memberlist initiates a pushPull to get initial data.
Unfortunately, at this point the nflog and silence listener have not
been registered yet, so the first data arrives only after one pushPull
cycle (1min by default !).
Signed-off-by: Corentin Chary <c.chary@criteo.com>
The memberlist library fails when it can't find a private address and no
advertise address is given. To return a helpful message to the user,
AlertManager mimics the logic from memberlist. However the code had a
bug that swallowed the error message and made it difficult for the user
to understand how to fix the problem.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
To ensure we include the breaking change notice in the next release
notes, this patch adds a 'Next release' section mentioning the breaking
change of the working directory of the Alertmanager Dockerfile.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
The YAML strict mode doesn't allow mapping keys that are duplicates. If
someone wants to override one of the default keys in the Details hash,
the unmarshal function returns an error because the key is already
defined by DefaultPagerdutyConfig.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* cluster: prune the queue if too large
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address review comments
Also increases the pruning interval to 15 minutes and the max queue size
to 4096 items (same value as used by Serf).
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Gossip large messages via SendReliable
For messages beyond half of the maximum gossip
packet size, send the message to all peer nodes
via TCP.
The choice of "larger than half the max gossip
size" is relatively arbitrary. From brief testing,
the overhead from memberlist on a packet seemed to
only use ~3 of the available 1400 bytes, and most
gossip messages seem to be <<500 bytes.
* Add tests for oversized/normal message gossiping
* Make oversize metric names consistent
* Remove errant printf in test
* Correctly increment WaitGroup
* Add comment for OversizedMessage func
* Add metric for oversized messages dropped
Code was added to drop oversized messages if the
buffered channel they are sent on is full. This
is a good thing to surface as a metric.
* Add counter for total oversized messages sent
* Change full queue log level to debug
Was previously a warning, which isn't necessary
now that there is a metric tracking it.
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>