Commit Graph

146 Commits

Author SHA1 Message Date
Max Inden
573389a9bb
Merge pull request #1623 from simonpasquier/add-test-apiv2
test: add acceptance test for firing alerts with EndsAt
2018-11-18 16:32:59 +01:00
Simon Pasquier
2ea37af92c test: add acceptance test for firing alerts with EndsAt
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-15 16:37:41 +01:00
Max Leonard Inden
b4b8b750df
api/v2/openapi.yaml: Differentiate between post and get silence
Instead of having one general silence, differentiate between postable
and gettable silence, hence making more fields required.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-11-15 16:21:07 +01:00
Max Leonard Inden
e4e053b18e
ui: Move /status & /silences to API v2
This patch makes the Alertmanager UI (/status & /silences) use the
api/v2 endpoint. In addition it adds logic to generate the elm side data
model based on the OpenAPI specification.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-11-15 13:24:26 +01:00
Simon Pasquier
306fd73e32 *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-09 10:00:23 +01:00
Max Leonard Inden
d123cbe696
test: Enable testing against cluster of Alertmanagers
Instead of only testing single instance Alertmanagers, this patch
enables individual tests to spin up Alertmanager clusters.

In addition it adds two tests:

1. A test firing alerts against a cluster, expecting to only receive a a
notification by one of the Alertmanager instances in the cluster.

2. A test firing alerts both against a single instance as well as a
cluster, making sure the output equals.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-10-24 15:59:36 +02:00
Simon Pasquier
460b7a72fc test: Don't run TestResolved() in parallel and reduce to 2 runs (#1544)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-09-11 12:55:48 +02:00
Max Leonard Inden
f1b920bcc9
api: Implement OpenAPI generated Alertmanager API V2
The current Alertmanager API v1 is undocumented and written by hand.
This patch introduces a new Alertmanager API - v2. The API is fully
generated via an OpenAPI 2.0 [1] specification (see
`api/v2/openapi.yaml`) with the exception of the http handlers itself.

Pros:
- Generated server code
- Ability to generate clients in all major languages
  (Go, Java, JS, Python, Ruby, Haskell, *elm* [3] ...)
    - Strict contract (OpenAPI spec) between server and clients.
    - Instant feedback on frontend-breaking changes, due to strictly
      typed frontend language elm.
- Generated documentation (See Alertmanager online Swagger UI [4])

Cons:
- Dependency on open api ecosystem including go-swagger [2]

In addition this patch includes the following changes.

- README.md: Add API section

- test: Duplicate acceptance test to API v1 & API v2 version

  The Alertmanager acceptance test framework has a decent test coverage
  on the Alertmanager API. Introducing the Alertmanager API v2 does not go
  hand in hand with deprecating API v1. They should live alongside each
  other for a couple of minor Alertmanager versions.

  Instead of porting the acceptance test framework to use the new API v2,
  this patch duplicates the acceptance tests, one using the API v1, the
  other API v2.

  Once API v1 is removed we can simply remove `test/with_api_v1` and bring
  `test/with_api_v2` to `test/`.

[1]
https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md

[2] https://github.com/go-swagger/go-swagger/

[3] https://github.com/ahultgren/swagger-elm

[4]
http://petstore.swagger.io/?url=https://raw.githubusercontent.com/mxinden/alertmanager/apiv2/api/v2/openapi.yaml

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-09-04 13:38:34 +02:00
Max Leonard Inden
1219541184
*.go: Introduce errcheck enforcing error handling
Errcheck [1] enforces error handling accross all go files. Functions can
be excluded via `scripts/errcheck_excludes.txt`.

This patch adds errcheck to the `test` Make target.

[1] https://github.com/kisielk/errcheck

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-08-30 15:47:13 +02:00
Julius Volz
6d0edbe630 Fix a bunch of unhandled errors (#1501)
...as discovered by "gosec" (many other ones reported, but not all make
a lot of sense to fix).

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-05 15:38:25 +02:00
Simon Pasquier
b7d891cf39 notify: notify resolved alerts properly (#1408)
* notify: notify resolved alerts properly

The PR #1205 while fixing an existing issue introduced another bug when
the send_resolved flag of the integration is set to true.

With send_resolved set to false, the semantics remain the same:
AlertManager generates a notification when new firing alerts are added
to the alert group. The notification only carries firing alerts.

With send_resolved set to true, AlertManager generates a notification
when new firing or resolved alerts are added to the alert group. The
notification carries both the firing and resolved notifications.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix comments

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-06-08 11:37:38 +02:00
stuart nelson
80f2eeb2ca
Fix resolved alerts still inhibiting (#1331)
* inhibit: update inhibition cache when alerts resolve

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: remove unnecessary fmt.Sprintf

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: add unit tests

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: use NopLogger in tests

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Update old alert with result of merge with new

On ingest, alerts with matching fingerprints are
merged if the new alert's start and end times
overlap with the old alert's.

The merge creates a new alert, which is then
updated in the internal alert store.

The original alert is not updated (because merge
creates a copy), so it is never marked as resolved
in the inhibitor's reference to it.

The code within the inhibitor relies on skipping
over resolved alerts, but because the old alert is
never updated it is never marked as resolved. Thus
it continues to inhibit other alerts until it is
cleaned up by the internal GC.

This commit updates the struct of the old alert
with the result of the merge with the new alert.

An alternative would be to always update the
inhibitor's internal cache of alerts regardless of
an alert's resolve status.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Update inhibitor cache even if alert is resolved

This seems like a better choice than the previous
commit. I think it is more sane to have the
inhibitor update its own cache, rather than having
one of its pointers updated externally.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-04-18 16:26:04 +02:00
Simon Pasquier
c92ed69ce8 Split cli package (#1314)
* cli: move commands to cli/cmd

* cli: use StatusAPI interface for config command

* cli: use SilenceAPI interface for silence commands

* cli: use AlertAPI for alert command

* cli: move back commands to cli package

And move API client code to its own package.

* cli: remove unused structs
2018-04-11 11:17:41 +02:00
Simon Pasquier
4cba49155d dispatch: don't reset timer if flush is in-progress (#1301)
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
2018-03-29 12:22:49 +02:00
Simon Pasquier
0c086e3b12 cli: extract client bindings of the v1 API (#1278)
* cli: extract client bindings of the v1 API from amtool

This is a continuation of [1] but the code is kept in the alertmanage
repository rather than having it in client_golang.

[1] https://github.com/prometheus/client_golang/pull/333

Co-Authored-By: Fabian Reinartz <fab.reinartz@gmail.com>
Co-Authored-By: Tristan Colgate <tcolgate@gmail.com>
Co-Authored-By: Corin Lawson <corin@responsight.com>
Co-Authored-By: stuart nelson <stuartnelson3@gmail.com>

* cli: fix httpSilenceAPI.Set() method

* vendor: remove github.com/prometheus/client_golang/api/alertmanager

* cli: don't use the model.Alert type
2018-03-28 19:19:04 +02:00
Brian Brazil
aa950668bf The default group_by is meant to be no labels. (#1287)
This is what the intended default is, and what
the documentation says.
2018-03-16 18:39:23 +01:00
Corentin Chary
dd75201f1c Add /-/ready based on mesh status (#1209)
* Wait for the gossip to settle before sending notifications

See #1209 for details.

As an heuristic for mesh readyness, try to see if
the mesh looks stable (the number of peers isn't changing too much).
This implementation always mark the altermanager as ready after a maximum of 60s.

This adds one new flags to control this behavior:
```
      --cluster.settle-timeout=60s  mesh settling timeout. Do not wait more than this duration on startup.
```

It also adds `/-/ready` which always return 200 (in order to make it clear
that we are ready as soon as we can receive requests).

The mesh status is exposed in `/api/v1/status` and visible on `/#/status`.

* cluster: fix typos and base interval on gossipInterval
2018-03-02 15:45:21 +01:00
pasquier-s
c39a913f8a test: enable race detection (#1262)
This change enables race detection when running the tests. It also fixes
a couple of existing race conditions.
2018-02-27 18:18:53 +01:00
Stuart Nelson
a552afd998 Merge branch 'master' into memberlist 2018-02-13 10:47:17 +01:00
Fabian Reinartz
e6df2d8751 Adapt cluster listen address flag in tests 2018-02-12 11:31:55 +01:00
pasquier-s
76ee5388e7 Forbid 0 value for group_interval and repeat_interval (#1230)
Setting one of these parameters to a zero value doesn't make sense
semantically and can cause high CPU usage.
2018-02-09 10:53:46 +01:00
Fabian Reinartz
fd49dbb477 *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
pasquier-s
62b957cc14 Notify only when new firing alerts are added (#1205)
After the initial notification has been sent, AlertManager shouldn't notify the
receiver again when no new alerts have been added to the group during
group_interval.

This change also modifies the acceptance test framework to assert that no
notification has been received in a given interval.
2018-01-23 16:52:03 +01:00
pasquier-s
907ac510f8 Fix flaky TestBatching acceptance test (#1193)
This change decreases the repeat_interval parameter from 5s to 4.9s to
make sure that the alerts are effectively sent after 5 seconds.

The workflow is:
- The dispatcher flushes the alerts at t0, sends the notification and
marks the notification log at t0+epsilon.
- The dispatcher flushes the alerts at t1, t2, t3 and t4 and doesn't
send the notifications as expected.
- At t5, the dispatcher flushes the alerts because current_time - (t0+epsilon)
is less then repeat_interval.

If repeat_interval is exactly 5s, there is a little chance that it is
greater than current_time - (t0+epsilon).
2018-01-11 22:45:59 +01:00
Calle Pettersson
b7da058efb Switch cmd/alertmanager to kingpin (#974) 2018-01-06 11:22:26 +01:00
Julius Volz
b0aab04906 Fix notifications for flapping alerts (#1071)
Fixes https://github.com/prometheus/alertmanager/issues/1063
2017-11-02 11:12:12 +01:00
Julius Volz
9b72c10134 Minor code cleanups 2017-11-01 23:08:34 +01:00
Fabian Reinartz
ff5ecfff51 test: add reload test
This test reloads the Alertmanager to verify, that it properly keeps
state and sends notifications correctly across reloads.
2017-04-18 12:44:38 +02:00
Fabian Reinartz
309c6af4b2
nflog: use alert set instead of hash for deduplication
Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
2017-04-13 15:13:47 +02:00
stuart nelson
24a9a64bdf Only find MAC address if no command-line flag value given (#638)
* Find MAC address if mesh.hardware-addr not given

Defaulting to the machine's MAC address fails
sometimes fails and causes a panic. Allow the user
to specify custom address to skip this so they can
run AlertManager.

* -mesh.hardware-address -> -mesh.peer-id

* Fix command-line invocation
2017-02-28 14:57:45 +01:00
Martín Ferrari
5489644cbe Wait for test server to be ready before running tests (#605)
* Wait for test server to be ready before running tests

This fixes problems when running the acceptance tests in slow or CPU-starved
machines, as mentioned in #472.
2017-01-16 12:32:27 +00:00
Frederic Branczyk
c392ace697
notify: replace unfiltered with filtered alerts 2017-01-04 13:50:40 +01:00
Frederic Branczyk
dcf2b3afcb
notify: move resolved alert filtering to integration
Resolved alerts, even when filtered, have to end up in the
SetNotifiesStage, otherwise when an alert fires again it is ambiguous
whether it was resolved in between or not.

fixes #523
2016-10-05 17:45:35 +02:00
Fabian Reinartz
a4e8703567 *: integrate new silence package 2016-08-30 12:15:23 +02:00
Fabian Reinartz
72fdf3d3ab *: integrate nflog
This commit replaces the previous NotifyInfo provider with the new
nflog package. It needs adjustments in the behavior of the deduping
stage.
The nflog stores notification digests per receiver per alert aggregation
group rather than one entry for alert per receiver. This drastically
reduces the number of entries and removes interference
across aggregation groups.
2016-08-18 15:52:28 +02:00
Fabian Reinartz
e51770ce21 main: use mesh providers 2016-08-09 12:00:28 +02:00
Fabian Reinartz
81cbf3cda7 *: refactor Silence type, use UUID
This commit removes the dependency on model.Silence for the internal
Silence type, uses UUIDs instead of uint64s and clarifies invariants
around timestamp handling.

The created_at timestamp is removed for the time being.
2016-08-09 11:59:35 +02:00
Fabian Reinartz
d6e64dccc5 provider/boltmem: make alerts purely in-memory.
Initial testing has shown BoltDB in plain usage to be a bottleneck
at a few thousand alerts or more (especially JSON decoding.
This commit actually makes them purely memory as a temporary solution.
2016-07-07 09:45:12 +02:00
Matt Bostock
68a1e51ffb Use localhost for tests
Previously, the tests would listen on all available interfaces.

Instead, have the tests use localhost only; using all available
interfaces is unnecessary.

On Mac OS X with the builtin firewall enabled, it triggers annoying
prompts to allow the tests to listen on all interfaces.
2016-06-04 08:19:23 +01:00
Fabian Reinartz
04f60c5a50 Deal with changed webhook format in tests 2016-02-12 11:00:51 +01:00
Fabian Reinartz
11fae2a719 Simplify and fix notification grouping.
This commit changes the notification grouping behavior
to simply send all alerts of a group as soon as a single
one of them needs updating.

This fixes a critical bug which caused erroneous resolved
notifications to be sent.
2016-01-08 15:17:54 +01:00
Fabian Reinartz
d21d29ee58 Correctly parse send_resolved config field
Fixes #198
2015-12-23 08:31:50 +01:00
Fabian Reinartz
a2b8d35733 Validate API input 2015-12-09 18:21:06 +01:00
Fabian Reinartz
2e5b9e5194 Improve acceptance test logging 2015-12-08 11:55:29 +01:00
Fabian Reinartz
cec04341f7 Add resolved test 2015-11-30 11:20:28 +01:00
beorn7
93ffa534a5 PR with changes after code review
Now to be reverse-reveiewed.
2015-11-23 18:24:57 +01:00
Brian Brazil
faa88831f4 First-pass at improving template system.
- Cut back to bare minimum to make the rest simpler
- Consistency in config naming
- Have one data strucutre that's the same for all templates
- Pass in common labels to templates
- Support templates almost everywhere
- Support multiple SMTP recipients
- Support non-ASCII SMTP headers
- Handle colour logic via templates
- Make $subjects have consistent output, go maps aren't sorted.
- Make tests pass when v6 is disabled
2015-11-18 14:59:05 +00:00
Fabian Reinartz
d80fd26902 Add Dockerfile and target, change flag 2015-11-12 15:03:09 +01:00
Fabian Reinartz
dc656a44ea Adjust config fields to 'receiver' 2015-11-10 14:08:20 +01:00
Fabian Reinartz
e4e594d826 Unify receiver naming 2015-11-10 13:47:04 +01:00
Fabian Reinartz
ba4c3d31b5 Extend merging test to cover more scenarios 2015-10-20 11:59:40 +02:00
Fabian Reinartz
6cbd7f5511 Inherit grouping labels, default grouping labels 2015-10-19 17:35:59 +02:00
Fabian Reinartz
cb0ecd9416 Alter config to have a root route 2015-10-19 16:52:54 +02:00
Fabian Reinartz
fa7955c9bc Show logs of crashed testing instances 2015-10-15 16:17:04 +02:00
Fabian Reinartz
8148e82358 Terminate tests on Alertmanager crash 2015-10-15 16:15:37 +02:00
Fabian Reinartz
ff7eddc453 Add acceptance test for alert merging 2015-10-15 12:46:51 +02:00
Fabian Reinartz
2d3f0ecd84 Add test for silence deletion 2015-10-12 07:40:55 +02:00
Fabian Reinartz
0073647981 Restructure acceptance test files 2015-10-12 07:35:22 +02:00
Fabian Reinartz
16e693dd4f Add simple test for retry logic 2015-10-12 07:28:43 +02:00
Fabian Reinartz
aca2089216 Add injection function to webhook 2015-10-12 07:28:25 +02:00
Fabian Reinartz
5dc2f6e9b1 Add license headers 2015-10-11 17:24:49 +02:00
Fabian Reinartz
d1379a3f71 Move repeat_interval and send_resolved to route configuration 2015-10-08 10:50:37 +02:00
Fabian Reinartz
f48c95eb19 Test restartability with persistence 2015-10-07 16:19:37 +02:00
Fabian Reinartz
a653f90516 Setup persistence dir for acceptance tests 2015-10-06 12:41:20 +02:00
Fabian Reinartz
f14ada020a Fix alert batch comparison, improve debug output 2015-10-05 16:51:34 +02:00
Fabian Reinartz
5222e340b8 Move inhibition test into own file 2015-10-05 16:08:00 +02:00
Julius Volz
05f9972bb5 Remove some debug statements. 2015-10-05 16:03:57 +02:00
Julius Volz
247cf8e3cd Add first inhibition acceptance test. 2015-10-05 16:03:57 +02:00
Julius Volz
0c96c80cd6 Fix spelling of exepected->expected. 2015-10-05 16:03:56 +02:00
Fabian Reinartz
aab576c7c0 Add method to update Alertmanager configuration file 2015-10-02 14:14:30 +02:00
Fabian Reinartz
5ba2d4abc1 Assign webhook addresses automatically, per AM configs 2015-10-02 14:10:04 +02:00
Fabian Reinartz
1ff41b3864 Use alertmanager client package for alert pushing 2015-10-02 12:45:52 +02:00
Fabian Reinartz
83a0d68fc8 Add acceptance test documentation 2015-10-02 12:32:19 +02:00
Fabian Reinartz
f29a380ec4 Export Alertmanager process control functionality 2015-10-02 12:28:27 +02:00
Fabian Reinartz
5c4ec44962 Centralize test actions in AcceptanceTest 2015-10-02 12:18:02 +02:00
Fabian Reinartz
b0989ca9f3 Let test silence timeout by itself 2015-10-01 22:22:51 +02:00
Fabian Reinartz
152f811227 Fix acceptance test 2015-10-01 22:15:27 +02:00
Fabian Reinartz
7b0820a205 Add silencing test 2015-10-01 21:28:18 +02:00
Fabian Reinartz
38b324eab2 Implement silencing for acceptance tests, use api package 2015-10-01 20:58:46 +02:00
Fabian Reinartz
ba4f5684bb Remove all internal types from acceptance test framework 2015-10-01 15:46:39 +02:00
Fabian Reinartz
0b4d58fbdb Switch to model.Alert 2015-10-01 14:53:49 +02:00
Fabian Reinartz
ad1408e8b9 Fix test documentation 2015-10-01 09:43:51 +02:00
Fabian Reinartz
6174b41b0c Adjust test for new batching behavior 2015-09-30 19:03:19 +02:00
Fabian Reinartz
1fcc5d6717 Add proper checks whether full batches were received 2015-09-30 18:45:49 +02:00
Fabian Reinartz
6a0b4cc8b2 Add acceptance test example for batching 2015-09-30 18:02:47 +02:00
Fabian Reinartz
297bbf5a0a Generalize AM actions, batch pushed alerts as defined 2015-09-30 17:45:37 +02:00
Fabian Reinartz
25ee299cb9 Make example test more constrained 2015-09-30 17:35:33 +02:00
Fabian Reinartz
0539783eea Start alertmanager with a random free address in tests 2015-09-30 17:33:49 +02:00
Fabian Reinartz
f20f5dce42 Document example acceptance test scenario 2015-09-30 16:18:44 +02:00
Fabian Reinartz
4401bb1b82 Split acceptance testing into multiple files 2015-09-30 16:13:00 +02:00
Fabian Reinartz
b45dd027bc Remove time factor from e2e test options 2015-09-30 15:35:52 +02:00
Fabian Reinartz
0d39c0e1af Improve e2e testing output 2015-09-30 15:02:07 +02:00
Fabian Reinartz
ab2a3b1c6a More elaborate example e2e test case 2015-09-30 14:55:19 +02:00
Fabian Reinartz
251f3ec57c Improve checks in e2e tests, various fixes 2015-09-30 14:54:54 +02:00
Fabian Reinartz
d4ec632d06 Improve e2e test framework 2015-09-29 22:40:44 +02:00
Fabian Reinartz
329ec605cf Initial e2e test setup 2015-09-29 20:45:38 +02:00