The CI keeps reporting flakes for our acceptance test around the starting and stopping of the Alertmanagers. While I have an idea of where these failures are coming from, it would be nice to get a confirmation by structuring our error messages a bit better.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
As part of #2971, I'm about to extend the test for silences - extract the functions into helpers as part of a separate file and add names to the expectations so that we can easily identify them.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
As noted in #2867, there is an unnecessary require.Eventually in a
silence test. This PR addresses that by using a channel to signal that
that the maintenance loop has completed.
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
github.com/benbjohnson/clock provides a time interface to programs
rather than using the stdlib time package. This allows mocking time in
programs and tests. In this commit, the clock is used to speed up and
simplify testing of the silences package.
Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
Add dependabot dependency check in order to maintain dependencies up-to-date and security updates on time.
Signed-off-by: David Ureba <david.ureba@aiven.io>
In accordance with a new rule introduced as part of https://github.com/grafana/dashboard-linter/pull/79 this is now required. However, for the new rule of `panel-unit-rule` we don't reap any benefits from specifiying a particular unit for our panels, the defaults work perfectly fine so they're ignored.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
This accurately reflects what the function _actually_ does. If no active silences IDs are provided and the list of inhibitions we have is already empty the alert is actually set to Active. Took me a while to realise this as I was understanding how do we populate the alert list.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
While merging #2944, I noticed the CI failed: https://app.circleci.com/pipelines/github/prometheus/alertmanager/2686/workflows/b6f87b0a-20c3-455b-b706-432c38a77511/jobs/12028.
It seemed like a deadlock between uncoordinated routines but I couldn't pin point (or reproduce, I tried with -race and -count) the exact problem. However, from the logs, I could point out where the problem originated and kind of have a hunch it had to do with the way net listeners are handled by the TODO removed.
The more worrying bit of the CI failure is that it took 10m to timeout, with this change we'll force close the connection with a 5s deadline so at the very least we'll get the feedback faster.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Alert metric reports different results to what the user sees via API
Fixes#1439 and #2619.
The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
With https://github.com/grafana/dashboard-linter/pull/49 `template-job-rule` no longer validates both `instance` and `job` labels. Add the new rule of `template-instance-rule` to the exclusions to preserve the previous behaviour.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Within 9a32e58ed0, the rules have been split into two different rules:
`target-job-rule`
`target-instance-rule`
All of our queries do contain the `job` label but as per the reason, we don't need both in this particular case.
Fixes#2899
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
The validation should fail if both `api_key` and `api_key_file` are
defined. I think there was a typo in the original PR (#2728) that
enforced `api_url` and `api_key_file` not being defined at the same
time.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
The function value and parameters of a defer statement are immediately
evaluated, so this "disp" value is always nil, and calling Stop() on a nil
dispatcher is a no-op, so this does nothing, but wrapping it in a closure
that refers to "disp" fixes it.
Signed-off-by: Julius Volz <julius.volz@gmail.com>