The CI environment isn't as performant as local machines: the time
needed to fully initialize the test environment can be significant and
skew the verification. Rather than setting the "virtual" clock used to
measure alert timings at the beginning of the acceptance test, it is
better to wait for the test bed to be ready.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
While merging #2944, I noticed the CI failed: https://app.circleci.com/pipelines/github/prometheus/alertmanager/2686/workflows/b6f87b0a-20c3-455b-b706-432c38a77511/jobs/12028.
It seemed like a deadlock between uncoordinated routines but I couldn't pin point (or reproduce, I tried with -race and -count) the exact problem. However, from the logs, I could point out where the problem originated and kind of have a hunch it had to do with the way net listeners are handled by the TODO removed.
The more worrying bit of the CI failure is that it took 10m to timeout, with this change we'll force close the connection with a 5s deadline so at the very least we'll get the feedback faster.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Instead of only testing single instance Alertmanagers, this patch
enables individual tests to spin up Alertmanager clusters.
In addition it adds two tests:
1. A test firing alerts against a cluster, expecting to only receive a a
notification by one of the Alertmanager instances in the cluster.
2. A test firing alerts both against a single instance as well as a
cluster, making sure the output equals.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>