When users start Alertmanager with `--cluster.listen-address=`, the
cluster will not be initialized, hence api.peer will be `nil`. So far
this would result in a nil pointer dereference by the API v2 accessing
the api.peer field.
With this patch, api v2 skips populating the peers array, sets the name
to an empty string and the status to "disabled" in case `api.peer` is
nil.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
With issue 1465 on openapi-generator [1] being fixed, we can not extract
shared properties of the gettable and postable alert definition into a
shared object (`alert`) like we do for silence, gettable silence and
postable silence.
In addition this patch does the following changes to the UI:
- Use `List GettableAlert` instead of plural type definition like
`GettableAlerts` because the plural definitions are not generated.
- Fix openapi-generator-cli docker image to specific hash.
[1] https://github.com/OpenAPITools/openapi-generator/issues/1465
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
Instead of having one general silence, differentiate between postable
and gettable silence, hence making more fields required.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
This patch makes the Alertmanager UI (/status & /silences) use the
api/v2 endpoint. In addition it adds logic to generate the elm side data
model based on the OpenAPI specification.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
The current Alertmanager API v1 is undocumented and written by hand.
This patch introduces a new Alertmanager API - v2. The API is fully
generated via an OpenAPI 2.0 [1] specification (see
`api/v2/openapi.yaml`) with the exception of the http handlers itself.
Pros:
- Generated server code
- Ability to generate clients in all major languages
(Go, Java, JS, Python, Ruby, Haskell, *elm* [3] ...)
- Strict contract (OpenAPI spec) between server and clients.
- Instant feedback on frontend-breaking changes, due to strictly
typed frontend language elm.
- Generated documentation (See Alertmanager online Swagger UI [4])
Cons:
- Dependency on open api ecosystem including go-swagger [2]
In addition this patch includes the following changes.
- README.md: Add API section
- test: Duplicate acceptance test to API v1 & API v2 version
The Alertmanager acceptance test framework has a decent test coverage
on the Alertmanager API. Introducing the Alertmanager API v2 does not go
hand in hand with deprecating API v1. They should live alongside each
other for a couple of minor Alertmanager versions.
Instead of porting the acceptance test framework to use the new API v2,
this patch duplicates the acceptance tests, one using the API v1, the
other API v2.
Once API v1 is removed we can simply remove `test/with_api_v1` and bring
`test/with_api_v2` to `test/`.
[1]
https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md
[2] https://github.com/go-swagger/go-swagger/
[3] https://github.com/ahultgren/swagger-elm
[4]
http://petstore.swagger.io/?url=https://raw.githubusercontent.com/mxinden/alertmanager/apiv2/api/v2/openapi.yaml
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
Errcheck [1] enforces error handling accross all go files. Functions can
be excluded via `scripts/errcheck_excludes.txt`.
This patch adds errcheck to the `test` Make target.
[1] https://github.com/kisielk/errcheck
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
* api: support more query filters
This change adds 2 new query filters to the /api/v1/alerts endpoint.
- active, filter out active alerts when set to 'false' (default: 'true').
- unprocessed, filter out unprocessed alerts when set to 'false'
(default: 'true').
The default values ensure that the API behavior remains the same as
before when the query filters aren't provided.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* api: address comments
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
With prometheus/prometheus commit
e114ce0ff7a1ae06b24fdc479ffc7422074c1ebe [1] Prometheus switches from
using `api/alerts` to `api/v1/alerts`. This commit is included starting
from Prometheus v0.17.0. As discussed on the prometheus-developers
mailing list [2] the deprecation period is long over.
[1] github.com/prometheus/prometheus/commit/e114ce0ff7a1ae06b24fdc479ffc7422074c1ebe
[2]
https://groups.google.com/d/msg/prometheus-developers/2CCuFTMbmAg/Qg58rvyzAQAJ
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
This change replaces the deprecated InstrumentHandler function by
the equivalent functions from the promhttp package.
The following metrics are removed:
* http_request_duration_microseconds (Summary).
* http_request_size_bytes (Summary).
* http_requests_total (Counter).
And the following metrics are added instead:
* alertmanager_http_request_duration_seconds (Histogram).
* alertmanager_http_response_size_bytes (Histogram).
* promhttp_metric_handler_requests_in_flight (Gauge).
* promhttp_metric_handler_requests_total (Counter).
The same behavior exists in prometheus. This is a
bit superfluous, but in the event people are using
old versions of prometheus or a different metric
gathering system, it's still valid to check.
* Wait for the gossip to settle before sending notifications
See #1209 for details.
As an heuristic for mesh readyness, try to see if
the mesh looks stable (the number of peers isn't changing too much).
This implementation always mark the altermanager as ready after a maximum of 60s.
This adds one new flags to control this behavior:
```
--cluster.settle-timeout=60s mesh settling timeout. Do not wait more than this duration on startup.
```
It also adds `/-/ready` which always return 200 (in order to make it clear
that we are ready as soon as we can receive requests).
The mesh status is exposed in `/api/v1/status` and visible on `/#/status`.
* cluster: fix typos and base interval on gossipInterval
When the API receives alerts where StartsAt is zero, it updates the
value to EndsAt (if not zero itself) or "now". This ensures that the
alert validation will not fail since StartsAt has to be less than or
equal to EndsAt.
* Expose alert fingerprint in the API
Alert fingerprint is already provided as the value of status.inhibitedBy[] attribute that inhibited alerts have, but there's no way to get back to the alert that's inhibiting it as the fingerprint is not exposed.
* Expose alert fingerprint as ID in the list endpoint
* Rename ID to Fingerprint
* Use Fingerprint().String() in the API
* Render status page without mesh connection (#918)
A mesh connection was assumed, even though the
value that was being passed into the helper
function was a possibly-nil pointer. Add a check
for this, and return a nil value in that case. The
frontend finds this when decoding the json
payload, and displays the "not configured"
message.
* Update bindata