Commit Graph

76 Commits

Author SHA1 Message Date
Simon Pasquier
57c4ff10ab api/v2: serve OpenAPI specification (#1751)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 15:36:19 +01:00
stuart nelson
51eebbef85
Stn/correctly mark api silences (#1733)
* Update alert status on every GET to alerts

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-02-18 17:06:51 +01:00
beorn7
21de9ff88c Various improvements after code review
Most importantly, `api.New` now takes an `Options` struct as an
argument, which allows some other things done here as well:

- Timout and concurrency limit are now in the options, streamlining
  the registration and the implementation of the limiting middleware.

- A local registry is used for metrics, and the metrics used so far
  inside any of the api packages are using it now.

The 'in flight' metric now contains the 'get' as a method label. I
have also added a TODO to instrument other methods in the same way
(otherwise, the label doesn't reall make sense, semantically). I have
also added an explicit error counter for requests rejected because of
the concurrency limit. (They also show up as 503s in the generic HTTP
instrumentation (or they would, if v2 were instrumented, too), but
those 503s might have a number of reasons, while users might want to
alert on concurrency limit problems explicitly).

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-12 18:42:08 +01:00
beorn7
3382a0e949 Add HTTP instrumentation for GET requests in flight
While the newly added in-flight instrumentation works for all GET
requests, the existing HTTP instrumentation omits api/v2 calls. This
commit adds a TODO note about that.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
beorn7
4747fd9b2f Propagate timeout to alert listing via context
The context is created by the http.TimeoutHandler we use to set the
timeout.

I believe this is the only endpoint where propagating the timeout is
feasible and needed.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
beorn7
fc4b67ce80 Introduce a timeout and concurrency limit for HTTP requests
The default concurrency limit is max(GOMAXPROCS, 8). That should not
imply that each GET requests eats a whole CPU. It's more to get some
reasonable heuristics for the processing power of the hosting machine
(while allowing at least 8 concurrent requests even on the smallest
machines). As GET requests can easily overload the Alertmanager,
rendering it incapable of doing its main task, namely sending alert
notifications, we need to limit GET requests by default.

In contrast, no timeout is set by default. The http.TimeoutHandler
inovkes quite a bit of machinery behind the scenes, in particular an
additional layer of buffering. Thus, we should first get a bit of
experience with it before we consider enforcing a timeout by default,
even if setting a timeout is in general the safer setting for
resiliency.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-11 19:34:06 +01:00
Max Leonard Inden
c57542127d
api: Combine v1 and v2 into generic api
Instead of cmd/alertmanager/main.go instantiating and starting both api
v1 and v2, delegate that work to a generic api combining the two.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-04 14:31:33 +01:00
Max Leonard Inden
8e157b3af5
api/v2: Make cluster status peers and name optional
If a users chooses to disable the Alertmanager cluster feature, there is
no cluster name nor cluster peers. Hence these should be optional. Only
cluster status is set to "disabled".

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-02-04 11:40:30 +01:00
Max Leonard Inden
2f055d9966
api/v2: Do not populate cluster info if clustering is disabled
When users start Alertmanager with `--cluster.listen-address=`, the
cluster will not be initialized, hence api.peer will be `nil`. So far
this would result in a nil pointer dereference by the API v2 accessing
the api.peer field.

With this patch, api v2 skips populating the peers array, sets the name
to an empty string and the status to "disabled" in case `api.peer` is
nil.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-31 16:56:59 +01:00
Max Leonard Inden
7aa8ea9d9d
api/v2: Disable serving swagger spec and redoc UI
By default go-swagger serves the swagger spec and the redoc UI. This
patch disables both.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-01-17 16:19:29 +01:00
Simon Pasquier
b676fa79c0 *: update Makefile.common with new staticcheck (#1692)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 15:37:33 +01:00
Simon Pasquier
9a116736ef api/v2: Add CORS support (#1667)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-16 14:05:34 +01:00
Max Leonard Inden
2b697aaa6b
api/v2: Extract shared properties of gettable and postable alert
With issue 1465 on openapi-generator [1] being fixed, we can not extract
shared properties of the gettable and postable alert definition into a
shared object (`alert`) like we do for silence, gettable silence and
postable silence.

In addition this patch does the following changes to the UI:

- Use `List GettableAlert` instead of plural type definition like
`GettableAlerts` because the plural definitions are not generated.

- Fix openapi-generator-cli docker image to specific hash.

[1] https://github.com/OpenAPITools/openapi-generator/issues/1465

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-11-28 14:35:39 +01:00
Max Leonard Inden
f504f953c1
ui: Move /alerts to API v2
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-11-23 12:53:48 +01:00
Max Leonard Inden
b4b8b750df
api/v2/openapi.yaml: Differentiate between post and get silence
Instead of having one general silence, differentiate between postable
and gettable silence, hence making more fields required.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-11-15 16:21:07 +01:00
Max Leonard Inden
e4e053b18e
ui: Move /status & /silences to API v2
This patch makes the Alertmanager UI (/status & /silences) use the
api/v2 endpoint. In addition it adds logic to generate the elm side data
model based on the OpenAPI specification.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-11-15 13:24:26 +01:00
Max Leonard Inden
f1b920bcc9
api: Implement OpenAPI generated Alertmanager API V2
The current Alertmanager API v1 is undocumented and written by hand.
This patch introduces a new Alertmanager API - v2. The API is fully
generated via an OpenAPI 2.0 [1] specification (see
`api/v2/openapi.yaml`) with the exception of the http handlers itself.

Pros:
- Generated server code
- Ability to generate clients in all major languages
  (Go, Java, JS, Python, Ruby, Haskell, *elm* [3] ...)
    - Strict contract (OpenAPI spec) between server and clients.
    - Instant feedback on frontend-breaking changes, due to strictly
      typed frontend language elm.
- Generated documentation (See Alertmanager online Swagger UI [4])

Cons:
- Dependency on open api ecosystem including go-swagger [2]

In addition this patch includes the following changes.

- README.md: Add API section

- test: Duplicate acceptance test to API v1 & API v2 version

  The Alertmanager acceptance test framework has a decent test coverage
  on the Alertmanager API. Introducing the Alertmanager API v2 does not go
  hand in hand with deprecating API v1. They should live alongside each
  other for a couple of minor Alertmanager versions.

  Instead of porting the acceptance test framework to use the new API v2,
  this patch duplicates the acceptance tests, one using the API v1, the
  other API v2.

  Once API v1 is removed we can simply remove `test/with_api_v1` and bring
  `test/with_api_v2` to `test/`.

[1]
https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md

[2] https://github.com/go-swagger/go-swagger/

[3] https://github.com/ahultgren/swagger-elm

[4]
http://petstore.swagger.io/?url=https://raw.githubusercontent.com/mxinden/alertmanager/apiv2/api/v2/openapi.yaml

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-09-04 13:38:34 +02:00
Max Inden
b1a8fdd169
Merge pull request #1521 from mxinden/errcheck
*.go: Introduce errcheck enforcing error handling
2018-08-30 17:53:49 +02:00
Max Leonard Inden
1219541184
*.go: Introduce errcheck enforcing error handling
Errcheck [1] enforces error handling accross all go files. Functions can
be excluded via `scripts/errcheck_excludes.txt`.

This patch adds errcheck to the `test` Make target.

[1] https://github.com/kisielk/errcheck

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-08-30 15:47:13 +02:00
Simon Pasquier
899226f3ac *: remove v1/alerts/groups API endpoint (#1525)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-23 16:03:49 +02:00
comicmuse
ec263489e9 Add cache control headers to the API responses to avoid IE caching th… (#1500)
Add cache control headers to the API responses to avoid IE caching the response.
2018-08-06 18:51:54 +02:00
Julius Volz
6d0edbe630 Fix a bunch of unhandled errors (#1501)
...as discovered by "gosec" (many other ones reported, but not all make
a lot of sense to fix).

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-05 15:38:25 +02:00
Simon Pasquier
0ebaeccd4b *: add missing license headers
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 17:37:13 +02:00
Simon Pasquier
75900ea62a api: remove dead code (#1367)
This is a follow-up of f825d97de4.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-07 18:11:36 +02:00
Simon Pasquier
383024e63d api: support more query filters (#1366)
* api: support more query filters

This change adds 2 new query filters to the /api/v1/alerts endpoint.

- active, filter out active alerts when set to 'false' (default: 'true').
- unprocessed, filter out unprocessed alerts when set to 'false'
 (default: 'true').

The default values ensure that the API behavior remains the same as
before when the query filters aren't provided.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* api: address comments

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-07 18:07:19 +02:00
Max Leonard Inden
f825d97de4
api: Deprecate api/alerts endpoint
With prometheus/prometheus commit
e114ce0ff7a1ae06b24fdc479ffc7422074c1ebe [1] Prometheus switches from
using `api/alerts` to `api/v1/alerts`. This commit is included starting
from Prometheus v0.17.0. As discussed on the prometheus-developers
mailing list [2] the deprecation period is long over.

[1] github.com/prometheus/prometheus/commit/e114ce0ff7a1ae06b24fdc479ffc7422074c1ebe
[2]
https://groups.google.com/d/msg/prometheus-developers/2CCuFTMbmAg/Qg58rvyzAQAJ

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-05-04 09:59:14 +02:00
Simon Pasquier
f53b24765d api: initialize alerts_received_total labels (#1310) 2018-04-04 10:38:17 +02:00
Simon Pasquier
b95b32821f ui: replace deprecated InstrumentHandler() (#1302)
This change replaces the deprecated InstrumentHandler function by
the equivalent functions from the promhttp package.

The following metrics are removed:

* http_request_duration_microseconds (Summary).
* http_request_size_bytes (Summary).
* http_requests_total (Counter).

And the following metrics are added instead:

* alertmanager_http_request_duration_seconds (Histogram).
* alertmanager_http_response_size_bytes (Histogram).
* promhttp_metric_handler_requests_in_flight (Gauge).
* promhttp_metric_handler_requests_total (Counter).
2018-03-28 15:28:38 +02:00
Stuart Nelson
319687ab3c Re-simplify match filters fn 2018-03-20 16:11:01 +01:00
Stuart Nelson
0c026b4387 Remove empty alert labels on ingest
The same behavior exists in prometheus. This is a
bit superfluous, but in the event people are using
old versions of prometheus or a different metric
gathering system, it's still valid to check.
2018-03-20 12:06:34 +01:00
Stuart Nelson
4c98f4b4a9 Fix matchLabels logic 2018-03-20 11:47:53 +01:00
Stuart Nelson
f5df55666b Filter empty matchers correctly 2018-03-20 10:08:58 +01:00
Corentin Chary
dd75201f1c Add /-/ready based on mesh status (#1209)
* Wait for the gossip to settle before sending notifications

See #1209 for details.

As an heuristic for mesh readyness, try to see if
the mesh looks stable (the number of peers isn't changing too much).
This implementation always mark the altermanager as ready after a maximum of 60s.

This adds one new flags to control this behavior:
```
      --cluster.settle-timeout=60s  mesh settling timeout. Do not wait more than this duration on startup.
```

It also adds `/-/ready` which always return 200 (in order to make it clear
that we are ready as soon as we can receive requests).

The mesh status is exposed in `/api/v1/status` and visible on `/#/status`.

* cluster: fix typos and base interval on gossipInterval
2018-03-02 15:45:21 +01:00
pasquier-s
e8a92f65ef Run staticcheck as part of the build process (#1264)
This change also fixes potential issues highlighted by running
staticcheck.
2018-02-28 17:42:32 +01:00
pasquier-s
29e441f88f Fix miscellaneous issues revealed by Go 1.10 (#1256)
* provider/mem: fix format verbs in tests

* api: fix format verb
2018-02-22 14:57:45 +00:00
pasquier-s
382a0d8089 api: support zero StartsAt for alerts (#1238)
When the API receives alerts where StartsAt is zero, it updates the
value to EndsAt (if not zero itself) or "now". This ensures that the
alert validation will not fail since StartsAt has to be less than or
equal to EndsAt.
2018-02-13 16:26:34 +01:00
Stuart Nelson
a552afd998 Merge branch 'master' into memberlist 2018-02-13 10:47:17 +01:00
Fabian Reinartz
3f2e00fbea cluster/api: improve metrics and cluster status 2018-02-09 11:16:00 +01:00
Fabian Reinartz
fd49dbb477 *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
conorbroderick
e8832619e0 Fixes AM wrongly counting alerts with EndTimes in the future as resolved 2018-02-07 15:52:26 +00:00
Jose Donizetti
fc9306cd7e Add expired silence validation (#1096)
* Add expired silence validation

* Add silence end time in the past validation
2018-01-21 15:29:51 +01:00
pasquier-s
364979bbf8 Display connections in the Status page (#1164)
This change shows the status of the local connections in the web UI. It
can be used to troubleshoot mesh issues.
2017-12-22 11:39:27 +01:00
Fabian Reinartz
405dbb8d9c
Fix wrong lock 2017-12-21 16:55:55 +01:00
stuart nelson
1abe4c9a56 Lock around variables used in Update()
Found two places where struct members being
updated in api.Update() where being accessed
elsewhere without locks.
2017-12-21 12:08:39 +01:00
Jose Donizetti
10ed60361d Fix silences negative filtering (#1095)
* Fix silence negative filtering

* Refactor extract filtering labels func
2017-11-15 14:29:06 -05:00
Jose Donizetti
e303646b80 Fix typo (#1103) 2017-11-15 14:25:46 -05:00
Jose Donizetti
95e80d1aa8 Add tests to receiver filtering (#1098) 2017-11-12 11:35:49 -05:00
Julius Volz
fdee5fcbfc Fix UI when no silences are present (#1090)
* Explicitly initialize silences list to avoid "null" JSON

* Wrap "No silences found" message in error box

* bindata fixup
2017-11-11 14:48:48 +01:00
Jose Donizetti
74808e40f3 Refactor silence constants (#1076)
* Refactor remove dups silence state constants

* Refactor to use const instead of string
2017-11-07 11:36:30 +01:00
Jose Donizetti
b9597f5c7b Fix negative matchers filtering (#1077) 2017-11-04 14:38:16 +01:00