Commit Graph

1559 Commits

Author SHA1 Message Date
Simon Pasquier
4cba49155d dispatch: don't reset timer if flush is in-progress (#1301)
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
2018-03-29 12:22:49 +02:00
stuart nelson
19715022a4
[amtool] update silence add and update flags (#1298)
* Update silence add/update flags

- Change --expires/-e to --duration/-d
- Change --expires-on to --end
- Add --start

* update subcommand returns ID of new silence

The silences printed before were accurate, except
they had the old ID. Now the new ID is returned.

* Duration is added to silence.StartsAt

When a user supplies a duration to update a
silence, it is applied to silence.StartsAt after
any potential changes to the silence's start time.
2018-03-29 12:11:31 +02:00
Simon Pasquier
0c086e3b12 cli: extract client bindings of the v1 API (#1278)
* cli: extract client bindings of the v1 API from amtool

This is a continuation of [1] but the code is kept in the alertmanage
repository rather than having it in client_golang.

[1] https://github.com/prometheus/client_golang/pull/333

Co-Authored-By: Fabian Reinartz <fab.reinartz@gmail.com>
Co-Authored-By: Tristan Colgate <tcolgate@gmail.com>
Co-Authored-By: Corin Lawson <corin@responsight.com>
Co-Authored-By: stuart nelson <stuartnelson3@gmail.com>

* cli: fix httpSilenceAPI.Set() method

* vendor: remove github.com/prometheus/client_golang/api/alertmanager

* cli: don't use the model.Alert type
2018-03-28 19:19:04 +02:00
Simon Pasquier
b95b32821f ui: replace deprecated InstrumentHandler() (#1302)
This change replaces the deprecated InstrumentHandler function by
the equivalent functions from the promhttp package.

The following metrics are removed:

* http_request_duration_microseconds (Summary).
* http_request_size_bytes (Summary).
* http_requests_total (Counter).

And the following metrics are added instead:

* alertmanager_http_request_duration_seconds (Histogram).
* alertmanager_http_response_size_bytes (Histogram).
* promhttp_metric_handler_requests_in_flight (Gauge).
* promhttp_metric_handler_requests_total (Counter).
2018-03-28 15:28:38 +02:00
stuart nelson
acb111e812
0.15.0-rc.1 (#1296) 2018-03-23 13:59:49 +01:00
Ted Zlatanov
099b6a1d43 Sort dispatched alerts by job+instance then rest by default (#1178) (#1234) 2018-03-22 20:06:37 +01:00
Simon Pasquier
1531aa66f3 Fix for #1282 (#1286)
* cluster: add alertmanager_cluster_messages_queued metric

* cluster: add metrics for sent messages

This change adds 2 new metrics:

- alertmanager_cluster_messages_sent_total
- alertmanager_cluster_messages_sent_size_total

* Fix marshaling for entries being broadcast

Individual notifications logs and silences being broadcast to the other
peers need to be encoded using the same length-delimited format as when
doing full-state synchronization.

* main: fix argument order for cluster.Join()

cluster.Join() was called with the push/pull and gossip interval
parameters being swapped one for another.
2018-03-22 13:53:00 +01:00
stuart nelson
a578319008
Merge pull request #1289 from prometheus/allow-empty-matchers
Allow empty matchers
2018-03-21 14:12:16 +01:00
Stuart Nelson
319687ab3c Re-simplify match filters fn 2018-03-20 16:11:01 +01:00
Stuart Nelson
479e5c52ac Update package prometheus/pkg/labels 2018-03-20 16:10:16 +01:00
ranbochen
b4048f46bc fix wechat issue (#1293) 2018-03-20 12:21:19 +01:00
Stuart Nelson
0c026b4387 Remove empty alert labels on ingest
The same behavior exists in prometheus. This is a
bit superfluous, but in the event people are using
old versions of prometheus or a different metric
gathering system, it's still valid to check.
2018-03-20 12:06:34 +01:00
Stuart Nelson
85caf29316 Cleanup frontend makefile 2018-03-20 11:54:44 +01:00
Stuart Nelson
4c98f4b4a9 Fix matchLabels logic 2018-03-20 11:47:53 +01:00
Stuart Nelson
5ddf0444c4 Update bindata 2018-03-20 10:48:12 +01:00
Stuart Nelson
c300cd9f8d Remove non-empty string validation in frontend 2018-03-20 10:46:03 +01:00
Stuart Nelson
f5df55666b Filter empty matchers correctly 2018-03-20 10:08:58 +01:00
Brian Brazil
bd04da5480 Remove debugging code (#1291) 2018-03-18 12:42:24 +01:00
Brian Brazil
aa950668bf The default group_by is meant to be no labels. (#1287)
This is what the intended default is, and what
the documentation says.
2018-03-16 18:39:23 +01:00
Simon Pasquier
8c9e0cf50c config: set global SMTP hello to "localhost" by default (#1290) 2018-03-16 14:45:17 +00:00
Andrey Kuzmin
1413927c3f Update bindata.go 2018-03-16 11:27:08 +01:00
Andrey Kuzmin
1c9034282c Allow empty matchers 2018-03-16 11:19:57 +01:00
Dave Thompson
d63c25a855 Set User object in alertmanager url. (#1279) 2018-03-13 14:56:20 +01:00
James Turnbull
6ebfb88cbc Update for new HA docs (#1277)
* Update for new HA

* Fixed default

* Added cluster settle reference
2018-03-07 14:08:35 +01:00
Stuart Nelson
8f1c16eaa9 Update flag help text
Start all help texts with a capital letter, end
with a period.

There were some additional things that got caught
by gofmt/goimports.
2018-03-07 10:04:30 +01:00
pasquier-s
7b80919b36 Remove unused code (#1272) 2018-03-03 11:07:47 +01:00
Corentin Chary
dd75201f1c Add /-/ready based on mesh status (#1209)
* Wait for the gossip to settle before sending notifications

See #1209 for details.

As an heuristic for mesh readyness, try to see if
the mesh looks stable (the number of peers isn't changing too much).
This implementation always mark the altermanager as ready after a maximum of 60s.

This adds one new flags to control this behavior:
```
      --cluster.settle-timeout=60s  mesh settling timeout. Do not wait more than this duration on startup.
```

It also adds `/-/ready` which always return 200 (in order to make it clear
that we are ready as soon as we can receive requests).

The mesh status is exposed in `/api/v1/status` and visible on `/#/status`.

* cluster: fix typos and base interval on gossipInterval
2018-03-02 15:45:21 +01:00
stuart nelson
2ecd4d6c3c
Prepopulate matchers when recreating a silence (#1270)
* Fix #1268

Pre-populate matches list when recreating expired
silences.

* Update bindata
2018-03-02 10:59:50 +01:00
pasquier-s
e67aa8edae Hide sensitive Wechat configuration + remove default fields (#1253)
* Hide sensitive Wechat configuration

* Don't send resolved alerts for Wechat by default
2018-03-02 09:49:41 +01:00
pasquier-s
e8a92f65ef Run staticcheck as part of the build process (#1264)
This change also fixes potential issues highlighted by running
staticcheck.
2018-02-28 17:42:32 +01:00
Fabian Reinartz
68122e7005
Merge pull request #1263 from prometheus/015rc0
Cut 0.15.0-rc.0
2018-02-28 12:54:34 +01:00
Fabian Reinartz
56ff38288e *: cut 0.15.0-rc.0 2018-02-28 12:43:43 +01:00
Fabian Reinartz
187b116bba Build with Go 1.10 2018-02-28 11:37:17 +01:00
pasquier-s
c39a913f8a test: enable race detection (#1262)
This change enables race detection when running the tests. It also fixes
a couple of existing race conditions.
2018-02-27 18:18:53 +01:00
pasquier-s
3df093968c cluster: gather alertmanager_peer_position all the time (#1247)
* cluster: gather alertmanager_peer_position all the time

This change moves the gathering of the alertmanager_peer_position metric
outside of the clusterWait() function so that the metric is computed
accurately even when no alerting group fires.

* cluster: add alertmanager_cluster_health_score metric

This metric is retrieved from the memberlist library.
2018-02-27 10:37:56 +01:00
pasquier-s
c2dac90434 silence: fix skipped test (#1258)
TestStateMerge() was skipped because of a typo. Fixing the name revealed
that the test itself needed to be updated following the switch to the
memberlist library.
2018-02-27 10:17:48 +01:00
Brian Brazil
5cb71e1def Fix spelling and comment style. (#1257) 2018-02-27 10:07:33 +01:00
pasquier-s
29e441f88f Fix miscellaneous issues revealed by Go 1.10 (#1256)
* provider/mem: fix format verbs in tests

* api: fix format verb
2018-02-22 14:57:45 +00:00
stuart nelson
0f9c9a0bb0
Remove unused functions for mesh (#1251)
These functions were used with weaveworks/mesh,
but are no longer needed with memberlist.
2018-02-16 18:16:06 +01:00
Frederic Branczyk
28db2409fd
Merge pull request #1222 from simonpasquier/httpcfg
*: configure http client from config
2018-02-16 14:44:37 +01:00
Fabian Reinartz
dd675e0c89
Merge pull request #1242 from roidelapluie/ptc
cluster: Make peer timeout configurable
2018-02-14 11:20:53 +01:00
Fabian Reinartz
4e434573c7
Merge pull request #1245 from simonpasquier/fix-join
cluster: pass resolved peers to Join()
2018-02-14 11:20:34 +01:00
Simon Pasquier
f4c81c43e9 cluster: pass resolved peers to Join() 2018-02-13 16:53:09 +01:00
Julien Pivotto
dc293439ca cluster: Make peer timeout configurable
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2018-02-13 16:31:33 +01:00
pasquier-s
382a0d8089 api: support zero StartsAt for alerts (#1238)
When the API receives alerts where StartsAt is zero, it updates the
value to EndsAt (if not zero itself) or "now". This ensures that the
alert validation will not fail since StartsAt has to be less than or
equal to EndsAt.
2018-02-13 16:26:34 +01:00
Frederic Branczyk
32f90a02ca
Merge pull request #1239 from simonpasquier/remove-dead-code
cmd: remove unused code
2018-02-13 16:21:44 +01:00
Simon Pasquier
955c92f1b6 Configure http client for Wechat 2018-02-13 14:52:53 +01:00
Simon Pasquier
8b93f1085d Add tests for HTTP client configuration 2018-02-13 14:30:59 +01:00
Frederic Branczyk
d678022fea *: configure http client from config 2018-02-13 14:30:59 +01:00
Simon Pasquier
9d16fe8266 cmd: remove unused code 2018-02-13 14:20:54 +01:00