Commit Graph

1593 Commits

Author SHA1 Message Date
Adam Shannon
bf0db5b989 cli: print more config details (#1376)
Example output:

$ amtool check-config alertmanager.yaml
Checking 'alertmanager.yaml'  SUCCESS
Found:
 - global config
 - route
 - 0 inhibit rules
 - 13 receivers
 - 0 templates

Signed-off-by: Adam Shannon <adamkshannon@gmail.com>
2018-05-15 09:17:51 +02:00
Alex Lardschneider
1f9a7b6182 [Request] Add Slack actions to notifications (#1355)
* Added slack actions to notifications

Signed-off-by: Alex Lardschneider <alex.lardschneider@gmail.com>
2018-05-14 17:26:11 +02:00
Simon Pasquier
292256ca7f vendor: remove unused packages (#1380)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-14 16:23:48 +02:00
rhysm
e4416bd612 Add additional cluster configuration flags (#1379)
The cluster configuration uses DefaultLANConfig which seems
to be quite sensitive to WAN conditions. Allowing the tuning of these 3
parameters (TCP Timeout, Probe Interval and Probe Timeout) makes
clustering more robust across WAN connections.

Signed-off-by: Rhys Meaclem <rhysmeaclem@gmail.com>
2018-05-14 09:22:04 +02:00
stuart nelson
942be9d993
cli alert query: Expose --active and --unprocessed (#1370)
* cli alert query: Expose --active and --unprocessed

Support the new filter options in the alerts api
endpoint introduced by https://github.com/prometheus/alertmanager/pull/1366

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Update comment and client_test

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-05-09 10:57:01 +02:00
Simon Pasquier
02f10f204f circleci: fix docker push command (#1371)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-08 11:41:25 +02:00
Simon Pasquier
28967e394e config: fix Go formatting (#1368)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-07 18:12:14 +02:00
Simon Pasquier
75900ea62a api: remove dead code (#1367)
This is a follow-up of f825d97de4.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-07 18:11:36 +02:00
Simon Pasquier
383024e63d api: support more query filters (#1366)
* api: support more query filters

This change adds 2 new query filters to the /api/v1/alerts endpoint.

- active, filter out active alerts when set to 'false' (default: 'true').
- unprocessed, filter out unprocessed alerts when set to 'false'
 (default: 'true').

The default values ensure that the API behavior remains the same as
before when the query filters aren't provided.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* api: address comments

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-07 18:07:19 +02:00
Max Inden
05fb09aebd
Merge pull request #1362 from mxinden/deprecate-v0-alerts
api: Deprecate `api/alerts` endpoint
2018-05-05 13:43:54 +02:00
stuart nelson
1c0c24b300
Update alerts argument order, rename expired to inhibited (#1360)
Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-05-04 10:43:38 +02:00
Max Leonard Inden
f825d97de4
api: Deprecate api/alerts endpoint
With prometheus/prometheus commit
e114ce0ff7a1ae06b24fdc479ffc7422074c1ebe [1] Prometheus switches from
using `api/alerts` to `api/v1/alerts`. This commit is included starting
from Prometheus v0.17.0. As discussed on the prometheus-developers
mailing list [2] the deprecation period is long over.

[1] github.com/prometheus/prometheus/commit/e114ce0ff7a1ae06b24fdc479ffc7422074c1ebe
[2]
https://groups.google.com/d/msg/prometheus-developers/2CCuFTMbmAg/Qg58rvyzAQAJ

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-05-04 09:59:14 +02:00
Simon Pasquier
998984d8d6 Update CircleCI build (#1354)
This change upgrades the build configuration to CircleCI 2.0.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-05-03 09:36:23 +02:00
RogerYuQian
8a0faa9946 fix wechat issue (#1353) (#1356) 2018-05-03 09:32:09 +02:00
Simon Pasquier
b3cc6229a2 notify: remove wechat unit test (#1350)
The unit test was making a request to the public Wechat endpoint which
caused flaky results.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-04-30 20:01:38 +02:00
Ted Zlatanov
b04e9ad19b #1346: move maintenance messages to DEBUG log level (#1347)
Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>
2018-04-30 11:56:17 +02:00
Trevor Wood
cecfe5b2f5 Validate Slack field config and only allow the necessary input (#1334)
Signed-off-by: Trevor Wood <Trevor.G.Wood@gmail.com>
2018-04-25 18:58:11 +02:00
stuart nelson
cfde256913 [amtool] fix silence import --help format 2018-04-24 11:46:24 +02:00
Simon Pasquier
dc5fc02d22 [amtool] use kingpin.v2 (#1330)
* Use default values to store values from config
* fix typo and reserved keywork
* move to long help texts
* add one more unit test for resolver
* update comments

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-04-24 09:35:15 +02:00
Ted Zlatanov
af5dd74264 tell users opening issues to use alertmanager --version (#1327)
`-version` doesn't work as of 0.15-rc.1 so users should run `alertmanager  --version`

Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>
2018-04-23 18:02:06 +02:00
stuart nelson
bc263d3e61
Improve notification instrumentation (#1335)
* Improve notification instrumentation

- Add notificationLatencySeconds histogram to
debug duplicate messages. This can help rule out
if duplicate messages are being caused by
excessive latency when sending a notification.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-04-23 14:23:01 +02:00
stuart nelson
80f2eeb2ca
Fix resolved alerts still inhibiting (#1331)
* inhibit: update inhibition cache when alerts resolve

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: remove unnecessary fmt.Sprintf

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: add unit tests

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* inhibit: use NopLogger in tests

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Update old alert with result of merge with new

On ingest, alerts with matching fingerprints are
merged if the new alert's start and end times
overlap with the old alert's.

The merge creates a new alert, which is then
updated in the internal alert store.

The original alert is not updated (because merge
creates a copy), so it is never marked as resolved
in the inhibitor's reference to it.

The code within the inhibitor relies on skipping
over resolved alerts, but because the old alert is
never updated it is never marked as resolved. Thus
it continues to inhibit other alerts until it is
cleaned up by the internal GC.

This commit updates the struct of the old alert
with the result of the merge with the new alert.

An alternative would be to always update the
inhibitor's internal cache of alerts regardless of
an alert's resolve status.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Update inhibitor cache even if alert is resolved

This seems like a better choice than the previous
commit. I think it is more sane to have the
inhibitor update its own cache, rather than having
one of its pointers updated externally.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-04-18 16:26:04 +02:00
Manos Fokas
300a87e85b Removed file changes to resolve conflict. (#1318)
Signed-off-by: manosf <manosf@protonmail.com>
2018-04-17 16:22:46 +02:00
stuart nelson
e7bc6e2935
Move amtool to modular structure (#1321)
* Move amtool to modular structure

Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>

* Move toplevel setup back into root.go

Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>

* Remove confusing alert struct name overwriting

A local variable within the alert subcommand was
using the name of the struct within that file.

Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>

* change local var name shadowing struct name

Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>
2018-04-13 13:34:16 +02:00
stuart nelson
360dba6d9a
Rename silence API Delete() -> Expire() (#1319)
Within alertmanager, expire is the term used,
since silences still "exist" but aren't in effect.

Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>
2018-04-11 12:30:18 +02:00
Simon Pasquier
c92ed69ce8 Split cli package (#1314)
* cli: move commands to cli/cmd

* cli: use StatusAPI interface for config command

* cli: use SilenceAPI interface for silence commands

* cli: use AlertAPI for alert command

* cli: move back commands to cli package

And move API client code to its own package.

* cli: remove unused structs
2018-04-11 11:17:41 +02:00
Max Inden
510e67ef18
Merge pull request #1316 from simonpasquier/fix-decode-state
Fix potential panic in decodeState()
2018-04-10 18:51:57 +02:00
Max Inden
a9b7026bc2
Merge pull request #1317 from simonpasquier/go-fmt
gofmt code
2018-04-10 13:04:17 +02:00
Simon Pasquier
d0b664b618 cluster: gofmt code
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-04-10 12:06:23 +02:00
Simon Pasquier
2d68b4d318 silence: fix potential panic in decodeState()
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-04-10 10:12:05 +02:00
Simon Pasquier
a8c995f77c nflog: fix potential panic in decodeState()
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-04-10 10:11:40 +02:00
stuart nelson
b1625a08a0
Provide default working config for artifacts (#1313) 2018-04-05 16:25:45 +02:00
Simon Pasquier
f53b24765d api: initialize alerts_received_total labels (#1310) 2018-04-04 10:38:17 +02:00
Simon Pasquier
cb169a5ec6 parse: fix missing argument to fmt.Errorf (#1311) 2018-04-04 10:37:35 +02:00
Simon Pasquier
4cba49155d dispatch: don't reset timer if flush is in-progress (#1301)
When the aggregation group receives an alert that is past the initial
group_wait value, it should reset its timer only if the timer has ever
expired. Otherwise it means that the flush is already in-progress.
2018-03-29 12:22:49 +02:00
stuart nelson
19715022a4
[amtool] update silence add and update flags (#1298)
* Update silence add/update flags

- Change --expires/-e to --duration/-d
- Change --expires-on to --end
- Add --start

* update subcommand returns ID of new silence

The silences printed before were accurate, except
they had the old ID. Now the new ID is returned.

* Duration is added to silence.StartsAt

When a user supplies a duration to update a
silence, it is applied to silence.StartsAt after
any potential changes to the silence's start time.
2018-03-29 12:11:31 +02:00
Simon Pasquier
0c086e3b12 cli: extract client bindings of the v1 API (#1278)
* cli: extract client bindings of the v1 API from amtool

This is a continuation of [1] but the code is kept in the alertmanage
repository rather than having it in client_golang.

[1] https://github.com/prometheus/client_golang/pull/333

Co-Authored-By: Fabian Reinartz <fab.reinartz@gmail.com>
Co-Authored-By: Tristan Colgate <tcolgate@gmail.com>
Co-Authored-By: Corin Lawson <corin@responsight.com>
Co-Authored-By: stuart nelson <stuartnelson3@gmail.com>

* cli: fix httpSilenceAPI.Set() method

* vendor: remove github.com/prometheus/client_golang/api/alertmanager

* cli: don't use the model.Alert type
2018-03-28 19:19:04 +02:00
Simon Pasquier
b95b32821f ui: replace deprecated InstrumentHandler() (#1302)
This change replaces the deprecated InstrumentHandler function by
the equivalent functions from the promhttp package.

The following metrics are removed:

* http_request_duration_microseconds (Summary).
* http_request_size_bytes (Summary).
* http_requests_total (Counter).

And the following metrics are added instead:

* alertmanager_http_request_duration_seconds (Histogram).
* alertmanager_http_response_size_bytes (Histogram).
* promhttp_metric_handler_requests_in_flight (Gauge).
* promhttp_metric_handler_requests_total (Counter).
2018-03-28 15:28:38 +02:00
stuart nelson
acb111e812
0.15.0-rc.1 (#1296) 2018-03-23 13:59:49 +01:00
Ted Zlatanov
099b6a1d43 Sort dispatched alerts by job+instance then rest by default (#1178) (#1234) 2018-03-22 20:06:37 +01:00
Simon Pasquier
1531aa66f3 Fix for #1282 (#1286)
* cluster: add alertmanager_cluster_messages_queued metric

* cluster: add metrics for sent messages

This change adds 2 new metrics:

- alertmanager_cluster_messages_sent_total
- alertmanager_cluster_messages_sent_size_total

* Fix marshaling for entries being broadcast

Individual notifications logs and silences being broadcast to the other
peers need to be encoded using the same length-delimited format as when
doing full-state synchronization.

* main: fix argument order for cluster.Join()

cluster.Join() was called with the push/pull and gossip interval
parameters being swapped one for another.
2018-03-22 13:53:00 +01:00
stuart nelson
a578319008
Merge pull request #1289 from prometheus/allow-empty-matchers
Allow empty matchers
2018-03-21 14:12:16 +01:00
Stuart Nelson
319687ab3c Re-simplify match filters fn 2018-03-20 16:11:01 +01:00
Stuart Nelson
479e5c52ac Update package prometheus/pkg/labels 2018-03-20 16:10:16 +01:00
ranbochen
b4048f46bc fix wechat issue (#1293) 2018-03-20 12:21:19 +01:00
Stuart Nelson
0c026b4387 Remove empty alert labels on ingest
The same behavior exists in prometheus. This is a
bit superfluous, but in the event people are using
old versions of prometheus or a different metric
gathering system, it's still valid to check.
2018-03-20 12:06:34 +01:00
Stuart Nelson
85caf29316 Cleanup frontend makefile 2018-03-20 11:54:44 +01:00
Stuart Nelson
4c98f4b4a9 Fix matchLabels logic 2018-03-20 11:47:53 +01:00
Stuart Nelson
5ddf0444c4 Update bindata 2018-03-20 10:48:12 +01:00
Stuart Nelson
c300cd9f8d Remove non-empty string validation in frontend 2018-03-20 10:46:03 +01:00