Commit Graph

25 Commits

Author SHA1 Message Date
Ted Zlatanov
b04e9ad19b #1346: move maintenance messages to DEBUG log level (#1347)
Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>
2018-04-30 11:56:17 +02:00
Simon Pasquier
2d68b4d318 silence: fix potential panic in decodeState()
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-04-10 10:12:05 +02:00
Simon Pasquier
1531aa66f3 Fix for #1282 (#1286)
* cluster: add alertmanager_cluster_messages_queued metric

* cluster: add metrics for sent messages

This change adds 2 new metrics:

- alertmanager_cluster_messages_sent_total
- alertmanager_cluster_messages_sent_size_total

* Fix marshaling for entries being broadcast

Individual notifications logs and silences being broadcast to the other
peers need to be encoded using the same length-delimited format as when
doing full-state synchronization.

* main: fix argument order for cluster.Join()

cluster.Join() was called with the push/pull and gossip interval
parameters being swapped one for another.
2018-03-22 13:53:00 +01:00
pasquier-s
c2dac90434 silence: fix skipped test (#1258)
TestStateMerge() was skipped because of a typo. Fixing the name revealed
that the test itself needed to be updated following the switch to the
memberlist library.
2018-02-27 10:17:48 +01:00
Fabian Reinartz
fd49dbb477 *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
pasquier-s
a7d4e4ea7c Log snapshot sizes on maintenance (#1155)
* Log snapshot sizes on maintenance

* Add metrics for snapshot sizes

This change adds 2 new gauges for tracking the last snapshots' sizes:

  - alertmanager_nflog_snapshot_size_bytes
  - alertmanager_silences_snapshot_size_bytes
2018-01-10 14:53:57 +01:00
Jose Donizetti
74808e40f3 Refactor silence constants (#1076)
* Refactor remove dups silence state constants

* Refactor to use const instead of string
2017-11-07 11:36:30 +01:00
Julius Volz
947970af44 Convert Alertmanager to use non-global go-kit loggers
Fixes https://github.com/prometheus/alertmanager/issues/1040
2017-10-22 00:20:40 -07:00
Corentin Chary
bff889b490 silence|alerts: add metrics about current silences and alerts
This adds metrics that look like this:
```
alertmanager_alerts{state="active"} 6
alertmanager_alerts{state="suppressed"} 0
alertmanager_silences{state="active"} 1
alertmanager_silences{state="expired"} 1
alertmanager_silences{state="pending"} 0
```

This can be used to monitor alertmanager's usage and validate that
alertmanagers in a mesh have a similar number of silences and alerts.
2017-10-02 13:33:29 +02:00
Jose Donizetti
9449bd1fa9 Ignore expired silences OnGossip (#999)
This will fix the bug of resync deleted silences
due to the state of other peers.
2017-09-28 10:25:35 +02:00
Corentin Chary
34d9524ab9 silences: avoid deadlock (#995)
* silences: avoid deadlock

Calling gossip.GossipBroadcast() will cause a deadlock if
there is a currently executing OnBroadcast* function.

See #982

* silence_test: better unit test to detect deadlocks
2017-09-27 11:48:28 +02:00
Corentin Chary
869a038a2b Add a mutex to silences.go:gossipData (#984)
This should fix silence/silence.go #982
2017-09-13 11:18:01 +02:00
Max Leonard Inden
08be6a4149
Expire pending silence and move to expired state
Instead of setting endsAt to startsAt we can set both to now. Thereby
the Silence will get the expired state by default.
2017-05-29 18:44:58 +02:00
Fabian Reinartz
f53974d5e6 silence: fix and test expiration behavior 2017-05-22 09:27:57 +02:00
Fabian Reinartz
d73a655bf4 Simplify silence modifications, add update endpoint (#796)
* Simplify silence modifications, add update endpoint

* vendor: add pkg/errors

* ui: Handle upserting of silences

.

* Regenerate bindata
2017-05-16 16:48:25 +02:00
Fabian Reinartz
b1486ca546 silence: move to gogoproto
This generates the protobuf Go code with gogoproto and switches to
standard library time types.
2017-04-18 12:47:42 +02:00
Fabian Reinartz
309c6af4b2
nflog: use alert set instead of hash for deduplication
Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
2017-04-13 15:13:47 +02:00
Fabian Reinartz
b6851a5421 silences: fix concurrent cache writes (#561)
This fixes #559 by removing concurrent map writes to the matcher cache.
The cache was guarded by the Silence's main lock, which only used a
read-lock on queries.
The cache's get methods lazily loads data into the cache and thus
causing concurrent writes.

We just change the main lock to always write-lock, as we don't expect
high lock contention at this point and would have it in a dedicated
cache lock anyway.
2016-11-21 11:09:49 +01:00
Fabian Reinartz
7517453c68 silence: add metrics 2016-09-29 09:54:34 +02:00
Frederic Branczyk
e72e45c8f1 silence: add cache for silence matchers
compiling regex silence matchers on every query is expensive, therefore
caching them as soon as they are gossiped through the mesh
2016-09-09 11:41:39 +02:00
Fabian Reinartz
b2461bb2d4 *: remove go-kit logging 2016-09-06 11:56:57 +02:00
Fabian Reinartz
8d88d9e05b Merge pull request #481 from prometheus/fabxc-meshsil
*: integrate new silence package
2016-08-30 16:53:34 +02:00
Fabian Reinartz
98101f3868 silence: fix doc strings 2016-08-30 14:19:22 +02:00
Fabian Reinartz
a4e8703567 *: integrate new silence package 2016-08-30 12:15:23 +02:00
Fabian Reinartz
ed3fdc747d silence: add protobuf-based silence package.
This commit adds an implementation of a silence storage that can
share store and modify silences, share state via a mesh network,
write and load snapshots, and be dynamically queried.
All data formats are based on protocol buffers.
2016-08-24 17:48:31 +02:00