Commit Graph

30 Commits

Author SHA1 Message Date
johncming
311650658a nflog: use errors.New instead of fmt.Errorf for no custom error msg. (#2045)
Signed-off-by: johncming <johncming@yahoo.com>
2019-09-25 10:31:01 +02:00
beorn7
318e006065 Mark some Summaries explicitly as having no objectives
With the next release of client_golang, Summaries will not have
objectives by default. Interestingly, this will do the right thing for
the Summaries affected by this commit. However, right now those
summaries do get the old default objectives. They don't really make
sense because the affected Summaries receive Observations quite
infrequently (far less than once in the 10m max age currently
used). To not get surprising changes when moving on to client_golang
v1, let's explicitly set the Summaries as objective-less now.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-12 15:47:56 +02:00
stuart nelson
2026e4a01f
[gossip] Don't merge expired gossip messages (#1631)
* [silences] Don't merge expired silences

If they're expired, they should be cleaned up on
the next GC cycle, but merging them in means that
they'll probably be gossip'd continually between
the cluster members.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Add analogous behavior+test for nflog

The code for nflog was also constantly re-adding
nflogs to the internal memory store, the same as
the silence code was.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Add retention to TestQuery

With the default 0 retention, the alerts would not
be merged.

Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>
2018-11-21 11:40:57 +01:00
stuart nelson
445fbdf1a8
gossip large messages via SendReliable (#1415)
* Gossip large messages via SendReliable

For messages beyond half of the maximum gossip
packet size, send the message to all peer nodes
via TCP.

The choice of "larger than half the max gossip
size" is relatively arbitrary. From brief testing,
the overhead from memberlist on a packet seemed to
only use ~3 of the available 1400 bytes, and most
gossip messages seem to be <<500 bytes.

* Add tests for oversized/normal message gossiping

* Make oversize metric names consistent

* Remove errant printf in test

* Correctly increment WaitGroup

* Add comment for OversizedMessage func

* Add metric for oversized messages dropped

Code was added to drop oversized messages if the
buffered channel they are sent on is full. This
is a good thing to surface as a metric.

* Add counter for total oversized messages sent

* Change full queue log level to debug

Was previously a warning, which isn't necessary
now that there is a metric tracking it.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-06-15 13:40:21 +02:00
stuart nelson
77cc718a81 [nflog] register snapshotSize
This metric was never registered.
2018-06-12 13:59:48 +02:00
stuart nelson
36588c3865
memberlist gossip (#1389)
* Peers further propagate newly received nflogs

If a peer receives an nflog that it hasn't seen
before, queue the message and propagate it further
to other peers. This should ensure that all
peers within a cluster receive all gossip
messages.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Set Retransmit value based on number of members

For alertmanagers that are brought up with a list
of peers, set the number of message retransmits to
be half of that number. If there are no peers on
start, or there are few, continue to use the
default value of 3.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* [nflog] Move retransmit calculation

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* [silence] further gossip silence messages

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Set GossipNodes to equal RetransmitMulti

During a gossip, we send messages to at most
GossipNodes nodes. If possible, we only a message
to reach all nodes as soon as possible.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Fix rebase

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-06-08 11:48:42 +02:00
Ted Zlatanov
b04e9ad19b #1346: move maintenance messages to DEBUG log level (#1347)
Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>
2018-04-30 11:56:17 +02:00
Simon Pasquier
a8c995f77c nflog: fix potential panic in decodeState()
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-04-10 10:11:40 +02:00
Simon Pasquier
1531aa66f3 Fix for #1282 (#1286)
* cluster: add alertmanager_cluster_messages_queued metric

* cluster: add metrics for sent messages

This change adds 2 new metrics:

- alertmanager_cluster_messages_sent_total
- alertmanager_cluster_messages_sent_size_total

* Fix marshaling for entries being broadcast

Individual notifications logs and silences being broadcast to the other
peers need to be encoded using the same length-delimited format as when
doing full-state synchronization.

* main: fix argument order for cluster.Join()

cluster.Join() was called with the push/pull and gossip interval
parameters being swapped one for another.
2018-03-22 13:53:00 +01:00
Fabian Reinartz
247bfff606 cluster: remove MergeSingle 2018-02-09 11:06:51 +01:00
Fabian Reinartz
fd49dbb477 *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
pasquier-s
9b10acae68 Don't notify resolved alerts if none were firing (#1198)
* Don't notify resolved alerts if none were firing

* Fix comments
2018-01-18 11:12:17 +01:00
pasquier-s
a7d4e4ea7c Log snapshot sizes on maintenance (#1155)
* Log snapshot sizes on maintenance

* Add metrics for snapshot sizes

This change adds 2 new gauges for tracking the last snapshots' sizes:

  - alertmanager_nflog_snapshot_size_bytes
  - alertmanager_silences_snapshot_size_bytes
2018-01-10 14:53:57 +01:00
Frederic Branczyk
bfdff67138 nflog: Copy and replace gossipData instead of modifying it in place (#1121) 2017-12-09 15:22:07 +01:00
Julius Volz
fc984941ee nflog: Fix Log() crash when gossip is nil (#1064) 2017-11-01 10:34:40 +01:00
Julius Volz
947970af44 Convert Alertmanager to use non-global go-kit loggers
Fixes https://github.com/prometheus/alertmanager/issues/1040
2017-10-22 00:20:40 -07:00
Fabian Reinartz
3269bc39e1 *: switch group key to matcher serialization
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
2017-04-21 12:06:23 +02:00
Fabian Reinartz
4258b028d6 nflog: switch to gogoproto
This switches the nflog to generate Go code via gogoproto and thereby
use standard library timestamp types.
2017-04-18 10:03:57 +02:00
Fabian Reinartz
309c6af4b2
nflog: use alert set instead of hash for deduplication
Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
2017-04-13 15:13:47 +02:00
Fabian Reinartz
1e01b2bdba nflog: add metrics (#518) 2016-11-21 15:22:35 +01:00
Fabian Reinartz
b2461bb2d4 *: remove go-kit logging 2016-09-06 11:56:57 +02:00
Fabian Reinartz
d6713c8eeb nflog: enable sharing log via gossip 2016-08-19 12:20:04 +02:00
Fabian Reinartz
5dc8286942 nflog: fix maintenance termination 2016-08-19 12:01:16 +02:00
Fabian Reinartz
72fdf3d3ab *: integrate nflog
This commit replaces the previous NotifyInfo provider with the new
nflog package. It needs adjustments in the behavior of the deduping
stage.
The nflog stores notification digests per receiver per alert aggregation
group rather than one entry for alert per receiver. This drastically
reduces the number of entries and removes interference
across aggregation groups.
2016-08-18 15:52:28 +02:00
Fabian Reinartz
a42a473213 nflog: add doc comments, license headers 2016-08-18 14:08:01 +02:00
Fabian Reinartz
4a5df40539 nflog: add logging 2016-08-16 16:33:17 +02:00
Fabian Reinartz
48b6c8ff70 nflog: add initial tests 2016-08-16 11:11:48 +02:00
Fabian Reinartz
086d581cf8 nflog: add gc/snapshotting maintenance, remove delete
This removes the Delete function from the interface as the log
should be append-only and only be reduced by expired entries.
This also adds an argument to configure a background processing routine,
which periodically garbage collects and snapshots.
2016-08-16 11:11:48 +02:00
Fabian Reinartz
80afd502d5 nflog: add mesh gossip support 2016-08-16 11:11:48 +02:00
Fabian Reinartz
3d8e60ded7 nflog: add notification log package
This adds a new nflog package meant to replace provider.Notifies. It
has a central protobuf type package, which is also meant for usage for
other packages and the API.
The generated Go types are also the in-memory representation.
2016-08-16 11:11:48 +02:00