alertmanager

Commit Graph

Author	SHA1	Message	Date
Simon Pasquier	0ebaeccd4b	*: add missing license headers Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-05-14 17:37:13 +02:00
rhysm	e4416bd612	Add additional cluster configuration flags (#1379 ) The cluster configuration uses DefaultLANConfig which seems to be quite sensitive to WAN conditions. Allowing the tuning of these 3 parameters (TCP Timeout, Probe Interval and Probe Timeout) makes clustering more robust across WAN connections. Signed-off-by: Rhys Meaclem <rhysmeaclem@gmail.com>	2018-05-14 09:22:04 +02:00
Simon Pasquier	d0b664b618	cluster: gofmt code Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-04-10 12:06:23 +02:00
Simon Pasquier	1531aa66f3	Fix for #1282 (#1286 ) * cluster: add alertmanager_cluster_messages_queued metric * cluster: add metrics for sent messages This change adds 2 new metrics: - alertmanager_cluster_messages_sent_total - alertmanager_cluster_messages_sent_size_total * Fix marshaling for entries being broadcast Individual notifications logs and silences being broadcast to the other peers need to be encoded using the same length-delimited format as when doing full-state synchronization. * main: fix argument order for cluster.Join() cluster.Join() was called with the push/pull and gossip interval parameters being swapped one for another.	2018-03-22 13:53:00 +01:00
Corentin Chary	dd75201f1c	Add /-/ready based on mesh status (#1209 ) * Wait for the gossip to settle before sending notifications See #1209 for details. As an heuristic for mesh readyness, try to see if the mesh looks stable (the number of peers isn't changing too much). This implementation always mark the altermanager as ready after a maximum of 60s. This adds one new flags to control this behavior: ``` --cluster.settle-timeout=60s mesh settling timeout. Do not wait more than this duration on startup. ``` It also adds `/-/ready` which always return 200 (in order to make it clear that we are ready as soon as we can receive requests). The mesh status is exposed in `/api/v1/status` and visible on `/#/status`. * cluster: fix typos and base interval on gossipInterval	2018-03-02 15:45:21 +01:00
pasquier-s	3df093968c	cluster: gather alertmanager_peer_position all the time (#1247 ) * cluster: gather alertmanager_peer_position all the time This change moves the gathering of the alertmanager_peer_position metric outside of the clusterWait() function so that the metric is computed accurately even when no alerting group fires. * cluster: add alertmanager_cluster_health_score metric This metric is retrieved from the memberlist library.	2018-02-27 10:37:56 +01:00
Simon Pasquier	f4c81c43e9	cluster: pass resolved peers to Join()	2018-02-13 16:53:09 +01:00
Fabian Reinartz	3f2e00fbea	cluster/api: improve metrics and cluster status	2018-02-09 11:16:00 +01:00
Fabian Reinartz	247bfff606	cluster: remove MergeSingle	2018-02-09 11:06:51 +01:00
Fabian Reinartz	fd49dbb477	*: move to memberlist for clustering	2018-02-08 12:18:44 +01:00

10 Commits