alertmanager/cluster
Max Inden 3735df3ac7
cluster: Do not exit when failing to join cluster (#1465)
Alertmanager is exiting with a non-zero exit code if the initial cluster
join fails. This behavior could be not wanted because:

- As Alertmanager is a critical component with an at-least-once
guarantee, failing on joining the cluster is unnecessary as
Alertmanager still functions by itself.

- In an environment like Kubernetes discovering peers via DNS, peers
might roll out one-by-one, leaving the DNS entries unpopulated for the
first peer of a set. Failing on initial join prevents a roll-out.

Instead of failing on the initial join this patch only logs the failure.
The cluster can be later joined via the `handleReconnect`.

This is a regression introduced in PR #1456 [1].

[1] https://github.com/prometheus/alertmanager/pull/1456

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-11 17:19:33 +02:00
..
clusterpb *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
advertise.go cluster: fail when no private address can be found (#1437) 2018-07-05 22:59:56 +02:00
advertise_test.go cluster: fail when no private address can be found (#1437) 2018-07-05 22:59:56 +02:00
channel.go gossip large messages via SendReliable (#1415) 2018-06-15 13:40:21 +02:00
channel_test.go gossip large messages via SendReliable (#1415) 2018-06-15 13:40:21 +02:00
cluster.go cluster: Do not exit when failing to join cluster (#1465) 2018-07-11 17:19:33 +02:00
cluster_test.go cluster: make sure we don't miss the first pushPull (#1456) 2018-07-09 11:16:04 +02:00
delegate.go cluster: make sure we don't miss the first pushPull (#1456) 2018-07-09 11:16:04 +02:00