alertmanager/cluster
Max Inden 2c7c5b6f4e
cluster: Do not exit when failing to join cluster (#1465)
Alertmanager is exiting with a non-zero exit code if the initial cluster
join fails. This behavior could be not wanted because:

- As Alertmanager is a critical component with an at-least-once
guarantee, failing on joining the cluster is unnecessary as
Alertmanager still functions by itself.

- In an environment like Kubernetes discovering peers via DNS, peers
might roll out one-by-one, leaving the DNS entries unpopulated for the
first peer of a set. Failing on initial join prevents a roll-out.

Instead of failing on the initial join this patch only logs the failure.
The cluster can be later joined via the `handleReconnect`.

This is a regression introduced in PR #1456 [1].

[1] https://github.com/prometheus/alertmanager/pull/1456

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-11 17:20:58 +02:00
..
clusterpb *: move to memberlist for clustering 2018-02-08 12:18:44 +01:00
advertise.go cluster: fail when no private address can be found (#1437) 2018-07-10 22:01:40 +02:00
advertise_test.go cluster: fail when no private address can be found (#1437) 2018-07-10 22:01:40 +02:00
channel.go gossip large messages via SendReliable (#1415) 2018-06-15 13:40:21 +02:00
channel_test.go gossip large messages via SendReliable (#1415) 2018-06-15 13:40:21 +02:00
cluster.go cluster: Do not exit when failing to join cluster (#1465) 2018-07-11 17:20:58 +02:00
cluster_test.go cluster: make sure we don't miss the first pushPull (#1456) 2018-07-10 22:02:57 +02:00
delegate.go cluster: make sure we don't miss the first pushPull (#1456) 2018-07-10 22:02:57 +02:00