alertmanager/cmd
stuart nelson db4af95ea0
memberlist reconnect (#1384)
* initial impl

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Add reconnectTimeout

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Fix locking

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Remove unused PeerStatuses

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Add metrics

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Actually use peerJoinCounter

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Cleanup peers map on peer timeout

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Add reconnect test

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* test removing failed peers

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Use peer address as map key

If a peer is restarted, it will rejoin with the
same IP but different ULID. So the node will
rejoin the cluster, but its peers will never
remove it from their internal list of failed nodes
because its ULID has changed.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Add failed peers from creation

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Remove warnIfAlone()

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Update metric names

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>

* Address comments

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-06-05 14:28:49 +02:00
..
alertmanager memberlist reconnect (#1384) 2018-06-05 14:28:49 +02:00
amtool *: add missing license headers 2018-05-14 17:37:13 +02:00