Commit Graph

68 Commits

Author SHA1 Message Date
George Robinson 342f6a599c
Add godot linter (#3613)
* Add godot linter

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Remove extra line from LICENSE

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-03-21 11:26:46 +00:00
George Krajcsovits d85bef20d9
feature: add native histogram support to latency metrics (#3737)
Note that this does not stop showing classic metrics, for now
it is up to the scrape config to decide whether to keep those instead or
both.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-02-29 14:53:47 +00:00
gotjosh d352d16e27
Fix flaky test TestClusterJoinAndReconnect/TestTLSConnection (#3722)
wait until `p2.Status()` returns because it blocks until we're ready - that way, we're guaranteed to know that the cluster size is 2.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-02-14 11:18:28 +00:00
Matthieu MOREL b9e347b9d1 golangci-lint: enable testifylint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-10 08:50:03 +00:00
Matthieu MOREL b81bad8711 use Go standard errors
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-08 16:44:13 +01:00
Alexander Weaver fdea7e731c
Isolate react-app package (#3589)
* Isolate react-app package

Signed-off-by: Alex Weaver <weaver.alex.d@gmail.com>

---------

Signed-off-by: Alex Weaver <weaver.alex.d@gmail.com>
2023-11-03 14:50:06 +00:00
rongyi b22dc1d5e0
if 9093/9094 port is in use, test case will fail (#3320)
* Update test

Signed-off-by: rongyi <rongyi@onchain.com>

* Change port to uint16

Signed-off-by: rongyi <rongyi@onchain.com>

* Update testcase

Signed-off-by: rongyi <rongyi@onchain.com>

* make testcase pass

Signed-off-by: rongyi <rongyi@onchain.com>

---------

Signed-off-by: rongyi <rongyi@onchain.com>
2023-08-07 12:31:05 +01:00
Simon Pasquier aea6204d58 cluster: fix panic when `tls_client_config` is empty
Closes #3403

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2023-08-04 14:29:06 +02:00
Jean-Philippe Quéméner 9de8ef3675
Cluster: Add memberlist label configuration option (#3354)
* Cluster: Add memberlist label configuration option

Signed-off-by: Jean-Philippe Quémémer <jeanphilippe.quemener@grafana.com>

---------

Signed-off-by: Jean-Philippe Quémémer <jeanphilippe.quemener@grafana.com>
Signed-off-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2023-05-05 17:26:22 +01:00
PrometheusBot 87ad8437fc
Synchronize common files from prometheus/prometheus (#3191)
* Update common Prometheus files

* cluster: fix formatting

Signed-off-by: prombot <prometheus-team@googlegroups.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
2022-12-23 10:59:32 +01:00
arukiidou c2b2defe48
bump:hashicorp/golang-lru to v2,aws-sdk-go,prometheus/common (#3182)
* bump:hashicorp/golang-lru to v2,aws-sdk-go,prometheus/common

Signed-off-by: junya koyama <arukiidou@yahoo.co.jp>
2022-12-20 17:21:12 +01:00
Simon Pasquier b9e5f08fde Update code to match changes in exporter-toolkit
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2022-11-15 16:15:05 +01:00
inosato 791e542100 Remove ioutil
Signed-off-by: inosato <si17_21@yahoo.co.jp>
2022-07-18 22:01:02 +09:00
Matthias Loibl a6d10bd5bc
Update golangci-lint and fix complaints (#2853)
* Copy latest golangci-lint files from Prometheus

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use grafana/regexp over stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix typos in comments

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Fix goimports complains in import sorting

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* gofumpt all Go files

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Update naming to comply with revive linter

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* test/cli: Fix error messages to be lower case

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* .golangci.yaml: Remove obsolete space

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* config: Fix expected victorOps error

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Use stdlib regexp

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Clean up Go modules

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2022-03-25 17:59:51 +01:00
Devin Trejo fad796931b
Add feature flag to enable discovery and use of public IPaddr for clustering. (#2719)
* Add feature flag to enable discovery and use of public IPaddr for clustering.

Before this change, Alertmanager would refuse to startup if using a
advertise address binding to any address (0.0.0.0), and the host only
had an interface with a public IP address. After this change we feature
flag permitting the use of a discovered public address for cluster
gossiping.

Signed-off-by: Devin Trejo <dtrejo@palantir.com>
2021-11-10 17:40:48 +01:00
Dustin Hooten ff85bec45b
Secure cluster traffic via mutual TLS (#2237)
* Add TLS option to gossip cluster

Co-authored-by: Sharad Gaur <sharadgaur@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* generate new certs that expire in 100 years

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Fix tls_connection attributes

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Improve error message

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Fix tls client config docs

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Add capacity arg to message buffer

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* fix formatting

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Update version; add version validation

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* use lru cache for connection pool

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* lock reading from the connection

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* when extracting net.Conn from tlsConn, lock and throw away wrapper

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Add mutex to connection pool to protect cache

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* fix linting

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

Co-authored-by: Sharad Gaur <sharadgaur@gmail.com>
2021-08-09 14:58:06 -06:00
Julien Pivotto 3a9808c3f7
Fix main tests (#2670)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-08-04 16:13:51 +02:00
Julien Pivotto 20a1f8fd3f
Merge pull request #2433 from sylr/fix-test
Fix test not waiting for cluster member to be ready
2021-08-04 13:57:26 +02:00
Julien Pivotto b2a4cacb95 Update go dependencies & switch to go-kit/log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-08-02 12:43:23 +02:00
gotjosh 473cf4e8f7
Remove whitespace
Signed-off-by: gotjosh <josue@grafana.com>
2021-04-13 14:07:44 +01:00
gotjosh d0406d6295
Clustering: Fix unsynchronised access
Signed-off-by: gotjosh <josue@grafana.com>
2021-04-13 14:03:43 +01:00
Steve Simpson 1711e72d1b Clustering: Change WaitReady to accept a Context.
WaitReady is a blocking call and so should accept a Context in order to
be responsive to cancellation of the notification pipeline for any reason.

Signed-off-by: Steve Simpson <steve.simpson@grafana.com>
2021-03-10 09:18:39 +01:00
gotjosh bbd01285b3
Address review feedback
Signed-off-by: gotjosh <josue@grafana.com>
2021-03-03 15:00:03 +00:00
gotjosh 6341b9fe0d
Use a private attribute for the memberlist.Node
Signed-off-by: gotjosh <josue@grafana.com>
2021-03-02 15:50:53 +00:00
gotjosh f3a4f77021
Use an interface for the Channel too
Signed-off-by: gotjosh <josue@grafana.com>
2021-02-25 16:01:00 +00:00
gotjosh 40279b7fa4
gofmt
Signed-off-by: gotjosh <josue@grafana.com>
2021-02-24 15:38:13 +00:00
gotjosh eb3048f2df
Address review comments
Signed-off-by: gotjosh <josue@grafana.com>
2021-02-24 15:35:24 +00:00
gotjosh 9a2ae39430
Clustering: Interface for Peers in other packages
A Peer as defined by the `cluster` package represents the node in the
cluster. It is used in other packages to know the status of all of the
members or how long should we wait to know if a notification has already fired.

In Cortex, we'd like to implement a slightly different way of
clustering (using gRPC for communication and a
hash ring for node discovery).

This is a small change to support that by changing the consumer of other
packages to an interface.

Silences and Notification channels don't need an interface as they take
a `func([]byte) error` as a parameter.

Signed-off-by: gotjosh <josue@grafana.com>
2021-02-19 19:07:41 +00:00
Simon Pasquier 23a7f89398
Update github.com/gogo/protobuf to v1.3.2 (#2478)
Fix for CVE-2021-3121

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2021-02-09 16:49:07 +01:00
gotjosh e6a1bede89
Make MaxGossipPacketSize public (#2475)
Downstream implementations might want to configure it.

Signed-off-by: gotjosh <josue@grafana.com>
2021-02-05 18:06:47 +01:00
Sylvain Rabot f4c7eb54aa
Fix test not waiting for cluster member to be ready
Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
2020-12-10 16:16:54 +01:00
Simon Pasquier de80d907d1
cluster: log error on reconnect failures (#2260)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-05-18 15:00:51 +02:00
Julien Pivotto 013177e2d0
Update dependencies (#2257)
Update membership

Update common (support HTTP/2 client)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-18 15:00:36 +02:00
Sylvain Rabot 21e99dcb63 Fix TestClusterJoinAndReconnect on macos (#2110)
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
2019-11-21 14:17:24 +01:00
Pger-Y f76fec1fd9 cluster: change lock from Read lock to Write Lock since function modifies the struct... (#2109)
Signed-off-by: John Smith <megman5576@gmail.com>
2019-11-21 14:15:58 +01:00
Kien Nguyen-Tuan ca3893058c Consolidate invalid address log (#2063)
The Invalid listen address errors should be as
same as Invalid advertise address.

Signed-off-by: Kien Nguyen <kiennt2609@gmail.com>
2019-10-11 14:05:06 +02:00
Ganesh Vernekar 3207e8b300
Vendor prometheus 2.12.0 (#2008)
* Vendor prometheus 2.12.0

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Update protos

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-08-22 15:34:38 +05:30
Simon Pasquier 019ace3298
cluster: add more metrics (#1941)
- alertmanager_cluster_alive_messages_total, total number of alive
messages received.
- alertmanager_cluster_peer_info, a constant metric labeled by peer name.
- alertmanager_cluster_pings_seconds, histogram of latencies for ping
messages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-01 10:24:41 +02:00
Simon Pasquier f32ad1dd8b *: enable default linters (#1861)
* *: enable default linters

* Remove direct usage of errcheck

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-29 10:54:40 +02:00
Simon Pasquier 6592692907 cluster: reduce code duplication
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-17 16:20:41 +02:00
xuyasong 60c9bf49c2 fix golint error
Signed-off-by: xuyasong <1154564309@qq.com>
2019-03-15 18:10:36 +08:00
Karsten Weiss c637ca1a6e Fix typos in comments and metric HELPs (#1790)
No functional change.

Signed-off-by: Karsten Weiss <knweiss@gmail.com>
2019-03-12 10:29:26 +01:00
Simon Pasquier c7de536129
*: use stdlib context (#1768)
This changes removes all usage of golang.org/x/net/context in the code
base. It also bumps a few dependencies for the same reason:
- github.com/gogo/protobuf
- go-openapi/*

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 12:18:57 +01:00
Ye Ben 5f8eaf9560 cluster/delegate: Replace labels to const to reduce hardcode (#1724)
Signed-off-by: yeya24 <ben.ye@daocloud.io>
2019-01-28 10:17:55 +01:00
JoeWrightss 9ccbeb585b cluster: Fix typo in comment (#1668)
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-16 14:03:55 +01:00
Povilas Versockas 7f34cb4716 cluster: Add cluster peers DNS refresh job (#1428)
Adds a job which runs periodically and refreshes cluster.peer dns records.

The problem is that when you restart all of the alertmanager instances in an environment like Kubernetes, DNS may contain old alertmanager instance IPs, but on startup (when Join() happens) none of the new instance IPs. As at the start DNS is not empty resolvePeers waitIfEmpty=true, will return and "islands" of 1 alertmanager instances will form.

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2018-11-23 09:47:13 +01:00
Simon Pasquier 13d71e58fa cluster: skip tests when no private ip address exists (#1470)
The memberlist library will fail to setup the cluster when the machine
has no private IP address.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-22 17:40:07 +02:00
Max Inden 3735df3ac7
cluster: Do not exit when failing to join cluster (#1465)
Alertmanager is exiting with a non-zero exit code if the initial cluster
join fails. This behavior could be not wanted because:

- As Alertmanager is a critical component with an at-least-once
guarantee, failing on joining the cluster is unnecessary as
Alertmanager still functions by itself.

- In an environment like Kubernetes discovering peers via DNS, peers
might roll out one-by-one, leaving the DNS entries unpopulated for the
first peer of a set. Failing on initial join prevents a roll-out.

Instead of failing on the initial join this patch only logs the failure.
The cluster can be later joined via the `handleReconnect`.

This is a regression introduced in PR #1456 [1].

[1] https://github.com/prometheus/alertmanager/pull/1456

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-11 17:19:33 +02:00
Corentin Chary 42ea9a565b cluster: make sure we don't miss the first pushPull (#1456)
* cluster: make sure we don't miss the first pushPull

During the join, memberlist initiates a pushPull to get initial data.
Unfortunately, at this point the nflog and silence listener have not
been registered yet, so the first data arrives only after one pushPull
cycle (1min by default !).

Signed-off-by: Corentin Chary <c.chary@criteo.com>
2018-07-09 11:16:04 +02:00
Simon Pasquier f5a258dd1d cluster: fail when no private address can be found (#1437)
The memberlist library fails when it can't find a private address and no
advertise address is given. To return a helpful message to the user,
AlertManager mimics the logic from memberlist. However the code had a
bug that swallowed the error message and made it difficult for the user
to understand how to fix the problem.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-05 22:59:56 +02:00