alertmanager

Commit Graph

Author	SHA1	Message	Date
George Robinson	342f6a599c	Add godot linter (#3613 ) * Add godot linter Signed-off-by: George Robinson <george.robinson@grafana.com> * Remove extra line from LICENSE Signed-off-by: George Robinson <george.robinson@grafana.com> --------- Signed-off-by: George Robinson <george.robinson@grafana.com>	2024-03-21 11:26:46 +00:00
George Krajcsovits	d85bef20d9	feature: add native histogram support to latency metrics (#3737 ) Note that this does not stop showing classic metrics, for now it is up to the scrape config to decide whether to keep those instead or both. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2024-02-29 14:53:47 +00:00
Matthieu MOREL	b9e347b9d1	golangci-lint: enable testifylint linter Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-12-10 08:50:03 +00:00
Matthieu MOREL	b81bad8711	use Go standard errors Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-12-08 16:44:13 +01:00
gotjosh	28925efbd8	Metrics: Notification log maintenance success and failure (#3286 ) * Metrics: Notification log maintenance success and failure Due to various reasons, we've observed different kind of errors on this area. From read-only disks to silly code bugs. Errors during maintenance are effectively a data loss and therefore, we should encourage proper monitoring of this area. Similar to #3285 --------- Signed-off-by: gotjosh <josue.abreu@gmail.com>	2023-03-08 10:29:05 +00:00
gotjosh	f59460bfd4	Refactor nflog configuration options to make it similar to Silences. (#3220 ) * Refactor nflog configuration options to make it similar to Silences. The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval. To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and `run` to be similar to the silences.	2023-01-19 16:39:03 +00:00
Julien Pivotto	b0443021dc	Expires notify log sooner when possible It seems useless to keep the notifications in the nflog for longer than twice the repeat interval. This should help reduce memory usage of clustered alertmanagers. Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2022-10-14 10:03:17 +02:00
inosato	791e542100	Remove ioutil Signed-off-by: inosato <si17_21@yahoo.co.jp>	2022-07-18 22:01:02 +09:00
Matthias Loibl	a6d10bd5bc	Update golangci-lint and fix complaints (#2853 ) * Copy latest golangci-lint files from Prometheus Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Use grafana/regexp over stdlib regexp Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Fix typos in comments Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Fix goimports complains in import sorting Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * gofumpt all Go files Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Update naming to comply with revive linter Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * config: Fix error messages to be lower case Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * test/cli: Fix error messages to be lower case Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * .golangci.yaml: Remove obsolete space Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * config: Fix expected victorOps error Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Use stdlib regexp Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Clean up Go modules Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>	2022-03-25 17:59:51 +01:00
gotjosh	8da517524a	Enable support for custom callbacks as part of maintenance (#2689 ) * Enable support for custom callbacks as part of maintenance This enables support for custom Maintenance callbacks as part of the periodic maintenance of silences and notification logs. Effectively a no-op for the Alertmanager but allows downstream implementation to inject custom logic as part of it. Signed-off-by: gotjosh <josue.abreu@gmail.com> * Add tests Signed-off-by: gotjosh <josue.abreu@gmail.com> * Fix tests and remove whitespace Signed-off-by: gotjosh <josue.abreu@gmail.com> * Address review comments Signed-off-by: gotjosh <josue.abreu@gmail.com> * run go fmt Signed-off-by: gotjosh <josue.abreu@gmail.com> * Fix import ordering Signed-off-by: gotjosh <josue.abreu@gmail.com>	2021-09-06 16:19:39 +05:30
Julien Pivotto	b2a4cacb95	Update go dependencies & switch to go-kit/log Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-08-02 12:43:23 +02:00
Simon Pasquier	23a7f89398	Update github.com/gogo/protobuf to v1.3.2 (#2478 ) Fix for CVE-2021-3121 Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2021-02-09 16:49:07 +01:00
Julien Pivotto	013177e2d0	Update dependencies (#2257 ) Update membership Update common (support HTTP/2 client) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-05-18 15:00:36 +02:00
Josh Soref	0f2c65d265	Spelling (#2167 ) * spelling: inhibition Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: matchers Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: notification Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: nonexistent Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: obfuscated Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: occurred Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: relevant Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: unexpected Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: marshaled Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: marshaling Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	2020-01-23 17:06:16 +01:00
johncming	311650658a	nflog: use errors.New instead of fmt.Errorf for no custom error msg. (#2045 ) Signed-off-by: johncming <johncming@yahoo.com>	2019-09-25 10:31:01 +02:00
Ganesh Vernekar	3207e8b300	Vendor prometheus 2.12.0 (#2008 ) * Vendor prometheus 2.12.0 Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Update protos Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2019-08-22 15:34:38 +05:30
beorn7	318e006065	Mark some Summaries explicitly as having no objectives With the next release of client_golang, Summaries will not have objectives by default. Interestingly, this will do the right thing for the Summaries affected by this commit. However, right now those summaries do get the old default objectives. They don't really make sense because the affected Summaries receive Observations quite infrequently (far less than once in the 10m max age currently used). To not get surprising changes when moving on to client_golang v1, let's explicitly set the Summaries as objective-less now. Signed-off-by: beorn7 <beorn@grafana.com>	2019-06-12 15:47:56 +02:00
Simon Pasquier	c7de536129	: use stdlib context (#1768 ) This changes removes all usage of golang.org/x/net/context in the code base. It also bumps a few dependencies for the same reason: - github.com/gogo/protobuf - go-openapi/ Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-02-26 12:18:57 +01:00
Simon Pasquier	b676fa79c0	*: update Makefile.common with new staticcheck (#1692 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-01-04 15:37:33 +01:00
stuart nelson	2026e4a01f	[gossip] Don't merge expired gossip messages (#1631 ) * [silences] Don't merge expired silences If they're expired, they should be cleaned up on the next GC cycle, but merging them in means that they'll probably be gossip'd continually between the cluster members. Signed-off-by: stuart nelson <stuartnelson3@gmail.com> * Add analogous behavior+test for nflog The code for nflog was also constantly re-adding nflogs to the internal memory store, the same as the silence code was. Signed-off-by: stuart nelson <stuartnelson3@gmail.com> * Add retention to TestQuery With the default 0 retention, the alerts would not be merged. Signed-off-by: Stuart Nelson <stuartnelson3@gmail.com>	2018-11-21 11:40:57 +01:00
stuart nelson	445fbdf1a8	gossip large messages via SendReliable (#1415 ) * Gossip large messages via SendReliable For messages beyond half of the maximum gossip packet size, send the message to all peer nodes via TCP. The choice of "larger than half the max gossip size" is relatively arbitrary. From brief testing, the overhead from memberlist on a packet seemed to only use ~3 of the available 1400 bytes, and most gossip messages seem to be <<500 bytes. * Add tests for oversized/normal message gossiping * Make oversize metric names consistent * Remove errant printf in test * Correctly increment WaitGroup * Add comment for OversizedMessage func * Add metric for oversized messages dropped Code was added to drop oversized messages if the buffered channel they are sent on is full. This is a good thing to surface as a metric. * Add counter for total oversized messages sent * Change full queue log level to debug Was previously a warning, which isn't necessary now that there is a metric tracking it. Signed-off-by: stuart nelson <stuartnelson3@gmail.com>	2018-06-15 13:40:21 +02:00
stuart nelson	77cc718a81	[nflog] register snapshotSize This metric was never registered.	2018-06-12 13:59:48 +02:00
stuart nelson	36588c3865	memberlist gossip (#1389 ) * Peers further propagate newly received nflogs If a peer receives an nflog that it hasn't seen before, queue the message and propagate it further to other peers. This should ensure that all peers within a cluster receive all gossip messages. Signed-off-by: stuart nelson <stuartnelson3@gmail.com> * Set Retransmit value based on number of members For alertmanagers that are brought up with a list of peers, set the number of message retransmits to be half of that number. If there are no peers on start, or there are few, continue to use the default value of 3. Signed-off-by: stuart nelson <stuartnelson3@gmail.com> * [nflog] Move retransmit calculation Signed-off-by: stuart nelson <stuartnelson3@gmail.com> * [silence] further gossip silence messages Signed-off-by: stuart nelson <stuartnelson3@gmail.com> * Set GossipNodes to equal RetransmitMulti During a gossip, we send messages to at most GossipNodes nodes. If possible, we only a message to reach all nodes as soon as possible. Signed-off-by: stuart nelson <stuartnelson3@gmail.com> * Fix rebase Signed-off-by: stuart nelson <stuartnelson3@gmail.com>	2018-06-08 11:48:42 +02:00
Simon Pasquier	b7d891cf39	notify: notify resolved alerts properly (#1408 ) * notify: notify resolved alerts properly The PR #1205 while fixing an existing issue introduced another bug when the send_resolved flag of the integration is set to true. With send_resolved set to false, the semantics remain the same: AlertManager generates a notification when new firing alerts are added to the alert group. The notification only carries firing alerts. With send_resolved set to true, AlertManager generates a notification when new firing or resolved alerts are added to the alert group. The notification carries both the firing and resolved notifications. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Fix comments Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-06-08 11:37:38 +02:00
Simon Pasquier	0ebaeccd4b	*: add missing license headers Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-05-14 17:37:13 +02:00
Ted Zlatanov	b04e9ad19b	#1346 : move maintenance messages to DEBUG log level (#1347 ) Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>	2018-04-30 11:56:17 +02:00
Simon Pasquier	a8c995f77c	nflog: fix potential panic in decodeState() Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-04-10 10:11:40 +02:00
Simon Pasquier	1531aa66f3	Fix for #1282 (#1286 ) * cluster: add alertmanager_cluster_messages_queued metric * cluster: add metrics for sent messages This change adds 2 new metrics: - alertmanager_cluster_messages_sent_total - alertmanager_cluster_messages_sent_size_total * Fix marshaling for entries being broadcast Individual notifications logs and silences being broadcast to the other peers need to be encoded using the same length-delimited format as when doing full-state synchronization. * main: fix argument order for cluster.Join() cluster.Join() was called with the push/pull and gossip interval parameters being swapped one for another.	2018-03-22 13:53:00 +01:00
pasquier-s	e8a92f65ef	Run staticcheck as part of the build process (#1264 ) This change also fixes potential issues highlighted by running staticcheck.	2018-02-28 17:42:32 +01:00
Fabian Reinartz	247bfff606	cluster: remove MergeSingle	2018-02-09 11:06:51 +01:00
Fabian Reinartz	fd49dbb477	*: move to memberlist for clustering	2018-02-08 12:18:44 +01:00
pasquier-s	62b957cc14	Notify only when new firing alerts are added (#1205 ) After the initial notification has been sent, AlertManager shouldn't notify the receiver again when no new alerts have been added to the group during group_interval. This change also modifies the acceptance test framework to assert that no notification has been received in a given interval.	2018-01-23 16:52:03 +01:00
pasquier-s	9b10acae68	Don't notify resolved alerts if none were firing (#1198 ) * Don't notify resolved alerts if none were firing * Fix comments	2018-01-18 11:12:17 +01:00
pasquier-s	a7d4e4ea7c	Log snapshot sizes on maintenance (#1155 ) * Log snapshot sizes on maintenance * Add metrics for snapshot sizes This change adds 2 new gauges for tracking the last snapshots' sizes: - alertmanager_nflog_snapshot_size_bytes - alertmanager_silences_snapshot_size_bytes	2018-01-10 14:53:57 +01:00
Frederic Branczyk	bfdff67138	nflog: Copy and replace gossipData instead of modifying it in place (#1121 )	2017-12-09 15:22:07 +01:00
Frederic Branczyk	53bd897bd0	Merge pull request #1066 from josedonizetti/add_set_test Add tests to nflog set	2017-11-02 11:23:19 +01:00
Julius Volz	9b72c10134	Minor code cleanups	2017-11-01 23:08:34 +01:00
Jose Donizetti	511c6bcb6a	Add nflog TestQuery (#1070 )	2017-11-01 20:38:00 +01:00
Julius Volz	fc984941ee	nflog: Fix Log() crash when gossip is nil (#1064 )	2017-11-01 10:34:40 +01:00
Jose Donizetti	bf3f6de719	Add tests to nflog set	2017-11-01 06:44:27 -02:00
Jose Donizetti	359b614f5f	Fix documentation (#1065 )	2017-11-01 08:41:00 +00:00
Julius Volz	947970af44	Convert Alertmanager to use non-global go-kit loggers Fixes https://github.com/prometheus/alertmanager/issues/1040	2017-10-22 00:20:40 -07:00
Fabian Reinartz	3269bc39e1	*: switch group key to matcher serialization Turn the GroupKey into a string that is composed of the matchers if the path in the routing tree and the grouping labels. Only hash it at the very end to ensure we don't exceed size limits of integration APIs.	2017-04-21 12:06:23 +02:00
Fabian Reinartz	4258b028d6	nflog: switch to gogoproto This switches the nflog to generate Go code via gogoproto and thereby use standard library timestamp types.	2017-04-18 10:03:57 +02:00
Fabian Reinartz	309c6af4b2	nflog: use alert set instead of hash for deduplication Building a hash over an entire set of alerts causes problems, because the hash differs, on any change, whereas we only want to send notifications if the alert and it's state have changed. Therefore this introduces a list of alerts that are active and a list of alerts that are resolved. If the currently active alerts of a group are a subset of the ones that have been notified about before then they are deduplicated. The resolved notifications work the same way, with a separate list of resolved notifications that have already been sent.	2017-04-13 15:13:47 +02:00
Fabian Reinartz	1e01b2bdba	nflog: add metrics (#518 )	2016-11-21 15:22:35 +01:00
Fabian Reinartz	b2461bb2d4	*: remove go-kit logging	2016-09-06 11:56:57 +02:00
Fabian Reinartz	d6713c8eeb	nflog: enable sharing log via gossip	2016-08-19 12:20:04 +02:00
Fabian Reinartz	5dc8286942	nflog: fix maintenance termination	2016-08-19 12:01:16 +02:00
Fabian Reinartz	72fdf3d3ab	*: integrate nflog This commit replaces the previous NotifyInfo provider with the new nflog package. It needs adjustments in the behavior of the deduping stage. The nflog stores notification digests per receiver per alert aggregation group rather than one entry for alert per receiver. This drastically reduces the number of entries and removes interference across aggregation groups.	2016-08-18 15:52:28 +02:00

1 2

56 Commits