Commit Graph

30 Commits

Author SHA1 Message Date
chengzw aff09c28dc fix label mismatch for alertmanager_notifications_failed_total
Signed-off-by: chengzw <chengzw258@163.com>
2023-11-11 22:26:38 +08:00
gotjosh 07b89eb117
Mixin: Pin the mixtool version in CircleCI
In mixtool, the tip of master broke for our mixin - I have managed to trace it down and opened a PR (see https://github.com/grafana/dashboard-linter/pull/143) but for now, let's pin the version to make sure our CI is not affected.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-08-03 16:15:26 +01:00
gotjosh c494009f61
Mixin: Fix mixin linting
In accordance with a new rule introduced as part of https://github.com/grafana/dashboard-linter/pull/79 this is now required. However, for the new rule of `panel-unit-rule` we don't reap any benefits from specifiying a particular unit for our panels, the defaults work perfectly fine so they're ignored.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-06-29 16:25:15 +01:00
gotjosh 4d09995c26
Mixin: `template-job-rule` now only validates job and not both instance and job (#2944)
With https://github.com/grafana/dashboard-linter/pull/49 `template-job-rule` no longer validates both `instance` and `job` labels. Add the new rule of `template-instance-rule` to the exclusions to preserve the previous behaviour.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-06-15 22:27:11 +02:00
gotjosh 35bf59f182
Mixin: Rename exclusion rule from `panel-job-instance-rule` to `target-instance-rule`
Within 9a32e58ed0, the rules have been split into two different rules:

`target-job-rule`
`target-instance-rule`

All of our queries do contain the `job` label but as per the reason, we don't need both in this particular case.

Fixes #2899

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2022-05-02 13:08:22 +01:00
Simon Pasquier 48a99764a1
*: bump to Go 1.17 (#2792)
* *: bump to Go 1.17

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* *: fix yamllint errors

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2021-12-22 14:03:53 +01:00
clyang82 5414963190 fix lint error
Signed-off-by: clyang82 <chuyang@redhat.com>
2021-12-10 08:20:23 +00:00
fpetkovski b408b522bc Improve the AlertmanagerMembersInconsistent alert
The expression alertmanager_cluster_members{job="alertmanager"}[5m]) is assumed to return
one series for each alertmanager instance in the cluster. When running inside Kubernetes,
alertmanager pods can get evicted and rescheduled. This can change the instance label and
produce a new series for that alertmanager instance.

When the same pod gets evicted several times in a row, there will be a short interval in which
Prometheus will return values from both the new series and the old series.
As a result, counting the number of series for the alertmanager_cluster_members metric
will overestimate the number of instances in the given cluster.

This commit modifies the the AlertmanagerMembersInconsistent alert to increase the for clause to 15m
in order to reduce the probability of a false positive.

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-06-22 08:21:02 +02:00
Arthur Silva Sens 8598683b24
[mixins] Alertmanager Overview dashboard (#2540)
* Implements a Grafana dashboard to the mixin.

The dashboard aims to show an overview of the overall health of Alertmanager.

Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-06-07 19:54:22 +02:00
beorn7 0ed31c3311 Update matcher examples
While the documentation for the matchers themselves was updated, we
missed the examples.

I propose to merge this into the release-0.22 branch so that it gets
included in the ongoing release.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-05-06 19:17:59 +02:00
SuperQ 99f64e944b
Update build
* Drop /vendor.
* Update Go to 1.16.
* Update djfarrelly/maildev to 1.1.0.
* Update protoc to 3.15.8.
* Update mixin test for Go 1.16.
* Bump Go modules.

Signed-off-by: SuperQ <superq@gmail.com>
2021-04-22 13:11:44 +02:00
Bor Grošelj Simić dd6b3afebe Remove trailing whitespace in docs
Signed-off-by: Bor Grošelj Simić <bor.groseljsimic@telemach.net>
2021-01-05 16:27:38 +01:00
Björn Rabenstein ce108378d4
Fix and improve AlertmanagerClusterFailedToSendAlerts (#2437)
The alert was just looking at the minimum across integrations. So a
complete failure of one integration would be masked by a still worknig
other integration. With this fix, the `integration` label is retained
(as it was already expected by the `description`), and thus any
failing integration will trigger the alert.

In addition, an `alertmanagerCriticalIntegrationsRegEx` is provided
that allows to mark integrations as critical. Integrations that are
not used to deliver critical alerts, or those that are just there for
auditing and logging purposes can now be configured to only trigger a
warning alert if they fail.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-12-23 15:15:38 +01:00
Tom Wilkie 6c5dee008f
Beginnings of an Alertmanager mixin. (#1629)
Add an Alertmanager mixin

Signed-off-by: beorn7 <beorn@grafana.com>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: beorn7 <beorn@grafana.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
2020-12-03 15:57:42 +01:00
Julien Pivotto 1cba0c7a37
Remove HipChat (#2281)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-06-11 15:51:10 +02:00
Dominik-K f8ffc2a18a
Add warning that inhibition occurs on missing `equal` (#2214)
Signed-off-by: Dominik <dominik-k@mailbox.org>
2020-03-27 16:20:19 +01:00
Max Leonard Inden d81b9a5435
doc: Add 'Secure Alertmanager cluster traffic' design document
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-03-01 16:12:25 +01:00
kirillsablin 32bb289906 dispatch: Add group_by_all support (#1588)
To aggregate by all possible labels use '...' as the sole label name. 
This effectively disables aggregation entirely, passing through all 
alerts as-is. This is unlikely to be what you want, unless you have 
a very low alert volume or your upstream notification system performs 
its own grouping. Example: group_by: [...]

Signed-off-by: Kyryl Sablin <kyryl.sablin@schibsted.com>
2018-11-29 12:31:14 +01:00
Martin Chodur 5d222bce55 feat: added routing tools to amtool (#1511)
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
2018-08-22 16:41:09 +02:00
Silvio Gissi 402564055b Update Architecture diagram (#1394)
* Update Architecture diagram

Update diagram from sketch to vector.
Add draw.io XML source file.
Update README.md to display master doc/arch.jpg

Signed-off-by: Silvio Gissi <silvio@gissilabs.com>

* Updated README.md with relative link to architecture doc.

* Updated Architecture document from JPG to SVG

Signed-off-by: Silvio Gissi <silvio@gissilabs.com>

* Small fix in graph.

* Updated font to align with Prometheus architecture.

Signed-off-by: Silvio Gissi <silvio@gissilabs.com>

* Embedded images at arch.svg

* Removed images from SVG, update source XML
2018-05-31 15:34:52 +02:00
Tom Paine 081fc7d982 Update simple.yml (#1216)
match spacing on other receiver groups
2018-01-29 15:58:44 +01:00
Jose Donizetti 76c15a0ef5 Fix config name inconsistency (#1087)
* Rename global config hipchat_url to hipchat_api_url

* Rename opsgenie config 'host' to 'url'
2017-11-11 15:01:21 +01:00
Max Chadwick 5801581883 Fix a typo in simple.yml 2016-04-20 22:10:28 -04:00
Max Chadwick 4cb3874ab8 Move SMTP auth to the config file 2016-04-16 16:41:55 -04:00
Max Chadwick eaff66916c Clarify SMTP authentication in the docs 2016-04-10 21:25:06 -04:00
Fabian Reinartz 1a68209fd1 Fix example config 2016-02-03 17:20:41 +01:00
louis 5a16e373a7 update simple config example for hipchat integration 2016-01-05 20:52:25 +01:00
beorn7 93ffa534a5 PR with changes after code review
Now to be reverse-reveiewed.
2015-11-23 18:24:57 +01:00
Fabian Reinartz 0c27b08a05 Add example config file 2015-11-12 15:15:12 +01:00
Fabian Reinartz 4e6695682a Add architecture sketch 2015-10-26 15:09:48 +01:00