This commit removes the metrics from the compat package
in favour of the existing logging and the additional tools
at hand, such as amtool, to validate Alertmanager configurations.
Due to the global nature of the compat package, a consequence
of config.Load, these metrics have proven to be less useful
in practice than expected, both in Alertmanager and other projects
such as Mimir.
There are a number of reasons for this:
1. Because the compat package is global, these metrics cannot be
reset each time config.Load is called, as in multi-tenant
projects like Mimir loading a config for one tenant would reset
the metrics for all tenants. This is also the reason the metrics
are counters and not gauges.
2. Since the metrics are counters, it is difficult to create
meaningful dashboards for Alertmanager as, unlike in Mimir,
configurations are not reloaded at fixed intervals, and as such,
operators cannot use rate to track configuration changes
over time.
In Alertmanager, there are much better tools available to validate
that an Alertmanager configuration is compatible with the UTF-8
parser, including both the existing logging from Alertmanager
server and amtool check-config.
In other projects like Mimir, we can track configurations for
individual tenants using log aggregation and storage systems
such as Loki. This gives operators far more information than
what is possible with the metrics, including the timestamp,
input and ID of tenant configurations that are incompatible
or have disagreement.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit fixes inconsistent UTF-8 behavior if the compat package is
not initialized and feature flags are not passed to the API. This can
happen when Alertmanager is used as a package in software such
as Cortex or Mimir.
The inconsistent behavior is that Alertmanager will accept UTF-8 alerts
but reject UTF-8 configurations.
Since feature flags are optional via api.Options, we cannot force them
to be passed to api.New at compile time. Instead, it's better to defer
back to the compat package which is consistent even when not initialized.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Support UTF-8 label matchers: Use compat package in Alertmanager server
This pull request adds use of the compat package in Alertmanager server that will allow users to switch between the new matchers/parse parser and the old pkg/labels parser. The new matchers/parse parser uses a fallback mechanism where if the input cannot be parsed in the new parser it then attempts to use the old parser. If an input is parsed in the old parser but not the new parser then a warning log is emitted.
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Refactor: Move `inTimeIntervals` from `notify` to `timeinterval`
There's absolutely no change of functionality here and I've expanded coverage for similar logic in both places.
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
This accurately reflects what the function _actually_ does. If no active silences IDs are provided and the list of inhibitions we have is already empty the alert is actually set to Active. Took me a while to realise this as I was understanding how do we populate the alert list.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Alert metric reports different results to what the user sees via API
Fixes#1439 and #2619.
The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Previously, if a pending silence existed for an alert, and it later
became active without any silences getting added in the meantime, we
would miss the existence of that newly active silence.
Signed-off-by: beorn7 <beorn@grafana.com>
Add version tracking of silences states. Adding a silence to the state
increments the version. If the version hasn't changed since the last
time an alert was checked for being silenced, we only have to verify
that the relevant silences are still active rather than checking the
alert against all silences.
Signed-off-by: beorn7 <beorn@soundcloud.com>
This clarifies a bunch of things I have run into during code reading
in preparation for some performance improvements around muting.
It also moves doc comments from places where they don't show up in
godoc to visible places.
It also fixes golint warnings.
Signed-off-by: beorn7 <beorn@soundcloud.com>
Instead of registering marker metrics inside of
cmd/alertmanager/main.go, register them in types/types.go, encapsulating
marker specific logic in its module, not in main.go. In addition it
paves the path for removing the usage of the global metric registry in
the future, by taking a local metric registerer.
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
Alert merging assumed that EndsAt would always be empty for firing
alerts. This is no longer true starting with Prometheus v2.4.0: EndsAt
is set to a multiple of the evaluation interval or resend interval
(whichever is the largest). This change updates the merging logic to
support both cases.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Sort dispatched alerts by job+instance in the correct order (#1178)
Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>
* dispatch: add unit test for alerts sorting
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
This adds metrics that look like this:
```
alertmanager_alerts{state="active"} 6
alertmanager_alerts{state="suppressed"} 0
alertmanager_silences{state="active"} 1
alertmanager_silences{state="expired"} 1
alertmanager_silences{state="pending"} 0
```
This can be used to monitor alertmanager's usage and validate that
alertmanagers in a mesh have a similar number of silences and alerts.
Turn the GroupKey into a string that is composed of the matchers if the
path in the routing tree and the grouping labels.
Only hash it at the very end to ensure we don't exceed size limits of
integration APIs.
This commit removes the dependency on model.Silence for the internal
Silence type, uses UUIDs instead of uint64s and clarifies invariants
around timestamp handling.
The created_at timestamp is removed for the time being.