wait until `p2.Status()` returns because it blocks until we're ready - that way, we're guaranteed to know that the cluster size is 2.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
wait until `p2.Status()` returns because it blocks until we're ready - that way, we're guaranteed to know that the cluster size is 2.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Update the docs on how to use UTF-8 in label matchers and parse mode feature flags
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Fix panic in acceptance tests
This commit attempts to address a panic that occurs in acceptance
tests if a server in the cluster fails to start.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Remove started and check am.cmd.Process != nil
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit fixes a log line in the featurecontrol package which
should be "UTF-8 strict mode" and not "UTF-8 mode".
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit fixes a small number of inconsistencies in the compat
package logging. It now has consistent use of classic matchers
parser and UTF-8 matchers parser, instead of old matchers parser
and new matchers parser.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* feat: add counter to track alerts dropped outside of time_intervals
Addresses: #3512
This adds a new counter metric `alertmanager_alerts_supressed_total`
that is incremented by `len(alerts)` when an alert is suppressed for
being outside of a time_interval, ie inside of a mute_time_intervals or
outside of an active_time_intervals.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* test: add time interval suppression metric checks for notify
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* test: fix failure message log values in notifier
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* ref: address PR feedback for #3565
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* fix: track suppressed notifications metric for inhibit/silence
Based on PR feedback:
https://github.com/prometheus/alertmanager/pull/3565/files#r1393068026
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* fix: broken notifier tests
- fixed metric count check to properly check the diff between
input/output notifications from the suppression to compare to suppression
metric, was previously inverted to compare to how many notifications it
suppressed.
- stopped using `Reset()` to compare collection counts between the
multiple stages that are executed in `TestMuteStageWithSilences()`.
the intent was to compare a clean metric collection after each stage
execution, but the final stage where all silences are lifted results in
no metric being created in the test, causing `prom_testutil.ToFloat64()`
to panic. changed to separate vars to check counts between each stage,
with care to consider prior counts.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* rename metric and add constants
Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
This commit removes the metrics from the compat package
in favour of the existing logging and the additional tools
at hand, such as amtool, to validate Alertmanager configurations.
Due to the global nature of the compat package, a consequence
of config.Load, these metrics have proven to be less useful
in practice than expected, both in Alertmanager and other projects
such as Mimir.
There are a number of reasons for this:
1. Because the compat package is global, these metrics cannot be
reset each time config.Load is called, as in multi-tenant
projects like Mimir loading a config for one tenant would reset
the metrics for all tenants. This is also the reason the metrics
are counters and not gauges.
2. Since the metrics are counters, it is difficult to create
meaningful dashboards for Alertmanager as, unlike in Mimir,
configurations are not reloaded at fixed intervals, and as such,
operators cannot use rate to track configuration changes
over time.
In Alertmanager, there are much better tools available to validate
that an Alertmanager configuration is compatible with the UTF-8
parser, including both the existing logging from Alertmanager
server and amtool check-config.
In other projects like Mimir, we can track configurations for
individual tenants using log aggregation and storage systems
such as Loki. This gives operators far more information than
what is possible with the metrics, including the timestamp,
input and ID of tenant configurations that are incompatible
or have disagreement.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* feat: implement webhook_url_file for discord
implements #3482
Signed-off-by: Philipp Born <git@pborn.eu>
* feat: implement webhook_url_file for msteams
implements #3536
Signed-off-by: Philipp Born <git@pborn.eu>
---------
Signed-off-by: Philipp Born <git@pborn.eu>
There is no need to register these metrics in amtool, so use
compat.NewMetrics(nil) instead of compat.RegisteredMetrics.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit changes the metrics in the compat package from gauges
to counters. The reason for this is that in some cases the gauge
should behave like a gauge (i.e. loading configurations) but in
other cases should behave like a counter (i.e. HTTP requests).
Second, because the compat package is a global package
(due to how config.Load works), in tenanted systems like Cortex
and Mimir it was non-trivial to reset the gauges per tenant
each time their configuration was reloaded.
Instead, it's easier to compute the rate of increase as 0 instead
of check that the gauge is 0 to know if UTF-8 strict mode can be
enabled.
Signed-off-by: George Robinson <george.robinson@grafana.com>
This commit fixes a small bug in the warning logs for incompatible
matchers where the error from the UTF-8 parser was logged as nil.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Add metric for inhibit rules
This commit adds a new metric called alertmanager_inhibit_rules.
It is identical to the alertmanager_integrations and
alertmanager_receivers metrics that are present in the current
and previous versions.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Rename metric and variable
Signed-off-by: George Robinson <george.robinson@grafana.com>
---------
Signed-off-by: George Robinson <george.robinson@grafana.com>