* Metrics: Silence maintenance success and failure
Due to various reasons, we've observed different kind of errors on this area. From read-only disks to silly code bugs.
Errors during maintenance are effectively a _data loss_ and therefore we should encourage proper monitoring of this area.
This PR Introduces a total and failure metric for silence maintenance. If agreed, I'll do the same for the nflog and fix the flaky test like I did for silences while I'm there.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Metrics: Notification log maintenance success and failure
Due to various reasons, we've observed different kind of errors on this area. From read-only disks to silly code bugs. Errors during maintenance are effectively a data loss and therefore, we should encourage proper monitoring of this area.
Similar to #3285
---------
Signed-off-by: gotjosh <josue.abreu@gmail.com>
This commit fixes data corruption in templates that use the title
function as it used a shared cases.Title when casers should not
be shared between goroutines. When templates that used the title
function were executed at the same time, data corruption would occur
in the text that was being cased. This is fixed using a separate
caser for each function call.
Add tests to assertDefaultFuncs are thread-safe
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Added note on data retention to documentation for repeat_interval
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
---------
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
* support loading webhook URL from a file
/cc #2498
Signed-off-by: Simon Rozet <me@simonrozet.com>
* notify/webhook: add test for reading url from file
Signed-off-by: Simon Rozet <me@simonrozet.com>
* notify/pushover: add tests for reading secrets from files
Signed-off-by: Simon Rozet <me@simonrozet.com>
---------
Signed-off-by: Simon Rozet <me@simonrozet.com>
Today I learned that `runtime.Gosched()` doesn't do what I thought it would.
While it allows other goroutines to run it doesn't guarantee that the main goroutine will be blocked until others are run.
sadly, I had to fall back to the sleep approach.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
* Separate the template creating functions for as-lib use
Fixes#3217
Signed-off-by: Furkan <furkan.turkal@trendyol.com>
---------
Signed-off-by: Furkan <furkan.turkal@trendyol.com>
In the Web UI, we have a UI to get information on a given silence
through its ID. This functionality is missing from amtool however,
leading to the necessity to pull _all_ the silenced, and filter with
`grep` or similar.
This commit adds in a `--id` flag to the `silence query` command, which
allows specifying an ID to match on, matching the functionality of the Web UI.
Signed-off-by: sinkingpoint <colin@quirl.co.nz>
This commit implements the Stringer interface for Pairs and KVs.
It changes how Pairs are printed in templates from
map[name1:value1 name2:value2] to name1=value1, name2=value2. KVs
work similar, but are first sorted into pairs before being printed.
Signed-off-by: George Robinson <george.robinson@grafana.com>
* Refactor nflog configuration options to make it similar to Silences.
The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval.
To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and `run` to be similar to the silences.
* support loading pushover secrets from files
Add the user_key_file and token_file keys to the pushover config.
/cc https://github.com/prometheus/alertmanager/issues/2498
Signed-off-by: Simon Rozet <me@simonrozet.com>
* The current page outline was an unstructured and unsorted mess, so I tried to
organize the different configuration file fields into categories.
* I also sorted receivers alphabetically.
* Corrected the Telegram receiver's "bot_token" to be a "secret", not "string".
* Other minor improvements and wording additions to the sections.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
The CI environment isn't as performant as local machines: the time
needed to fully initialize the test environment can be significant and
skew the verification. Rather than setting the "virtual" clock used to
measure alert timings at the beginning of the acceptance test, it is
better to wait for the test bed to be ready.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>