Add --ignore-unknown-fields that ignores unknown fields in rule group
files. There are lots of tools in the ecosystem that "like" to extend
the rule group file structure but they are currently unreadable by
promtool if there's anything extra. The purpose of this flag is so that
we could use the "vanilla" promtool instead of rolling our own.
Some examples of tools/code:
https://github.com/grafana/mimir/blob/main/pkg/mimirtool/rules/rwrulefmt/rulefmt.go8898eb3cc5/pkg/rules/rules.go (L18-L25)
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
What
Adds support for OTLP delta temporality to the OTLP endpoint.
This is done by calling the deltatocumulative processor from the OpenTelemetry collector during OTLP conversion.
Why
Delta conversion is a naturally stateful process, which requires careful request routing when operated inside a collector.
Prometheus is already stateful and doing the conversion in-server reduces the operational burden on the ingest architecture by only having one stateful component.
How
deltatocumulative is a OTel collector component that works as follows:
* pmetric.Metrics come from a receiver or in this case from the HTTP client
* It operates as an in-place update loop:
* for each sample, if not delta, leave unmodified
* if delta, do:
* state += sample, where state is the in-memory sum of all previous samples
* sample = state, sample value is now cumulative
* this is supported for sums (counters), gauges, histograms (old histograms) and exponential histograms (native histograms)
If a series receives no new samples for 5m, its state is removed from memory
Performance
Delta performance is a stateful operation and the OTel code is not highly optimized yet, e.g. it locks the entire processor for each request. Nonetheless, care has been taken to mitigate those effects:
delta conversion is behind a feature flag. If disabled, no conversion code is ever invoked
if enabled, conversion is not invoked if request not actually contains delta samples. This leads to no measureable performance difference between default-cumulative to convert-cumulative (only cumulative, feature on/off)
Signed-off-by: sh0rez <me@shorez.de>
Fix issues raised by staticcheck
We are not enabling staticcheck explicitly, though, because it has too many false positives.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Instead of storing discovered labels on every target, recompute them if
required. The `Target` struct now needs to hold some more data required
to recompute them, such as ScrapeConfig.
This moves the load from every Prometheus all of the time, to just when
someone views Service Discovery in the UI.
The way `PopulateLabels` is used changes; you are no longer expected to
call it with a part-populated `labels.Builder`.
The signature of `Target.Labels` changes to take a `labels.Builder`
instead of a `ScratchBuilder`, for consistency with `DiscoveredLabels`.
This will save a lot of work when many targets are filtered out in
relabeling. Combine with `keep_dropped_targets` to avoid ever computing
most labels for dropped targets.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
We should never modify (or even shallow copy) Config after config.Load;
added comments and modified GetScrapeConfigs to do so. For GetScrapeConfigs
the validation (even repeated) was likely doing writes (because global fields was 0). We GetScrapeConfigs concurrently
in tests and ApplyConfig causing test races. In prod there were
races but likelyt only to replace 0 with 0, so not too severe.
I removed validation since I don't see anyone using our config.Config without Load.
I had to refactor one test that was doing it, all others use yaml config.
Fixes#15538
Previous attempt: https://github.com/prometheus/prometheus/pull/15634
Signed-off-by: bwplotka <bwplotka@gmail.com>
Enable the `auto-gomaxprocs` feature flag by default.
* Add command line flag `--no-auto-gomaxprocs` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
Enable the `auto-gomemlimit` feature flag by default.
* Add command line flag `--no-auto-gomemlimit` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
Previously, top-level initialization (including WAL reading) had this at
the top of the stack:
github.com/oklog/run.(*Group).Run.func1:38
I found this confusing, and thought it had something to do with logging.
Introducing another local function should make this clearer.
Also make the error message user-centric not code-centric.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
When we had a syntax error but restored the old file, we did not
re-trigger the config reload, so the config reload metric was showing
that config reload was unsucessful.
I made magic to handle logs in cmd/prometheus.
For now it is a separate file so we can backport this easily.
I will generalize the helper in another PR.
Signed-off-by: Julien <roidelapluie@o11y.eu>
* Add hidden flag for the delayed compaction random time window
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Update cmd/prometheus/main.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Update cmd/prometheus/main.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Update tsdb/db.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Fix flag name according to review - add test for delay
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Fix afer main rebase
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Implement review comments
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Update generatedelaytest to try with limit values
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
---------
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Fix some edge cases when OOO is enabled
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
* promtool: Add debug flag for rule tests
This makes it print out the tsdb state (both input_series and rules that
are run) at the end of a test, making reasoning about tests much easier.
Signed-off-by: David Leadbeater <dgl@dgl.cx>
* Reuse generated test name from junit testing
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: David Leadbeater <dgl@dgl.cx>
turn some loops into subtests to make use of t.Parallel()
requires Go 1.22 to make use of https://go.dev/blog/loopvar-preview
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Now we check that a rule execution has taken place.
This also reduces the time to run the rules tests from 45s to 25s.
Signed-off-by: Julien <roidelapluie@o11y.eu>
The OTLP receiver can now considered stable. We've had it for longer
than a year in main and has received constant improvements.
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
This commit introduces a new `/api/v1/notifications/live` endpoint that
utilizes Server-Sent Events (SSE) to stream notifications to the web UI.
This is used to display alerts such as when a configuration reload
has failed.
I opted for SSE over WebSockets because SSE is simpler to implement and
more robust for our use case. Since we only need one-way communication
from the server to the client, SSE fits perfectly without the overhead
of establishing and maintaining a two-way WebSocket connection.
When the SSE connection fails, we go back to a classic
/api/v1/notifications API endpoint.
This commit also contains the required UI changes for the new Mantine UI.
Signed-off-by: Julien <roidelapluie@o11y.eu>
The pattern of `import _ "net/http/pprof"` adds handlers to the default
http handler, but Prometheus does not use that. There are explicit
handlers in `web/web.go`.
So, we can remove this line with no impact to behaviour.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>