prometheus/promql/promqltest
Jorge Creixell e9e3d64b7c
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation (#14477)
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation

  - This change allows optionally preserving the `__name__` label via the `label_replace` and `label_join` functions, and helps prevent the dreaded "vector cannot contain metrics with the same labelset" error.
  - The implementation extends the `Series` and `Sample` structs with a boolean flag indicating whether the `__name__` label should be deleted at the end of the query evaluation.
  - The `label_replace` and `label_join` functions can still access the value of the `__name__` label, even if it has been previously marked for deletion. If  `__name__` is used as target label, it won't be dropped at the end of the query evaluation.
  - Fixes https://github.com/prometheus/prometheus/issues/11397
  - See https://github.com/jcreixell/prometheus/pull/2 for previous discussion, including the decision to create this PR and benchmark it before considering other alternatives (like refactoring `labels.Labels`).
  - See https://github.com/jcreixell/prometheus/pull/1 for an alternative implementation using a special label instead of boolean flags.
  - Note: a feature flag  `promql-delayed-name-removal` has been added as it changes the behavior of some "weird" queries (see https://github.com/prometheus/prometheus/issues/11397#issuecomment-1451998792)

Example (this always fails, as `__name__` is being dropped by `count_over_time`):

```
count_over_time({__name__!=""}[1m])

=> Error executing query: vector cannot contain metrics with the same labelset
```

Before:

```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")

=> Error executing query: vector cannot contain metrics with the same labelset
```

After:

```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")

=>
count_go_gc_cycles_automatic_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
count_go_gc_cycles_forced_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
...
```

Signed-off-by: Jorge Creixell <jcreixell@gmail.com>

---------

Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
2024-08-29 15:50:39 +02:00
..
testdata PromQL engine: Delay deletion of __name__ label to the end of the query evaluation (#14477) 2024-08-29 15:50:39 +02:00
README.md remove eval_with_nhcb 2024-06-20 22:49:00 +08:00
test_test.go promql: extend test scripting language to support asserting on expected error message (#14038) 2024-06-06 17:56:25 +02:00
test.go PromQL engine: Delay deletion of __name__ label to the end of the query evaluation (#14477) 2024-08-29 15:50:39 +02:00

The PromQL test scripting language

This package contains two things:

  • an implementation of a test scripting language for PromQL engines
  • a predefined set of tests written in that scripting language

The predefined set of tests can be run against any PromQL engine implementation by calling promqltest.RunBuiltinTests(). Any other test script can be run with promqltest.RunTest().

The rest of this document explains the test scripting language.

Each test script is written in plain text.

Comments can be given by prefixing the comment with a #, for example:

# This is a comment.

Each test file contains a series of commands. There are three kinds of commands:

  • load
  • clear
  • eval

Each command is executed in the order given in the file.

load command

load adds some data to the test environment.

The syntax is as follows:

load <interval>
    <series> <points>
    ...
    <series> <points>
  • <interval> is the step between points (eg. 1m or 30s)
  • <series> is a Prometheus series name in the usual metric{label="value"} syntax
  • <points> is a specification of the points to add for that series, following the same expanding syntax as for promtool unittest documented here

For example:

load 1m
    my_metric{env="prod"} 5 2+3x2 _ stale {{schema:1 sum:3 count:22 buckets:[5 10 7]}}

...will create a single series with labels my_metric{env="prod"}, with the following points:

  • t=0: value is 5
  • t=1m: value is 2
  • t=2m: value is 5
  • t=3m: value is 7
  • t=4m: no point
  • t=5m: stale marker
  • t=6m: native histogram with schema 1, sum -3, count 22 and bucket counts 5, 10 and 7

Each load command is additive - it does not replace any data loaded in a previous load command. Use clear to remove all loaded data.

Native histograms with custom buckets (NHCB)

When loading a batch of classic histogram float series, you can optionally append the suffix _with_nhcb to convert them to native histograms with custom buckets and load both the original float series and the new histogram series.

clear command

clear removes all data previously loaded with load commands.

eval command

eval runs a query against the test environment and asserts that the result is as expected.

Both instant and range queries are supported.

The syntax is as follows:

# Instant query
eval instant at <time> <query>
    <series> <points>
    ...
    <series> <points>
    
# Range query
eval range from <start> to <end> step <step> <query>
    <series> <points>
    ...
    <series> <points>
  • <time> is the timestamp to evaluate the instant query at (eg. 1m)
  • <start> and <end> specify the time range of the range query, and use the same syntax as <time>
  • <step> is the step of the range query, and uses the same syntax as <time> (eg. 30s)
  • <series> and <points> specify the expected values, and follow the same syntax as for load above

For example:

eval instant at 1m sum by (env) (my_metric)
    {env="prod"} 5
    {env="test"} 20
    
eval range from 0 to 3m step 1m sum by (env) (my_metric)
    {env="prod"} 2 5 10 20
    {env="test"} 10 20 30 45

Instant queries also support asserting that the series are returned in exactly the order specified: use eval_ordered instant ... instead of eval instant .... This is not supported for range queries.

It is also possible to test that queries fail: use eval_fail instant ... or eval_fail range .... eval_fail optionally takes an expected error message string or regexp to assert that the error message is as expected.

For example:

# Assert that the query fails for any reason without asserting on the error message.
eval_fail instant at 1m ceil({__name__=~'testmetric1|testmetric2'})

# Assert that the query fails with exactly the provided error message string.
eval_fail instant at 1m ceil({__name__=~'testmetric1|testmetric2'})
    expected_fail_message vector cannot contain metrics with the same labelset

# Assert that the query fails with an error message matching the regexp provided.
eval_fail instant at 1m ceil({__name__=~'testmetric1|testmetric2'})
    expected_fail_regexp (vector cannot contain metrics .*|something else went wrong)