The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.
Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...
Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.
The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.
Also, remove lint and vet warnings in ast.go.
This commits implements the OR operation between two vectors.
Vector matching using the ON clause is added to limit the set of
labels that define a match between two elements. Group modifiers
(GROUP_LEFT/GROUP_RIGHT) to request many-to-one matching are added.
This allows changing the time offset for individual instant and range
vectors in a query.
For example, this returns the value of `foo` 5 minutes in the past
relative to the current query evaluation time:
foo offset 5m
Note that the `offset` modifier always needs to follow the selector
immediately. I.e. the following would be correct:
sum(foo offset 5m) // GOOD.
While the following would be *incorrect*:
sum(foo) offset 5m // INVALID.
The same works for range vectors. This returns the 5-minutes-rate that
`foo` had a week ago:
rate(foo[5m] offset 1w)
This change touches the following components:
* Lexer/parser: additions to correctly parse the new `offset`/`OFFSET`
keyword.
* AST: vector and matrix nodes now have an additional `offset` field.
This is used during their evaluation to adjust query and result times
appropriately.
* Query analyzer: now works on separate sets of ranges and instants per
offset. Isolating different offsets from each other completely in this
way keeps the preloading code relatively simple.
No storage engine changes were needed by this change.
The rules tests have been changed to not probe the internal
implementation details of the query analyzer anymore (how many instants
and ranges have been preloaded). This would also become too cumbersome
to test with the new model, and measuring the result of the query should
be sufficient.
This fixes https://github.com/prometheus/prometheus/issues/529
This fixed https://github.com/prometheus/promdash/issues/201
This is related to #454. Queries now timeout after a duration set by
the -query.timeout flag. The TotalEvalTimer is now started/stopped
inside any of the ast.Eval* functions.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
It turned out in the end, that only drop_common_metrics() produced any
erroneous output in the old system. The second expression in the test
("sum(testmetric) keeping_extra") already worked in the old code, but
why not keep it in...
The way to test ranged evaluations is a bit clumsy so far, so I want to
build a nicer test framework in the end, where all the test cases can be
specified as text files which specify desired inputs, outputs, query
step widths, etc.
Change-Id: I821859789e69b8232bededf670a1b76e9e8c8ca4
- Delete unneeded file view_adapter.go.
- Assessed that we still need the fingerprints in nodes
(to create iterators).
- Turned numMemChunkDescs into a metric.
Change-Id: I29be963c795a075ec00c095f76bf26405535609d
After many transformations, it doesn't make sense to keep the metric
names, since the result of the transformation is no longer that metric.
This drops the metric name after such transformations and makes the web
UI deal well with missing metric names.
This depends on the current branch on the following things:
- prometheus/client_golang needs to be at
e237cf15c6
in branch "julius/int-fingerprints" (to be merged with new storage)
- prometheus/promdash needs to be at
dd7691c9c2
Change-Id: Ib3c8cad8d647d9854e8c653c424b8c235ccc231d
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:
rule checker -> query layer -> tiered storage layer -> leveldb
This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:
- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
and LevelDB storage.
I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.
The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.
The rule_checker is now a static binary :)
Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
This allows putting a scalar as the first argument of a binary operator
in which the second argument is a vector:
<scalar> <binop> <vector>
For example,
1 / http_requests_total
...will output a vector in which every sample value is 1 divided by the
respective input vector element.
This even works for filter binary operators now:
1 == http_requests_total
Returns a vector with all values set to 1 for every element in
http_requests_total whose initial value was 1.
Note: For filter binary operators, the resulting values are always taken
from the left-hand-side of the operation, no matter whether the scalar
or the vector argument is the left-hand-side. That is,
1 != http_requests_total
...will set all result vector sample values to 1, although these are
exactly the sample elements that were != 1 in the input vector.
If you want to just filter elements without changing their sample
values, you still need to do:
http_requests_total != 1
The new filter form is a bit exotic, and so probably won't be used
often. But it was easier to implement it than disallow it completely or
change its behavior.
Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5
There are four label-matching ops for selecting timeseries now:
- Equal: =
- NotEqual: !=
- RegexMatch: =~
- RegexNoMatch: !~
Instead of looking up labels by a simple clientmodel.LabelSet (basically
an equals op for every key/value pair in the set), timeseries
fingerprint selection is now done via a list of metric.LabelMatchers.
Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96
The initial impetus for this was that it made unmarshalling sample
values much faster.
Other relevant benchmark changes in ns/op:
Benchmark old new speedup
==================================================================
BenchmarkMarshal 179170 127996 1.4x
BenchmarkUnmarshal 404984 132186 3.1x
BenchmarkMemoryGetValueAtTime 57801 50050 1.2x
BenchmarkMemoryGetBoundaryValues 64496 53194 1.2x
BenchmarkMemoryGetRangeValues 66585 54065 1.2x
BenchmarkStreamAdd 45.0 75.3 0.6x
BenchmarkAppendSample1 1157 1587 0.7x
BenchmarkAppendSample10 4090 4284 0.95x
BenchmarkAppendSample100 45660 44066 1.0x
BenchmarkAppendSample1000 579084 582380 1.0x
BenchmarkMemoryAppendRepeatingValues 22796594 22005502 1.0x
Overall, this gives us good speedups in the areas where they matter
most: decoding values from disk and accessing the memory storage (which
is also used for views).
Some of the smaller append examples take minimally longer, but the cost
seems to get amortized over larger appends, so I'm not worried about
these. Also, we're currently not bottlenecked on the write path and have
plenty of other optimizations available in that area if it becomes
necessary.
Memory allocations during appends don't change measurably at all.
Change-Id: I7dc7394edea09506976765551f35b138518db9e8
Mostly, that means adding compliant doc strings to exported items.
Also, remove 'go vet' warnings where possible. (Some are unfortunately
not to avoid, arguably bugs in 'go vet'.)
Change-Id: I2827b6dd317492864c1383c3de1ea9eac5a219bb
MIN/MAX/SUM/AVG/COUNT aggregations will now by default drop all labels that are
not specifically part of a BY-clause, even if a label value is the same within
all timeseries of an aggregation group. The old behavior of keeping extra
labels may still be switched on by adding KEEPING_EXTRA to the end of an
aggregation statement:
sum(http_requests) by (job, method) keeping_extra
I'm open to better syntax/naming suggestions.
Change-Id: I21d3fe7af9e98552ce3dffa3ce7c0a4ba4c0b4a4
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:
- there could be time.Time values which were out of the range/precision of the
time type that we persist to disk, therefore causing incorrectly ordered keys.
One bug caused by this was:
https://github.com/prometheus/prometheus/issues/367
It would be good to use a timestamp type that's more closely aligned with
what the underlying storage supports.
- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
Unix timestamp (possibly even a 32-bit one). Since we store samples in large
numbers, this seriously affects memory usage. Furthermore, copying/working
with the data will be faster if it's smaller.
*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.
*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.
For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
passed into Target.Scrape()).
When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
telemetry exporting, timers that indicate how frequently to execute some
action, etc.
*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.
Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
This adds timers around several query-relevant code blocks. For now, the
query timer stats are only logged for queries initiated through the UI.
In other cases (rule evaluations), the stats are simply thrown away.
My hope is that this helps us understand where queries spend time,
especially in cases where they sometimes hang for unusual amounts of
time.
This commit adds telemetry for the Prometheus expression rule
evaluator, which will enable meta-Prometheus monitoring of customers
to ensure that no instance is falling behind in answering routine
queries.
A few other sundry simplifications are introduced, too.
This also removes the now obsolete scalar count() function and corrects the
expressions test naming (broken in
2202cd71c9 (L6R59))
so that the expression tests will actually run.