Commit Graph

7882 Commits

Author SHA1 Message Date
Hrishikesh Barman
581d16d751
Updated prombench workflow to use test-infra cluster (#7214) 2020-05-07 11:17:46 +03:00
Nevill
adeb946e54
Add funcbench workflow (#7199) 2020-05-07 11:08:21 +03:00
Chris Marchbanks
2668fa1ad2
Merge pull request #7188 from csmarchbanks/simplify-queue-metrics
Remove duplicate metrics in QueueManager
2020-05-06 12:29:22 -06:00
Ganesh Vernekar
d4b9fe801f
M-map full chunks of Head from disk (#6679)
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages
 - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md)  - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

**Prombench results**

_WAL Replay_

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

_Memory During WAL Replay_

High Churn:
10-15% less RAM -  32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM -  23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)


Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-06 21:00:00 +05:30
Chris Marchbanks
c1f9917e90
Add test for unregistering queue manager metrics
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-05-05 14:14:04 -06:00
Chris Marchbanks
dfad1da296
Remove duplicate metrics in QueueManager
Right now any new metrics added for remote write need to be added to
both the QueueManager struct, and the queueManagerMetrics struct.
Instead, use the queueManagerMetrics struct directly from QueueManager.

The newQueueManagerMetrics constructor will now create the metrics for a
specific queue with name and endpoint pre-populated, and a new copy of
the struct will be created specifically for each queue.

This also fixes a bug where prometheus_remote_storage_sent_bytes_total
is not being unregistered after a queue is changed.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-05-05 14:13:59 -06:00
Bartlomiej Plotka
532f7bbac9
Merge pull request #7204 from prometheus/release-2.18
[Merge Without Squash] Merge release-2.18 back to master.
2020-05-05 18:58:45 +01:00
Bartlomiej Plotka
a12e96299d
Cut 2.18.0 release. (#7201)
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-05-05 15:22:02 +01:00
Harold Dost
0e2004f6fb
Simplify the Getting Started documentation. (#7193)
- Reduce the level of entry to start gathering metrics with prometheus
  by suggesting to just download pre-built exporters instead of requiring
  the reader to download an entire Golang build chain and checkout a project.

Fix #6956

Signed-off-by: Harold Dost <h.dost@criteo.com>
2020-05-04 11:49:45 +01:00
Julien Pivotto
7ecd2d1c24
Jaeger: Create child span for remote read (#7187)
* Jaeger: Create child span for remote read
* Jaeger: use middleware to trace client http request

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-02 22:41:55 +02:00
Harold Dost
18d45e564b
Documentation: Update example expressions to follow convention. (#7195)
Based out of conversation on #7193

Signed-off-by: Harold Dost <h.dost@criteo.com>
2020-05-02 12:52:24 +01:00
Guangming Wang
5b4006ac86
cleanup: remove unnacessary nil check before range (#7194)
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2020-05-02 07:25:44 +01:00
qinng
f36ae1c21c
[remote-storage] use warn log level when send samples to remote failed (#7184)
[remote] increasing sendbatch error log level

Signed-off-by: guoruyi1 <guoruyi1@xiaomi.com>
Co-authored-by: guoruyi1 <guoruyi1@xiaomi.com>
2020-04-30 17:06:22 -06:00
Hongcai Ren
1c48005911
bump client golang to v1.6.0 (#7191)
* bump github.com/prometheus/client_golang to v1.6.0

Signed-off-by: RainbowMango <renhongcai@huawei.com>
2020-04-30 12:24:47 +01:00
Bartlomiej Plotka
b575f95c8a
Cut 2.18.0-rc.1 (#7186)
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-29 23:03:44 +01:00
Ben Ye
1e4e37144d
Fixed wrongly handled not ready TSDB on web and API. (#7182)
* fix federate endpoint panic

Signed-off-by: yeya24 <yb532204897@gmail.com>

* Fixed all cases of not ready TSDB being wrongly handled.

* Fixed issue for federation.
* Ensured this will never happen again thanks to interfaces
* Fixes same issue for stats.
* Added tests for readiness.
* Fixed bug in stats. It was:
   status.MaxTime = db.Head().MaxTime()
   status.MinTime = db.Head().MaxTime()


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-29 17:16:14 +01:00
ga
05038b48bd
Goroutine: Fix ambiguous variable (#7175)
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
2020-04-28 11:02:26 +01:00
Bartlomiej Plotka
33606d1cf7
Cut release 2.18.0-rc.0 (#7165)
* Cut release 2.18.0-rc.0

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Removed mention about Go update.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Julien comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added Julien suggestion.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Chris' and Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Bjorn's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-25 10:09:08 +01:00
Bartlomiej Plotka
746820ede8
Merge pull request #7162 from prometheus/partial-dep-update
Updated all deps except k8s.io/client.
2020-04-24 12:14:57 +01:00
Bartlomiej Plotka
dbc9bd7948 Updared mod as suggested by Julien
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-24 08:50:55 +01:00
Bartlomiej Plotka
94baacdd93 Moved down all k8s.io deps to old version.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 17:07:29 +01:00
Bartlomiej Plotka
1d13a2cd2f Updated different swagger output.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 16:52:14 +01:00
Bartlomiej Plotka
69d60f2411 Don't touch circle.yml it's too scary.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 16:22:28 +01:00
Bartlomiej Plotka
ee72599e5d Reverted k8s-client-go
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 16:21:42 +01:00
Bartlomiej Plotka
8e247ba0ba Moved back k8s-client.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 16:20:28 +01:00
Bartlomiej Plotka
1a8c3f2b7d Updated CircleCI for hope that windows will have new Go.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 15:14:08 +01:00
Bartlomiej Plotka
1bd55973c3 Fixed flakty pool test.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 14:14:44 +01:00
Bartlomiej Plotka
86ff4a1717 Updated all deps.
Pinned github.com/googleapis/gnostic as they introduced braking change.


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 13:43:59 +01:00
Goutham Veeramachaneni
84b4d079c8
Make sure deleted intervals are excluded from Seek (#6980)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2020-04-23 10:00:30 +01:00
Boqin Qin
f3c6d26781
notifier: forget unlock before return (#7133)
Signed-off-by: BurtonQin <bobbqqin@gmail.com>

Co-authored-by: root <root@neon-cats-4.localdomain>
2020-04-23 09:49:57 +01:00
ZouYu
5c5ac7cc3e
add unit test for pkg/pool/pool.go (#7152)
Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>
2020-04-23 09:49:07 +01:00
Vasily Sliouniaev
0393b188c9
Add Jaeger (#7148)
* Trace remote read

Signed-off-by: vas <vasily.sliouniaev@jet.com>

* Use jaeger

Signed-off-by: vas <vasily.sliouniaev@jet.com>
2020-04-23 02:05:55 +02:00
ZouYu
06493b7034
add unit test TestLabels_String for pkg/labels/labels.go (#7150)
Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>
2020-04-22 12:32:47 +05:30
Marek Slabicki
4b5e7d4984
Adding a shouldReshard function to modularize logic for the QueueManager deciding if it should shard or not (#7143)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-20 16:20:39 -06:00
Julien Pivotto
fc3fb3265a
Merge pull request #7145 from prometheus/release-2.17
Backport release 2.17 into master
2020-04-20 14:08:12 +02:00
Julien Pivotto
18254838fb
Release 2.17.2 (#7139)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-20 10:17:21 +02:00
Julien Pivotto
9072cf7203
Merge pull request #7137 from roidelapluie/cherrypicks
Cherry-pick three bugfixes from master to release-2.17
2020-04-18 20:21:26 +02:00
Chris Marchbanks
a7b449320d
Fix updating rule manager never finishing (#7138)
Rather than sending a value to the done channel on a group to indicate
whether or not to add stale markers to a closing rule group use an
explicit boolean. This allows more functions than just run() to read
from the done channel and fixes an issue where Eval() could consume the
channel during an update, causing run() to never return.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-04-18 14:32:18 +02:00
Björn Rabenstein
ca23cd064e
Merge pull request #7136 from prometheus/beorn7/api
Ensure queries are closed in API calls
2020-04-18 00:58:11 +02:00
beorn7
69ac27e1b4 Make series method return a finalizer, too
Signed-off-by: beorn7 <beorn@grafana.com>
2020-04-17 22:40:39 +02:00
Julien Pivotto
7eedcc708e promql/parser: Cleanup generatedParserResult accross reuse
Reusing the same generatedParserResult ends up in strange panics:
See #7131 and #7127.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-17 22:06:42 +02:00
Julian Taylor
e2c06a8898 register federation failure metrics (#7081)
Closes gh-7080

Signed-off-by: Julian Taylor <juliantaylor108@gmail.com>
2020-04-17 22:06:16 +02:00
Julien Pivotto
a2fcdeb1ef Defer finalizer (#7129)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-17 22:05:38 +02:00
Julien Pivotto
ed1852ab95
TSDB: Isolation: avoid creating appenderId's without appender (#7135)
Prior to this commit we could have situations where we are creating an
appenderId but never creating an appender to go with it, therefore
blocking the low watermak.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-17 20:51:03 +02:00
beorn7
f9f423ec0a Ensure queries are closed in API calls
Signed-off-by: beorn7 <beorn@grafana.com>
2020-04-17 20:32:36 +02:00
Chris Marchbanks
cd12f0873c
Merge pull request #7073 from csmarchbanks/fix-md5-remote-write
Fix remote write not updating when relabel configs or secrets change
2020-04-16 16:36:25 -06:00
Julien Pivotto
209d4bb8a1
Defer finalizer (#7129)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-16 20:16:16 +02:00
Frederic Branczyk
1d6532e9e5
Merge pull request #7132 from roidelapluie/clpql
promql/parser: Cleanup generatedParserResult accross reuse
2020-04-16 15:03:42 +02:00
gotjosh
24af5049bb
API: Allow TargetRetriever to receive a Context (#7125)
Fixes #7103

Signed-off-by: gotjosh <josue@grafana.com>
2020-04-16 09:30:47 +01:00
Julien Pivotto
1f6f8e60ee promql/parser: Cleanup generatedParserResult accross reuse
Reusing the same generatedParserResult ends up in strange panics:
See #7131 and #7127.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-16 01:51:08 +02:00