Commit Graph

5519 Commits

Author SHA1 Message Date
Simon Pasquier
f9462d5d44 discovery/consul: pass current context to Consul queries
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-18 14:23:56 +01:00
Simon Pasquier
b41d6d54f2
storage/remote: increase timeouts for Travis CI (#5224)
* storage/remote: adapt tests for Travis CI

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Check filesystems on Travis environment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Run remote/storage tests on CircleCI for troubleshooting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Try using tmpfs partition

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "Try using tmpfs partition"

This reverts commit 85a30deb72.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Don't store labels in writeToMock

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix data race

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Bump retries to 100 meaning that the total timeout is 10s

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* clean up .travis.yml

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* code fixup

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Remove unneeded empty line

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-15 16:47:41 +01:00
Goutham Veeramachaneni
b7594f650f
Merge pull request #5203 from codesome/shepherd
Propose myself (Ganesh, @codesome) as 2.8 release shepherd
2019-02-14 14:21:17 +01:00
Simon Pasquier
12708acd15
scrape: catch errors when creating HTTP clients (#5182)
* scrape: catch errors when creating HTTP clients

This change makes sure that no scrape pool is created with a nil HTTP
client.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Tariq's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Brian's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-13 14:24:22 +01:00
Callum Styan
37e35f9e0c Various improvements to WAL based remote write.
- Use the queue name in WAL watcher logging.
- Don't return from watch if the reader error was EOF.
- Fix sample timestamp check logic regarding what samples we send.
- Refactor so we don't need readToEnd/readSeriesRecords
- Fix wal_watcher tests since readToEnd no longer exists

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Tom Wilkie
b93bafeee1 Various fixes to locking & shutdown for WAL-based remote write.
- Remove datarace in the exported highest scrape timestamp.
- Backoff on enqueue should be per-sample - reset the result for each sample.
- Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor.
- Reorder functions in WALWatcher depth-first according to call graph.
- Fix vendor/modules.txt.
- Split out the various timer periods into consts at the top of the file.
- Move w.currentSegmentMetric.Set close to where we set the currentSegment.
- Combine r.Next() and isClosed(w.quit) into a single loop.
- Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read.
- Reorganise checkpoint handling to reduce nesting and make it easier to follow.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 11:39:13 +00:00
Callum Styan
6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Maria Nemtinova
8e3a39f725 Web UI QoL improvements (#5201)
1. Added an ability to resize text area on mouseclick
2. Remember selected target status button on page reload

Signed-off-by: Maria Nemtinova <nemtinovamasha@gmail.com>
2019-02-12 00:22:05 +01:00
Ganesh Vernekar
ce69dcb0e5
Propose @codesome as 2.8 release shepherd
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-02-11 23:44:01 +05:30
JoeWrightss
4cb6c202ff Fix fmt.Errorf error message (#5199)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-10 15:16:20 +05:30
Tariq Ibrahim
a2a6e24f9f show list of offending labels in the error message in many-to-many scenarios (#5189)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-02-09 10:17:52 +01:00
Minh-Long Do
b26b5c9e96 Add rendering test of template based web endpoints (#5188)
Signed-off-by: Minh-Long  Do <minhlong.langos@gmail.com>
2019-02-08 10:17:47 +00:00
Simon Pasquier
fc10f6d814
Unset GO111MODULE variable in Makefile.common (#5191)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-07 17:22:04 +01:00
Goutham Veeramachaneni
9b8bbe3246
Merge pull request #5187 from prometheus/beorn7/release
Merge v2.7 bugfixes into master
2019-02-06 21:32:06 +01:00
beorn7
d26e134bd4 Merge branch 'release-2.7' into beorn7/release
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 15:22:40 +01:00
Björn Rabenstein
3db36f34ec
Merge pull request #5186 from prometheus/beorn7/metrics
Fix prometheus_rule_group_last_evaluation_timestamp_seconds
2019-02-06 15:19:08 +01:00
beorn7
2db1eeb4ec Fix prometheus_rule_group_last_evaluation_timestamp_seconds
It should be a unix timestamp, not the seconds in the minute.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 11:02:49 +01:00
zhulongcheng
fd964426a7 web: predeclare and reuse errors (#5180)
Predeclare and reuse errors to reduce duplicate code

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-02-04 13:06:26 +01:00
zhulongcheng
a75f8a8e05 update error message in extractTimeRange (#5179)
Update error message in the extractTimeRange function
to match function's logic

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-02-03 09:29:23 +00:00
JoeWrightss
e158c53fa9 Fix some typos in comment (#5175)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-01 14:35:32 +00:00
Brian Brazil
c66aeb3fff
In histogram_quantile merge buckets with equivalent le values (#5158)
This makes things generally more resilient, and will
help with OpenMetrics transitions (and inconsistencies).

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-02-01 10:22:44 +00:00
Simon Pasquier
a60431f3cd Merge v2.7.1 into master (#5170)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-01 09:54:12 +01:00
Vishnunarayan K I
108b9b0e5f Limit number of merics in prometheus UI (#5139)
Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>
2019-01-31 17:03:50 +00:00
Frederic Branczyk
50e1228f88
Merge pull request #5147 from prometheus/brancz-patch-1
docs: Add filesystem POSIX requirement
2019-01-31 16:20:46 +01:00
Frederic Branczyk
32079f351f
docs: Specifically call out NFS and POSIX
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2019-01-31 12:57:48 +01:00
Goutham Veeramachaneni
62e591f928
*: cut 2.7.1 (#5164)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-31 12:13:25 +01:00
Goutham Veeramachaneni
b03d6f6eff
Remove custom highlight code, it's not needed. (#5163)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-31 11:27:18 +01:00
Ganesh Vernekar
10ae00ab9d Fix bug from #4898 (#5161)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:14:14 +01:00
Ganesh Vernekar
787eb1e904 Set rule_group_last_duration_seconds to seconds (#5153)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:07:58 +01:00
Frederic Branczyk
3de734d8de
docs: Add filesystem POSIX requirement
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2019-01-29 13:51:16 +01:00
Ganesh Vernekar
a2ef8cf2f5
Merge pull request #5146 from tariq1890/ineff
fix ineffectual assignment in dns.go
2019-01-29 11:50:34 +05:30
tariqibrahim
b173de0c26 fix ineffectual assignment in dns.go
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-01-28 17:15:43 -08:00
Jannick Fahlbusch ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎
63f375e80a [FIX] Azure DS: Return error when request failed (#4719)
This fixes the issue that the error is swallowed when the request failed.

Signed-off-by: Jannick Fahlbusch <git@jf-projects.de>
2019-01-28 21:31:45 +00:00
Brian Brazil
1dd57765b4
Reduce time that alertmanagers are in flux when reloaded. (#5126)
This no longer waits for all of the scrape reload to complete
before getting a list of AMs again.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-28 18:34:12 +00:00
Bryan Boreham
8841692a63 Use the context associated with the inner evaluation span (#5130)
Signed-off-by: Bryan Boreham <bryan@weave.works>
2019-01-28 18:33:30 +00:00
Tariq Ibrahim
f4275d2352 Use the latest versions of azure go sdk and go-autorest (#5015)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-01-28 18:30:29 +00:00
Tariq Ibrahim
bfcdba211f remove the prepended watch reactor from the fake k8s client (#5140)
Signed-off-by: tariqibrahim <tariq.ibrahim@microsoft.com>
2019-01-28 16:42:25 +01:00
Goutham Veeramachaneni
410ee9e04a
*: cut 2.7.0 (#5141)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-28 15:37:30 +05:30
Goutham Veeramachaneni
7f7b211047
*: cut 2.7.0-rc.2 (#5134)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-24 18:55:04 +05:30
Goutham Veeramachaneni
b454ed3ec2
*: cut 2.7.0-rc.1 (#5123)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-21 18:47:37 +05:30
Goutham Veeramachaneni
4e83f91cfd
Rollback Dockerfile to version @ 2.5.x (#5122)
Fixes https://github.com/prometheus/prometheus/issues/5043

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-21 17:27:16 +05:30
Hrishikesh Barman
9c4e258651 corrected regex string check for anyorigin(*) (#5117)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-21 17:17:27 +05:30
Goutham Veeramachaneni
24f19f03db
*: cut 2.7.0-rc.0 (#5114)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 22:16:02 +05:30
Goutham Veeramachaneni
4068968e12
Protect retention from overflowing (#5112)
Also sanitise the max block duration to max a month.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 20:18:06 +05:30
Goutham Veeramachaneni
384cba1211
Add flag for size based retention (#5109)
* Add flag for size based retention

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Deprecate the old retention flag for a new one.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add ability to take a suffix for size flag

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Address feedback

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 19:18:36 +05:30
Krasi Georgiev
3bd41cc92c Udpate tsdb to 0.4 (#5110)
* update tsdb to v0.4.0

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* remove unused struct field

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-18 16:32:14 +05:30
Simon Pasquier
68e4c211f2
discovery/azure: more robust handling of go routines (#5106)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-18 09:55:47 +01:00
Hrishikesh Barman
a1f34bec2e Added CORS Origin flag (#5011)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-17 15:01:06 +00:00
Matt Layher
c44cd7e166
Merge pull request #5102 from prometheus/mdl-gofmt
*: apply gofmt -s
2019-01-16 19:12:43 -05:00
Matt Layher
67c43f3054
Merge pull request #5101 from prometheus/mdl-no-fatal
pkg/runtime: use panic instead of log.Fatal for system call errors
2019-01-16 19:12:29 -05:00