Commit Graph

5564 Commits

Author SHA1 Message Date
Tom Wilkie
184f06a981 Combine the record decoding metrics into one; break out garbage collection into a separate function.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie
859cda27ff Remove some 'global' state, moving segment numbers to parameters.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie
bdc6b764b0 If reading the WAL fails, try again. Also, read from the segment containing the index for the last checkpoint, not the first segment.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie
d6f911b511 Factor out logging ratelimit & dedupe middleware.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie
a5c20642b3 Refactor WAL watcher to remove some duplication.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie
37ad4db485 Export timestamps in seconds since epoch.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
LongKB
84df210c41 Update prometheus.io's URL to the latest version (#5270)
Currently, the lastest version is **2.7**. But the version in web page is **2.0**.
So this commit aims to update the URL to the latest version of **prometheus.io**

Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
2019-02-27 10:39:50 +00:00
JoeWrightss
e4b88704a6 Fix misspell in manager_test.go (#5279)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-27 11:22:31 +01:00
Simon Pasquier
1d2fc95b1c
discovery/marathon: pass context to the client (#5232)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 14:49:16 +01:00
Simon Pasquier
e60d314f43
discovery/consul: pass current context to Consul queries (#5230)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 14:48:19 +01:00
Simon Pasquier
8f578d9c6b
discovery/ec2: pass context to the client (#5234)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 14:48:03 +01:00
Simon Pasquier
4997dcb4a1
discovery/gce: pass context to the client (#5233)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 14:47:43 +01:00
Simon Pasquier
9040dddd0c
discovery/azure: pass context to the client (#5255)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 14:47:26 +01:00
Simon Pasquier
fe7a1bcfc6
discovery/triton: pass context to the client (#5235)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-26 14:47:04 +01:00
tuanvcw
9de0ab3c8a Update remaining deprecated links in docs (#5271)
Signed-off-by: Vu Cong Tuan <tuanvc@vn.fujitsu.com>
2019-02-26 10:16:38 +00:00
Björn Rabenstein
ad29221a7b
Merge pull request #5020 from erikh/upgrade-miekg-dns
Upgrade miekg dns
2019-02-25 12:47:32 +01:00
David Symonds
46361a7c85 rules: Fix sorting of result from (*Manager).RuleGroups (#5260)
The previous code was defective in that it never sorted groups within a
file due to doing a multi-key sort incorrectly.

Signed-off-by: David Symonds <dsymonds@gmail.com>
2019-02-23 09:51:44 +01:00
Simon Pasquier
e72c875e63
config: fix Kubernetes config with empty API server (#5256)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-22 15:51:47 +01:00
JoeWrightss
362873f72b Fix .Log() error message (#5257)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-22 14:39:37 +00:00
LongKB
e4a741cb7d Replacing 'HTTP' by 'HTTPS' for securing links (#5252)
Currently, when we access the modified pages with **HTTP**, it is
redirected to **HTTPS** automatically. So this commit aims to
replace **HTTP** to **HTTPs** for security.

Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
2019-02-22 14:33:02 +01:00
LongKB
23480bef43 Remove the duplicated words (#5251)
Although it is spelling mistakes, it might make an affects while reading.

Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
2019-02-22 14:32:34 +01:00
Nguyen Hai Truong
5fbda4c9d7 Secure http links (#5244)
Fix http link to https link for secure, modify http to https
in the links of project. Have some http links doesn't
redirect into https.

Co-Authored-By: Nguyen Van Trung trungnv@vn.fujitsu.com
Signed-off-by: Nguyen Hai Truong <truongnh@vn.fujitsu.com>
2019-02-21 10:48:47 +01:00
Ganesh Vernekar
1d9e11a390
Merge pull request #5247 from longkb/fix_typo
Trivial fix: Fix some typos in comments
2019-02-21 10:47:07 +05:30
Ganesh Vernekar
ded80bf4a5
Merge pull request #5246 from truongnh1992/removing-redundant-words
Remove duplicated words in comments
2019-02-21 10:45:25 +05:30
Kim Bao Long
94f5352951 Trivial fix: Fix some typos in comments
Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
2019-02-21 09:07:49 +07:00
Nguyen Hai Truong
aed9ea144a Remove duplicated words in comments
Although it is spelling mistakes, it might make an affects
while reading.

Co-Authored-By: Kim Bao Long longkb@vn.fujitsu.com
Signed-off-by: Nguyen Hai Truong <truongnh@vn.fujitsu.com>
2019-02-20 17:41:02 -08:00
Simon Pasquier
c8a1a5a93c
discovery/kubernetes: fix support for password_file and bearer_token_file (#5211)
* discovery/kubernetes: fix support for password_file

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Create and pass custom RoundTripper to Kubernetes client

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use inline HTTPClientConfig

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 11:22:34 +01:00
Nguyen Van Duc
89d36a4bf6 Change http to https for security links (#5238)
Signed-off-by: vanduc95 <ducnguyenvan.bk@gmail.com>
2019-02-20 09:50:45 +00:00
Erik Hollensbe
be3c082539 discovery/dns/dns.go: fix handling of truncated dns records
https://github.com/miekg/dns/pull/815 goes into the detail, but more or
less the existing solution was no longer supported and needed to be
rewritten to support the new versions of the library. miekg additionally
claims this is more correct in the ticket.

Signed-off-by: Erik Hollensbe <github@hollensbe.org>
2019-02-20 00:36:41 +00:00
Julius Volz
f7332c4dcf
Merge pull request #5226 from prometheus/bootstrap4
Update to Bootstrap 4
2019-02-20 00:00:31 +00:00
Julius Volz
795c989d36 Merge branch 'master' into bootstrap4
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-19 22:32:55 +00:00
Palash Nigam
09208b1a58 queryRange: Add more descriptive error messages (#5229)
Fixes: https://github.com/prometheus/prometheus/issues/4811

Signed-off-by: Palash Nigam <npalash25@gmail.com>
2019-02-19 19:16:14 +00:00
Krasi Georgiev
a3c41f4256
use the default time retention value only when no size retention is set (#5216)
fixes https://github.com/prometheus/prometheus/issues/5213

Now that we have time and size base retention time bases should not have a default value. A default is set only when both - time and size flags are not set.

This change will not affect current installations that rely on the default time based value, and will avoid confusions when only the size retention is set and it is expected that the default time based setting would be no longer in place.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-19 13:53:43 +02:00
Krasi Georgiev
41dee81554
Makefile.common: add check_license by default. (#5236) 2019-02-19 13:25:10 +02:00
Simon Pasquier
f9462d5d44 discovery/consul: pass current context to Consul queries
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-18 14:23:56 +01:00
Sylvain Rabot
87c79b0c81 Fix console templates (#5228)
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
2019-02-18 13:14:58 +00:00
Julius Volz
fdbaef86df Re-add typeahead license header to minified file
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 22:50:20 +00:00
Julius Volz
661f0127bc Rebuild web assets
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 22:50:16 +00:00
Julius Volz
ed635190ba Re-add popper.js to fix target label tooltips
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:57 +00:00
Julius Volz
7b724cea3a Whitespace and other cleanups
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:52 +00:00
Julius Volz
7244ef3783 Add more top/bottom spacing for All/Unhealthy buttons
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:48 +00:00
Julius Volz
028e99e3d6 Remove spacing between All/Unhealthy buttons
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:44 +00:00
Julius Volz
45b91e8e80 Fix copy&paste button on /config, move pre style to CSS
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:30 +00:00
Julius Volz
cd569b51d9 Merge branch 'master' into bootstrap4 2019-02-17 17:22:41 +00:00
Simon Pasquier
b41d6d54f2
storage/remote: increase timeouts for Travis CI (#5224)
* storage/remote: adapt tests for Travis CI

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Check filesystems on Travis environment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Run remote/storage tests on CircleCI for troubleshooting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Try using tmpfs partition

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "Try using tmpfs partition"

This reverts commit 85a30deb72.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Don't store labels in writeToMock

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix data race

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Bump retries to 100 meaning that the total timeout is 10s

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* clean up .travis.yml

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* code fixup

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Remove unneeded empty line

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-15 16:47:41 +01:00
Goutham Veeramachaneni
b7594f650f
Merge pull request #5203 from codesome/shepherd
Propose myself (Ganesh, @codesome) as 2.8 release shepherd
2019-02-14 14:21:17 +01:00
Simon Pasquier
12708acd15
scrape: catch errors when creating HTTP clients (#5182)
* scrape: catch errors when creating HTTP clients

This change makes sure that no scrape pool is created with a nil HTTP
client.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Tariq's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Brian's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-13 14:24:22 +01:00
Callum Styan
37e35f9e0c Various improvements to WAL based remote write.
- Use the queue name in WAL watcher logging.
- Don't return from watch if the reader error was EOF.
- Fix sample timestamp check logic regarding what samples we send.
- Refactor so we don't need readToEnd/readSeriesRecords
- Fix wal_watcher tests since readToEnd no longer exists

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Tom Wilkie
b93bafeee1 Various fixes to locking & shutdown for WAL-based remote write.
- Remove datarace in the exported highest scrape timestamp.
- Backoff on enqueue should be per-sample - reset the result for each sample.
- Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor.
- Reorder functions in WALWatcher depth-first according to call graph.
- Fix vendor/modules.txt.
- Split out the various timer periods into consts at the top of the file.
- Move w.currentSegmentMetric.Set close to where we set the currentSegment.
- Combine r.Next() and isClosed(w.quit) into a single loop.
- Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read.
- Reorganise checkpoint handling to reduce nesting and make it easier to follow.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 11:39:13 +00:00
Callum Styan
6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00