Commit Graph

77 Commits

Author SHA1 Message Date
Brian Brazil
3f8e51738c
More granular locking for scrapeLoop. (#8104)
Don't lock for all of Sync/stop/reload as that holds up /metrics and the
UI when they want a list of active/dropped targets. Instead take
advantage of the fact that Sync/stop/reload cannot be called
concurrently by the scrape Manager and lock just on the targets
themselves.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-10-26 14:46:20 +00:00
Julien Pivotto
be5ba1a62d Fix wordings
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 21:44:36 +02:00
Julien Pivotto
671f7c66e5 Adjust comment
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:28:02 +02:00
Julien Pivotto
627ff84599 Adjust flag
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:25:52 +02:00
Julien Pivotto
536dfb6234 Add an experimental, hidden flag
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 17:31:46 +02:00
Julien Pivotto
b90c7a55da Simplify logic
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-06 21:17:16 +02:00
Julien Pivotto
ccc1df3140 Fix comment
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-06 13:48:24 +02:00
Julien Pivotto
98e14611a5 Move the tolerance logic in the loop function.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-05 18:20:10 +02:00
Julien Pivotto
6544f95403 Introduce timestamp tolerance in scrapes
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-05 18:20:10 +02:00
iurii
bd53b5ff37
Unnecessary go routine spawn. (#7879)
* Unnecessary go routine spawn.
* Remove unnecessary local variable creation.

Signed-off-by: iurii <iurii@coins.ph>
Co-authored-by: iurii <iurii@coins.ph>
2020-09-02 16:26:42 +01:00
Andy Bursavich
4e6a94a27d
Invert service discovery dependencies (#7701)
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
2020-08-20 13:48:26 +01:00
Julien Pivotto
2899773b01
Do not stop scrapes in progress during reload (#7752)
* Do not stop scrapes in progress during reload.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-07 15:58:16 +02:00
johncming
5578c96307
scrape: fix typo. (#7712)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-01 09:56:21 +01:00
Julien Pivotto
7b5507ce4b
Scrape: defer report (#7700)
When I started wotking on target_limit, scrapeAndReport did not exist
yet. Then I simply rebased my work without thinking.

It appears that there is a lot that can be inline if I defer() the
report.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-31 19:11:08 +02:00
Annanay
ec562f152b Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-31 13:03:56 +05:30
Julien Pivotto
f482c7bdd7
Add per scrape-config targets limit (#7554)
* Add per scrape-config targets limit

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 14:20:24 +02:00
Annanay
7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
johncming
490f9c664e
scrape: remove two blank lines. (#7610)
Signed-off-by: johncming <johncming@yahoo.com>
2020-07-19 07:34:04 +02:00
Julien Pivotto
754461b74f
Reuse the same appender for report and scrape (#7562)
Additionally, implement isolation in collectResultAppender.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-16 13:53:39 +02:00
Julien Pivotto
190addffd8
Change Scrape Loop mtx to Mutex (#7553)
It was still RWLock but we never use the read lock..

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-11 15:37:13 +02:00
Brian Brazil
f9d21f10ec
Only relabelling should apply for scrape_samples_scraped_post_relabelling. (#7342)
More consistent variable names.

Fixes #7298

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-06-04 16:00:37 +01:00
Brian Brazil
c9565f08aa
Pass reference to checkAddError so appendErrors is updated. (#7294)
This was preventing the warnings from being logged.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-05-26 15:14:55 +01:00
Marek Slabicki
8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Callum Styan
c453def8c5
Separate scrape add error checking out into it's own function. (#6930)
* Separate scrape add error checking out into it's own function.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* pass sampleLimitError to checkAddError instead of returning an error

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Return bool, error from checkAddError so we can properly handle
ErrNotFound for AddFast. This should in theory never happen, but the
previous code path handled this case. Adds a test for this, which master
passes and the previous commit fails.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address comment changes.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Move sampleAdded inside the loop iteration within append, since that's
the only block the variable is used in.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2020-03-25 19:31:48 -07:00
Julien Pivotto
d6ad5551c9
Scrape: do not put staleness marker when cache is reused (#7011)
* Scrape: do not put staleness marker when cache is reused

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-20 17:43:26 +01:00
Julien Pivotto
8907ba6235 Make TSDB use storage errors
This fixes #6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-17 22:24:25 +01:00
Björn Rabenstein
d80b0810c1
Move crucial actions to defer (#6918)
With defer having less of a performance penalty, there is no reason
not to do those crucial operations via defer.

Context: With isolation in place, if we forget to Commit/Rollback, the
low watermark will get stuck forever.

The current code should not have any bugs, but moving to defer helps
to avoid future bugs.

This is also moving the `closeAppend` in the `Commit` implementation
itself to defer. If logging to the WAL fails, we would have missed the
`closeAppend`.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-13 20:54:47 +01:00
Brian Brazil
5da8990053
Log scrape append failures as debug rather than warn. (#6852)
This is most likely due to an endpoint not producing valid
metrics output, which we should treat the same as a failed
scrape, and thus not spam the application logs with it.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-03-06 00:46:03 +00:00
李国忠
52025bd7a9
[comments] change word ‘wheter’ to ‘whether’ (#6912)
* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-02 13:51:24 +05:30
Julien Pivotto
ed623f69e2
tsdb: don't allow ingesting empty labelsets (#6891)
* tsdb: don't allow ingesting empty labelsets

When we ingest an empty labelset in the head, further blocks can not be
compacted, with the error:

```
level=error ts=2020-02-27T21:26:58.379Z caller=db.go:659 component=tsdb
msg="compaction failed" err="persist head block: write compaction:
add series: out-of-order series added with label set \"{}\" / prev:
\"{}\""
```

We should therefore reject those invalid empty labelsets upfront.

This can be reproduced with the following:

```
cat << END > prometheus.yml
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 1s
    basic_auth:
      username: test
      password: test
    metric_relabel_configs:
    - regex: ".*"
      action: labeldrop

    static_configs:
    - targets:
      - 127.0.1.1:9090
END
./prometheus --storage.tsdb.min-block-duration=1m
```
And wait a few minutes.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-02 07:18:05 +00:00
Bartlomiej Plotka
34426766d8 Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.

* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.

No logic was changed, only different source of abstractions, so no need for benchmarks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:54 +00:00
gotjosh
8b49c9285d
scrape: Add metrics to track bytes and entries in the metadata cache (#6675)
Signed-off-by: gotjosh <josue@grafana.com>
2020-01-29 11:13:18 +00:00
Julien Pivotto
fafb7940b1 Pass over scrape cache to the next scrape (#6670)
* Pass over scrape cache to the next scrape

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-22 12:13:47 +00:00
gotjosh
05842176a6 Make the scrape.metricMetadataStore interface public
To test the implementation of our metric metadata API, we need to represent various states of metadata in the scrape metadata store. That is currently not possible as the interface and method to set the store are private.

This changes the interface, list and get methods, and the SetMetadaStore function to be public.

Incidentally, the scrapeCache implementation needs to be renamed to match the new signature.

Signed-off-by: gotjosh <josue@grafana.com>
2019-12-05 10:29:58 +00:00
Geoffrey Beausire
5cb7987314 Fix relabaling collision when using exported label
When using both a label and the suffix+label in the
relabel config. It's possible that Prometheus remove
the suffx+label for no obvious reason. It's due to a
collision when merging labels from target and from
the sample.

Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
2019-11-26 11:03:11 +01:00
Dustin Hooten
ca60bf298c React UI: Implement /targets page (#6276)
* Add LastScrapeDuration to targets endpoint

Signed-off-by: Dustin Hooten <dhooten@splunk.com>

* Add Scrape job name to targets endpoint

Signed-off-by: Dustin Hooten <dhooten@splunk.com>

* Implement the /targets page in react

Signed-off-by: Dustin Hooten <dhooten@splunk.com>

* Add state query param to targets endpoint

Signed-off-by: Dustin Hooten <dhooten@splunk.com>

* Use state filter in api call

Signed-off-by: Dustin Hooten <dhooten@splunk.com>

* api feedback

Signed-off-by: Dustin Hooten <dhooten@splunk.com>

* pr feedback frontend

Signed-off-by: Dustin Hooten <dhooten@splunk.com>

* Implement and use localstorage hook

Signed-off-by: Dustin Hooten <dhooten@splunk.com>

* PR feedback

Signed-off-by: Dustin Hooten <dhooten@splunk.com>
2019-11-11 22:42:24 +01:00
johncming
1fa5a75a3a Ctx name (#5961)
* scrape: rename ctx name for readability

Signed-off-by: johncming <johncming@yahoo.com>

* scrape: use self ctx instead of parent ctx.

Signed-off-by: johncming <johncming@yahoo.com>
2019-08-28 15:55:09 +02:00
Julius Volz
b5c833ca21
Update go.mod dependencies before release (#5883)
* Update go.mod dependencies before release

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Add issue for showing query warnings in promtool

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Revert json-iterator back to 1.1.6

It produced errors when marshaling Point values with special float
values.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Fix expected step values in promtool tests after client_golang update

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Update generated protobuf code after proto dep updates

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-08-14 11:00:39 +02:00
Brian Brazil
e62f30d497
Correctly handle empty labels from alert templates. (#5845)
Fixes https://github.com/prometheus/common/issues/36

Move logic handling this into the labels package,
so all the cases are handled in one place and we're
less likely to have this come up again.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-08-13 11:19:17 +01:00
yeya24
b7bb278e95 make targets active parallel (#5740)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2019-07-29 17:08:54 +01:00
Simon Pasquier
9a1935d641
scrape: remove unused type (#5761)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-15 08:54:22 +02:00
Brian Brazil
b98e818876
Add scrape_series_added per-scrape metric. (#5546)
This is an estimate of churn, with series being added to the cache being
considered churn. This will have both false positives (e.g. series
appearing and disappearing) and false negatives (e.g. series hit
sample_limit, but still created in head block), but should be generally
useful as-is.

Relevant docs live in another repo.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-05-08 22:24:00 +01:00
Romain Baugue
95193fa027 Exhaust every request body before closing it (#5166) (#5479)
From the documentation:
> The default HTTP client's Transport may not
> reuse HTTP/1.x "keep-alive" TCP connections if the Body is
> not read to completion and closed.

This effectively enable keep-alive for the fixed requests.

Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
2019-04-18 09:50:37 +01:00
Simon Pasquier
c1682adb2f Bump prometheus/common to v0.3.0 (#5344)
* Reload certificates from disk automatically

This change bumps github.com/prometheus/common to include
https://github.com/prometheus/common/pull/173

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* scrape: close idle connections on reload/stop

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* use v0.3.0 tag

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-10 13:20:00 +01:00
Brian Brazil
f7184978f4 Protect against memory exhaustion when scraping.
Now that we're not losing the scrape cache across failed
scrape, a scrape that continually failed but had varying
series or metadata (e.g. timestamps in metric names,
plus hitting smaple_limit) would grow the cache indefinitely.

Add some code to catch that, and flush the cache anyway.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-04-04 19:09:11 +01:00
Brian Brazil
dd3073616c Don't lose the scrape cache on a failed scrape.
This avoids CPU usage increasing when the target comes back.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-04-04 19:09:11 +01:00
Tariq Ibrahim
8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
Julien Pivotto
4397916cb2 Add honor_timestamps (#5304)
Fixes #5302

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-15 10:04:15 +00:00
xjewer
0d1a69353e scrape: Add global jitter for HA server (#5181)
* scrape: Add global jitter for HA server

Covers issue in https://github.com/prometheus/prometheus/pull/4926#issuecomment-449039848
where the HA setup become a problem for targets unable to be scraped simultaneously.
The new jitter per server relies on the hostname and external labels which necessarily to be uniq.

As before, scrape offset will be calculated with regard the absolute time, so even
restart/reload doesn't change scrape time per scrape target + prometheus instance.

Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed
to calculate server hash to be distinguish on machines with the same hostname, but
different DC.

Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
2019-03-12 10:46:15 +00:00
Julien Pivotto
04ce817c49 scrape: Rewrite scrape loop options as a struct (#5314)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-12 10:26:18 +00:00