Commit Graph

59 Commits

Author SHA1 Message Date
Björn Rabenstein
9ea3897ea7 Merge pull request #1354 from prometheus/beorn7/storage
Rework the way to communicate backpressure (AKA suspended ingestion)
2016-02-01 15:10:13 +01:00
beorn7
ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7
a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Brian Brazil
ba6688bfce retrieval: Reduce flakiness of TestTargetRunScraperScrapes 2015-09-28 08:34:54 +01:00
Brian Brazil
50258929ac Retrieval: Show error message for failed test scrape
This is flaky, and I suspect it was due the to I/O timeout that I've
already fixed. In case that wasn't it, display the error should it
happen again.
2015-09-23 09:24:50 +01:00
Brian Brazil
93145b960a retrieval: Reduce flakiness of target tests
Bump timeouts of tests where we don't want I/O timeouts.

Adjust the full channel test to be much more reliable,
by reducing the ingestion timeout from 1ms to 0.
2015-09-22 19:23:36 +01:00
Jimmi Dyson
a1574aa2b3 Move TLS options to scrape config
Fixes #1013, fixes #989
2015-09-09 09:52:21 +01:00
Julius Volz
f63a899744 Change config regexes to full-string matches.
This anchors all regular expressions entered via the config to match a
full string vs. a substring.

THIS IS A BREAKING CHANGE!

Fixes part of https://github.com/prometheus/prometheus/issues/996
2015-09-01 15:46:41 +02:00
Julius Volz
963ad82dcb Fix "go vet" errors.
I ignored all errors of the type "composite literal uses unkeyed
fields". Most of them are wrong because of
https://github.com/golang/go/issues/9171.
2015-08-26 02:05:04 +02:00
Fabian Reinartz
3a0145c09e Reenable blocked appending tests 2015-08-22 09:47:57 +02:00
Fabian Reinartz
306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Jimmi Dyson
923f8111d4 Initial Kubernetes discovery
Fixes #904
2015-08-13 10:38:52 +01:00
Will Rouesnel
7810448dbe Add proxy_url parameter to allow specifying per-job HTTP proxy servers
Allow scrape_configs to have an optional proxy_url option which specifies
a proxy to be used for all connections to hosts in that config.

Internally this modifies the various client functions to take a *url.URL pointer
which currently must point to an HTTP proxy (but has been left open-ended to
allow the url format to be extended to support others, such as maybe SOCKS if
needed).
2015-08-08 04:29:27 +10:00
Jimmi Dyson
da4c50a6cf Make scheme relabelable via discovery 2015-08-06 12:00:33 +01:00
Jimmi Dyson
52cf6b3e6e Configuration options for bearer tokens, client certs & CA certs
Fixes #918, fixes #917
2015-08-04 17:18:46 +01:00
Brian Brazil
d8875d17d8 Retrieval: Make it possible to relabel query params
This only allows relabelling the first value
for a given parameter, this should be sufficient in practice.
2015-07-31 10:09:28 +01:00
Fabian Reinartz
c292979374 retrieval: double timeout in target scrape test. 2015-06-23 21:59:55 +02:00
Fabian Reinartz
dc7d27ab9a retrieval: add honor label handling and parametrized querying.
This commit adds the honor_labels and params arguments to the scrape
config. This allows to specify query parameters used by the scrapers
and handling scraped labels with precedence.
2015-06-23 13:45:14 +02:00
Brian Brazil
0dbae36d36 Allow ingested metrics to be relabeled.
The main purpose of this is to allow for blacklisting
of expensive metrics as a tactical option.
It could also find uses for renaming and removing labels
from federation.
2015-06-13 15:18:27 +01:00
Brian Brazil
58ceae82bc Revert "Allow ingested metrics to be relabeled."
This reverts commit f2f26ca08f.

Was accidentally pushed to master instead of a branch for PR.
2015-06-12 22:12:26 +01:00
Brian Brazil
f2f26ca08f Allow ingested metrics to be relabeled.
The main purpose of this is to allow for blacklisting
of expensive metrics as a tactical option.
It could also find uses for renaming and removing labels
from federation.
2015-06-12 22:06:30 +01:00
Fabian Reinartz
0de6edbdfc Move pkg/ to util/ 2015-06-01 21:12:32 +02:00
Fabian Reinartz
dfaf31a1da Move web/httputils to pkg/httputil and add DeadlineClient to it 2015-06-01 21:12:31 +02:00
Fabian Reinartz
8de50619f1 Increase target test wait times
On slow systems such as Travis CI occasionally the tests fail
because the wait times are too short.
2015-05-19 12:06:52 +02:00
Fabian Reinartz
385919a65a Avoid inter-component blocking if ingestion/scraping blocks.
Appending to the storage can block for a long time. Timing out
scrapes can also cause longer blocks. This commit avoids that those
blocks affect other compnents than the target itself.
Also the Target interface was removed.
2015-05-18 17:58:51 +02:00
Fabian Reinartz
1a2d57b45c Move template functionality out of target.
The target implementation and interface contain methods only serving a
specific purpose of the templates. They were moved to the template
as they operate on more fundamental target data.
2015-05-18 13:35:43 +02:00
Fabian Reinartz
dbc08d390e Move target status data into its own object 2015-05-18 11:15:42 +02:00
Fabian Reinartz
93548a8882 Add initial file based service discovery.
This commits adds file based service discovery which reads target
groups from specified files. It detects changes based on file watches
and regular refreshes.
2015-05-15 14:44:54 +02:00
Fabian Reinartz
5fbde88919 Switch config to YAML format. 2015-05-07 16:52:14 +02:00
Fabian Reinartz
0b619b46d6 Change JobConfig to ScrapeConfig.
This commit changes the configuration interface from job configs to scrape
configs. This includes allowing multiple ways of target definition at once
and moving DNS SD to its own config message. DNS SD can now contain multiple
DNS names per configured discovery.
2015-04-28 23:18:55 +02:00
Fabian Reinartz
5015c2a0e8 Make target manager source based.
This commit shifts responsibility for maintaining targets from providers and
pools to the target manager. Target groups have a source name that identifies
them for updates.
2015-04-24 15:49:35 +02:00
beorn7
fa1935a644 Remove /api/targets call and do not show job and instance labels on status.
/api/targets was undocumented and never used and also broken.

Showing instance and job labels on the status page (next to targets)
does not make sense as those labels are set in an obvious way.

Also add a doc comment to TargetStateToClass.
2015-03-18 18:53:43 +01:00
beorn7
be11cb2b07 Remove the sample ingestion channel.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.

Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...

Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.

The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.

Also, remove lint and vet warnings in ast.go.
2015-03-15 14:08:22 +01:00
Julius Volz
140eede5e0 Rename UNREACHABLE to UNHEALTHY.
The current wording suggests that a target is not reachable at all,
although it might also get set when the target was reachable, but there
was some other error during the scrape (invalid headers or invalid
scrape content). UNHEALTHY is a more general wording that includes all
these cases.

For consistency, ALIVE is also renamed to HEALTHY.
2015-03-07 23:18:18 +01:00
Sergiusz 'q3k' Bazański
0d0bb3c030 Change instance identifiers to be host:port
This changes the PublicURL function into InstanceIdentifier, which now
returns a simple <host>:<port> string instead of a full URL.
2015-02-20 16:21:13 +01:00
Sergiusz 'q3k' Bazański
bb69a3d284 Hide HTTP auth parts from URL
This  instroduces an extra function in the Target interface (PublicURL)
which is used to populate the instance field in scraped metrics.
2015-02-19 18:58:47 +01:00
beorn7
0f191629c6 Next try to deal with backed-up ingestion.
This is now not even trying to throttle in a benign way, but creates a
fully-fledged error. Advantage: It shows up very visible on the status
page. Disadvantage: The server does not really adjusts to a lower
scraping rate. However, if your ingestion backs up, you are in a very
irregulare state, I'd say it _should_ be considered an error and not
dealt with in a more graceful way.

In different news: I'll work on optimizing ingestion so that we will
not as easily run into that situation in the first place.
2015-02-09 17:32:47 +01:00
Bjoern Rabenstein
5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Julius Volz
d6b9e97655 Remove extraction.Result type, simplify code. 2015-01-08 16:34:01 +01:00
Brian Brazil
e56786b221 Have scrape time as a pseudovariable, not a prometheus variable.
This ensures it has the right timestamp, and is easier to work with.

Switch sd variable away from 'outcome', using total/failed instead.
2014-12-27 00:39:33 +00:00
Johannes 'fish' Ziemke
ff95a52b0f Rename Address to URL
The "Address" is actually a URL which may contain username and
password. Calling this Address is misleading so we rename it.

Change-Id: I441c7ab9dfa2ceedc67cde7a47e6843a65f60511
2014-12-18 12:18:16 +01:00
Bjoern Rabenstein
b1e4956142 Apply a giant code cleanup.
Essentially:

- Remove unused code.

- Make it 'go vet' clean. The only remaining warnings are in generated code.

- Make it 'golint' clean. The only remaining warnings are in gerenated code.

- Smoothed out same minor things.

Change-Id: I3fe5c1fbead27b0e7a9c247fee2f5a45bc2d42c6
2014-12-10 16:16:49 +01:00
Bjoern Rabenstein
fee88a7a77 Remove the remaining races, new and old.
Also, resolve a few other TODOs.

Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691
2014-12-03 18:07:23 +01:00
Bjoern Rabenstein
14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Brian Brazil
5edf689133 Stagger scrapes to spread out load.
Change-Id: Ib141b271e4adfb817886871f86051c207b05cf35
2014-11-25 17:02:00 +01:00
Brian Brazil
4a2b96f848 Remove backoff on scrape failure.
Having metrics with variable timestamps inconsistently
spaced when things fail will make it harder to write correct rules.

Update status page, requires some refactoring to insert a function.

Change-Id: Ie1c586cca53b8f3b318af8c21c418873063738a8
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein
814e479723 Treat non-200 HTTP response as error.
Change-Id: I2a9f3b47012b3c4839be53aa44c66d16dd41a24a
2014-11-25 17:01:59 +01:00
Julius Volz
740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Julius Volz
d69b85e6c9 Add global label support via Ingesters. 2013-08-13 16:54:15 +02:00
Matt T. Proud
a5141e4d0a Depointerize storage conf. and chain ingester.
The storage builders need to work with the assumption that they have
a copy of the underlying configuration data if any mutations are made.
2013-08-12 17:07:03 +02:00