Commit Graph

382 Commits

Author SHA1 Message Date
Fabian Reinartz
76076bfb47 discovery: simplify client initialization 2016-04-30 21:07:49 +02:00
Fabian Reinartz
b5bfb502df discovery: properly check context on chan send 2016-04-30 11:57:20 +02:00
Fabian Reinartz
9f8feb9ff6 discovery: consolidate Marathon SD files 2016-04-30 11:56:11 +02:00
Fabian Reinartz
086f7caceb discovery: extract Consul shouldWatch logic 2016-04-30 11:50:19 +02:00
Fabian Reinartz
e805e68c01 discovery: sanitize Consul service discovery
This commits simplifies the SD's structure and ensures that all
channel sends are checked against a canceled context.
2016-04-30 11:50:19 +02:00
Fabian Reinartz
5837e6a97f discovery: move consul SD into own package 2016-04-25 16:56:27 +02:00
beorn7
d566808d40 Bring back logging of discarded samples
But only on DEBUG level.

Also, count and report the two cases of out-of-order timestamps on the
one hand and same timestamp but different value on the other hand
separately.
2016-04-25 16:43:52 +02:00
Fabian Reinartz
585ab6b163 Merge pull request #1494 from iamseth/master
Add discovery capability for Microsoft Azure
2016-04-21 13:49:44 +02:00
Jonathan Boulle
38098f8c95 Add missing license headers
Prometheus is Apache 2 licensed, and most source files have the
appropriate copyright license header, but some were missing it without
apparent reason. Correct that by adding it.
2016-04-13 16:08:22 +02:00
Seth Miller
0988e3b937 Add support for Azure discovery
This change adds the ability to do target discovery with Microsoft's Azure platform.
2016-04-06 22:47:02 -05:00
Fabian Reinartz
769389e559 Fix potential race in ctx intialization 2016-04-05 20:27:31 +02:00
Tobias Schmidt
e82ef154ee Remove unused code leftovers 2016-04-02 20:20:55 -04:00
stuart nelson
dbe5d18b6e Instrument scrape pool sync()
Instruments:
- duration
- count
2016-03-14 18:30:16 +01:00
stuart nelson
813f61e551 Merge pull request #1484 from prometheus/instrument-retrieval
Instrument retrieval/scrape.go
2016-03-11 12:26:00 +01:00
stuart nelson
a1ee77601a Instrument the duration of the reload function 2016-03-11 12:12:42 +01:00
Fabian Reinartz
895f2f092f Fix flaky scrape test
t
2016-03-09 16:00:33 +01:00
Fabian Reinartz
f2e359962c Sort exported targets 2016-03-08 17:12:27 +01:00
Fabian Reinartz
56fc9bdff3 Handle closed target provider channel
This fixes the case where a target provider closes the update
channel and exits before the context is canceled.
This should only be true for the static provider but it's safer
to generally handle this case.
2016-03-08 15:49:03 +01:00
beorn7
d44b83690e Fix flaky file-sd test 2016-03-07 15:39:18 +01:00
Fabian Reinartz
ddc74f712b Add sortable target list 2016-03-02 09:10:20 +01:00
Fabian Reinartz
499f4af4aa Test target URL 2016-03-01 14:49:57 +01:00
Fabian Reinartz
50c2f20756 Add targetScraper tests 2016-03-01 14:33:28 +01:00
Fabian Reinartz
1ede7b9d72 Consolidate TargetStatus into Target.
This commit simplifies the TargetHealth type and moves the target
status into the target itself. This also removes a race where error
and last scrape time could have been out of sync.
2016-03-01 14:33:21 +01:00
Fabian Reinartz
2060a0a15b Turn target group members into plain lists.
As the scrape pool deduplicates targets now, it is no longer necessary
to store a hash map for members of each group.
2016-03-01 14:33:12 +01:00
Fabian Reinartz
0d7105abee Remove scrape config from Target.
This commit removes the scrapeConfig entirely from Target.
All identity defining parameters are thus immutable now and the mutex
can be removed..

Target identity is now correctly defined by the labels and the full URL.
This in particular includes URL parameters that are not specified in the
label set.

Fingerprint is also removed from hash to remove an unnecessary tight coupling
to the common/model package.
2016-03-01 14:32:57 +01:00
Fabian Reinartz
75681b691a Extract HTTP client from Target.
The HTTP client is the same across all targets with the same
scrape configuration. Thus, this commit moves it into the scrape
pool.
2016-03-01 14:31:57 +01:00
Fabian Reinartz
9bea27ae8a Add scraping tests 2016-03-01 14:00:48 +01:00
Fabian Reinartz
76a8c6160d Deduplicate targets in scrape pool.
With this commit the scrape pool deduplicates incoming
targets before scraping them. This way multiple target providers
can produce the same target but it will be scraped only once.
2016-03-01 13:50:51 +01:00
Fabian Reinartz
84f74b9a84 Apply new scrape config on reload.
This commit updates a target set's scrape configuration
on reload. This will cause all running scrape loops to be
stopped and started again with new parameters.
2016-03-01 13:50:51 +01:00
Fabian Reinartz
02f635dc24 Remove interval/timeout from Target internals 2016-03-01 13:50:51 +01:00
Fabian Reinartz
775316f8d2 Move appender construction from Target to scrapePool 2016-03-01 13:50:51 +01:00
Fabian Reinartz
fbe251c2df Fix scrape interval length calculation 2016-03-01 13:48:36 +01:00
Fabian Reinartz
1a3253e8ed Make scrape time unambigious.
This commit changes the scraper interface to accept a timestamp
so the reported timestamp by the caller and the timestamp
attached to samples does not differ.
2016-03-01 13:48:36 +01:00
Fabian Reinartz
2bb8ef99d1 Test scrape loop behavior. 2016-03-01 13:48:36 +01:00
Fabian Reinartz
c7bbe95597 Remove outdated target tests 2016-03-01 13:48:36 +01:00
Fabian Reinartz
05de8b7f8d Extract target scraping into scrape loop.
This commit factors out the scrape loop handling into
its own data structure.
For the transition it will be directly attached to the
target.
2016-03-01 13:48:36 +01:00
Fabian Reinartz
cebba3efbb Simplify and fix TargetManager reloading 2016-03-01 13:48:36 +01:00
Fabian Reinartz
da99366f85 Consolidate Target.Update into constructor.
The Target.Update method is no longer needed.
2016-03-01 13:48:36 +01:00
Fabian Reinartz
d15adfc917 Preserve target state across reloads.
This commit moves Scraper handling into a separate scrapePool type.
TargetSets only manage TargetProvider lifecycles and sync the
retrieved updates to the scrapePool.

TargetProviders are now expected to send a full initial target set
within 5 seconds. The scrapePools preserve target state across reloads
and only drop targets after the initial set was synced.
2016-03-01 13:48:36 +01:00
Fabian Reinartz
5b30bdb610 Change TargetProvider interface.
This commit changes the TargetProvider interface to use a
context.Context and send lists of TargetGroups, rather than
single ones.
2016-03-01 13:48:36 +01:00
Fabian Reinartz
bb6dc3ff78 Remove old tests 2016-03-01 13:48:36 +01:00
Fabian Reinartz
5bfa4cdd46 Simplify target update handling.
We group providers by their scrape configuration. Each provider produces
target groups with an unique identifier.

On stopping a set of target providers we cancel the target providers,
stop scraping the targets and wait for the scrapers to finish.

On configuration reload all provider sets are stopped and new ones
are created. This will make targets disappear briefly on configuration
reload. Potentially scrapes are missed but due to the consistent
scrape intervals implemented recently, the impact is minor.
2016-03-01 13:48:36 +01:00
Jimmi Dyson
e59b7c15a3 Kubernetes SD: Fix node IP discovery 2016-03-01 12:24:52 +00:00
beorn7
33a50e69f7 Fix a deadlock
Double acquisition of the RLock usually doesn't blow up, but if the
write lock is called for between the two RLock's, we are deadlocked.

This deadlock does not exist in release-0.17, BTW.
2016-02-29 16:34:29 +01:00
beorn7
fd5108b038 Fix a targetmanager test 2016-02-22 16:43:48 +01:00
Fabian Reinartz
6df1f49c13 Remove fullLabels method and fix target updating
With recent changes to a Target's internal data representation
updating by fullLabels() assigns the additional default
instance label. This breaks target identity comparison and causes
identical targets from service discovery to be constantly swapped.
2016-02-22 13:06:30 +01:00
Fabian Reinartz
825831e98f Use fingerprint for target identity comparison
So far we were using the InstanceIdentifier to compare equality of targets.
This is not always accurate, for example for the blackbox exporter where the 
actual target is in the parameter.
2016-02-17 16:34:53 +01:00
Fabian Reinartz
66767121ab Handle scrape timeout on request.
For historic reasons we were enforcing a timeout directly
via the TCP dialer. This is no longer necessary for quite a while now.
Switching to context.Context will allow us to properly terminate
requests on shutdown as well.
2016-02-16 11:46:02 +01:00
Julius Volz
293486c7b1 Remove old superfluous calls to setLastScrape().
This is called from within the scrape()->report() flow now.

See https://github.com/prometheus/prometheus/pull/1394/files#r52945817
2016-02-15 22:42:24 +01:00
Fabian Reinartz
a0078ec84c Merge pull request #1394 from prometheus/scraperef2
Refactor and test appender modifications
2016-02-15 21:19:40 +01:00
Fabian Reinartz
463dd3ea06 Refactor target scrape reporting. 2016-02-15 18:06:15 +01:00
Fabian Reinartz
cd28b88b08 Fix wrong EOF error on successful target scraping 2016-02-15 17:23:04 +01:00
Fabian Reinartz
27d71b08d1 Factor out appender wrapping 2016-02-15 16:47:39 +01:00
Fabian Reinartz
fe7e91e2eb Make scraping offset consistent.
To evenly distribute scraping load we currently rely on random
jittering. This commit hashes over the target's identity and calculates
a consistent offset. This also ensures that scrape intervals
are constantly spaced between config/target changes.
2016-02-15 16:46:29 +01:00
Fabian Reinartz
a06bc75519 Remove occurrences of 'base' labels 2016-02-15 10:36:57 +01:00
Fabian Reinartz
0d44248fb8 Cleanup cluttered test data 2016-02-13 10:13:38 +01:00
Fabian Reinartz
65eba080a0 Cleanup internal target data 2016-02-13 10:13:38 +01:00
Julius Volz
9b6d69610a Fix various typos in comments.
Helpfully reported by
https://goreportcard.com/report/github.com/prometheus/prometheus :)
2016-02-10 03:47:00 +01:00
Julius Volz
3728b5872f Fix target update error handling.
Fixes https://github.com/prometheus/prometheus/issues/1378
2016-02-08 21:42:59 +01:00
Fabian Reinartz
1f877f3d2a Fix deadlock, structure target logging 2016-02-03 10:39:34 +01:00
Fabian Reinartz
d0d2c38c68 Fix tests for append API changes 2016-02-03 10:17:08 +01:00
Fabian Reinartz
59f1e722df Return error on sample appending 2016-02-02 14:01:44 +01:00
Björn Rabenstein
9ea3897ea7 Merge pull request #1354 from prometheus/beorn7/storage
Rework the way to communicate backpressure (AKA suspended ingestion)
2016-02-01 15:10:13 +01:00
beorn7
ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7
a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Jimmi Dyson
9faa7515c6 Kubernetes SD: Refactor to handle missing Kubernetes events 2016-01-19 20:49:58 +00:00
Brian Brazil
4a829e63a2 Merge pull request #1299 from PrFalken/master
Support AirBnB's Smartstack Nerve client for SD
2016-01-18 13:31:04 +00:00
Julien Dehee
061fe2f364 Support AirBnB's Smartstack Nerve client for SD
nerve's registration format differs from serverset. With this commit
there is now a dedicated treecache file in util,
and two separate files for serverset and nerve.

Reference:
https://github.com/airbnb/nerve
2016-01-18 14:07:28 +01:00
Brian Brazil
7a5f019c40 Use up/down in UI for consistency with 'up' metric. 2016-01-12 12:09:20 +00:00
Brian Brazil
6b7629be27 Merge pull request #1242 from tommyulfsparre/watcher-fix
Reduces watches in serverset
2015-12-10 10:43:57 +00:00
Jimmi Dyson
c12fb447b8 Kubernetes SD: Use first TCP service port as target port & clean up
example config

Fixes #1256
2015-12-08 10:29:40 +00:00
Tommy Ulfsparre
83e09422bf skip already watched child nodes. 2015-12-02 21:31:05 +01:00
Fabian Reinartz
29a69eecb8 Do not panic in Consul SD creation 2015-11-30 18:41:48 +01:00
Jimmi Dyson
2cca07381b KubernetesSD: Create targets for services as well as service endpoints 2015-11-18 14:15:39 +00:00
Brian Brazil
427bf29db1 Add in default port after relabelling.
For the SNMP and blackbox exporters where
the ports tends to not be 80/443 and indeed
there may not be a port this makes the relabelling
a bit simpler as you don't have to figure out this
logic exists and strip off the :80.

This is a breaking change for the example configs of
those exporters.
2015-11-08 11:42:18 +00:00
Brian Brazil
fd2bd81cd8 Allow all instance labels in target groups
With the blackbox exporter, the instance label will commonly
be used for things other than hostnames so remove this restriction.
https://example.com or https://example.com/probe/me are some examples.

To prevent user error, check that urls aren't provided as targets
when there's no relabelling that could potentically fix them.
2015-11-07 14:35:20 +00:00
Fabian Reinartz
9cad147265 Merge pull request #1172 from federicobaldo/ec2_sd_improvements
Minor improvements to ec2 service discovery
2015-11-04 13:02:51 +01:00
Federico Baldo
d14d2429ea Minor improvements to ec2 sd:
1. static credentials replaced with defaults.DefaultChainCredentials.
This change ensures that credentials are sourced form all possible
providers available with the aws sdk,           in the following order:
env variables, shared awsconfig file in user folder, ec2 instance role.

2. Added a few labels: AvailabilityZone, PublicDns, VpcId (if
available), SubnetId (if in Vpc)
2015-11-02 14:55:24 +01:00
Jimmi Dyson
87940ec213 Kubernetes SD: Rename masters to api_servers in config 2015-10-24 14:41:14 +01:00
Jimmi Dyson
7ff5cc66ea Kubernetes SD authentication options cleanup 2015-10-23 16:47:52 +01:00
Jimmi Dyson
ea9a173008 Kubernetes SD: Use node name as instance label 2015-10-12 21:26:09 +01:00
Julius Volz
d88aea7e6f Fix SD mechanism source prefix handling.
The prefixed target provider changed a pointerized target group that was
reused in the wrapped target provider, causing an ever-increasing chain
of source prefixes in target groups from the Consul target provider.

We now make this bug generally impossible by switching the target group
channel from pointer to value type and thus ensuring that target groups
are copied before being passed on to other parts of the system.

I tried to not let the depointerization leak too far outside of the
channel handling (both upstream and downstream) because I tried that
initially and caused some nasty bugs, which I want to minimize.

Fixes https://github.com/prometheus/prometheus/issues/1083
2015-10-09 14:08:22 +02:00
Julius Volz
dec9fc9c32 Merge pull request #1148 from prometheus/fix-serverset-multiple-paths
Fix watching multiple Zookeeper paths in serverset SD.
2015-10-08 19:27:06 +02:00
Matt Jibson
dcb4856d72 Add SD for Amazon EC2 instances 2015-10-06 18:36:17 -04:00
Julius Volz
60cf4015a4 Fix watching multiple Zookeeper paths in serverset SD.
Fix https://github.com/prometheus/prometheus/issues/1137
2015-10-06 15:54:54 +02:00
Fabian Reinartz
e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Jimmi Dyson
0d61605526 Kubernetes SD example: separate out cluster level components & services 2015-09-29 11:22:18 +01:00
Julius Volz
99e8fff872 Fix target manager CPU busyloop caused by bad done-channel handling.
Unfortunately this isn't nicely testable, as it's timing-dependent and
one would have to detect a stray goroutine doing a CPU busyloop...

Fixes https://github.com/prometheus/prometheus/issues/1114
2015-09-28 11:51:16 +02:00
Fabian Reinartz
097d810f37 Merge pull request #1120 from prometheus/flaky-test
retrieval: Reduce flakiness of TestTargetRunScraperScrapes
2015-09-28 09:57:16 +02:00
Brian Brazil
ba6688bfce retrieval: Reduce flakiness of TestTargetRunScraperScrapes 2015-09-28 08:34:54 +01:00
Brian Brazil
b03569267e retrieval: Add URL parameters to fullLabels too
Move all the special cases into one map, rather than
spreading the logic around.
2015-09-26 16:59:24 +01:00
Brian Brazil
50258929ac Retrieval: Show error message for failed test scrape
This is flaky, and I suspect it was due the to I/O timeout that I've
already fixed. In case that wasn't it, display the error should it
happen again.
2015-09-23 09:24:50 +01:00
Brian Brazil
4bc39dc60e retrieval: Reduce flakiness of TestTargetManagerChan
This will increase test time by a few hundred ms,
this is the 2nd most common cause of flakiness.
2015-09-23 09:00:37 +01:00
Brian Brazil
93145b960a retrieval: Reduce flakiness of target tests
Bump timeouts of tests where we don't want I/O timeouts.

Adjust the full channel test to be much more reliable,
by reducing the ingestion timeout from 1ms to 0.
2015-09-22 19:23:36 +01:00
Fabian Reinartz
cac6eea434 Merge pull request #1105 from prometheus/consulnil
Fix nil panic on consul error
2015-09-22 14:55:31 +02:00
Fabian Reinartz
327152862c Update expfmt.NewDecoder usage 2015-09-22 12:11:28 +02:00
Fabian Reinartz
1ce89a4a0b Fix nil panic on consul error 2015-09-22 09:04:31 +02:00
Julius Volz
af513468eb Fix some dead code, missing error checks, shadowings.
I applied
https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567
and was greeted with a deluge of warnings, most of which were not
applicable or really fixable realistically. These are some of the first
ones I decided to fix.
2015-09-14 12:21:34 +02:00
Jimmi Dyson
7ef9399920 Clean up kubernetes http response bodies 2015-09-11 11:44:28 +01:00
Anders Daljord Morken
9fb65a91af Close HTTP connections on HTTP errors too.
Move defer resp.Body.Close() up to make sure it's called even when the
HTTP request returns something other than 200 or Decoder construction
fails. This avoids leaking and eventually running out of file descriptors.
2015-09-10 22:41:05 +02:00