Commit Graph

50 Commits

Author SHA1 Message Date
Julien Pivotto
9da53391d1
Merge pull request #7739 from prometheus/release-2.20
Merge release-2.20 into the main branch after Consul fix
2020-08-04 20:15:43 +02:00
Julien Pivotto
3a7120bc07 Consul: Reduce WatchTimeout to 2m and set it as timeout for requests
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-03 00:42:55 +02:00
Julien Pivotto
93e9c010f3
Add more Go leak tests (#7652)
* Implement go leak test for promql

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Implement go leak test for Consul SD

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Implement go leak test in discovery manager

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-24 10:10:20 +01:00
John Bampton
98a69b77d1
Fix spelling (#7512)
Signed-off-by: John Bampton <jbampton@users.noreply.github.com>
2020-07-04 14:54:26 +02:00
Pierre Souchay
1508678001
Use 10m timeouts for watches (#7423)
use ?wait=10m will give results as fast as usual when data is changing
but will perform far less requests when services do not change.

On large infrastructure, this will reduce quite a lot the number of
qps on Consul servers while having the same performance for freshness
of results.

Signed-off-by: Pierre Souchay <p.souchay@criteo.com>
2020-06-20 20:22:45 +01:00
Mathilde Gilles
9b9c58aea8
[Consul] Add health label to metrics (#5313)
Label metrics with the target health using consul's /health endpoint.

Signed-off-by: Mathilde Gilles <m.gilles@criteo.com>
2020-02-25 13:32:30 +00:00
Simon Pasquier
fe76ccbfe3
discovery/consul: fix logging of tags (#6783)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-02-13 13:11:44 +01:00
Ben Ye
60527de355
keep consul service metrics in global variables (#6764)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-02-06 05:48:58 +00:00
Josh Soref
91d76c8023 Spelling (#6517)
* spelling: alertmanager

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: attributes

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: autocomplete

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: bootstrap

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: caught

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: chunkenc

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: compaction

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: corrupted

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: deletable

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: expected

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: fine-grained

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: initialized

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: iteration

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: javascript

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: multiple

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: number

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: overlapping

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: possible

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: postings

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: procedure

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: programmatic

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: queuing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: querier

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: repairing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: received

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: reproducible

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: retention

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: sample

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: segements

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: semantic

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: software [LICENSE]

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: staging

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: timestamp

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: unfortunately

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: uvarint

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: subsequently

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: ressamples

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-01-02 15:54:09 +01:00
Jean-Baptiste Le Duigou
5973227434 adding additional unit tests for getDataCenter() in consul (#6192)
* adding additional unit tests for getDataCenter() in consul

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consult Tests : update comments to start with uppercase and end with point

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consult Test : using table-driven tests

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consul Test : cleaner syntax

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consul Test : even cleaner syntax

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consul Test : update comments

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Fixing naming convention by removing underscore in function name

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Removing duplicated test case for getDatacenter()

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
2019-11-15 14:52:39 +01:00
Simon Pasquier
19ce6b7f5f
discovery: fix more error logs on context cancelation (#6133)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-18 11:48:51 +02:00
Ganesh Vernekar
5ecef3542d
Cleanup after merging tsdb into prometheus
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-08-13 14:04:14 +05:30
AllenZMC
41151ca8dc fix mis-spelling in consul_test.go (#5836)
Signed-off-by: czm <zhongming.chang@daocloud.io>
2019-08-06 06:11:41 +01:00
beorn7
dd81912554 Add objectives to Summaries
With the next release of client_golang, Summaries will not have
objectives by default. To not lose the objectives we have right now,
explicitly state the current default objectives.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-12 02:03:13 +02:00
Simon Pasquier
45506841e6
*: enable all default linters (#5504)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-03 15:11:28 +02:00
Tariq Ibrahim
8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
Simon Pasquier
782d00059a
discovery: factorize for SD based on refresh (#5381)
* discovery: factorize for SD based on refresh

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* discovery: use common metrics for refresh

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-25 11:54:22 +01:00
Mario Trangoni
5354ffff99 Fix some spelling issues (#5361)
See,
$ codespell -S './vendor/*,./.git*,./web/ui/static/vendor*' --ignore-words-list="uint,dur,ue,iff,te,wan"

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2019-03-14 14:38:54 +00:00
Callum Styan
83c46fd549 update Consul vendor code so that catalog.ServiceMultipleTags can be (#5151)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-03-12 10:31:27 +00:00
Simon Pasquier
f9462d5d44 discovery/consul: pass current context to Consul queries
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-18 14:23:56 +01:00
Simon Pasquier
f678e27eb6
*: use latest release of staticcheck (#5057)
* *: use latest release of staticcheck

It also fixes a couple of things in the code flagged by the additional
checks.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use official release of staticcheck

Also run 'go list' before staticcheck to avoid failures when downloading packages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00
Samuel Alfageme
240321acee Add taggedAddress to the labels in ConsulSD (#5001)
Useful when multiple (tagged) addresses for a node are exposed on the catalog API
Ref. https://www.consul.io/api/catalog.html#taggedaddresses

Signed-off-by: Samuel Alfageme <samuel@alfage.me>
2018-12-18 11:51:05 +01:00
Ben Kochie
c6399296dc
Fix spelling/typos (#4921)
* Fix spelling/typos

Fix spelling/typos reported by codespell/misspell.
* UK -> US spelling changes.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-27 17:44:29 +01:00
Simon Pasquier
1cd29f782c discovery/consul: close idle connections on stop
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-01 17:26:52 +02:00
Romain Baugue
b41be4ef52 Discovery consul service meta (#4280)
* Upgrade Consul client
* Add ServiceMeta to the labels in ConsulSD

Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
2018-07-18 05:06:56 +01:00
Julius Volz
5cf0113762
Add "omitempty" to some SD config YAML field tags (#4338)
Especially for Kubernetes SD, this fixes a bug where the rendered
configuration says "api_server: null", which when read back is not
interpreted as an un-set API server (thus the default is not applied).

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-03 13:43:41 +02:00
Elif T. Kuş
57dcdfb15f Rewrote tests with testutil for several test files (#4086)
* promql: Rewrote tests with testutil for functions_test

Signed-off-by: Elif T. Kuş <elifkus@gmail.com>

* pkg/relabel: Rewrote tests with testutil for relabel_test

Signed-off-by: Elif T. Kuş <elifkus@gmail.com>

* discovery/consul: Rewrote tests with testutil for consul_test

Signed-off-by: Elif T. Kuş <elifkus@gmail.com>

* scrape: Rewrote tests with testutil for manager_test

Signed-off-by: Elif T. Kuş <elifkus@gmail.com>
2018-04-27 13:11:16 +01:00
Adam Shannon
809881d7f5 support reading basic_auth password_file for HTTP basic auth (#4077)
Issue: https://github.com/prometheus/prometheus/issues/4076

Signed-off-by: Adam Shannon <adamkshannon@gmail.com>
2018-04-25 18:19:06 +01:00
sev3ryn
cc917aee7f fix of endless loop while doing Consul service discovery. (#4044)
Reloading Prometheus configs doesn't make loop end.
It produced a goroutine leak
2018-04-05 10:41:09 +01:00
Manos Fokas
25f929b772 Yaml UnmarshalStrict implementation. (#4033)
* Updated yaml vendor package.

* remove checkOverflow duplicate in rulefmt

* remove duplicated HTTPClientConfig.Validate()

* Added yaml static check.
2018-04-04 09:07:39 +01:00
Corentin Chary
60dafd425c consul: improve consul service discovery (#3814)
* consul: improve consul service discovery

Related to #3711

- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
  allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
  Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
  because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
  reads can be costly, even more when you are doing them to a remote
  datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
  service endpoint. This is needed because of that kind of behavior from
  consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
  on a large cluster would basically change *all* the time. No need to discover
  targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.

Benchmarks
----------

```yaml
scrape_configs:

- job_name: prometheus
  scrape_interval: 60s
  static_configs:
    - targets: ["127.0.0.1:9090"]

- job_name: "observability-by-tag"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      tag: marathon-user-observability  # Used in After
      refresh_interval: 30s             # Used in After+delay
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: ^(.*,)?marathon-user-observability(,.*)?$
      action: keep

- job_name: "observability-by-name"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - observability-cerebro
        - observability-portal-web

- job_name: "fake-fake-fake"
  scrape_interval: "15s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - fake-fake-fake
```

Note: tested with ~1200 services, ~5000 nodes.

| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|

Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: tweak `refresh_interval` behavior

`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).

This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.

Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
2018-03-23 14:48:43 +00:00
zemek
8a01a0fbed Set consul server default to localhost:8500 (#3703) 2018-01-24 12:14:32 +00:00
Shubheksha Jalan
0471e64ad1 Use shared types from the common repo (#3674)
* refactor: use shared types from common repo, remove util/config

* vendor: add common/config

* fix nit
2018-01-11 16:10:25 +01:00
Callum Styan
97464236c7 comments with TargetProvider should read Discoverer instead (#3667) 2018-01-08 23:59:18 +00:00
Shubheksha Jalan
ec94df49d4 Refactor SD configuration to remove config dependency (#3629)
* refactor: move targetGroup struct and CheckOverflow() to their own package

* refactor: move auth and security related structs to a utility package, fix import error in utility package

* refactor: Azure SD, remove SD struct from config

* refactor: DNS SD, remove SD struct from config into dns package

* refactor: ec2 SD, move SD struct from config into the ec2 package

* refactor: file SD, move SD struct from config to file discovery package

* refactor: gce, move SD struct from config to gce discovery package

* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil

* refactor: consul, move SD struct from config into consul discovery package

* refactor: marathon, move SD struct from config into marathon discovery package

* refactor: triton, move SD struct from config to triton discovery package, fix test

* refactor: zookeeper, move SD structs from config to zookeeper discovery package

* refactor: openstack, remove SD struct from config, move into openstack discovery package

* refactor: kubernetes, move SD struct from config into kubernetes discovery package

* refactor: notifier, use targetgroup package instead of config

* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup

* refactor: retrieval, use targetgroup package instead of config.TargetGroup

* refactor: storage, use config util package

* refactor: discovery manager, use targetgroup package instead of config.TargetGroup

* refactor: use HTTPClient and TLS config from configUtil instead of config

* refactor: tests, use targetgroup package instead of config.TargetGroup

* refactor: fix tagetgroup.Group pointers that were removed by mistake

* refactor: openstack, kubernetes: drop prefixes

* refactor: remove import aliases forced due to vscode bug

* refactor: move main SD struct out of config into discovery/config

* refactor: rename configUtil to config_util

* refactor: rename yamlUtil to yaml_config

* refactor: kubernetes, remove prefixes

* refactor: move the TargetGroup package to discovery/

* refactor: fix order of imports
2017-12-29 21:01:34 +01:00
Callum Styan
7776527390 bump consul HTTP client timeout by 5s so it doesn't match up exactly with the consul SD watch timeout 2017-10-28 16:42:42 -07:00
Julius Volz
099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Callum Styan
45f9f3c539 use a timeout in the HTTP client used for consul sd (#3303) 2017-10-20 16:56:30 +01:00
Marc Sluiter
6a633eece1 Added go-conntrack for monitoring http connections (#3241)
Added metrics for in- and outgoing traffic with go-conntrack.
2017-10-06 11:22:19 +01:00
Fabian Reinartz
d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Joe Martin
aba41c7d0f add support for consul's node metadata 2017-07-18 16:46:16 -04:00
Roman Vynar
dbe2eb2afc Hide consul token on UI. (#2797) 2017-06-01 22:14:23 +01:00
Chris Goller
42de0ae013 Use log.Logger interface for all discovery services 2017-06-01 11:25:55 -05:00
Conor Broderick
6766123f93 Replace regex with Secret type and remarshal config to hide secrets (#2775) 2017-05-29 12:46:23 +01:00
Julius Volz
525da88c35 Merge pull request #2479 from YKlausz/consul-tls
Adding consul capability to connect via tls
2017-03-20 11:40:18 +01:00
yklausz
75880b594f Adding consul capability to connect via tls 2017-03-17 22:37:18 +01:00
Tobias Schmidt
58cd39aacd Follow golang naming conventions in discovery packages 2017-03-16 23:40:46 -03:00
Robert Neumayer
feb7670929 Add tests for consul service discovery (#2490)
* Add tests for consul service discovery

* Add license header

* Address comments

* inline variables
* check for extra error

* Fix error formatting
2017-03-15 09:33:53 +01:00
Fabian Reinartz
35da23fd82 consul: start service watch as goroutine 2016-11-27 11:01:16 +01:00
Fabian Reinartz
d19d1bcad3 discovery: move into top-level package 2016-11-22 12:56:33 +01:00