Commit Graph

93 Commits

Author SHA1 Message Date
Matej Gera 0acbe5e3f5
Tracing: Add additional options to align with the upstream exporter (#10276)
* Enhance configuration

Signed-off-by: Matej Gera <matejgera@gmail.com>
2022-02-22 17:07:30 +01:00
Matej Gera 2c61d29b2a
Tracing: Migrate to OpenTelemetry library (#9724)
Signed-off-by: Matej Gera <matejgera@gmail.com>
2022-01-25 11:08:04 +01:00
Julien Pivotto 77f411b2ec
Enable tls_config in oauth2 (#9550)
* Enable tls_config in oauth2

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-20 23:10:18 +02:00
Witek Bedyk cda2dbbef6
Add Uyuni service discovery (#8190)
* Add Uyuni service discovery

Signed-off-by: Witek Bedyk <witold.bedyk@suse.com>

Co-authored-by: Joao Cavalheiro <jcavalheiro@suse.de>
Co-authored-by: Marcelo Chiaradia <mchiaradia@suse.com>
Co-authored-by: Stefano Torresi <stefano@torresi.io>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
2021-10-19 01:00:44 +02:00
Julien Pivotto 8920024323 Add PuppetDB service discovery
We have been Puppet user for 10 years and we are users of
https://github.com/camptocamp/prometheus-puppetdb-sd

However, that file_sd implementation contains business logic and
assumptions around e.g. the modules which you are using.

This pull request adds a simple PuppetDB service discovery, which will
enable more use cases than the upstream sd.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-09-16 16:54:26 +02:00
austin ce bbc951f50b
Add config tests for kuma SD
Signed-off-by: austin ce <austin.cawley@gmail.com>
2021-07-21 12:55:02 -04:00
Michal Wasilewski 3f686cad8b
fixes yamllint errors
Signed-off-by: Michal Wasilewski <mwasilewski@gmx.com>
2021-06-12 12:47:47 +02:00
Julien Pivotto 9444698ae2
http_sd (#8839)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-06-11 18:04:45 +02:00
Julien Pivotto 20c6739adc
Merge pull request #8833 from hanjm/feature/add-scape-read-body-limit
Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
2021-06-02 09:24:59 +02:00
TJ Hoplock dc22c65349
Add Linode Service Discovery (#8846)
* Add Linode Service Discovery

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2021-06-01 20:32:36 +02:00
hanjm 1df05bfd49 Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
Signed-off-by: hanjm <hanjinming@outlook.com>
2021-05-29 07:05:42 +08:00
Levi Harrison fa184a5fc3
Add OAuth 2.0 Config (#8761)
* Introduced oauth2 config into the codebase

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-04-28 14:47:52 +02:00
n888 7c028d59c2
Add lightsail service discovery (#8693)
Signed-off-by: N888 <drifto@gmail.com>
2021-04-28 11:29:12 +02:00
Robert Jacob b253056163
Implement Docker discovery (#8629)
* Implement Docker discovery

Signed-off-by: Robert Jacob <xperimental@solidproject.de>
2021-03-29 22:30:23 +02:00
Rémy Léone f690b811c5
add support for scaleway service discovery (#8555)
Co-authored-by: Patrik <patrik@ptrk.io>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>

Signed-off-by: Rémy Léone <rleone@scaleway.com>
2021-03-10 15:10:17 +01:00
Julien Pivotto 8787f0aed7 Update common to support credentials type
Most of the backwards compat tests is done in common.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-02-18 23:28:22 +01:00
Nándor István Krácser 509000269a
remote_write: allow passing along custom HTTP headers (#8416)
* remote_write: allow passing along custom HTTP headers

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* add warning

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* remote_write: add header valadtion

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* extend tests for bad remote write headers

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* remote_write: add note about the authorization header

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>
2021-02-04 14:18:13 -07:00
kangwoo 7c0d5ae4e7
Add Eureka Service Discovery (#3369)
Signed-off-by: kangwoo <kangwoo@gmail.com>
2020-08-26 17:36:59 +02:00
Lukas Kämmerling b6955bf1ca
Add hetzner service discovery (#7822)
Signed-off-by: Lukas Kämmerling <lukas.kaemmerling@hetzner-cloud.de>
2020-08-21 15:49:19 +02:00
Julien Pivotto f8ec72d730 Add digitalocean test
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:04:36 +02:00
Julien Pivotto a197508d09 Add docker swarm test
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:04:36 +02:00
Callum Styan 67838643ee
Add config option for remote job name (#6043)
* Track remote write queues via a map so we don't care about index.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Support a job name for remote write/read so we can differentiate between
them using the name.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Remote write/read has Name to not confuse the meaning of the field with
scrape job names.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Split queue/client label into remote_name and url labels.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allow for duplicate remote write/read configs.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Ensure we restart remote write queues if the hash of their config has
not changed, but the remote name has changed.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Include name in remote read/write config hashes, simplify duplicates
check, update test accordingly.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-12-12 12:47:23 -08:00
Julien Pivotto 4397916cb2 Add honor_timestamps (#5304)
Fixes #5302

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-15 10:04:15 +00:00
Callum Styan 83c46fd549 update Consul vendor code so that catalog.ServiceMultipleTags can be (#5151)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-03-12 10:31:27 +00:00
Simon Pasquier 027d2ece14 config: resolve more file paths (#5284)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-12 10:24:15 +00:00
Marcel D. Juhnke c7d83b2b6a discovery: add support for Managed Identity authentication in Azure SD (#4590)
Signed-off-by: Marcel Juhnke <marrat@marrat.de>
2018-12-19 10:03:33 +00:00
Simon Pasquier ff08c40091 discovery/openstack: support tls_config
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-09-25 14:31:32 +02:00
Tariq Ibrahim f708fd5c99 Adding support for multiple azure environments (#4569)
Signed-off-by: Tariq Ibrahim <tariq.ibrahim@microsoft.com>
2018-09-04 17:55:40 +02:00
Philippe Laflamme 2aba238f31 Use common HTTPClientConfig for marathon_sd configuration (#4009)
This adds support for basic authentication which closes #3090

The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`.

DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this.

Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.
2018-04-05 09:08:18 +01:00
Kristiyan Nikolov be85ba3842 discovery/ec2: Support filtering instances in discovery (#4011) 2018-03-31 07:51:11 +01:00
Corentin Chary 60dafd425c consul: improve consul service discovery (#3814)
* consul: improve consul service discovery

Related to #3711

- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
  allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
  Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
  because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
  reads can be costly, even more when you are doing them to a remote
  datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
  service endpoint. This is needed because of that kind of behavior from
  consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
  on a large cluster would basically change *all* the time. No need to discover
  targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.

Benchmarks
----------

```yaml
scrape_configs:

- job_name: prometheus
  scrape_interval: 60s
  static_configs:
    - targets: ["127.0.0.1:9090"]

- job_name: "observability-by-tag"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      tag: marathon-user-observability  # Used in After
      refresh_interval: 30s             # Used in After+delay
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: ^(.*,)?marathon-user-observability(,.*)?$
      action: keep

- job_name: "observability-by-name"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - observability-cerebro
        - observability-portal-web

- job_name: "fake-fake-fake"
  scrape_interval: "15s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - fake-fake-fake
```

Note: tested with ~1200 services, ~5000 nodes.

| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|

Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: tweak `refresh_interval` behavior

`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).

This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.

Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
2018-03-23 14:48:43 +00:00
Tobias Schmidt 7098c56474 Add remote read filter option
For special remote read endpoints which have only data for specific
queries, it is desired to limit the number of queries sent to the
configured remote read endpoint to reduce latency and performance
overhead.
2017-11-13 23:30:01 +01:00
Thibault Chataigner bf4a279a91 Remote storage reads based on oldest timestamp in primary storage (#3129)
Currently all read queries are simply pushed to remote read clients.
This is fine, except for remote storage for wich it unefficient and
make query slower even if remote read is unnecessary.
So we need instead to compare the oldest timestamp in primary/local
storage with the query range lower boundary. If the oldest timestamp
is older than the mint parameter, then there is no need for remote read.
This is an optionnal behavior per remote read client.

Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>
2017-10-18 12:08:14 +01:00
Fuente, Pablo Andres 902fafb8e7 Fixing tests for Windows
Fixing the config/config_test, the discovery/file/file_test and the
promql/promql_test tests for Windows. For most of the tests, the fix involved
correct handling of path separators. In the case of the promql tests, the
issue was related to the removal of the temporal directories used by the
storage. The issue is that the RemoveAll() call returns an error when it
tries to remove a directory which is not empty, which seems to be true due to
some kind of process that is still running after closing the storage. To fix
it I added some retries to the remove of the temporal directories.
Adding tags file from Universal Ctags to .gitignore
2017-07-09 01:59:30 -03:00
Roman Vynar dbe2eb2afc Hide consul token on UI. (#2797) 2017-06-01 22:14:23 +01:00
Conor Broderick 6766123f93 Replace regex with Secret type and remarshal config to hide secrets (#2775) 2017-05-29 12:46:23 +01:00
Brian Akins 27d66628a1 Allow limiting Kubernetes service discover to certain namespaces
Allow namespace discovery to be more easily extended in the future by using a struct rather than just a list.

Rename fields for kubernetes namespace discovery
2017-04-27 07:41:36 -04:00
yklausz 75880b594f Adding consul capability to connect via tls 2017-03-17 22:37:18 +01:00
Julius Volz e9476b35d5 Re-add multiple remote writers
Each remote write endpoint gets its own set of relabeling rules.

This is based on the (yet-to-be-merged)
https://github.com/prometheus/prometheus/pull/2419, which removes legacy
remote write implementations.
2017-02-20 13:23:12 +01:00
Fabian Reinartz 7eb849e6a8 Merge pull request #2307 from joyent/triton_discovery
Add Joyent Triton discovery
2017-01-18 05:08:11 +01:00
Richard Kiene f3d9692d09 Add Joyent Triton discovery 2017-01-17 20:34:32 +00:00
Björn Rabenstein ad40d0abbc Merge pull request #2288 from prometheus/limit-scrape
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
Brian Brazil 30448286c7 Add sample_limit to scrape config.
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.

This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).

Add metric to track how often this happens.

Fixes #2137
2016-12-16 15:10:09 +00:00
Tristan Colgate-McFarlane 4d9134e6d8 Add labeldrop and labelkeep actions. (#2279)
Introduce two new relabel actions. labeldrop, and labelkeep.
These can be used to filter the set of labels by matching regex

- labeldrop: drops all labels that match the regex
- labelkeep: drops all labels that do not match the regex
2016-12-14 10:17:42 +00:00
Fabian Reinartz 183c5749b9 config: add Alertmanager configuration 2016-11-23 18:23:37 +01:00
Fabian Reinartz ec66082749 Merge branch 'ec2_sd_profile_support' of https://github.com/Ticketmaster/prometheus into Ticketmaster-ec2_sd_profile_support 2016-11-21 11:49:23 +01:00
Kraig Amador bec6870ed4 ec2_sd_configs: Support profiles for configuring the ec2 service 2016-11-03 08:38:02 -07:00
beorn7 b2f28a9e82 Merge branch 'release-1.2' 2016-11-03 14:42:15 +01:00
Brian Brazil d1ece12c70 Handle null Regex in the config as the empty regex. (#2150) 2016-11-03 13:34:15 +00:00
bekbulatov c689b35858 Merge branch 'master' into marathon_tls 2016-10-24 10:37:32 +01:00