prometheus

Commit Graph

Author	SHA1	Message	Date
Fabian Reinartz	057a5ae2b1	Address comments Signed-off-by: Fabian Reinartz <freinartz@google.com>	2018-06-06 11:21:17 -04:00
Fabian Reinartz	ad4c33c1ff	scrape,api: provide per-target metric metadata This adds a per-target cache of scraped metadata. The metadata is only available for the lifecycle of the attached target. An API endpoint allows to select metadata by metric name and a label selection of targets. Signed-off-by: Fabian Reinartz <freinartz@google.com>	2018-06-06 05:56:10 -04:00
Damien Lespiau	e64037053d	Expose controller kind and name to labelling rules Relabelling rules can use this information to attach the name of the controller that has created a pod. In turn, this can be used to slice metrics by workload at query time, ie. "Give me all metrics that have been created by the $name Deployment" Signed-off-by: Damien Lespiau <damien@weave.works>	2018-05-09 11:51:37 +02:00
Nathan Graves	5b27996cb3	Include GCE labels during service discovery. Updated vendor files for Google API. (#4150 ) Signed-off-by: Nathan Graves <nathan.graves@kofile.us>	2018-05-08 17:37:47 +01:00
Ben Kochie	390e260bd9	Improve wording of remote write documentation. (#3817 ) Reduce the use of the term `long-term`, when what we're really talking about is remote clustered storage for increased capacity and durability. Signed-off-by: Ben Kochie <superq@gmail.com>	2018-05-05 16:38:45 +01:00
Daisy T	b424eb42e3	document remote write queue parameters (#4126 )	2018-04-30 20:08:45 +02:00
Brian Brazil	fbe66819c5	Update ALERTS docs for 2.0 staleness changes. (#4116 ) Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2018-04-26 12:44:11 +01:00
Adam Shannon	809881d7f5	support reading basic_auth password_file for HTTP basic auth (#4077 ) Issue: https://github.com/prometheus/prometheus/issues/4076 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	2018-04-25 18:19:06 +01:00
Julius Volz	fe10b36b30	Fix curl example for deleting series (#4046 )	2018-04-05 13:06:18 +01:00
Philippe Laflamme	2aba238f31	Use common HTTPClientConfig for marathon_sd configuration (#4009 ) This adds support for basic authentication which closes #3090 The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`. DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this. Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.	2018-04-05 09:08:18 +01:00
albatross0	0245fd55bf	Add a machine type label to GCE SD (#4032 )	2018-03-31 09:20:19 +01:00
Kristiyan Nikolov	be85ba3842	discovery/ec2: Support filtering instances in discovery (#4011 )	2018-03-31 07:51:11 +01:00
Corentin Chary	60dafd425c	consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.	2018-03-23 14:48:43 +00:00
Yecheng Fu	56ed29fbf7	Map target infos of endpoints to prometheus meta labels. (#3770 )	2018-03-09 10:07:00 +00:00
Fabian Reinartz	3e6c890aea	api: add flag to skip head on snapshots	2018-03-08 13:07:12 +01:00
Jeffrey Zhang	21f96caab3	Fix wrong syntax for alert field templates (#3883 )	2018-02-24 09:37:43 +00:00
Conor Broderick	99006d3baf	Added dropped targets API to targets endpoint (#3870 )	2018-02-21 17:26:18 +00:00
Conor Broderick	1fd20fc954	Add dropped alertmanagers to alertmanagers API (#3865 )	2018-02-21 09:00:07 +00:00
Bartek Plotka	93a63ac5fd	api: Added v1/status/flags endpoint. (#3864 ) Endpoint URL: /api/v1/status/flags Example Output: ```json { "status": "success", "data": { "alertmanager.notification-queue-capacity": "10000", "alertmanager.timeout": "10s", "completion-bash": "false", "completion-script-bash": "false", "completion-script-zsh": "false", "config.file": "my_cool_prometheus.yaml", "help": "false", "help-long": "false", "help-man": "false", "log.level": "info", "query.lookback-delta": "5m", "query.max-concurrency": "20", "query.timeout": "2m", "storage.tsdb.max-block-duration": "36h", "storage.tsdb.min-block-duration": "2h", "storage.tsdb.no-lockfile": "false", "storage.tsdb.path": "data/", "storage.tsdb.retention": "15d", "version": "false", "web.console.libraries": "console_libraries", "web.console.templates": "consoles", "web.enable-admin-api": "false", "web.enable-lifecycle": "false", "web.external-url": "", "web.listen-address": "0.0.0.0:9090", "web.max-connections": "512", "web.read-timeout": "5m", "web.route-prefix": "/", "web.user-assets": "" } } ``` Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2018-02-21 08:49:02 +00:00
Pedro Araújo	575f665944	Add OS type meta label to Azure SD (#3863 ) There is currently no way to differentiate Windows instances from Linux ones. This is needed when you have a mix of node_exporters / wmi_exporters for OS-level metrics and you want to have them in separate scrape jobs. This change allows you to do just that. Example: ``` - job_name: 'node' azure_sd_configs: - <azure_sd_config> relabel_configs: - source_labels: [__meta_azure_machine_os_type] regex: Linux action: keep ``` The way the vendor'd AzureSDK provides to get the OsType is a bit awkward - as far as I can tell, this information can only be gotten from the startup disk. Newer versions of the SDK appear to improve this a bit (by having OS information in the InstanceView), but the current way still works.	2018-02-19 15:40:57 +00:00
Andrea Giardini	3a9637fa3c	docs: Fix remote_read/remote_timeout default (#3829 )	2018-02-12 12:52:33 +00:00
Brian Brazil	66b8bdbf4a	Fix docs for #3820 (#3823 )	2018-02-11 23:35:08 +00:00
Ben Kochie	40acc632bb	Merge pull request #3505 from rdemachkovych/ansible_prom2.0 Added to documentation Ansible roles for Prometheus 2.0	2018-01-26 11:30:15 +01:00
Roman Demachkovych	8bfc611616	Remove not maintained roles	2018-01-26 09:46:44 +01:00
zemek	8a01a0fbed	Set consul server default to localhost:8500 (#3703 )	2018-01-24 12:14:32 +00:00
James Turnbull	00f4821178	Added missing ingress from role list (#3666 )	2018-01-08 21:23:01 +00:00
James Turnbull	380cacd3a4	Readability edits to vector matching (#3624 ) * Added L3 headings - makes page a little easier to read * Made use of right-hand and left-hand consistent	2017-12-26 10:28:39 +00:00
Brian Brazil	fba80da635	Fix default of read_recent to be false. (#3617 ) This is what is documented in the migration guide, and the default settings should make sense for a true long term storage. Document the setting.	2017-12-23 17:21:38 +00:00
James Turnbull	c3f9238756	Updated alert templating docs (#3596 ) The docs suggest that alert templating only works in the summary and description annotation fields. Some testing and a review of the code suggests this is no longer true and that you can template any annotation field.	2017-12-19 08:04:06 +00:00
Brian Brazil	9083d41d3a	Add 2.0 stability guarantees (#3484 ) As discussed generally consider SDs as unstable, as realistically they are never going to be. Drop the words "experimental/beta" from most places in the docs, as users are getting the wrong impression from this.	2017-12-14 12:54:32 +00:00
Simon Pasquier	aa25dff1ea	Update the openstack_sd_config section openstack_sd_config requires a 'role' parameter which wasn't documented.	2017-12-14 12:20:28 +00:00
Krasi Georgiev	08ee713c82	example to show the difference between "sum by" and "sum without" (#3558 )	2017-12-14 12:20:28 +00:00
vthriller	b4bd91958a	[minor] docs: recording_rules: fix missing key	2017-12-14 12:20:28 +00:00
Tobias Schmidt	28205f5ca9	Remove wrong statement about alertmanager URL configuration	2017-12-14 12:20:28 +00:00
Mike Rostermund	4648f4c156	New server uses read protocol, to eh, read. (#3444 )	2017-12-14 12:20:28 +00:00
Brian Brazil	e0711c2e9b	Document consul sd tls_config (#3440 ) Fixes https://github.com/prometheus/docs/issues/681	2017-12-14 12:20:28 +00:00
Tom Wilkie	d2f6803d14	'Prometheus lifecycle' should be a subsection of 'Miscellaneous'	2017-12-14 12:20:28 +00:00
Or Elimelech	6e8d192ba0	Wrong URL for remote.proto (#3431 ) Change wrong URL for remote.proto	2017-12-14 12:20:28 +00:00
phyber	013dc30dee	Fix markdown in recording rules. (#3432 ) Resolves an issue where rendered markdown was incorrect.	2017-12-14 12:20:28 +00:00
Tobias Schmidt	87f5fe3576	Fix migration documentation title in docs menu	2017-12-14 12:20:28 +00:00
Brian Brazil	5dff97639f	Tweak migration doc (#3430 )	2017-12-14 12:20:28 +00:00
Jose Donizetti	b3b6538348	Small changes to migration guide	2017-12-14 12:20:28 +00:00
Goutham Veeramachaneni	bee6864c14	Make the date returned by snapshot script friendly Fixes #3568 Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-10 15:14:31 -06:00
Goutham Veeramachaneni	e0d917e2f5	Merge pull request #3523 from Gouthamve/clean-tomb Add endpoint to cleanup tombstones	2017-12-07 14:39:24 -06:00
Goutham Veeramachaneni	f0599d4dbf	Incorporate review-feedback Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-07 09:06:04 -06:00
James Turnbull	330735aca6	Added another full link to the configuration docs (#3553 )	2017-12-07 08:31:15 +00:00
Amy Holt	607a675617	Add prefix to relative 3 URLs (#3551 )	2017-12-06 21:16:53 +00:00
Goutham Veeramachaneni	311edc5a38	Merge branch 'master' into clean-tomb Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-05 10:23:21 -06:00
Goutham Veeramachaneni	d8515b2580	Move Admin APIs to v1 Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-04 00:13:43 +05:30
Goutham Veeramachaneni	41b8f1f8fe	Add admin API docs Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-02 15:37:31 +05:30

1 2

83 Commits