From 0e2004f6fbe9eeddcd0c0084e401b5acb61760a4 Mon Sep 17 00:00:00 2001 From: Harold Dost Date: Mon, 4 May 2020 12:49:45 +0200 Subject: [PATCH] Simplify the Getting Started documentation. (#7193) - Reduce the level of entry to start gathering metrics with prometheus by suggesting to just download pre-built exporters instead of requiring the reader to download an entire Golang build chain and checkout a project. Fix #6956 Signed-off-by: Harold Dost --- docs/getting_started.md | 56 ++++++++++++++++------------------------- 1 file changed, 22 insertions(+), 34 deletions(-) diff --git a/docs/getting_started.md b/docs/getting_started.md index cfc77b50b..babb242c9 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -118,7 +118,7 @@ For more about the expression language, see the To graph expressions, navigate to http://localhost:9090/graph and use the "Graph" tab. -For example, enter the following expression to graph the per-second rate of chunks +For example, enter the following expression to graph the per-second rate of chunks being created in the self-scraped Prometheus: ``` @@ -132,36 +132,26 @@ Experiment with the graph range parameters and other settings. Let us make this more interesting and start some example targets for Prometheus to scrape. -The Go client library includes an example which exports fictional RPC latencies -for three services with different latency distributions. - -Ensure you have the [Go compiler installed](https://golang.org/doc/install) and -have a [working Go build environment](https://golang.org/doc/code.html) (with -correct `GOPATH`) set up. - -Download the Go client library for Prometheus and run three of these example -processes: +The Node Exporter is used as an example target, for more information on using it +[see these instructions.](https://prometheus.io/docs/guides/node-exporter/) ```bash -# Fetch the client library code and compile example. -git clone https://github.com/prometheus/client_golang.git -cd client_golang/examples/random -go get -d -go build +tar -xzvf node_exporter-*.*.tar.gz +cd node_exporter-*.* # Start 3 example targets in separate terminals: -./random -listen-address=:8080 -./random -listen-address=:8081 -./random -listen-address=:8082 +./node_exporter --web.listen-address 127.0.0.1:8080 +./node_exporter --web.listen-address 127.0.0.1:8081 +./node_exporter --web.listen-address 127.0.0.1:8082 ``` You should now have example targets listening on http://localhost:8080/metrics, http://localhost:8081/metrics, and http://localhost:8082/metrics. -## Configuring Prometheus to monitor the sample targets +## Configure Prometheus to monitor the sample targets Now we will configure Prometheus to scrape these new targets. Let's group all -three endpoints into one job called `example-random`. However, imagine that the +three endpoints into one job called `node`. However, imagine that the first two endpoints are production targets, while the third one represents a canary instance. To model this in Prometheus, we can add several groups of endpoints to a single job, adding extra labels to each group of targets. In @@ -173,7 +163,7 @@ section in your `prometheus.yml` and restart your Prometheus instance: ```yaml scrape_configs: - - job_name: 'example-random' + - job_name: 'node' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s @@ -189,8 +179,7 @@ scrape_configs: ``` Go to the expression browser and verify that Prometheus now has information -about time series that these example endpoints expose, such as the -`rpc_durations_seconds` metric. +about time series that these example endpoints expose, such as `node_cpu_seconds_total`. ## Configure rules for aggregating scraped data into new time series @@ -198,27 +187,26 @@ Though not a problem in our example, queries that aggregate over thousands of time series can get slow when computed ad-hoc. To make this more efficient, Prometheus allows you to prerecord expressions into completely new persisted time series via configured recording rules. Let's say we are interested in -recording the per-second rate of example RPCs -(`rpc_durations_seconds_count`) averaged over all instances (but -preserving the `job` and `service` dimensions) as measured over a window of 5 -minutes. We could write this as: +recording the per-second rate of cpu time (`node_cpu_seconds_total`) averaged +over all cpus per instance (but preserving the `job`, `instance` and `mode` +dimensions) as measured over a window of 5 minutes. We could write this as: ``` -avg(rate(rpc_durations_seconds_count[5m])) by (job, service) +avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m])) ``` Try graphing this expression. To record the time series resulting from this expression into a new metric -called `job_service:rpc_durations_seconds_count:avg_rate5m`, create a file +called `job_instance_mode:node_cpu_seconds:avg_rate5m`, create a file with the following recording rule and save it as `prometheus.rules.yml`: ``` groups: -- name: example +- name: cpu-node rules: - - record: job_service:rpc_durations_seconds_count:avg_rate5m - expr: avg(rate(rpc_durations_seconds_count[5m])) by (job, service) + - record: job_instance_mode:node_cpu_seconds:avg_rate5m + expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m])) ``` To make Prometheus pick up this new rule, add a `rule_files` statement in your `prometheus.yml`. The config should now @@ -245,7 +233,7 @@ scrape_configs: static_configs: - targets: ['localhost:9090'] - - job_name: 'example-random' + - job_name: 'node' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s @@ -261,5 +249,5 @@ scrape_configs: ``` Restart Prometheus with the new configuration and verify that a new time series -with the metric name `job_service:rpc_durations_seconds_count:avg_rate5m` +with the metric name `job_instance_mode:node_cpu_seconds:avg_rate5m` is now available by querying it through the expression browser or graphing it.