Simplify the Getting Started documentation. (#7193)

- Reduce the level of entry to start gathering metrics with prometheus
  by suggesting to just download pre-built exporters instead of requiring
  the reader to download an entire Golang build chain and checkout a project.

Fix #6956

Signed-off-by: Harold Dost <h.dost@criteo.com>
This commit is contained in:
Harold Dost 2020-05-04 12:49:45 +02:00 committed by GitHub
parent 7ecd2d1c24
commit 0e2004f6fb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 22 additions and 34 deletions

View File

@ -132,36 +132,26 @@ Experiment with the graph range parameters and other settings.
Let us make this more interesting and start some example targets for Prometheus
to scrape.
The Go client library includes an example which exports fictional RPC latencies
for three services with different latency distributions.
Ensure you have the [Go compiler installed](https://golang.org/doc/install) and
have a [working Go build environment](https://golang.org/doc/code.html) (with
correct `GOPATH`) set up.
Download the Go client library for Prometheus and run three of these example
processes:
The Node Exporter is used as an example target, for more information on using it
[see these instructions.](https://prometheus.io/docs/guides/node-exporter/)
```bash
# Fetch the client library code and compile example.
git clone https://github.com/prometheus/client_golang.git
cd client_golang/examples/random
go get -d
go build
tar -xzvf node_exporter-*.*.tar.gz
cd node_exporter-*.*
# Start 3 example targets in separate terminals:
./random -listen-address=:8080
./random -listen-address=:8081
./random -listen-address=:8082
./node_exporter --web.listen-address 127.0.0.1:8080
./node_exporter --web.listen-address 127.0.0.1:8081
./node_exporter --web.listen-address 127.0.0.1:8082
```
You should now have example targets listening on http://localhost:8080/metrics,
http://localhost:8081/metrics, and http://localhost:8082/metrics.
## Configuring Prometheus to monitor the sample targets
## Configure Prometheus to monitor the sample targets
Now we will configure Prometheus to scrape these new targets. Let's group all
three endpoints into one job called `example-random`. However, imagine that the
three endpoints into one job called `node`. However, imagine that the
first two endpoints are production targets, while the third one represents a
canary instance. To model this in Prometheus, we can add several groups of
endpoints to a single job, adding extra labels to each group of targets. In
@ -173,7 +163,7 @@ section in your `prometheus.yml` and restart your Prometheus instance:
```yaml
scrape_configs:
- job_name: 'example-random'
- job_name: 'node'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
@ -189,8 +179,7 @@ scrape_configs:
```
Go to the expression browser and verify that Prometheus now has information
about time series that these example endpoints expose, such as the
`rpc_durations_seconds` metric.
about time series that these example endpoints expose, such as `node_cpu_seconds_total`.
## Configure rules for aggregating scraped data into new time series
@ -198,27 +187,26 @@ Though not a problem in our example, queries that aggregate over thousands of
time series can get slow when computed ad-hoc. To make this more efficient,
Prometheus allows you to prerecord expressions into completely new persisted
time series via configured recording rules. Let's say we are interested in
recording the per-second rate of example RPCs
(`rpc_durations_seconds_count`) averaged over all instances (but
preserving the `job` and `service` dimensions) as measured over a window of 5
minutes. We could write this as:
recording the per-second rate of cpu time (`node_cpu_seconds_total`) averaged
over all cpus per instance (but preserving the `job`, `instance` and `mode`
dimensions) as measured over a window of 5 minutes. We could write this as:
```
avg(rate(rpc_durations_seconds_count[5m])) by (job, service)
avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
```
Try graphing this expression.
To record the time series resulting from this expression into a new metric
called `job_service:rpc_durations_seconds_count:avg_rate5m`, create a file
called `job_instance_mode:node_cpu_seconds:avg_rate5m`, create a file
with the following recording rule and save it as `prometheus.rules.yml`:
```
groups:
- name: example
- name: cpu-node
rules:
- record: job_service:rpc_durations_seconds_count:avg_rate5m
expr: avg(rate(rpc_durations_seconds_count[5m])) by (job, service)
- record: job_instance_mode:node_cpu_seconds:avg_rate5m
expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
```
To make Prometheus pick up this new rule, add a `rule_files` statement in your `prometheus.yml`. The config should now
@ -245,7 +233,7 @@ scrape_configs:
static_configs:
- targets: ['localhost:9090']
- job_name: 'example-random'
- job_name: 'node'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
@ -261,5 +249,5 @@ scrape_configs:
```
Restart Prometheus with the new configuration and verify that a new time series
with the metric name `job_service:rpc_durations_seconds_count:avg_rate5m`
with the metric name `job_instance_mode:node_cpu_seconds:avg_rate5m`
is now available by querying it through the expression browser or graphing it.