Go to file
Callum Styan 6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
.circleci *: bump gRPC dependencies (#5075) 2019-01-15 15:32:05 +01:00
.github Fix quoting in issue template (#4688) 2018-10-02 14:52:57 +02:00
cmd Tail the TSDB WAL for remote_write 2019-02-12 11:39:13 +00:00
config Tail the TSDB WAL for remote_write 2019-02-12 11:39:13 +00:00
console_libraries Cut down console template examples to just node and prometheus (#3099) 2017-08-21 16:35:20 +01:00
consoles Update example console template for node exporter 0.16.0 (#4208) 2018-06-08 14:01:05 +01:00
discovery Fix fmt.Errorf error message (#5199) 2019-02-10 15:16:20 +05:30
docs docs: Specifically call out NFS and POSIX 2019-01-31 12:57:48 +01:00
documentation update remote write path proto so that Labels/Timeseries can't be nil (#4957) 2019-01-15 19:13:39 +00:00
notifier Fix typo in comment (#5061) 2019-01-04 10:57:17 +00:00
pkg show list of offending labels in the error message in many-to-many scenarios (#5189) 2019-02-09 10:17:52 +01:00
prompb update remote write path proto so that Labels/Timeseries can't be nil (#4957) 2019-01-15 19:13:39 +00:00
promql show list of offending labels in the error message in many-to-many scenarios (#5189) 2019-02-09 10:17:52 +01:00
relabel Moved configuration into `relabel` package. (#4955) 2018-12-18 11:26:36 +00:00
rules Fix prometheus_rule_group_last_evaluation_timestamp_seconds 2019-02-06 11:02:49 +01:00
scrape Fix fmt.Errorf error message (#5199) 2019-02-10 15:16:20 +05:30
scripts *: bump gRPC dependencies (#5075) 2019-01-15 15:32:05 +01:00
storage Tail the TSDB WAL for remote_write 2019-02-12 11:39:13 +00:00
template add alert template expanding failure metric (#4747) 2018-11-06 14:39:06 +00:00
util corrected regex string check for anyorigin(*) (#5117) 2019-01-21 17:17:27 +05:30
vendor Use the latest versions of azure go sdk and go-autorest (#5015) 2019-01-28 18:30:29 +00:00
web Tail the TSDB WAL for remote_write 2019-02-12 11:39:13 +00:00
.dockerignore
.gitignore cleanup gitignore (#3869) 2018-02-20 11:03:22 +00:00
.promu.yml promu: fix ldflags for Go modules (#4929) 2018-11-30 17:10:43 +01:00
.travis.yml Merge branch 'master' into go-modules 2018-11-09 11:42:12 +01:00
CHANGELOG.md Merge v2.7.1 into master (#5170) 2019-02-01 09:54:12 +01:00
CONTRIBUTING.md Fix spelling/typos (#4921) 2018-11-27 17:44:29 +01:00
Dockerfile Rollback Dockerfile to version @ 2.5.x (#5122) 2019-01-21 17:27:16 +05:30
LICENSE
MAINTAINERS.md Update Fabian's email address 2018-11-30 09:37:40 +01:00
Makefile *: bump gRPC dependencies (#5075) 2019-01-15 15:32:05 +01:00
Makefile.common Unset GO111MODULE variable in Makefile.common (#5191) 2019-02-07 17:22:04 +01:00
NOTICE Update NOTICE for gogo/protobuf 2017-11-02 15:28:47 +01:00
README.md Move the build badge to the badge list (#5060) 2019-01-03 14:56:49 +00:00
RELEASE.md Propose myself as the 2.7 shepard 2019-01-07 18:44:05 +05:30
VERSION Merge v2.7.1 into master (#5170) 2019-02-01 09:54:12 +01:00
code-of-conduct.md Add CNCF code of conduct as the Prometheus code of conduct 2016-10-19 21:39:19 +02:00
go.mod Use the latest versions of azure go sdk and go-autorest (#5015) 2019-01-28 18:30:29 +00:00
go.sum Use the latest versions of azure go sdk and go-autorest (#5015) 2019-01-28 18:30:29 +00:00

README.md

Prometheus

Build Status CircleCI Docker Repository on Quay Docker Pulls Go Report Card CII Best Practices

Visit prometheus.io for the full documentation, examples and guides.

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.

Prometheus' main distinguishing features as compared to other monitoring systems are:

  • a multi-dimensional data model (timeseries defined by metric name and set of key/value dimensions)
  • a flexible query language to leverage this dimensionality
  • no dependency on distributed storage; single server nodes are autonomous
  • timeseries collection happens via a pull model over HTTP
  • pushing timeseries is supported via an intermediary gateway
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support
  • support for hierarchical and horizontal federation

Architecture overview

Install

There are various ways of installing Prometheus.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Prometheus. See the Installing chapter in the documentation for all the details.

Debian packages are available.

Docker images

Docker images are available on Quay.io or Docker Hub.

You can launch a Prometheus container for trying it out with

$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus

Prometheus will now be reachable at http://localhost:9090/.

Building from source

To build Prometheus from the source code yourself you need to have a working Go environment with version 1.11 or greater installed.

You can directly use the go tool to download and install the prometheus and promtool binaries into your GOPATH:

$ go get github.com/prometheus/prometheus/cmd/...
$ prometheus --config.file=your_config.yml

You can also clone the repository yourself and build using make:

$ mkdir -p $GOPATH/src/github.com/prometheus
$ cd $GOPATH/src/github.com/prometheus
$ git clone https://github.com/prometheus/prometheus.git
$ cd prometheus
$ make build
$ ./prometheus --config.file=your_config.yml

The Makefile provides several targets:

  • build: build the prometheus and promtool binaries
  • test: run the tests
  • test-short: run the short tests
  • format: format the source code
  • vet: check the source code for common errors
  • assets: rebuild the static assets
  • docker: build a docker container for the current HEAD

More information

  • The source code is periodically indexed: Prometheus Core.
  • You will find a Travis CI configuration in .travis.yml.
  • See the Community page for how to reach the Prometheus developers and users on various communication channels.

Contributing

Refer to CONTRIBUTING.md

License

Apache License 2.0, see LICENSE.