Commit Graph

354 Commits

Author SHA1 Message Date
Simon Pasquier
08c2f50382
Merge pull request #4418 from simonpasquier/log-vm-limits
prometheus: log virtual memory limits
2018-08-07 16:27:46 +02:00
Frederic Branczyk
b0b3e3dd74
promql: Remove old and unused alerting/reconding syntax
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2018-08-07 15:14:06 +02:00
Dave Henderson
73a08f0045 promtool - Adding --step flag to 'query range' subcommand (#4454)
Signed-off-by: Dave Henderson <dhenderson@gmail.com>
2018-08-05 11:03:18 +02:00
Julius Volz
90521a65f8
Remove error return value from NotifyFunc() (#4459)
It's always nil and we also forgot to check it.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-04 21:31:12 +02:00
Ganesh Vernekar
f1db699dff Persist alert 'for' state across restarts (#4061)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-02 11:18:24 +01:00
Simon Pasquier
a94450c288 Fix build for openbsd
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-31 14:41:30 +02:00
Simon Pasquier
141c188ae6 Enforce conversion for freebsd
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 14:58:56 +02:00
Simon Pasquier
208d21a393 Add comment and print units
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 10:26:58 +02:00
Simon Pasquier
ba22b10113 prometheus: log virtual memory limits
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-25 15:51:27 +02:00
Daisy T
a3376e8f36 add query labels command to promtool (#4346)
Signed-off-by: Daisy T <daisyts@gmx.com>
2018-07-18 16:27:28 +02:00
Julius Volz
95dfb1b1dd
Add missing import to promtool, fix build (#4395)
Sorry, I used GitHub's web-based merge-conflict-resolution editor on
https://github.com/prometheus/prometheus/pull/4308 and it didn't show me
test errors afterwards, but maybe they didn't run again or I should have
waited or something.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 10:26:45 +02:00
Shubheksha
125da3b812 promtool: add command for querying series (#4308)
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
2018-07-18 10:15:58 +02:00
Julius Volz
03aa3a3de8
main: Improve / clean up error messages (#4286)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 09:58:40 +02:00
Chih-Hung Yeh
912d19fb85 Add 3 commands in promtool for getting debug information from prometheus server (#4247)
`debug all` - all information
`debug metrics` - metrics  information
`debug pprof` - profiling  information

the final result is compressed in a `tar.gz` file

Signed-off-by: chyeh <chyeh.taiwan@gmail.com>
2018-07-18 10:52:01 +03:00
Brian Brazil
68e8b80ffe
Reorder startup and shutdown to prevent panics. (#4321)
Start rule manager only after tsdb and config is loaded.
Stop rule manager before tsdb to avoid writing to closed storage.
Wait for any in-progress reloads to complete before shutting
down rule manager, so that rule manager doesn't get updated after
being shut down.

Remove incorrect comment around shutting down query enginge.
Log when config reload is completed.

Fixes #4133
Fixes #4262

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-07-04 13:41:16 +01:00
Michael Khalil
78e0784d04 return error exit status in prometheus cli (#4296)
Signed-off-by: mikeykhalil <mikeyfkhalil@gmail.com>
2018-06-21 08:32:26 +01:00
Tom Wilkie
8acad5f3cd make it compile
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-05-24 15:40:24 +01:00
Tom Wilkie
e51d6c4b6c Make remote flush deadline a command line param.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-05-23 15:06:01 +01:00
Sneha Inguva
c1a851074b promtool: add query instant and query range commands (#4085)
* promtool: add QueryInstant and QueryRange cmds

* promtool: add more query functions

* promtool: finished query Instant

* promtool: add range query

* promtool: add query command and address arguments

* vendor client and api
2018-04-26 20:41:56 +02:00
Mario Trangoni
464e747f1e fix some comments typos (#4059) 2018-04-08 10:51:54 +01:00
Sneha Inguva
7be846754a main: actor functionality comments 2018-04-01 11:19:30 -07:00
Marek Siarkowicz
bb86c3f62b Report internal runtime information on status page (#3921)
Add information about tsdb, wal and config reload
2018-03-21 16:08:37 +00:00
James Turnbull
ba5273a0ab Minor edits to help text (#3990) 2018-03-20 16:54:36 +00:00
Simon Pasquier
e1fd96db25 cmd: fix help text (#3989) 2018-03-20 15:58:19 +00:00
ferhat elmas
ffa673f7d8 General simplifications (#3887)
Another try as in #1516
2018-02-26 07:58:10 +00:00
Bartek Plotka
93a63ac5fd api: Added v1/status/flags endpoint. (#3864)
Endpoint URL: /api/v1/status/flags
Example Output:
```json
{
  "status": "success",
  "data": {
    "alertmanager.notification-queue-capacity": "10000",
    "alertmanager.timeout": "10s",
    "completion-bash": "false",
    "completion-script-bash": "false",
    "completion-script-zsh": "false",
    "config.file": "my_cool_prometheus.yaml",
    "help": "false",
    "help-long": "false",
    "help-man": "false",
    "log.level": "info",
    "query.lookback-delta": "5m",
    "query.max-concurrency": "20",
    "query.timeout": "2m",
    "storage.tsdb.max-block-duration": "36h",
    "storage.tsdb.min-block-duration": "2h",
    "storage.tsdb.no-lockfile": "false",
    "storage.tsdb.path": "data/",
    "storage.tsdb.retention": "15d",
    "version": "false",
    "web.console.libraries": "console_libraries",
    "web.console.templates": "consoles",
    "web.enable-admin-api": "false",
    "web.enable-lifecycle": "false",
    "web.external-url": "",
    "web.listen-address": "0.0.0.0:9090",
    "web.max-connections": "512",
    "web.read-timeout": "5m",
    "web.route-prefix": "/",
    "web.user-assets": ""
  }
}
```

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-02-21 08:49:02 +00:00
Fabian Reinartz
7ccd4b39b8 *: implement query params
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
2018-02-13 12:17:22 +01:00
Conor Broderick
5169ccf258
Merge pull request #3724 from simonpasquier/fix-bad-data-error
Don't reset FiredAt for inactive alerts
2018-02-01 16:18:09 +00:00
Krasi Georgiev
b75428ec19 rename package retrieve to scrape
no fucnctinal changes just renaming retrieval to scrape
2018-02-01 09:55:07 +00:00
Krasi Georgiev
7858745c04 rename structs for consistency 2018-01-30 17:49:05 +00:00
Krasi Georgiev
acc4197098 remove dicovery race for the context field 2018-01-29 15:18:07 +00:00
Julien Pivotto
8b20cb1e8d last config success time gauge: use SetToCurrentTime() (#3750)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2018-01-27 07:48:13 +00:00
Simon Pasquier
81c0ab69e0 Don't reset FiredAt for inactive alerts
Otherwise AlertManager receives resolved alerts where StartsAt is zero which
fails the validation.
2018-01-22 17:17:33 +01:00
Krasi Georgiev
719c579f7b refactor main execution reloadReady handling, update some comments 2018-01-17 18:14:24 +00:00
Krasi Georgiev
0eafaf32d3 set the correct config reloading execution for scraper and notifier 2018-01-17 13:06:56 +00:00
Krasi Georgiev
97f0461e29 refactor the config reloading execution 2018-01-17 12:02:13 +00:00
Krasi Georgiev
5260c650ec use the config hash for the map lookup 2018-01-16 11:10:54 +00:00
Krasi Georgiev
8369826808 comment to rethink the map reference for the notifier discovery 2018-01-16 09:47:53 +00:00
Krasi Georgiev
d12e6f29fc discovery manager ApplyConfig now takes a direct ServiceDiscoveryConfig so that it can be used for the notify manager
reimplement the service discovery for the notify manager

Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2018-01-15 13:39:44 +00:00
Shubheksha Jalan
0471e64ad1 Use shared types from the common repo (#3674)
* refactor: use shared types from common repo, remove util/config

* vendor: add common/config

* fix nit
2018-01-11 16:10:25 +01:00
Goutham Veeramachaneni
35a6ffbaf3
Merge pull request #3587 from krasi-georgiev/web-test-error-check
handle web_test webhandler errors.
2018-01-10 22:03:25 +05:30
Shubheksha Jalan
ec94df49d4 Refactor SD configuration to remove config dependency (#3629)
* refactor: move targetGroup struct and CheckOverflow() to their own package

* refactor: move auth and security related structs to a utility package, fix import error in utility package

* refactor: Azure SD, remove SD struct from config

* refactor: DNS SD, remove SD struct from config into dns package

* refactor: ec2 SD, move SD struct from config into the ec2 package

* refactor: file SD, move SD struct from config to file discovery package

* refactor: gce, move SD struct from config to gce discovery package

* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil

* refactor: consul, move SD struct from config into consul discovery package

* refactor: marathon, move SD struct from config into marathon discovery package

* refactor: triton, move SD struct from config to triton discovery package, fix test

* refactor: zookeeper, move SD structs from config to zookeeper discovery package

* refactor: openstack, remove SD struct from config, move into openstack discovery package

* refactor: kubernetes, move SD struct from config into kubernetes discovery package

* refactor: notifier, use targetgroup package instead of config

* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup

* refactor: retrieval, use targetgroup package instead of config.TargetGroup

* refactor: storage, use config util package

* refactor: discovery manager, use targetgroup package instead of config.TargetGroup

* refactor: use HTTPClient and TLS config from configUtil instead of config

* refactor: tests, use targetgroup package instead of config.TargetGroup

* refactor: fix tagetgroup.Group pointers that were removed by mistake

* refactor: openstack, kubernetes: drop prefixes

* refactor: remove import aliases forced due to vscode bug

* refactor: move main SD struct out of config into discovery/config

* refactor: rename configUtil to config_util

* refactor: rename yamlUtil to yaml_config

* refactor: kubernetes, remove prefixes

* refactor: move the TargetGroup package to discovery/

* refactor: fix order of imports
2017-12-29 21:01:34 +01:00
Brian Brazil
ecc24b554d
Hide block duration flags. (#3618)
Users are starting to use these mistakenly thinking they'll help
with issues, and thus causing some confusion.
Thus hide them and make it clear that they're only there for testing
reasons.
2017-12-24 12:13:48 +00:00
Krasi Georgiev
c94fa731aa bypass the proxy for the tests 2017-12-20 18:21:10 +00:00
Krasi Georgiev
ad66476c4f fix flaky main.go test and simplify a bit 2017-12-19 15:07:49 +00:00
Fabian Reinartz
2881d73ed8
Merge pull request #3362 from krasi-georgiev/discovery-refactoring
Decouple the discovery and refactor the retrieval package
2017-12-19 12:56:34 +01:00
Goutham Veeramachaneni
9c9f96b2c0
Merge pull request #3529 from krasi-georgiev/main-integration-test
main.go integration test for Startup interrupting.
2017-12-18 22:12:13 -06:00
Krasi Georgiev
587dec9eb9 rebased and resolved conflicts with the new Discovery GUI page
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2017-12-18 20:10:03 +00:00
Krasi Georgiev
1ec76d1950 rearange the contexts variables and logic
split the groupsMerge function to set and get
other small nits
2017-12-18 17:23:47 +00:00
Krasi Georgiev
6ff1d5c51e add the scrape manager config reloader
handle errors with invalid scrape config
2017-12-18 17:23:47 +00:00
Krasi Georgiev
b0d4f6ee08 resolved merge confilc in main.go 2017-12-18 17:23:46 +00:00
Krasi Georgiev
c5cb0d2910 simplify naming and API. 2017-12-18 17:22:50 +00:00
Krasi Georgiev
9c61f0e8a0 scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage. 2017-12-18 17:22:49 +00:00
Krasi Georgiev
e405e2f1ea refactored discovery 2017-12-18 17:22:49 +00:00
pasquier-s
2440696961 Log file descriptor limits at startup (#3567)
Fixes #3564
2017-12-11 13:01:53 +00:00
Alberto Cortés
29da2fb9cd testutil: update to go1.9 testing.Helper 2017-12-08 19:06:53 +01:00
Alberto Cortés
8f6a9f7833 config: simplify tests by using testutil.NotOk (#3289)
Also include filename in all LoadFile errors

Also add mesage to testuitl.NotOk so we can identify failing tests when
using table driven tests.
2017-12-08 16:52:25 +00:00
Krasi Georgiev
740662644e write to temp dir and remove it at the end.
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2017-12-06 10:45:58 +00:00
Brian Brazil
b97f4cf48c Add metrics for rule group interval and last duration. 2017-12-04 11:44:38 +00:00
Krasi Georgiev
2c2a962da3 main.go integration test for Startup interrupting. 2017-12-01 10:58:01 +00:00
Goutham Veeramachaneni
823b7f90b3
Use the files globbed files and not the files in cfg
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-30 17:08:34 +05:30
Fabian Reinartz
62461379b7 rules: decouple notifier packages
The dependency on the notifier packages caused a transitive dependency
on discovery and with that all client libraries our service discovery
uses.
2017-11-27 16:38:14 +01:00
Fabian Reinartz
4d964a0a0d rules: make glob expansion a concern of main 2017-11-24 08:22:57 +01:00
Fabian Reinartz
bd9f7460eb rules: remove config package dependency 2017-11-24 07:57:54 +01:00
Fabian Reinartz
2d0e3746ac rules: remove dependency on promql.Engine 2017-11-24 07:57:54 +01:00
Krasi Georgiev
e2f4850fea Refactor main.go with oklog/pkg/group actors pattern 2017-11-11 12:33:15 +00:00
Thibault Chataigner
fc4406201e Tsdb StartTime : Use a simplier way to compute StartTime 2017-10-25 17:41:00 +02:00
Julius Volz
099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Julius Volz
9d43176ab3 Remove unused printVersion variable (#3335)
Kingpin now automatically does this via --version.
2017-10-23 08:50:13 +01:00
Julius Volz
82c5b98496 Capitalize Prometheus in startup message (#3332)
Hey, branding :)
2017-10-23 08:49:28 +01:00
Thibault Chataigner
bf4a279a91 Remote storage reads based on oldest timestamp in primary storage (#3129)
Currently all read queries are simply pushed to remote read clients.
This is fine, except for remote storage for wich it unefficient and
make query slower even if remote read is unnecessary.
So we need instead to compare the oldest timestamp in primary/local
storage with the query range lower boundary. If the oldest timestamp
is older than the mint parameter, then there is no need for remote read.
This is an optionnal behavior per remote read client.

Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>
2017-10-18 12:08:14 +01:00
Julius Volz
5f715f5733 Fix typo in flag description (#3302) 2017-10-16 23:00:05 +01:00
Tobias Schmidt
3589f2f1d4 Merge pull request #3285 from jlevesy/use-testutils-in-cmd-subpackage
Use testutil assertion helpers in cmd package
2017-10-13 00:12:39 +02:00
Julien Levesy
d7b4fa8d78 use testutil assertions in the cmd/prometheus package 2017-10-12 13:45:38 +02:00
Mathieu Pasquet
38afa507bb Provide better errors messages in commandline
Instead or only printing the help message, which is not always helpful.
For example, when upgrading from prometheus v1, the retention time value
format has changed and now only accepts one unit (e.g. "15d") where it
previously allowed more complex strings (e.g. "360h0m0s").

This commit provides the error message as an explanation for the parsing
failure.
2017-10-09 16:25:50 +02:00
Marc Sluiter
6a633eece1 Added go-conntrack for monitoring http connections (#3241)
Added metrics for in- and outgoing traffic with go-conntrack.
2017-10-06 11:22:19 +01:00
Fabian Reinartz
2d0b8e8b94 Merge branch 'master' into dev-2.0 2017-10-05 13:09:18 +02:00
Paul Gier
08af129b4d cmd/prometheus: don't allow quotes at beginning or end of url
This prevents accidental copy/paste error where a the web.external-url
or alertmanager.url params could have an extra set of quotes.
See also: https://github.com/prometheus/prometheus/issues/1229
2017-10-04 10:10:02 -05:00
Paul Gier
f79b55d057 cmd/prometheus: remove govalidator for url validation
The usage of govalidator is redundant with the call to url.Parse for
url validation. Removing it has the following benefits:

 - The explicit error message is displayed instead of just a generic
   valid/invalid message
 - Slightly smaller code with one fewer external dependency
 - Speed improvement by removing duplicate call to url.Parse (inside
   govalidator.IsURL()
 - Resolves issue #2717

The only potential drawback of removing govalidator is that certain
URLs will be considered valid which were previously invalid. For example:

 - URLs with hostnames that start and/or end with an underscore (http://_example.com_)
 - URLs with hostnames that contain some special characters (http://foo&*bar.org)

These are valid URIs according to RFC 3986 and valid domain names per RFC 2181,
however they are not valid hostnames per RFC 952.
2017-10-04 10:08:34 -05:00
Fabian Reinartz
7b02bfee0a web: start web handler while TSDB is starting up 2017-09-20 15:03:19 +02:00
Goutham Veeramachaneni
f5aed810f9 logging: Port to common/promlog
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-09-15 12:40:50 +05:30
Fabian Reinartz
d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Fabian Reinartz
c70379e1c7 Merge branch 'dev-2.0' of github.com:prometheus/prometheus into dev-2.0 2017-09-04 13:10:50 +02:00
Fabian Reinartz
fffe51fb03 Add mutex and block profiling via envvar 2017-09-04 13:10:32 +02:00
Ben Kochie
59aca4138b Fix staticcheck issues. 2017-08-28 17:29:01 +02:00
Matt Bostock
64973f5c65 cmd/prometheus: Fix capitalisation in log line (#3123)
Change 'Ready' to 'ready'.
2017-08-28 11:03:25 +01:00
Mark Adams
77c816b309 Fix pprof endpoints when -web.route-prefix or -web.external-url is used (#3054)
Whenever a route prefix is applied, the router prepends the prefix to
the URL path on the request. For most handlers, this is not an issue
because the request's path is only used for routing and is not actually
needed by the handler itself. However, Prometheus delegates the handling
of the /debug/* endpoints to the http.DefaultServeMux which has it's own
routing logic that depends on the url.Path. As a result, whenever a
prefix is applied, the prefixed URL is passed to the DefaultServeMux
which has no awareness of the prefix and returns a 404.

This change fixes the issue by creating a new serveDebug handler which
routes requests /debug/* requests to appropriate net/http/pprof handler
and removing the net/http/pprof import in cmd/prometheus since it is no
longer necessary.

Fixes #2183.
2017-08-23 00:00:56 +01:00
Callum Styan
8912f81ffe check if file_sd files exist in checkConfig 2017-08-22 15:25:30 -07:00
Fabian Reinartz
25f3e1c424 Merge branch 'master' into mergemaster 2017-08-10 17:04:25 +02:00
KalivarapuReshma
686050d816 Change -config.file to --config.file in Readme and error message 2017-08-08 12:49:35 +05:30
emluque
ff54c5c11a 2831 Add Healthy and Ready endpoints 2017-08-07 17:34:04 -03:00
Fabian Reinartz
4d3d8ee229 Merge pull request #2850 from tomwilkie/dev-2.0-remote
Remote APIs for v2
2017-08-03 13:39:09 +02:00
Julius Volz
cc50aa2c6b main: Consistently end flag descriptions with periods. (#2977) 2017-07-20 23:48:35 +02:00
Tom Wilkie
2dda5775e3 Initial port of remote storage to v2. 2017-07-12 12:27:57 +01:00
Fabian Reinartz
32226e30f5 Guard reload and quit endpoints by flag 2017-07-11 14:25:07 +02:00
Fabian Reinartz
45ac064669 web: disable Amin APIs by default 2017-07-10 09:29:41 +02:00
Fabian Reinartz
ccf9e62972 *: add admin grpc API 2017-07-10 09:14:14 +02:00
Fabian Reinartz
be32afd6df cmd/prometheus: add back tsdb.no-lockfile flag 2017-06-22 15:02:10 +02:00
Goutham Veeramachaneni
f9202c6511
Move from .yaml to .yml in update rules
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-21 18:38:37 +05:30
Goutham Veeramachaneni
e3701077c3
Move promtool to kingpin
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-21 17:42:57 +05:30
Fabian Reinartz
867b8d108f cmd/prometheus: cleanup 2017-06-21 11:38:13 +02:00
Fabian Reinartz
34ab7a885a cmd/prometheus: switch to kingpin 2017-06-20 17:38:01 +02:00
Goutham Veeramachaneni
592cb00c2f
Remove version from RuleGroups
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-19 16:38:46 +05:30
Goutham Veeramachaneni
37e7b69f56
Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-19 16:34:55 +05:30
Goutham Veeramachaneni
67dc73fd59
Flag changes for 2.0
Fixes: prometheus/prometheus#2087

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 20:21:41 +05:30
Goutham Veeramachaneni
d407bd150c Consolidate the duration params in CLI
* All CLI params moved to model.Duration

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 20:20:57 +05:30
Goutham Veeramachaneni
6b70a4d850
Incorporate PR feedback
* Move fingerprint to Hash()
* Move away from tsdb.MultiError
* 0777 -> 0666 for files
* checkOverflow of extra fields

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:44:33 +05:30
Goutham Veeramachaneni
6c1617fd13
Simplify usage string
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 15:55:13 +05:30
Goutham Veeramachaneni
507790a357
Rework logging to use explicitly passed logger
Mostly cleaned up the global logger use. Still some uses in discovery
package.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 15:52:44 +05:30
Goutham Veeramachaneni
dc69645e92
Move back to go-yaml
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 10:46:21 +05:30
Goutham Veeramachaneni
8abb91f656
Move CLI commander to cobra
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-15 16:38:08 +05:30
Goutham Veeramachaneni
1c08743721
Update check-rules to new format.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 13:32:26 +05:30
Goutham Veeramachaneni
cea1e99f78
Add update-rules command to promtool
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 11:38:54 +05:30
Fabian Reinartz
669075c6b9 Merge branch 'master' into dev-2.0 2017-06-06 09:36:51 +02:00
Chris Goller
42de0ae013 Use log.Logger interface for all discovery services 2017-06-01 11:25:55 -05:00
Conor Broderick
6766123f93 Replace regex with Secret type and remarshal config to hide secrets (#2775) 2017-05-29 12:46:23 +01:00
Fabian Reinartz
4c31061251 Merge branch 'master' into dev-2.0 2017-05-24 15:36:17 +02:00
Fabian Reinartz
d289dc55c3 storage: update TSDB 2017-05-22 11:53:08 +02:00
Shashank Varanasi
dea60bb553 Fix malformed uname string (#2727)
* Fix malformed uname string

* Make fix better

* Reformat code for simplicity
2017-05-16 18:44:11 +02:00
Fabian Reinartz
06c2b76cd4 Merge branch 'master' into uptsdb 2017-05-16 16:48:37 +02:00
Shashank Varanasi
61235fd851 Print system information (uname) at Prometheus startup (#2709)
* Print uname on prom startup

* Make uname file linux-only

* Add missing license headers

Add missing license headers

* Print OS when uname is not available

* Print only OS name when uname not available

* Remove extra space, fix cmd/prometheus/main.go license header

* Add fix for int8 and uint8 systems

* Better formatting for build tags in cmd/prometheus/uname files

* Remove newline
2017-05-13 20:42:29 +02:00
Frederic Branczyk
c50a3eccce
prometheus: default max-block-duration to 10% of retention 2017-05-12 11:48:51 +02:00
Michal Witkowski
4177c35eba Fixup sighup for P2 TSDB init #2699 2017-05-09 17:00:54 +01:00
Fabian Reinartz
9b175d48cb Add flag to disable TSDB lock file 2017-05-09 12:56:51 +02:00
Fabian Reinartz
73b8ff0ddc Merge branch 'master' into dev-2.0 2017-04-27 10:19:55 +02:00
Matt Layher
283756c503 Initial commit of 'promtool check-metrics', promlint package (#2605) 2017-04-13 23:53:41 +02:00
Fabian Reinartz
757cba7c31 cmd/prometheus: Undo GOGC adjustment 2017-04-10 16:22:01 +02:00
beorn7
f20b84e816 flags: Improve doc strings for checkpoint flags 2017-04-07 13:10:12 +02:00
Fabian Reinartz
8ffc851147 Merge branch 'master' into dev-2.0 2017-04-04 15:17:56 +02:00
Julius Volz
589061919a Merge pull request #2465 from Gouthamve/alert-metrics-2429
Better Metrics For Alerts
2017-03-31 21:45:05 +02:00
Goutham Veeramachaneni
f27ce34a13
Use Registerer to Register All Metrics
* Made Metric a Gauge so that it can be registered.
2017-04-01 00:14:30 +05:30
Goutham Veeramachaneni
0d0c9d5440
Move Registerer to Config Struct in Notifier 2017-03-31 21:20:12 +05:30
Björn Rabenstein
29f05680a2 Merge pull request #2528 from prometheus/beorn7/storage2
main.go: Set GOGC to 40 by default
2017-03-27 15:00:37 +02:00
Björn Rabenstein
e63d079b59 Merge pull request #2527 from prometheus/beorn7/storage
storage: Evict chunks and calculate persistence pressure...
2017-03-27 14:49:42 +02:00
Julius Volz
b5b0e00923 Merge pull request #2499 from prometheus/remote-read
Remote Read
2017-03-27 14:43:44 +02:00
beorn7
434ab2a6a3 storage: Evict chunks and calculate persistence pressure based on target heap size
This is a fairly easy attempt to dynamically evict chunks based on the
heap size. A target heap size has to be set as a command line flage,
so that users can essentially say "utilize 4GiB of RAM, and please
don't OOM".

The -storage.local.max-chunks-to-persist and
-storage.local.memory-chunks flags are deprecated by this
change. Backwards compatibility is provided by ignoring
-storage.local.max-chunks-to-persist and use
-storage.local.memory-chunks to set the new
-storage.local.target-heap-size to a reasonable (and conservative)
value (both with a warning).

This also makes the metrics intstrumentation more consistent (in
naming and implementation) and cleans up a few quirks in the tests.

Answers to anticipated comments:

There is a chance that Go 1.9 will allow programs better control over
the Go memory management. I don't expect those changes to be in
contradiction with the approach here, but I do expect them to
complement them and allow them to be more precise and controlled. In
any case, once those Go changes are available, this code has to be
revisted.

One might be tempted to let the user specify an estimated value for
the RSS usage, and then internall set a target heap size of a certain
fraction of that. (In my experience, 2/3 is a fairly safe bet.)
However, investigations have shown that RSS size and its relation to
the heap size is really really complicated. It depends on so many
factors that I wouldn't even start listing them in a commit
description. It depends on many circumstances and not at least on the
risk trade-off of each individual user between RAM utilization and
probability of OOMing during a RAM usage peak. To not add even more to
the confusion, we need to stick to the well-defined number we also use
in the targeting here, the sum of the sizes of heap objects.
2017-03-27 14:33:50 +02:00
beorn7
96a303b348 storage: Use staleness delta as head chunk timeout
Currently, if a series stops to exist, its head chunk will be kept
open for an hour. That prevents it from being persisted. Which
prevents it from being evicted. Which prevents the series from being
archived.

Most of the time, once no sample has been added to a series within the
staleness limit, we can be pretty confident that this series will not
receive samples anymore. The whole chain as described above can be
started after 5m instead of 1h. In the relaxed case, this doesn't
change a lot as the head chunk timeout is only checked during series
maintenance, and usually, a series is only maintained every six
hours. However, there is the typical scenario where a large service is
deployed, the deoply turns out to be bad, and then it is deployed
again within minutes, and quite quickly the number of time series has
tripled. That's the point where the Prometheus server is stressed and
switches (rightfully) into rushed mode. In that mode, time series are
processed as quickly as possible, but all of that is in vein if all of
those recently ended time series cannot be persisted yet for another
hour. In that scenario, this change will help most, and it's exactly
the scenario where help is most desperately needed.
2017-03-26 23:44:50 +02:00
beorn7
04ccf84559 main.go: Set GOGC to 40 by default
Rationale: The default value for GOGC is 100, i.e. a garbage collected
is initialized once as many heap space has been allocated as was in
use after the last GC was done. This ratio doesn't make a lot of sense
in Prometheus, as typically about 60% of the heap is allocated for
long-lived memory chunks (most of which are around for many hours if
not days). Thus, short-lived heap objects are accumulated for quite
some time until they finally match the large amount of memory used by
bulk memory chunks and a gigantic GC cyle is invoked. With GOGC=40, we
are essentially reinstating "normal" GC behavior by acknowledging that
about 60% of the heap are used for long-term bulk storage.

The median Prometheus production server at SoundCloud runs a GC cycle
every 90 seconds. With GOGC=40, a GC cycle is run every 35 seconds
(which is still not very often). However, the effective RAM usage is
now reduced by about 30%. If settings are updated to utilize more RAM,
the time between GC cycles goes up again (as the heap size is larger
with more long-lived memory chunks, but the frequency of creating
short-lived heap objects does not change). On a quite busy large
Prometheus server, the timing changed from one GC run every 20s to one
GC run every 12s.

In the former case (just changing GOGC, leave everything else as it
is), the CPU usage increases by about 10% (on a mid-size referenc
server from 8.1 to 8.9). If settings are adjusted, the CPU
consumptions increases more drastically (from 8 cores to 13 cores on a
large reference server), despite GCs happening more rarely, presumably
because a 50% larger set of memory chunks is managed now. Having more
memory chunks is good in many regards, and most servers are running
out of memory long before they run out of CPU cycles, so the tradeoff
is overwhelmingly positive in most cases.

Power users can still set the GOGC environment variable as usual, as
the implementation in this commit honors an explicitly set variable.
2017-03-26 21:55:37 +02:00
Julius Volz
8fda83ea12 Make rules only read local data 2017-03-21 00:50:04 +01:00
Julius Volz
406b65d0dc Rename remote.Storage to remote.Writer 2017-03-20 13:15:28 +01:00
Julius Volz
02395a224d [WIP] Remote Read 2017-03-20 13:13:44 +01:00
Fabian Reinartz
b586781283 *: update tsdb vendoring and add retention flag 2017-03-17 16:06:04 +01:00
Goutham Veeramachaneni
f35816613e
Refactored Notifier to use Registerer
* Brought metrics back into Notifier

Notifier still implements a Collector. Check if that is needed.
2017-03-03 02:53:16 +05:30
Fabian Reinartz
9304179ef7 Merge branch 'master' into dev-2.0 2017-03-02 08:16:58 +01:00
Fabian Reinartz
4397b4d508 *: pass Prometheus registry into storage 2017-02-28 09:33:14 +01:00
Julius Volz
beb3c4b389 Remove legacy remote storage implementations
This removes legacy support for specific remote storage systems in favor
of only offering the generic remote write protocol. An example bridge
application that translates from the generic protocol to each of those
legacy backends is still provided at:

documentation/examples/remote_storage/remote_storage_bridge

See also https://github.com/prometheus/prometheus/issues/10

The next step in the plan is to re-add support for multiple remote
storages.
2017-02-14 17:52:05 +01:00
Fabian Reinartz
ea3ba338dd main: add flags for new storage 2017-02-05 18:22:06 +01:00
Fabian Reinartz
5772f1a7ba retrieval/storage: adapt to new interface
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
Fabian Reinartz
1d3cdd0d67 Merge branch 'master' into dev-2.0-rebase 2017-01-30 17:43:01 +01:00
Fabian Reinartz
035976b275 retrieval: handle not found error correctly 2017-01-20 11:27:01 +01:00