Commit Graph

266 Commits

Author SHA1 Message Date
Brian Brazil
0509b0f2db Expand alert templates at eval time.
Fixes #1678 #1677
2016-07-12 17:13:55 +01:00
beorn7
064b57858e Consistently use the Seconds() method for conversion of durations
This also fixes one remaining case of recording integral numbers
of seconds only for a metric, i.e. this will probably fix #1796.
2016-07-07 15:24:35 +02:00
Fabian Reinartz
f7ed2ff706 Merge pull request #1644 from prometheus/beorn7/logging
Add missing logging of out-of-order samples
2016-05-20 05:52:00 -07:00
beorn7
b95c096a45 Fix style issues in rules/... 2016-05-19 16:59:53 +02:00
beorn7
45e5775f9b Add missing logging of out-of-order samples
So far, out-of-order samples during rule evaluation were not logged,
and neither scrape health samples. The latter are unlikely to cause
any errors. That's why I'm logging them always now. (It's alway highly
irregular should it happen.) For rules, I have used the same plumbing
as for samples, just with a different wording in the message to mark
them as a result of rule evaluation.
2016-05-19 16:22:53 +02:00
beorn7
4b574e8a61 Switch chunk encoding to type 2 where it was hardcoded type 1 before
The chunk encoding was hardcoded there because it mostly doesn't
matter what encoding is chosen in that test. Since type 1 is
battle-hardened enough, I'm switching to type 2 here so that we can
catch unexpected problems as a byproduct. My expectation is that the
chunk encoding doesn't matter anyway, as said, but then "unexpected
problems" contains the word "unexpected".
2016-03-20 23:32:20 +01:00
Fabian Reinartz
d89c254849 Make copying alerting state safer.
This considers static labels in the equality of alerts to
avoid falsely copying state from a different alert definition with
the same name across reloads.

To be safe, it also copies the state map rather than just its pointer
so that remaining collisions disappear after one evaluation interval.
2016-03-02 12:21:54 +01:00
Fabian Reinartz
bfa8aaa017 Rename notification to notifier 2016-03-01 12:39:08 +01:00
beorn7
663a1550d0 Fix the instrumentation fixes 2016-02-17 15:50:55 +01:00
Tobias Schmidt
f1f8317fa5 Fix detection of flapping alerts
Alerts in the resolve retention period must be transitioned to the
active state again when their condition is met.
2016-02-04 23:55:12 -05:00
Björn Rabenstein
9ea3897ea7 Merge pull request #1354 from prometheus/beorn7/storage
Rework the way to communicate backpressure (AKA suspended ingestion)
2016-02-01 15:10:13 +01:00
beorn7
ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7
a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Fabian Reinartz
a6935024e1 Remove old WITH clause in alert printing 2016-01-26 15:45:27 +01:00
Fabian Reinartz
b0adfea8d5 Fix swapped constants, improve instrumentation 2016-01-21 12:15:29 +01:00
Fabian Reinartz
a8c38c3ac5 Don't log rule evaluation failure on shutdown 2016-01-18 17:34:25 +01:00
Fabian Reinartz
6eee86dce8 Terminate rule groups during initial sleep
When an evaluation group runs initially, it waits a deterministic
amount of time. During that time it also has to accept
a termination singnal so shutdown doesn't hang during the first
evaluation iteration after a configuration reload.

Fixes #1307
2016-01-12 10:54:09 +01:00
Fabian Reinartz
26eb3ac2f8 Don't skip recording rule errors 2016-01-12 10:26:06 +01:00
Fabian Reinartz
37d80c4b25 Fix premature rule evaluation
This commit prevents rule evaluation from starting until after
the storage is ready.
2016-01-08 17:51:22 +01:00
Fabian Reinartz
0cf3c6a9ef Add comments, rename a method 2015-12-23 12:29:28 +01:00
Fabian Reinartz
bf6abac8f4 Send resolved notifications 2015-12-17 15:42:26 +01:00
Fabian Reinartz
f69e668fc4 Improve rules/ instrumentation
This commit adds a counter for the total number of rule evaluations
and standardizes the units to seconds.
2015-12-17 15:42:26 +01:00
Fabian Reinartz
62075aa037 Reduce noisy no-alertmanager warning 2015-12-17 15:42:26 +01:00
Fabian Reinartz
52e5224f5a Refactor rules/ package 2015-12-17 15:42:25 +01:00
Fabian Reinartz
e4fabe135a Set StartsAt to time of first firing state 2015-12-17 11:36:58 +01:00
Fabian Reinartz
7c90db22ed Use annotation based alerts in rules/
This commit breaks the previously used alert format.
2015-12-14 10:16:07 +01:00
Fabian Reinartz
e114ce0ff7 Refactor notification handler 2015-12-11 15:17:32 +01:00
Fabian Reinartz
e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Fabian Reinartz
171f50706a Fix unkeyed field errors. 2015-09-18 17:00:08 +02:00
Brian Brazil
4d196fea6b Merge pull request #1032 from prometheus/scalar-metric
rules: Allow for setting labels on LHS on scalars
2015-08-26 16:56:16 +01:00
Brian Brazil
3bcdb2bbba rules: Allow for setting labels on LHS on scalars 2015-08-26 16:54:28 +01:00
Julius Volz
995d3b831d Fix most golint warnings.
This is with `golint -min_confidence=0.5`.

I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
2015-08-26 12:44:46 +02:00
Fabian Reinartz
d6b8da8d43 Switch promql types to common/model 2015-08-25 13:49:14 +02:00
Brian Brazil
fdf0d0642e Cast value to float, as that's what the console templates expect. 2015-08-24 16:59:08 +01:00
Fabian Reinartz
438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz
306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Brian Brazil
e6a67476c2 rules: Allow recorded rules expressions to be scalars.
This is useful if you want to build up a constant metric,
such as a set of alert thresholds that vary by label value.
2015-08-19 21:09:00 +01:00
Fabian Reinartz
7a67472fc1 Resolve relative paths on configuration loading
This moves the concern of resolving the files relative to the config
file into the configuration loading itself.
It also fixes #921 which did not load the cert and token files relatively.
2015-08-05 18:08:04 +02:00
Fabian Reinartz
feb8a03503 rules: load rule files relative to a base dir 2015-07-03 15:10:37 +02:00
Julius Volz
fcff35b43e Consolidate external reachability flags into one.
Besides fixing https://github.com/prometheus/prometheus/issues/805 by
making the entire externally reachable server URL configurable, this
adds tests for the "globalURL" template function and makes it easier to
test other such functions in the future.

This breaks the `web.Hostname` flag (and introduces `web.external-url`).
This flag is likely only used by few users, so I hope that's
justifiable.

Fixes https://github.com/prometheus/prometheus/issues/805
2015-07-03 13:39:10 +02:00
Fabian Reinartz
f06cf664e1 rules: cleanup alerting test 2015-06-30 18:22:24 +02:00
Fabian Reinartz
9bd4f6d017 rules: preserve alert state across reloads. 2015-06-30 11:32:07 +02:00
Fabian Reinartz
4625485b84 rules: move rules*.go contents to manager*.go 2015-06-30 11:32:07 +02:00
Fabian Reinartz
749ae450c5 promql: add runbook to alert statement.
This commit adds the RUNBOOK keyword to alert statements. The field
is optional and expected to be a link.
2015-06-25 13:00:52 +02:00
Julius Volz
d868264bb8 Improve UI of /alerts page.
Changes to the UI:
- "Active Since" timestamps are now human-readable.
- Alerting rules are now pretty-printed better.
- Labels are no longer just strings, but alert bubbles (like we do on
  the status page for base labels).
- Alert states and target health states are now capitalized in the
  presentation layer rather than at the source.
2015-06-23 18:48:45 +02:00
Fabian Reinartz
fe301d7946 promql: remove global flags 2015-06-15 19:01:06 +02:00
Fabian Reinartz
5e13880201 General cleanup of rules. 2015-06-06 21:40:52 +02:00
Fabian Reinartz
75c920c95e Remove DotGraph method from Rule interface 2015-06-06 21:35:59 +02:00
Fabian Reinartz
83d07516e8 Remove EvalRaw methods from Rule interface 2015-06-06 21:34:09 +02:00
Fabian Reinartz
280d11dca8 main: exit on invalid rule files on startup. 2015-06-02 18:44:41 +02:00