Commit Graph

3932 Commits

Author SHA1 Message Date
beorn7
69eddc9e84 storage: Correctly increase prometheus_local_storage_open_head_chunks 2017-05-08 18:20:23 +02:00
Frederic Branczyk
0c96c4b157
notifier: expose metric for number of discovered alertmanagers 2017-05-08 10:37:19 +02:00
Fabian Reinartz
aaaec6431e Merge pull request #2642 from bakins/kubernetes-namespaces
Allow limiting Kubernetes service discover to certain namespaces
2017-05-04 07:36:21 +02:00
Tom Wilkie
2195bb66f7 Ensure ewma int64s are always aligned. (#2675) 2017-05-03 14:32:50 -05:00
Tom Wilkie
4d9b917d11 Instrument Prometheus with OpenTracing (#2554)
* Use request.Context() instead of a global map of contexts.

* Add some basic opentracing instrumentation on the query path.

* Remove tracehandler endpoint.
2017-05-02 18:49:29 -05:00
Stephan Erb
0b9fca983b Fix reload of ZooKeeper service discovery config (#2669)
Rational:

* When the config is reloaded and the provider context is canceled, we need to
  exit the current ZK `TargetProvider.Run` method as a new provider will be
  instantiated.
* In case `Stop` is called on the `ZookeeperTreeCache`, the update/events
  channel may not be closed as it is shared by multiple caches and would
  thus be double closed.
* Stopping all `zookeeperTreeCacheNode`s on teardown ensures all associated
  watcher go-routines will be closed eagerly rather than implicityly on
  connection close events.
2017-05-02 18:21:37 -05:00
Fabian Reinartz
86426c0566 Merge pull request #2672 from svend/kubernetes-pods-port-comment
Document what ports are scraped by default in k8s example
2017-05-02 11:12:13 +02:00
Svend Sorensen
94a3e863e4 Document what ports are scraped by default in k8s example
The Kubernetes pod SD creates a target for each declared port, as documented:

https://prometheus.io/docs/operating/configuration/#pod

> The pod role discovers all pods and exposes their containers as targets. For
> each declared port of a container, a single target is generated. If a
> container has no specified ports, a port-free target per container is created
> for manually adding a port via relabeling.

This results in the default port being the declared port, or no port if none are
declared.
2017-05-01 15:58:48 -07:00
Conor Broderick
314b81062d Updated vendoring for log level reporting issue (#2660) 2017-04-27 14:25:13 +01:00
Brian Akins
27d66628a1 Allow limiting Kubernetes service discover to certain namespaces
Allow namespace discovery to be more easily extended in the future by using a struct rather than just a list.

Rename fields for kubernetes namespace discovery
2017-04-27 07:41:36 -04:00
Julius Volz
fe11c5933a Fix mutation of active alert elements by notifier (#2656)
This caused the external label application in the notifier to bleed back
into the rule manager's active alerting elements.
2017-04-26 10:29:42 -05:00
Fabian Reinartz
5248118b10 Merge pull request #2654 from dsymonds/master
Add maintainers' GitHub usernames to MAINTAINERS.md.
2017-04-25 08:43:36 +02:00
David Symonds
8bb07490a2 Add maintainers' GitHub usernames to MAINTAINERS.md.
CONTRIBUTING.md instructs people to loop them in using that mechanism,
but nothing lists the right username.
2017-04-25 16:32:23 +10:00
Fabian Reinartz
60d9138b6b Merge pull request #2653 from dsymonds/master
Preserve Alertmanager URLs as *url.URL.
2017-04-25 08:27:31 +02:00
David Symonds
04ad889751 Preserve Alertmanager URLs as *url.URL.
Render a nicer link in the web UI.
2017-04-25 16:17:46 +10:00
Conor Broderick
9eb1a5d6bf Handle invalid query in graph UI (#2652) 2017-04-24 10:50:57 +01:00
Brian Brazil
8b8ba26129 Merge pull request #2644 from prometheus/release-1.6
Merge 1.6.1 release from 1.6 branch
2017-04-19 15:22:24 +01:00
Brian Brazil
8097a3c523 Cut v1.6.1 (#2640) 2017-04-19 14:23:56 +01:00
beorn7
e499ef8cac Merge bug fixes from branch 'release-1.6' 2017-04-18 18:06:01 +02:00
Björn Rabenstein
872ed88166 Merge pull request #2638 from prometheus/beorn7/storage
storage: Don't panic if storage has no FPs even after initial wait
2017-04-18 17:02:07 +02:00
beorn7
1dd737d7c3 storage: Don't panic if storage has no FPs even after initial wait 2017-04-18 15:59:12 +02:00
Matt Layher
1faf33acac Add promlint check for histogram/summary reserved names (#2626) 2017-04-15 22:38:01 +01:00
Tobias Schmidt
09a977a782 Create sha256 checksums file during release 2017-04-15 12:26:51 -03:00
Tobias Schmidt
619cc0e0ff Merge pull request #2625 from mdlayher/promlint-cleanup
Simplify promlint problems gathering, use protobuf accessors
2017-04-14 22:47:30 +02:00
Matt Layher
cc4198f421
Simplify promlint problems gathering, use protobuf accessors 2017-04-14 16:40:40 -04:00
Matt Layher
34a4813464 Initial promlint counter _total suffix check (#2624) 2017-04-14 22:09:54 +02:00
Matt Layher
254cb1ec29 Use untyped metrics for some promlint tests (#2623) 2017-04-14 19:38:57 +01:00
Björn Rabenstein
67d511784d Merge pull request #2619 from prometheus/release-1.6
Cut v1.6.0
2017-04-14 20:12:22 +02:00
beorn7
10f6453829 Cut v1.6.0 2017-04-14 19:53:58 +02:00
Jack Neely
896f951e68 Force buckets in a histogram to be monotonic for quantile estimation (#2610)
* Force buckets in a histogram to be monotonic for quantile estimation

The assumption that bucket counts increase monotonically with increasing
upperBound may be violated during:

  * Recording rule evaluation of histogram_quantile, especially when rate()
     has been applied to the underlying bucket timeseries.
  * Evaluation of histogram_quantile computed over federated bucket
     timeseries, especially when rate() has been applied

This is because scraped data is not made available to RR evalution or
federation atomically, so some buckets are computed with data from the N
most recent scrapes, but the other buckets are missing the most recent
observations.

Monotonicity is usually guaranteed because if a bucket with upper bound
u1 has count c1, then any bucket with a higher upper bound u > u1 must
have counted all c1 observations and perhaps more, so that c  >= c1.

Randomly interspersed partial sampling breaks that guarantee, and rate()
exacerbates it. Specifically, suppose bucket le=1000 has a count of 10 from
4 samples but the bucket with le=2000 has a count of 7, from 3 samples. The
monotonicity is broken. It is exacerbated by rate() because under normal
operation, cumulative counting of buckets will cause the bucket counts to
diverge such that small differences from missing samples are not a problem.
rate() removes this divergence.)

bucketQuantile depends on that monotonicity to do a binary search for the
bucket with the qth percentile count, so breaking the monotonicity
guarantee causes bucketQuantile() to return undefined (nonsense) results.

As a somewhat hacky solution until the Prometheus project is ready to
accept the changes required to make scrapes atomic, we calculate the
"envelope" of the histogram buckets, essentially removing any decreases
in the count between successive buckets.

* Fix up comment docs for ensureMonotonic

* ensureMonotonic: Use switch statement

Use switch statement rather than if/else for better readability.
Process the most frequent cases first.
2017-04-14 16:21:49 +02:00
Matt Layher
283756c503 Initial commit of 'promtool check-metrics', promlint package (#2605) 2017-04-13 23:53:41 +02:00
Conor Broderick
ee62807b62 Added min/max to graph to accomodate for constant time series (#2612)
Added min/max to graph to accommodate constant time series
2017-04-12 14:25:25 +01:00
Björn Rabenstein
1fb2190eeb Merge pull request #2607 from prometheus/beorn7/storage
Vendoring update prior to 1.6 release
2017-04-11 14:31:58 +02:00
beorn7
c53f256a09 storage: Fix use of counter (Set -> Add) 2017-04-11 12:58:24 +02:00
beorn7
1ae50b1d1b vendoring: Update client_golang/prometheus
This is mostly required to enable summaries without quantiles
2017-04-11 12:58:24 +02:00
beorn7
92d4cf7663 vendoring: Remove unused packages 2017-04-11 12:58:24 +02:00
Brian Brazil
0e0fc5a7f4 Correct example name to adapter. (#2590) 2017-04-10 17:24:53 +01:00
Björn Rabenstein
acd72ae1a7 Merge pull request #2591 from prometheus/beorn7/storage
storage: Several optimizations of checkpointing
2017-04-07 20:02:14 +02:00
Goutham Veeramachaneni
cffb1acf7f Test Longer Tests in Travis (#2570)
* Test Longer Tests in Travis

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

* Make test Target Run All Tests

* Add test-short to run short tests

test is running all the tests now as we are running make tests in
CircleCI and I think the base image is shared across Prometheus Org.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

* Remove Empty Line

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-04-07 13:46:06 +02:00
beorn7
f20b84e816 flags: Improve doc strings for checkpoint flags 2017-04-07 13:10:12 +02:00
beorn7
f338d791d2 storage: Several optimizations of checkpointing
- checkpointSeriesMapAndHeads accepts a context now to allow
  cancelling.

- If a shutdown is initiated, cancel the ongoing checkpoint. (We will
  create a final checkpoint anyway.)

- Always wait for at least as long as the last checkpoint took before
  starting the next checkpoint (to cap the time spending checkpointing
  at 50%).

- If an error has occurred during checkpointing, don't bother to sync
  the write.

- Make sure the temporary checkpoint file is deleted, even if an error
  has occurred.

- Clean up the checkpoint loop a bit. (The concurrent Timer.Reset(0)
  call might have cause a race.)
2017-04-07 13:10:12 +02:00
Björn Rabenstein
934d86b936 Merge pull request #2593 from prometheus/beorn7/storage2
storage: Recover from corrupted indices for archived series
2017-04-07 12:55:35 +02:00
Goutham Veeramachaneni
0f48d07f95 Fix Map Race by Moving Locking closer to the Write (#2476) 2017-04-07 08:55:01 +02:00
Julius Volz
182d7de9cd Merge pull request #2597 from richardkiene/CMON-53
Add triton zone brand metadata
2017-04-07 01:02:02 +02:00
Björn Rabenstein
38bcba11fe Merge pull request #2594 from prometheus/beorn7/storage3
storage: Guard against a corner case of data corruption
2017-04-07 00:52:28 +02:00
Björn Rabenstein
f0076aca01 Merge pull request #2595 from prometheus/beorn7/storage4
storage: Guard against appending to evicted chunk
2017-04-07 00:51:53 +02:00
Tom Wilkie
e5d7bbfc3c Remote writes: retry on recoverable errors. (#2552)
* Remote writes: retry on recoverable errors.

* Add comments

* Review feedback

* Comments

* Review feedback

* Final spelling misteak (I hope).  Plus, record failed samples correctly.
2017-04-07 00:15:41 +02:00
Richard Kiene
ec692f6161 Add triton zone brand metadata 2017-04-06 21:35:42 +00:00
beorn7
7199a9d9d4 storage: Guard against appending to evicted chunk
Fixes #2480. For certain definition of "fixes".

This is something that should never happen. Sadly, it does happen,
albeit extremely rarely. This could be some weird cornercase we
haven't covered yet. Or it happens as a consequesnce of data
corruption or a crash recovery gone bad.

This is not a "real" fix as we don't know the root cause of the
incident reported in #2480. However, this makes sure the server does
not crash, but deals gracefully with the problem: The series in
question is quarantined, which even makes it available for forensics.
2017-04-06 20:02:52 +02:00
beorn7
3d12906286 storage: Guard against a corner case of data corruption
Fixes #2475.
2017-04-06 19:50:32 +02:00