Commit Graph

211 Commits

Author SHA1 Message Date
Julius Volz
3d47f94149 Drop metric names after transformations.
After many transformations, it doesn't make sense to keep the metric
names, since the result of the transformation is no longer that metric.
This drops the metric name after such transformations and makes the web
UI deal well with missing metric names.

This depends on the current branch on the following things:

- prometheus/client_golang needs to be at
  e237cf15c6
  in branch "julius/int-fingerprints" (to be merged with new storage)

- prometheus/promdash needs to be at
  dd7691c9c2

Change-Id: Ib3c8cad8d647d9854e8c653c424b8c235ccc231d
2014-11-25 17:13:04 +01:00
Bjoern Rabenstein
14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein
006b5517e2 Simplify makefiles.
This removes the dependancy on C leveldb and snappy.
It also takes care of fewer dependencies as they would
anyway not work on any non-Debian, non-Brew system.

Change-Id: Ia70dce1ba8a816a003587927e0b3a3f8ad2fd28c
2014-11-25 17:10:39 +01:00
Julius Volz
c3fcea45e3 Support finer time resolutions than 1 second.
Change-Id: I4c5f1d6d2361e841999b23283d1961b1bd0c2859
2014-11-25 17:09:04 +01:00
Brian Brazil
f114bbd4e7 Make query_range more robust.
Gracefully handle decimal values, by truncating them.

Limit amount of steps, to avoid accidentally pulling too much data.
This limit returns up to ~500kB per timeseries, and allows
for 60s granularity for a week and 1h granularity for a year.

Change-Id: Ie549fc24deb2eecbc6c5d1b6088a548a6b02e849
2014-11-25 17:09:04 +01:00
Brian Brazil
75e37db55b Don't alert() when a query is aborted,
such as when you change the range.

Change-Id: I574504f97446ac5f3dda737fe054ae83f17dbbc2
2014-11-25 17:09:04 +01:00
Brian Brazil
fd34e4061d Add back consoles link.
Goes in index.html in consoles or else user data, if present.

Change-Id: I5303d30aa24ca0c20d2e0f49121e04a260b9c4f4
2014-11-25 17:08:26 +01:00
Andres Suarez
e389e63684 Focus expression after selection from dropdown
Change-Id: Id7f67e558e3611ab4c7188cc428c342d8d3e67db
2014-11-25 17:08:26 +01:00
Andres Suarez
86a447fc0e Allow selecting metric from Insert Metric
Change-Id: I99e0539cab2749a8aeabc0a13015889ff45834f7
2014-11-25 17:08:26 +01:00
Bjoern Rabenstein
71206dbc06 More code cleanups.
Add license text everywhere.
And others....

Change-Id: I11ccde267a2ef7eb366c4788ba7aeae14ba7545c
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein
f5f9f3514a Major code cleanup.
- Make it go-vet and golint clean.
- Add comments, TODOs, etc.

Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2
2014-11-25 17:02:53 +01:00
Julius Volz
e7ed39c9a6 Initial experimental snapshot of next-gen storage.
Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d
2014-11-25 17:02:00 +01:00
Brian Brazil
4a2b96f848 Remove backoff on scrape failure.
Having metrics with variable timestamps inconsistently
spaced when things fail will make it harder to write correct rules.

Update status page, requires some refactoring to insert a function.

Change-Id: Ie1c586cca53b8f3b318af8c21c418873063738a8
2014-11-25 17:02:00 +01:00
Brian Brazil
f525ca5d9e Let consoles get graph links from experssions.
Rename ConsoleLinkFromExpression, as we now have consoles.

Change-Id: I7ed2c9c83863adb390b51121dd9736845f7bcdfc
2014-11-25 17:01:59 +01:00
Brian Brazil
eba205fcac Expose path used to get to console to console.
Change-Id: I72386a2d4e53863da302ecc5c7e44d6c310197e0
2014-11-25 17:01:59 +01:00
Brian Brazil
eb5d928da7 Fix console handler.
This was accidnetally broken in 2128d9d811.

Change-Id: I50ea1fdb8ae4d28ae4555410bee97e5037692aa5
2014-11-25 17:01:59 +01:00
Julius Volz
21cafe6cd7 Only evict memory series after they are on disk.
This fixes the problem where samples become temporarily unavailable for
queries while they are being flushed to disk. Although the entire
flushing code could use some major refactoring, I'm explicitly trying to
do the minimal change to fix the problem since there's a whole new
storage implementation in the pipeline.

Change-Id: I0f5393a30b88654c73567456aeaea62f8b3756d9
2014-11-25 17:01:59 +01:00
Bjoern Rabenstein
8956faeccb Migrate to new client_golang.
This change will only be submitted when the new client_golang has been
moved to the new version.

Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581
2014-11-25 17:01:59 +01:00
Brian Brazil
e27447da5c Remove the broken "User Dashboard" link.
Due to the lack of a </a>, this makes the entire header render badly.
Accordingly it's safe to assume noone is using it, so remove it.
With the new console template support, we'll need to something a bit
more nuanced later.

Change-Id: I3424bed6aea18cbd4c63ad48f98808098dadc3ad
2014-11-25 17:01:59 +01:00
Brian Brazil
960ede66dc Use html/template for console templates and add template libary support.
Add a function to bypass the new auto-escaping.
Add a function to workaround go's templates only allowing passing in one argument.

Change-Id: Id7aa3f95e7c227692dc22108388b1d9b1e2eec99
2014-11-25 17:01:59 +01:00
Brian Brazil
0f5874ff97 Make Prometheus in header link to status page.
This is consistent with alertmanager, and more intiutive for users.

The graphs page just has graphs, so remove mention of consoles.

Change-Id: I87780a4ade33697a6095423e1a7de47d341d2838
2014-11-25 17:01:59 +01:00
Brian Brazil
1828b1f55c Only log every query when debugging.
Change-Id: I4f988d81cda6f6deb0ed7f497de4aa75409b158f
2014-11-25 17:01:59 +01:00
Brian Brazil
e041c0cd46 Add console and alert templates with access to all data.
Move rulemanager to it's own package to break cicrular dependency.
Make NewTestTieredStorage available to tests, remove duplication.

Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03
2014-05-30 16:24:56 +01:00
Julius Volz
01f652cb4c Separate storage implementation from interfaces.
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:

rule checker -> query layer -> tiered storage layer -> leveldb

This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:

- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
                         and LevelDB storage.

I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.

The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.

The rule_checker is now a static binary :)

Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
2014-04-16 13:30:19 +02:00
Matt T. Proud
2064f32662 Clean up quitting behavior and add quit trigger.
The closing of Prometheus now using a sync.Once wrapper to prevent
any accidental multiple invocations of it, which could trigger
corruption or a race condition.  The shutdown process is made more
verbose through logging.

A not-enabled by default web handler has been provided to trigger a
remote shutdown if requested for debugging purposes.

Change-Id: If4fee75196bbff1fb1e4a4ef7e1cfa53fef88f2e
2014-04-15 21:40:04 +02:00
Julius Volz
cc04238a85 Switch to new "__name__" metric name label.
This also fixes the compaction test, which before worked only because
the input sample sorting was accidentally equal to the resulting on-disk
sample sorting.

Change-Id: I2a21c4b46ba562424b27058fc02eba84fa6a6006
2014-03-14 16:52:37 +01:00
Julius Volz
740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Conor Hennessy
eba01d1119 Remove usage of gorest.
Due to on going issues, we've decided to remove gorest. It started with gorest
not being thread-safe (it does introspection to create a new handler which is
an easy process to mess up with multiple threads of execution):
    https://code.google.com/p/gorest/issues/detail?id=15
While the issue has been marked fixed, it looks like the patch has introduced
more problems than the original issue and simply doesn't work properly.
I'm not sure the behaviour was thought through properly. If a new instance is
needed every request then a handler-factory is needed or the library needs to
set expectations about how the new objects should interact with their
constructor state.
While it was tempting to try out another routing library, I think for now
it's better to use dumb vanilla Go routing. At least until we decide which
URL format we intend to standardize on.

Change-Id: Ica3da135d05f8ab8fc206f51eeca4f684f8efa0e
2013-10-23 14:19:14 +02:00
Julius Volz
a50ee8df30 Always set CORS headers at beginning of API handler.
Change-Id: Icde9a74260c4bb919f09c3e10c6dd5f372ccdaec
2013-10-16 15:59:47 +02:00
Matt T. Proud
4a87c002e8 Update low-level i'faces to reflect wireformats.
This commit fixes a critique of the old storage API design, whereby
the input parameters were always as raw bytes and never Protocol
Buffer messages that encapsulated the data, meaning every place a
read or mutation was conducted needed to manually perform said
translations on its own.  This is taxing.

Change-Id: I4786938d0d207cefb7782bd2bd96a517eead186f
2013-09-04 17:13:58 +02:00
Julius Volz
788587426b Make scrape timeouts configurable per job.
Change-Id: I77a7514ad9e7969771f873d63d6353ec50082a62
2013-08-19 12:21:47 +02:00
Matt T. Proud
972e856d9b Kill the curation state channel.
The use of the channels for curation state were always unidiomatic.

Change-Id: I1cb1d7175ebfb4faf28dff84201066278d6a0d92
2013-08-13 17:20:22 +02:00
Julius Volz
0003027dce Add needed trailing spaces in logs. 2013-08-12 18:22:48 +02:00
Julius Volz
aa5d251f8d Use github.com/golang/glog for all logging. 2013-08-12 17:54:36 +02:00
Julius Volz
ecf0ee8f39 Transfer alerting rule and Prometheus URL to alertmanager. 2013-08-09 18:32:13 +02:00
Matt T. Proud
07ac921aec Code Review: First pass. 2013-08-05 17:31:49 +02:00
Matt T. Proud
d8792cfd86 Extract HighWatermarking.
Clean up the rest.
2013-08-05 11:03:03 +02:00
Julius Volz
fcf784c13c Fix query error notification in tabular view.
Instead of "Unsupported value type" when type="error", the delivered
error message should be shown.
2013-08-02 09:04:13 +02:00
Julius Volz
35ee2cd3cb Add alertmanager notification support to Prometheus.
Alert definitions now also have mandatory SUMMARY and DESCRIPTION fields
that get sent along a firing alert to the alert manager.
2013-07-30 17:23:41 +02:00
Julius Volz
4e941255d8 Add caching to static assets when served from blob handler. 2013-07-24 18:52:57 +02:00
Julius Volz
1b9cbaf842 Bootstrappify remaining status pages. 2013-07-24 16:09:34 +02:00
Julius Volz
481ee4096b Add no-op silencing links. 2013-07-24 15:09:42 +02:00
Julius Volz
d9f403ab7d Prettify/Bootstrapify alert tables. 2013-07-24 15:03:13 +02:00
Julius Volz
f665534b61 Make quote and semicolon usage consistent in graph.js 2013-07-24 12:29:03 +02:00
Julius Volz
c91c100102 Fix graph resize bug when no graph exists. 2013-07-24 12:29:03 +02:00
Julius Volz
9f07f8677a Generate tabular console view from JSON data. 2013-07-24 12:28:59 +02:00
Sabra Melamed
22ab2366c1 Replacing interface components with Bootstrap.
This commit includes Bootstrap 2.3.2 and swaps a multitude of graph,
status, and other components to Bootstrap-based widgets.
2013-07-23 20:58:55 +02:00
Matt T. Proud
f7704af4f8 Code Review: Formatting comments. 2013-07-15 15:12:01 +02:00
Matt T. Proud
06b4a40661 Represent targets in a tabular interface.
This commit represents a target group's endpoints in a tabular fashion for better differentiation
of their state in a concise manner.
2013-07-15 15:12:01 +02:00
Julius Volz
f42adc1cc0 Display Y-axis outside of graph. 2013-07-01 14:47:43 +02:00
Julius Volz
1aa8f071b9 Add content compression support to API HTTP responses. 2013-06-28 16:56:44 +02:00
Matt T. Proud
30b1cf80b5 WIP - Snapshot of Moving to Client Model. 2013-06-25 15:52:42 +02:00
Julius Volz
0226d1ac7a Implement alerts dashboard and expression console links. 2013-06-13 22:35:40 +02:00
Johannes 'fish' Ziemke
005d65868a Merge pull request #294 from prometheus/remove-gvm
Remove gvm
2013-06-13 05:41:29 -07:00
Johannes 'fish' Ziemke
56249320e3 Remove gvm on travis. 2013-06-13 14:36:00 +02:00
Julius Volz
1fe3d3b06b Remove obsolete argument from target handling code. 2013-06-11 17:54:58 +02:00
Julius Volz
ba29d07901 Show loaded rules in Status dashboard. 2013-06-11 11:39:31 +02:00
Matt T. Proud
a73f061d3c Persist solely Protocol Buffers.
An design question was open for me in the beginning was whether to
serialize other types to disk, but Protocol Buffers quickly won out,
which allows us to drop support for other types.  This is a good
start to cleaning up a lot of cruft in the storage stack and
can let us eventually decouple the various moving parts into
separate subsystems for easier reasoning.

This commit is not strictly required, but it is a start to making
the rest a lot more enjoyable to interact with.
2013-06-08 11:02:35 +02:00
Bernerd Schaefer
f7a2436665 Include link to user dashboard when provided 2013-06-07 11:17:17 +02:00
Bernerd Schaefer
1d794896ac Support user-provided static asset directory
[fix #159]
2013-06-07 10:25:12 +02:00
Julius Volz
51689d965d Add debug timers to instant and range queries.
This adds timers around several query-relevant code blocks. For now, the
query timer stats are only logged for queries initiated through the UI.
In other cases (rule evaluations), the stats are simply thrown away.

My hope is that this helps us understand where queries spend time,
especially in cases where they sometimes hang for unusual amounts of
time.
2013-06-05 18:32:54 +02:00
Matt T. Proud
0d2d6e9a27 Include uptime in the status console.
In order to help corroborate whether a Prometheus instance has
flapped until meta-monitoring is in-place, we ought to provide the
instance's start time in the console to aid in diagnostics.
2013-05-24 10:44:34 +02:00
Julius Volz
8586c7520c Support negative graph values.
Currently graph Y-Axes were hardcoded to start at 0. Choose the Y-scale
automatically based on the graph data instead.
2013-05-21 16:54:33 +02:00
Julius Volz
081191afb8 Remember and display last scrape errors in web UI. 2013-05-21 15:31:27 +02:00
Matt T. Proud
1a95406b81 Include forgotten databases.html. 2013-05-14 14:50:54 +02:00
Matt T. Proud
b224251981 Simplify compaction and expose database sizes.
This commit simplifies the way that compactions across a database's
keyspace occur due to reading the LevelDB internals. Secondarily it
introduces the database size estimation mechanisms.

Include database health and help interfaces.

Add database statistics; remove status goroutines.

This commit kills the use of Go routines to expose status throughout
the web components of Prometheus. It also dumps raw LevelDB status
on a separate /databases endpoint.
2013-05-14 12:29:53 +02:00
Bernerd Schaefer
cdde766f39 Embed mutex on web status handler 2013-05-07 18:15:17 +02:00
Bernerd Schaefer
7740167654 Add comments about potential race conditions 2013-05-07 18:15:17 +02:00
Bernerd Schaefer
9183302b1f Web handler returns 404 for favicon requests 2013-05-07 18:15:17 +02:00
Matt Proud
7f0d816574 Schedule the background compactors to run.
This commit introduces three background compactors, which compact
sparse samples together.

1. Older than five minutes is grouped together into chunks of 50 every 30
   minutes.

2. Older than 60 minutes is grouped together into chunks of 250 every 50
   minutes.

3. Older than one day is grouped together into chunks of 5000 every 70
   minutes.
2013-05-07 17:14:04 +02:00
Julius Volz
af7920126c Fix build errors and add default build step to "make". 2013-05-07 15:54:41 +02:00
Julius Volz
56324d8ce2 Make AST query storage non-global. 2013-05-07 13:15:10 +02:00
Matt T. Proud
3b9b1c6ab4 Define dependencies for web. stack concretely.
This commit destroys the use of AppState, which makes passing
concrete state along to various serving components onerous.
2013-05-06 11:13:12 +02:00
juliusv
cfc3b1053d Merge pull request #212 from prometheus/ui/smaller-navigation-links
Restyle navigation a bit, align content elements with it
2013-05-03 07:00:09 -07:00
juliusv
2935476818 Merge pull request #211 from prometheus/feature/reorder-hud-elements
Move build info to the top of the status HUD.
2013-05-03 06:56:45 -07:00
Julius Volz
04e661c28f Move build info to the top of the status HUD. 2013-05-03 15:54:32 +02:00
Julius Volz
f3cf8eae7e Restyle navigation a bit, align content elements with it. 2013-05-03 15:49:08 +02:00
Johannes 'fish' Ziemke
c5e507cd9c Never submit empty queries. 2013-05-02 16:55:47 +02:00
juliusv
2b9ba56d61 Merge pull request #208 from prometheus/feature/toggle-console
Add the console to the main/graph ui.
2013-05-02 04:05:53 -07:00
Johannes 'fish' Ziemke
ba289ef7cd Add the console to the main/graph ui. 2013-05-02 12:19:34 +02:00
Julius Volz
9cea5d9df8 Convert the Prometheus configuration to protocol buffers. 2013-04-30 22:26:00 +02:00
Matt T. Proud
3362bf36e2 Include curator status in web heads-up-display. 2013-04-29 12:40:33 +02:00
Matt T. Proud
a48ab34dd0 Refresh Prometheus client API usage.
The client API has been updated per https://github.com/prometheus/client_golang/pull/9.
2013-04-28 19:40:30 +02:00
juliusv
169a7dc26c Merge pull request #189 from prometheus/feature/build-info-and-startup-friendliness
Build info and startup friendliness
2013-04-26 05:45:34 -07:00
Bernerd Schaefer
19fc094362 Merge pull request #191 from prometheus/update-gitignore-files
Ignore web/static/generated and build/root/share
2013-04-25 03:59:04 -07:00
Bernerd Schaefer
169ed9d297 Ignore web/static/generated and build/root/share 2013-04-25 12:33:27 +02:00
Matt T. Proud
961ff26874 Fix positional flags for `cp` on Darwin.
Unfortunately ``cp`` on Darwin regards some flags as positional and
requires them to be in a specific place.  The new Protocol Buffer
descriptor bundling fails on Mac OS.
2013-04-25 12:16:51 +02:00
Bernerd Schaefer
45243ac2da Print flags on status page. 2013-04-25 12:12:05 +02:00
Bernerd Schaefer
862054e88b web.StartServing prints listening address 2013-04-25 11:59:39 +02:00
Bernerd Schaefer
a2a4f94aae StatusHandler renders build info 2013-04-25 11:57:08 +02:00
Johannes 'fish' Ziemke
1f96d4c822 Move protobuf descriptor and add content-type.
- move to static/generated
- set content-type based on extension '.description'
2013-04-24 18:51:07 +02:00
Matt T. Proud
9e02c2393a Include generated Protocol Buffer descriptor.
The Protocol Buffer compiler supports generating a machine-readable
descriptor file encoded as a provided Protocol Buffer message type,
which can be used to decode messages that have been encoded with it
after-the-fact.  The generated descriptor also bundles in dependent
message types.

We can use this to perform forensics on old Prometheus clients, if
necessary.
2013-04-24 16:59:40 +02:00
Matt T. Proud
e86f4d9dfd Convert time readers to represent time in UTC.
Go's time.Time represents time as UTC in its fundamental data type.
That said, when using ``time.Unix(...)``, it sets the zone for the
time representation to the local.  Unfortunately with diagnosis and
our tests, it is a PITA to jump between various zones, even though
the serialized version remains the same.

To keep things easy, all places where times are generated or read
are converted into UTC.  These conversions are cheap, for
``Time.In`` merely changes a pointer reference in the struct,
nothing more.  This enables me to diagnose test failures with fixture
data very easily.
2013-04-24 12:19:41 +02:00
Johannes 'fish' Ziemke
955708e8db Merge pull request #158 from prometheus/feature/auto-refresh
Add per graph auto-refresh option to web UI.
2013-04-22 05:10:17 -07:00
Julius Volz
a2623efcdf Register pprof /debug endpoints with custom HTTP mux. 2013-04-22 13:21:24 +02:00
Johannes 'fish' Ziemke
712bf5e2f9 Add per graph auto-refresh option to web UI.
This adds a drop-down menu to select/disable a auto-refresh interval.
2013-04-22 11:42:23 +02:00
Julius Volz
a0d311c9e6 Constantize job name label. 2013-04-15 11:47:54 +02:00
juliusv
f21b5ad12b Merge pull request #133 from bernerdschaefer/graph-display-tweaks
Graph display tweaks
2013-04-15 02:32:45 -07:00
Bernerd Schaefer
72bd585485 Revert style change to legend items 2013-04-15 10:04:09 +02:00
juliusv
f817106d6a Merge pull request #134 from prometheus/fix/set-job-label-from-targets-api
Set job label for targets registered through the API
2013-04-12 07:28:27 -07:00