Commit Graph

31 Commits

Author SHA1 Message Date
Julius Volz 94666e20b7 Minor test error reporting cleanup.
Change-Id: Ie11c16b4e60de7c179c6d2a86e063f4432e2000f
2014-02-03 12:27:01 +01:00
Julius Volz fd2158e746 Store copy of metric during fingerprint caching
Problem description:
====================
If a rule evaluation referencing a metric/timeseries M happens at a time
when M doesn't have a memory timeseries yet, looking up the fingerprint
for M (via TieredStorage.GetMetricForFingerprint()) will create a new
Metric object for M which gets both: a) attached to a new empty memory
timeseries (so we don't have to ask disk for the Metric's fingerprint
next time), and b) returned to the rule evaluation layer. However, the
rule evaluation layer replaces the name label (and possibly other
labels) of the metric with the name of the recorded rule.  Since both
the rule evaluator and the memory storage share a reference to the same
Metric object, the original memory timeseries will now also be
incorrectly renamed.

Fix:
====
Instead of storing a reference to a shared metric object, take a copy of
the object when creating an empty memory timeseries for caching
purposes.

Change-Id: I9f2172696c16c10b377e6708553a46ef29390f1e
2014-02-02 17:11:08 +01:00
Julius Volz 740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Matt T. Proud 30b1cf80b5 WIP - Snapshot of Moving to Client Model. 2013-06-25 15:52:42 +02:00
Julius Volz 51689d965d Add debug timers to instant and range queries.
This adds timers around several query-relevant code blocks. For now, the
query timer stats are only logged for queries initiated through the UI.
In other cases (rule evaluations), the stats are simply thrown away.

My hope is that this helps us understand where queries spend time,
especially in cases where they sometimes hang for unusual amounts of
time.
2013-06-05 18:32:54 +02:00
Julius Volz 5b105c77fc Repointerize fingerprints. 2013-05-21 14:28:14 +02:00
Julius Volz ce1ee444f1 Synchronous memory appends and more fine-grained storage locks.
This does two things:

1) Make TieredStorage.AppendSamples() write directly to memory instead of
   buffering to a channel first. This is needed in cases where a rule might
   immediately need the data generated by a previous rule.

2) Replace the single storage mutex by two new ones:
   - memoryMutex - needs to be locked at any time that two concurrent
                   goroutines could be accessing (via read or write) the
                   TieredStorage memoryArena.
   - memoryDeleteMutex - used to prevent any deletion of samples from
                         memoryArena as long as renderView is running and
                         assembling data from it.
   The LevelDB disk storage does not need to be protected by a mutex when
   rendering a view since renderView works off a LevelDB snapshot.

The rationale against adding memoryMutex directly to the memory storage: taking
a mutex does come with a small inherent time cost, and taking it is only
required in few places. In fact, no locking is required for the memory storage
instance which is part of a view (and not the TieredStorage).
2013-05-10 17:15:52 +02:00
Matt T. Proud 161c8fbf9b Include deletion processor for long-tail values.
This commit extracts the model.Values truncation behavior into the actual
tiered storage, which uses it and behaves in a peculiar way—notably the
retention of previous elements if the chunk were to ever go empty.  This is
done to enable interpolation between sparse sample values in the evaluation
cycle.  Nothing necessarily new here—just an extraction.

Now, the model.Values TruncateBefore functionality would do what a user
would expect without any surprises, which is required for the
DeletionProcessor, which may decide to split a large chunk in two if it
determines that the chunk contains the cut-off time.
2013-05-10 12:19:12 +02:00
Matt T. Proud f897164bcf Expose TieredStorage.DiskStorage. 2013-05-07 10:26:28 +02:00
Matt T. Proud ce45787dbf Storage interface to TieredStorage.
This commit drops the Storage interface and just replaces it with a
publicized TieredStorage type.  Storage had been anticipated to be
used as a wrapper for testability but just was not used due to
practicality.  Merely overengineered.  My bad.  Anyway, we will
eventually instantiate the TieredStorage dependencies in main.go and
pass them in for more intelligent lifecycle management.

These changes will pave the way for managing the curators without
Law of Demeter violations.
2013-05-03 15:54:14 +02:00
Julius Volz d8110fcd9c Send sample arrays instead of single samples over channels. 2013-04-29 17:24:17 +02:00
Julius Volz 2202cd71c9 Track alerts over time and write out alert timeseries. 2013-04-26 14:35:21 +02:00
Johannes 'fish' Ziemke 1ad41d4c00 Call closer.Close() earlier. 2013-04-25 13:29:28 +02:00
Julius Volz 99dcbe0f94 Integrate memory and disk layers in view rendering. 2013-04-19 16:01:27 +02:00
Julius Volz 95b081f9bc Stop serving tiered storage after draining it. 2013-04-15 13:30:03 +02:00
Julius Volz c59f3fc538 Fix formatting in tiered_test.go. 2013-03-28 12:16:31 +01:00
juliusv 39826d7335 Merge pull request #107 from prometheus/julius-fix-get-fingerprints
Fix bug in GetFingerprintsForLabelSet().
2013-03-27 10:54:17 -07:00
Julius Volz 2668700e54 Fix bug in GetFingerprintsForLabelSet(). 2013-03-27 18:50:30 +01:00
Matt T. Proud c53a72a894 Test data for the curator. 2013-03-27 18:13:43 +01:00
Julius Volz 55ca65aa6e More userfriendly output when we fail to create the tiered storage. 2013-03-27 11:25:05 +01:00
Matt T. Proud c4e971d7d9 Merge pull request #101 from prometheus/refactor/test/directory-extraction
Create temporary directory handler.
2013-03-26 10:46:28 -07:00
Matt T. Proud b86b0ea41a Create temporary directory handler. 2013-03-26 18:09:25 +01:00
Julius Volz 2b8f0b2cc7 Constantize metric name label name. 2013-03-26 16:20:23 +01:00
Julius Volz e096896932 PR comment fixups. 2013-03-26 15:28:00 +01:00
Julius Volz dd67ab115b Change GetAllMetricNames() to GetAllValuesForLabel(). 2013-03-26 14:47:07 +01:00
Julius Volz 42bdf921d1 Fetch integrated memory/disk data for simple Get* functions. 2013-03-26 14:47:07 +01:00
Julius Volz 4d79dc3602 Replace renderView() by cleaner and more correct reimplementation. 2013-03-21 18:11:02 +01:00
Julius Volz 3621148e7f Comment out panicking test until proper support is implemented. 2013-03-21 18:08:47 +01:00
Julius Volz 2f06b8bea6 Fix tiered storage test to trigger iterator rewinding case. 2013-03-21 18:08:47 +01:00
Matt T. Proud 62b5d7ce20 Oops. 2013-03-21 18:08:46 +01:00
Matt T. Proud 34a921e16d Checkpoint. 2013-03-21 18:08:46 +01:00