Commit Graph

68 Commits

Author SHA1 Message Date
Matt T. Proud ef1d5fd8a2 Introduce semaphores for tiered storage.
This commit wraps the tiered storage access componnets in semaphores,
since we can handle several concurrent memory reads.
2013-06-06 18:16:18 +02:00
Matt T. Proud 819045541e Code Review: Make double-drain a panic. 2013-06-06 12:40:06 +02:00
Matt T. Proud beaaf386e7 Add storage state guards and transition callbacks.
To ensure that we access tiered storage in the proper way, we have
guards now.
2013-06-06 11:52:09 +02:00
Matt T. Proud 2c3df44af6 Ensure database access waits until it is started.
This commit introduces a channel message to ensure serving
state has been reached with the storage stack before anything attempts
to use it.
2013-06-06 10:42:21 +02:00
Julius Volz 51689d965d Add debug timers to instant and range queries.
This adds timers around several query-relevant code blocks. For now, the
query timer stats are only logged for queries initiated through the UI.
In other cases (rule evaluations), the stats are simply thrown away.

My hope is that this helps us understand where queries spend time,
especially in cases where they sometimes hang for unusual amounts of
time.
2013-06-05 18:32:54 +02:00
Matt T. Proud 8339a189cb Code Review: Fix seriesPresent scope.
The seriesPresent scope should be constrained to the scope of a
scanJob, since this is keyed to given series.
2013-06-04 13:16:59 +02:00
Matt T. Proud fe41ce0b19 Conditionalize disk initializations.
This commit conditionalizes the creation of the diskFrontier and
seriesFrontier along with the iterator such that they are provisioned
once something is actually required from disk.
2013-06-04 12:53:57 +02:00
Julius Volz a8468a2e5e Fix reversed disk flush cutoff behavior. 2013-05-28 16:14:30 +02:00
Julius Volz eb1f956909 Revert "Revert "Ensure that all extracted samples are added to view.""
This reverts commit 4b30fb86b4.
2013-05-28 14:36:03 +02:00
Matt T. Proud 4b30fb86b4 Revert "Ensure that all extracted samples are added to view."
This reverts commit 008314b5a8. By
running an automated git bisection described in
https://gist.github.com/matttproud-soundcloud/22a371a8d2cba382ea64
this commit was found.
2013-05-23 13:36:22 +02:00
Julius Volz 008314b5a8 Ensure that all extracted samples are added to view.
The current behavior only adds those samples to the view that are extracted by
the last pass of the last processed op and throws other ones away. This is a
bug. We need to append all samples that are extracted by each op pass.

This also makes view.appendSamples() take an array of samples.
2013-05-22 18:14:37 +02:00
Matt T. Proud b586801830 Code Review: Fix to-disk queue infinite growth.
We discovered a bug while manually testing this branch on a live
instance, whereby the to-disk queue was never actually dumped to
disk.
2013-05-22 17:59:53 +02:00
Matt T. Proud c07abf8521 Initial move away from skiplist. 2013-05-22 17:59:53 +02:00
Julius Volz 5b105c77fc Repointerize fingerprints. 2013-05-21 14:28:14 +02:00
Matt T. Proud e5ac91222b Benchmark memory arena; simplify map generation.
The one-off keys have been replaced with ``model.LabelPair``, which is
indexable.  The performance impact is negligible, but it represents
a cognitive simplification.
2013-05-21 09:39:12 +02:00
juliusv 360477f66c Merge pull request #257 from prometheus/feature/better-memory-behaviors
Pointerize memorySeriesArena.
2013-05-16 07:36:40 -07:00
Matt T. Proud e1f20de2e9 Pointerize memorySeriesArena. 2013-05-16 17:09:28 +03:00
Matt T. Proud 8f4c7ece92 Destroy naked returns in half of corpus.
The use of naked return values is frowned upon.  This is the first
of two bulk updates to remove them.
2013-05-16 10:53:25 +03:00
Matt T. Proud 4e0c932a4f Simplify Encoder's encoding signature.
The reality is that if we ever try to encode a Protocol Buffer and it
fails, it's likely that such an error is ultimately not a runtime error
and should be fixed forthwith.  Thusly, we should rename
``Encoder.Encode`` to ``Encoder.MustEncode`` and drop the error return
value.
2013-05-16 00:54:18 +03:00
juliusv 516101f015 Merge pull request #250 from prometheus/refactor/drop-unused-storage-setting
Drop unused writeMemoryInterval
2013-05-14 08:45:59 -07:00
Bernerd Schaefer 63d9988b9c Drop unused writeMemoryInterval 2013-05-14 17:03:03 +02:00
Julius Volz 83c60ad43a Fix GetMetricForFingerprint() metric mutability.
Some users of GetMetricForFingerprint() end up modifying the returned metric
labelset. Since the memory storage's implementation of
GetMetricForFingerprint() returned a pointer to the metric (and maps are
reference types anyways), the external mutation propagated back into the memory
storage.

The fix is to make a copy of the metric before returning it.
2013-05-14 16:46:30 +02:00
Matt T. Proud 244a4a9cdb Update to go1.1.
This commit updates the documentation, Makefiles, formatting, and
code semantics to support the 1.1. runtime, which includes ...

1. ``make advice``,

2. ``make format``, and

3. ``go fix`` on various targets.
2013-05-14 12:39:08 +02:00
Matt T. Proud b224251981 Simplify compaction and expose database sizes.
This commit simplifies the way that compactions across a database's
keyspace occur due to reading the LevelDB internals. Secondarily it
introduces the database size estimation mechanisms.

Include database health and help interfaces.

Add database statistics; remove status goroutines.

This commit kills the use of Go routines to expose status throughout
the web components of Prometheus. It also dumps raw LevelDB status
on a separate /databases endpoint.
2013-05-14 12:29:53 +02:00
Julius Volz ce1ee444f1 Synchronous memory appends and more fine-grained storage locks.
This does two things:

1) Make TieredStorage.AppendSamples() write directly to memory instead of
   buffering to a channel first. This is needed in cases where a rule might
   immediately need the data generated by a previous rule.

2) Replace the single storage mutex by two new ones:
   - memoryMutex - needs to be locked at any time that two concurrent
                   goroutines could be accessing (via read or write) the
                   TieredStorage memoryArena.
   - memoryDeleteMutex - used to prevent any deletion of samples from
                         memoryArena as long as renderView is running and
                         assembling data from it.
   The LevelDB disk storage does not need to be protected by a mutex when
   rendering a view since renderView works off a LevelDB snapshot.

The rationale against adding memoryMutex directly to the memory storage: taking
a mutex does come with a small inherent time cost, and taking it is only
required in few places. In fact, no locking is required for the memory storage
instance which is part of a view (and not the TieredStorage).
2013-05-10 17:15:52 +02:00
Matt T. Proud 161c8fbf9b Include deletion processor for long-tail values.
This commit extracts the model.Values truncation behavior into the actual
tiered storage, which uses it and behaves in a peculiar way—notably the
retention of previous elements if the chunk were to ever go empty.  This is
done to enable interpolation between sparse sample values in the evaluation
cycle.  Nothing necessarily new here—just an extraction.

Now, the model.Values TruncateBefore functionality would do what a user
would expect without any surprises, which is required for the
DeletionProcessor, which may decide to split a large chunk in two if it
determines that the chunk contains the cut-off time.
2013-05-10 12:19:12 +02:00
Julius Volz caab131ada Repointerize TieredStorage method receiver types. 2013-05-07 15:12:33 +02:00
juliusv 89de116ea9 Merge pull request #225 from prometheus/refactor/fmt-cleanups
Slice expression simplifications.
2013-05-07 04:27:27 -07:00
Julius Volz 05afa970d2 Slice expression simplifications. 2013-05-07 13:22:29 +02:00
Matt T. Proud f897164bcf Expose TieredStorage.DiskStorage. 2013-05-07 10:26:28 +02:00
Matt T. Proud ce45787dbf Storage interface to TieredStorage.
This commit drops the Storage interface and just replaces it with a
publicized TieredStorage type.  Storage had been anticipated to be
used as a wrapper for testability but just was not used due to
practicality.  Merely overengineered.  My bad.  Anyway, we will
eventually instantiate the TieredStorage dependencies in main.go and
pass them in for more intelligent lifecycle management.

These changes will pave the way for managing the curators without
Law of Demeter violations.
2013-05-03 15:54:14 +02:00
Matt T. Proud a3f1d81e24 Publicize a few storage components for curation.
This commit introduces the publicization of Stop and other
components, which the compaction curator shall take advantage
of.
2013-05-02 13:16:04 +02:00
Julius Volz d8110fcd9c Send sample arrays instead of single samples over channels. 2013-04-29 17:24:17 +02:00
Matt T. Proud 6fac20c8af Harden the tests against OOMs.
This commit employs explicit memory freeing for the in-memory storage
arenas.  Secondarily, we take advantage of smaller channel buffer sizes
in the test.
2013-04-29 11:46:01 +02:00
Johannes 'fish' Ziemke 22da76e8ab Close of reportTicker to exit goroutine. 2013-04-25 13:29:22 +02:00
Johannes 'fish' Ziemke 5043c6fce7 Have goroutine exit on signal via defer block. 2013-04-25 12:14:38 +02:00
Julius Volz 9b8c671ec9 Fixes/cleanups to renderView() samples truncation. 2013-04-24 12:42:58 +02:00
Matt T. Proud 05504d3642 WIP - Truncate irrelevant chunk values.
This does not work with the view tests.
2013-04-24 11:07:22 +02:00
Matt T. Proud b1a8e51b07 Extract dto.SampleValueSeries into model.Values. 2013-04-22 13:31:11 +02:00
Matt T. Proud db4ffbb262 Wrap dto.SampleKey with business logic type.
The curator work can be done easier if dto.SampleKey is no longer
directly accessed but rather has a higher level type around it that
captures a certain modicum of business logic.  This doesn't look
terribly interesting today, but it will get more so.
2013-04-21 20:38:39 +02:00
Julius Volz 99dcbe0f94 Integrate memory and disk layers in view rendering. 2013-04-19 16:01:27 +02:00
Matt T. Proud d468271e2f Fix append queue telemetry and parameterize sizes.
The original append queue telemetry never worked, because it was
updated only upon the exit of the select statement, which would
usually liberate the queues of contents.  This has been fixed to
be reported arbitrarily.

The queue sizes are now parameterizable via flags.
2013-04-16 17:13:29 +02:00
Julius Volz 95b081f9bc Stop serving tiered storage after draining it. 2013-04-15 13:30:03 +02:00
Matt T. Proud a55602df4a Validate diskFrontier domain for series candidate.
It is the case with the benchmark tool that we thought that we
generated multiple series and saved them to the disk as such, when
in reality, we overwrote the fields of the outgoing metrics via
Go map reference behavior.  This was accidental.  In the course of
diagnosing this, a few errors were found:

1. ``newSeriesFrontier`` should check to see if the candidate fingerprint is within the given domain of the ``diskFrontier``.  If not, as the contract in the docstring stipulates, a ``nil`` ``seriesFrontier`` should be emitted.

2. In the interests of aiding debugging, the raw LevelDB ``levigoIterator`` type now includes a helpful forensics ``String()`` method.

This work produced additional cleanups:

1. ``Close() error`` with the storage stack is technically incorrect, since nowhere in the bowels of it does an error actually occur.  The interface has been simplified to remove this for now.
2013-04-09 11:47:16 +02:00
Matt T. Proud d79c932a8e Merge pull request #120 from prometheus/feature/storage/compaction
Spin up curator run in the tests.
2013-04-05 04:55:59 -07:00
Matt T. Proud c3e3460ca6 Spin up curator run in the tests.
After this commit, we'll need to add validations that it does the
desired work, which we presently know that it doesn't.  Given the
changes I made with a plethora of renamings, I want to commit this
now before it gets even larger.
2013-04-05 13:55:11 +02:00
Julius Volz 2668700e54 Fix bug in GetFingerprintsForLabelSet(). 2013-03-27 18:50:30 +01:00
Julius Volz 55ca65aa6e More userfriendly output when we fail to create the tiered storage. 2013-03-27 11:25:05 +01:00
Julius Volz 8cf2af3923 Abort view job processing on timeout. 2013-03-26 17:18:51 +01:00
Julius Volz e096896932 PR comment fixups. 2013-03-26 15:28:00 +01:00