Commit Graph

73 Commits

Author SHA1 Message Date
beorn7 3b9ab546e6 Add metrics to count inconsistencies and fp collisions. 2015-05-21 18:46:20 +02:00
Björn Rabenstein c44e7cd105 Merge pull request #706 from prometheus/beorn7/persistence2
Improve iterator performance.
2015-05-21 13:48:52 +02:00
beorn7 3b9c421a69 Weed out all the [Gg]et* method names.
The only exception is getNumChunksToPersist to avoid naming the struct
member numChunksToPersist in a weird way.
2015-05-20 19:13:06 +02:00
Julius Volz 267fd34156 Switch Prometheus to use github.com/prometheus/log.
This change is conceptually very simple, although the diff is large. It
switches logging from "github.com/golang/glog" to
"github.com/prometheus/log", while not actually changing any log
messages. V(1)-style logging has been changed to be log.Debug*().
2015-05-20 18:19:32 +02:00
beorn7 cd5574bf8a Make chunk and series iterators more efficient. 2015-05-20 16:19:34 +02:00
Fabian Reinartz d8440d75f1 Do not start storage processing before Start() is called. 2015-05-19 13:51:45 +02:00
beorn7 c36e0e05f1 Add crash recovery of fingerprint mappings. 2015-05-07 18:58:14 +02:00
beorn7 2235cec175 Handle fingerprint collisions. 2015-05-07 18:17:59 +02:00
beorn7 a052d32609 Comment improvement. 2015-04-14 10:49:43 +02:00
beorn7 66fc61f9b7 Make bufPool a member of the persistence struct. 2015-04-14 10:43:09 +02:00
beorn7 b02d900e61 Improve chunk and chunkDesc loading.
Also, clean up some things in the code (especially introduction of the
chunkLenWithHeader constant to avoid the same expression all over the place).

Benchmark results:

BEFORE
BenchmarkLoadChunksSequentially     5000            283580 ns/op          152143 B/op        312 allocs/op
BenchmarkLoadChunksRandomly        20000             82936 ns/op           39310 B/op         99 allocs/op
BenchmarkLoadChunkDescs            10000            110833 ns/op           15092 B/op        345 allocs/op

AFTER
BenchmarkLoadChunksSequentially    10000            146785 ns/op          152285 B/op        315 allocs/op
BenchmarkLoadChunksRandomly        20000             67598 ns/op           39438 B/op        103 allocs/op
BenchmarkLoadChunkDescs            20000             99631 ns/op           12636 B/op        192 allocs/op

Note that everything is obviously loaded from the page cache (as the
benchmark runs thousands of times with very small series files). In a
real-world scenario, I expect a larger impact, as the disk operations
will more often actually hit the disk. To load ~50 sequential chunks,
this reduces the iops from 100 seeks and 100 reads to 1 seek and 1
read.
2015-04-13 21:06:04 +02:00
beorn7 6a21f73898 Fixes after review. 2015-03-19 17:54:59 +01:00
beorn7 51d35f4481 Instrument series maintenance durations. 2015-03-19 17:06:16 +01:00
beorn7 12ae6e9203 Increase resilience of the storage against data corruption - step 4.
Step 4: Add a configurable sync'ing of series files after modification.
2015-03-19 15:58:02 +01:00
beorn7 11bd9ce1bd Increase resilience of the storage against data corruption - step 3.
Step 3: Remember the mtime of series files and make use of it to
detect series files that are not the one the checkpoint thinks they
are.
2015-03-19 15:44:11 +01:00
beorn7 e25cca823c Increase resilience of the storage against data corruption - step 2.
Step 2: Add a flag -storage.local.pedantic-checks to check every
series file.

Also, remove countPersistedHeadChunks channel, which is unused.
2015-03-19 12:06:15 +01:00
beorn7 da7c0461c6 Rename persist queue len/cap to num/max chunks to persist.
Remove deprecated flag storage.incoming-samples-queue-capacity.
2015-03-18 19:36:41 +01:00
beorn7 1d8fc7d56f Change minor things after code review. 2015-03-18 19:09:07 +01:00
beorn7 0056eaeb4f Redesign series maintenance and chunk persistence. 2015-03-14 22:05:23 +01:00
beorn7 5bea942d8e Improve various things around chunk encoding.
A number of mostly minor things:

- Rename chunk type -> chunk encoding.

- After all, do not carry around the chunk encoding to all parts of
  the system, but just have one place where the encoding for new
  chunks is set based on the flag. The new approach has caveats as
  well, but the polution of so many method signatures is worse.

- Use the default chunk encoding for new chunks of existing
  series. (Previously, only new _series_ would get chunks with the
  default encoding.)

- Use an enum for chunk encoding. (But keep the version number for the
  flag, for reasons discussed previously.)

- Add encoding() to the chunk interface (so that a chunk knows its own
  encoding - no need to have that in a different top-level function).

- Got rid of newFollowUpChunk (which would keep the existing encoding
  for all chunks of a time series). Now only use newChunk(), which
  will create a chunk encoding according to the flag.

- Simplified transcodeAndAdd.

- Reordered methods of deltaEncodedChunk and doubleDeltaEncoded chunk
  to match the order in the chunk interface.

- Only transcode if the chunk is not yet half full. If more than half
  full, add a new chunk instead.
2015-03-14 19:03:20 +01:00
beorn7 9ecf93526d Sync the checkpoints.
Because that's what should be done with checkpoints.
2015-03-11 19:10:51 +01:00
beorn7 13fcf1ddbc Implement double-delta encoded chunks. 2015-03-05 20:33:26 +01:00
beorn7 0167083da6 Improvements after review. 2015-03-03 18:59:39 +01:00
beorn7 ebac14eff3 Add version guard to persistence. 2015-03-03 18:34:01 +01:00
beorn7 92991026bb Fix chunkDescsTotal count in case of errors.
Only increment the counter if we actually add the memory series to the
fingerprintToSeries map.
2015-02-27 02:21:12 +01:00
beorn7 9406afad72 Do not double-count non-persisted head chunks on loading. 2015-02-27 00:06:16 +01:00
beorn7 edd716e63c Fix the embarrassing bug introduced in commit 0851945.
In that commit, the 'maintainSeries' call was accidentally removed.

This commit refactors things a bit so that there is now a clean
'maintainMemorySeries' and a 'maintainArchivedSeries' call.

Straighten the nomenclature a bit (consistently use 'drop' for
chunks and 'purge' for series/metrics).

Remove the annoying 'Completed maintenance sweep through archived
fingerprints' message if there were no archived fingerprints to do
maintenance on.
2015-02-26 18:30:33 +01:00
beorn7 af91fb8e31 Improve persisting chunks to disk.
This is done by bucketing chunks by fingerprint. If the persisting to
disk falls behind, more and more chunks are in the queue. As soon as
there are "double hits", we will now persist both chunks in one go,
doubling the disk throughput (assuming it is limited by disk
seeks). Should even more pile up so that we end wit "triple hits", we
will persist those first, and so on.

Even if we have millions of time series, this will still help,
assuming not all of them are growing with the same speed. Series that
get many samples and/or are not very compressable will accumulate
chunks faster, and they will soon get double- or triple-writes.

To improve the chance of double writes,
-storage.local.persistence-queue-capacity could be set to a higher
value. However, that will slow down shutdown a lot (as the queue has
to be worked through). So we leave it to the user to set it to a
really high value. A more fundamental solution would be to checkpoint
not only head chunks, but also chunks still in the persist queue. That
would be quite complicated for a rather limited use-case (running many
time series with high ingestion rate on slow spinning disks).
2015-02-17 16:02:09 +01:00
beorn7 e22f26bc58 Move to a queue model for appending samples after all.
Starting a goroutine takes 1-2µs on my laptop. From the "numbers every
Go programmer should know", I had 300ns for a channel send in my
mind. Turns out, on my laptop, it takes only 60ns. That's fast enough
to warrant the machinery of yet another channel with a fixed set of
worker goroutines feeding from it. The number chosen (8 for now) is
low enough to not really afflict a measurable overhead (a big
Prometheus server has >1000 goroutines running), but high enough to
not make sample ingestion a bottleneck.
2015-02-13 14:26:54 +01:00
Bjoern Rabenstein c24bfdf701 Move crash related code into separate file.
persistence.go is way too long anyway, and a lot of code is just crash
recovery, which is not important to understand the normal operation.

Also, remove unused `exists` function.
2015-01-29 13:13:16 +01:00
Bjoern Rabenstein 73f6dc4d44 Make KeyValueStore.Delete report if the key to delete was found.
Previously, it would return an error instead. Now we can distinguish
the cases 'error while deleting known key' vs. 'key not in index'
without testing for leveldb-internal kinds of errors.
2015-01-29 12:57:50 +01:00
Bjoern Rabenstein 2c8d324ca4 Remove check that did not check anything. 2015-01-26 13:48:24 +01:00
Bjoern Rabenstein 5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Bjoern Rabenstein baca6faa1c Add double-start protection.
This mimics the locking leveldb is performing anyway. Advantages of
doing it separately:

- Should we ever replace the leveldb implementation by one without
  double-start protection, we are still good.

- In contrast to leveldb, the new code creates a meaningful error
  message.
2015-01-14 17:13:42 +01:00
Bjoern Rabenstein 622e8350cd Fix a bug handling freshly unarchived series.
Usually, if you unarchive a series, it is to add something to it,
which will create a new head chunk. However, if a series in
unarchived, and before anything is added to it, it is handled by the
maintenance loop, it will be archived again. In that case, we have to
load the chunkDescs to know the lastTime of the series to be
archived. Usually, this case will happen only rarely (as a race, has
never happened so far, possibly because the locking around unarchiving
and the subsequent sample append is smart enough). However, during
crash recovery, we sometimes treat series as "freshly unarchived"
without directly appending a sample. We might add more cases of that
type later, so better deal with archiving properly and load chunkDescs
if required.
2015-01-08 16:25:50 +01:00
Brian Brazil e56786b221 Have scrape time as a pseudovariable, not a prometheus variable.
This ensures it has the right timestamp, and is easier to work with.

Switch sd variable away from 'outcome', using total/failed instead.
2014-12-27 00:39:33 +00:00
Bjoern Rabenstein ff24070a03 Fix embarrassing bug in crash recovery.
(And yes, we always knew we need tests for that. I have added a TODO now.)

Change-Id: I9cf52bbf98e263e0b79404bda4c442beba9696a8
2014-12-17 17:18:04 +01:00
Bjoern Rabenstein 66c80b5ebd Fix typo.
Change-Id: I72608c7841c00145458807d3c3ee29db7b5ac2bc
2014-11-28 12:50:19 +01:00
Bjoern Rabenstein 674624f1c8 Completed more TODOs.
- Documented checkpoint file format.
- High-level description of series sanitation.
- Replace fp.LoadFromString panic with an error.
  (Change in client_golang already submitted.)
- Introduced checks for series file size where appropriate.
- Removed two Law of Demeter violations.

Change-Id: I555d97a2c8f4769820c2fc8bf5d6f4e160222abc
2014-11-27 20:46:45 +01:00
Bjoern Rabenstein 7d11019aa2 Squash a few trivial TODOs.
- Delete unneeded file view_adapter.go.
- Assessed that we still need the fingerprints in nodes
  (to create iterators).
- Turned numMemChunkDescs into a metric.

Change-Id: I29be963c795a075ec00c095f76bf26405535609d
2014-11-27 18:26:06 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein c087ee35f7 Remove archiveMtx.
Change-Id: Ie8019f860bbda68621f74380c90a4e57930d3d7a
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 7af42eda65 Optimize purging.
Now only purge if there is something to purge.
Also, set savedFirstTime and archived time range appropriately.
(Which is needed for the optimization.)

Change-Id: Idcd33319a84def3ce0318d886f10c6800369e7f9
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 33b959b898 Persist savedFirstTime in checkpoint.
Change-Id: Ibdfdea16fad0608ec104fbccc749e824a171f227
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 904acd43da Add crash recovery.
Fix the behavior if preload for non-existent series is requested.

Instead of returning an error (which triggers a panic further up),
simply count those incidents. They can happen regularly, we just want
to know if they happen too frequently because that would mean the
indexing is behind or broken.

Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 4efc60174b Tweak and verify a few parameters.
Remove TODOs accordingly.

Change-Id: Ic062e13b6ae89a9135d3f14011114fe1cca1cef8
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 5f8e9617ef Add more tests.
Add an end-to-end fuzz and race test.

Fix a race exposed by the above.

Change-Id: Ifaa39a90cefbde8d4c29bda197cc92592ded21bb
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein d215e013b7 Fix the weird chunkDesc shuffling bug.
The root cause was that after chunkDesc eviction, the offset between
memory representation of chunk layout (via chunkDescs in memory) was
shiftet against chunks as layed out on disk. Keeping the offset up to
date is by no means trivial, so this commit is pretty involved.

Also, found a race that for some reason didn't bite us so far:
Persisting chunks was completel unlocked, so if chunks were purged on
disk at the same time, disaster would strike. However, locking the
persisting of chunk revealed interesting dead locks. Basically, never
queue under the fp lock.

Change-Id: I1ea9e4e71024cabbc1f9601b28e74db0c5c55db8
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein f1de5b0c4e Run checkpointing of in-memory metrics and head chunks periodically.
Checkpointing interval is now a command line flag.

Along the way, several things were refactored.
- Restructure the way the storage is started and stopped..
- Number of series in checkpoint is now a uint64, not a varint.
  (Breaks old checkpoints, needs wipe!)
- More consistent naming and order of methods.

Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 74c9b34a5e Improve storage instrumentation even more.
Add gauge for chunks and chunkdescs in memory (backed by a global
variable to be used later not only for instrumentation but also for
memory management).

Refactored instrumentation code once more (instrumentation.go is back :).

Change-Id: Ife39947e22a48cac4982db7369c231947f446e17
2014-11-25 17:09:04 +01:00