Commit Graph

128 Commits

Author SHA1 Message Date
beorn7
3d8d8928be Increase resilience of the storage against data corruption - step 1.
Step 1: Admit the problem by turning the various "panic"s into logged
errors, followed by marking the persistence as dirty.
2015-03-19 11:49:18 +01:00
beorn7
da7c0461c6 Rename persist queue len/cap to num/max chunks to persist.
Remove deprecated flag storage.incoming-samples-queue-capacity.
2015-03-18 19:36:41 +01:00
beorn7
a075900f9a Merge branch 'beorn7/persistence' into beorn7/ingestion-tweaks 2015-03-18 19:09:31 +01:00
beorn7
1d8fc7d56f Change minor things after code review. 2015-03-18 19:09:07 +01:00
beorn7
be11cb2b07 Remove the sample ingestion channel.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.

Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...

Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.

The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.

Also, remove lint and vet warnings in ast.go.
2015-03-15 14:08:22 +01:00
beorn7
0056eaeb4f Redesign series maintenance and chunk persistence. 2015-03-14 22:05:23 +01:00
beorn7
5bea942d8e Improve various things around chunk encoding.
A number of mostly minor things:

- Rename chunk type -> chunk encoding.

- After all, do not carry around the chunk encoding to all parts of
  the system, but just have one place where the encoding for new
  chunks is set based on the flag. The new approach has caveats as
  well, but the polution of so many method signatures is worse.

- Use the default chunk encoding for new chunks of existing
  series. (Previously, only new _series_ would get chunks with the
  default encoding.)

- Use an enum for chunk encoding. (But keep the version number for the
  flag, for reasons discussed previously.)

- Add encoding() to the chunk interface (so that a chunk knows its own
  encoding - no need to have that in a different top-level function).

- Got rid of newFollowUpChunk (which would keep the existing encoding
  for all chunks of a time series). Now only use newChunk(), which
  will create a chunk encoding according to the flag.

- Simplified transcodeAndAdd.

- Reordered methods of deltaEncodedChunk and doubleDeltaEncoded chunk
  to match the order in the chunk interface.

- Only transcode if the chunk is not yet half full. If more than half
  full, add a new chunk instead.
2015-03-14 19:03:20 +01:00
beorn7
9ecf93526d Sync the checkpoints.
Because that's what should be done with checkpoints.
2015-03-11 19:10:51 +01:00
beorn7
853f971540 Actually use double-delta encoding for transcoding. :-o 2015-03-11 16:52:58 +01:00
beorn7
23ba8a5516 Make floats exact again.
This should do the right thing for the old delta chunks, too.
2015-03-06 17:03:56 +01:00
beorn7
a8d4f8af9a Improve minor things after review.
The problem of float precision will be addressed in the next commit.
2015-03-06 12:53:00 +01:00
beorn7
13fcf1ddbc Implement double-delta encoded chunks. 2015-03-05 20:33:26 +01:00
beorn7
5ed8f6c205 Update persistQueueLength after chunks were persisted. 2015-03-04 18:46:16 +01:00
beorn7
0167083da6 Improvements after review. 2015-03-03 18:59:39 +01:00
beorn7
ebac14eff3 Add version guard to persistence. 2015-03-03 18:34:01 +01:00
Julius Volz
795704f0df Merge pull request #565 from fabxc/fabxc/labelmatcher_test
Tests for retrieving fingerprints for label matchers added.
2015-02-27 14:52:37 +01:00
Fabian Reinartz
4bff5d29bf Add tests for retrieving fingerprints for label matchers.
This checks for the basic behaviour of GetFingerprintsForLabelMatchers, that is, whether the different matcher types filter the correct fingerprints and intersections are correct.
2015-02-27 14:41:43 +01:00
beorn7
92991026bb Fix chunkDescsTotal count in case of errors.
Only increment the counter if we actually add the memory series to the
fingerprintToSeries map.
2015-02-27 02:21:12 +01:00
beorn7
1db7589081 Reduce the capacity of countPersistedHeadChunks.
The capacity is basically how many persisted head chunks we will count
at most while doing other things, in particular checkpointing. To
limit the amount of already counted head chunks, keep this number low,
otherwise we will easily checkpoint too often if checkpoints take long
anyway.
2015-02-27 00:53:52 +01:00
beorn7
9406afad72 Do not double-count non-persisted head chunks on loading. 2015-02-27 00:06:16 +01:00
beorn7
dbc22b972c Check last time in head chunk for head chunk timeout, not first. 2015-02-26 23:40:42 +01:00
beorn7
edd716e63c Fix the embarrassing bug introduced in commit 0851945.
In that commit, the 'maintainSeries' call was accidentally removed.

This commit refactors things a bit so that there is now a clean
'maintainMemorySeries' and a 'maintainArchivedSeries' call.

Straighten the nomenclature a bit (consistently use 'drop' for
chunks and 'purge' for series/metrics).

Remove the annoying 'Completed maintenance sweep through archived
fingerprints' message if there were no archived fingerprints to do
maintenance on.
2015-02-26 18:30:33 +01:00
beorn7
af91fb8e31 Improve persisting chunks to disk.
This is done by bucketing chunks by fingerprint. If the persisting to
disk falls behind, more and more chunks are in the queue. As soon as
there are "double hits", we will now persist both chunks in one go,
doubling the disk throughput (assuming it is limited by disk
seeks). Should even more pile up so that we end wit "triple hits", we
will persist those first, and so on.

Even if we have millions of time series, this will still help,
assuming not all of them are growing with the same speed. Series that
get many samples and/or are not very compressable will accumulate
chunks faster, and they will soon get double- or triple-writes.

To improve the chance of double writes,
-storage.local.persistence-queue-capacity could be set to a higher
value. However, that will slow down shutdown a lot (as the queue has
to be worked through). So we leave it to the user to set it to a
really high value. A more fundamental solution would be to checkpoint
not only head chunks, but also chunks still in the persist queue. That
would be quite complicated for a rather limited use-case (running many
time series with high ingestion rate on slow spinning disks).
2015-02-17 16:02:09 +01:00
beorn7
e22f26bc58 Move to a queue model for appending samples after all.
Starting a goroutine takes 1-2µs on my laptop. From the "numbers every
Go programmer should know", I had 300ns for a channel send in my
mind. Turns out, on my laptop, it takes only 60ns. That's fast enough
to warrant the machinery of yet another channel with a fixed set of
worker goroutines feeding from it. The number chosen (8 for now) is
low enough to not really afflict a measurable overhead (a big
Prometheus server has >1000 goroutines running), but high enough to
not make sample ingestion a bottleneck.
2015-02-13 14:26:54 +01:00
beorn7
fe518fdb28 Simplify AppendSamples by allowing it to be goroutine-unsafe. 2015-02-13 12:13:22 +01:00
beorn7
5d3cd65a5d Improve performance of ingestion.
- Parallelize AppendSamples as much as possible without breaking the
  contract about temporal order.

- Allocate more fingerprint locker slots.

- Do not run early checkpoints if we are behind on chunk persistence.

- Increase fpMinWaitDuration to give the disk more time for more
  important things.

Also, switch math.MaxInt64 and math.MinInt64 to the new constants.
2015-02-12 18:12:37 +01:00
beorn7
d2ab49c396 Make the persist queue length configurable.
Also, set a much higher default value.

Chunk persist requests can be quite spiky. If you collect a large
number of time series that are very similar, they will tend to finish
up a chunk at about the same time. There is no reason we need to back
up scraping just because of that. The rationale of the new default
value is "1/8 of the chunks in memory".
2015-02-06 14:54:53 +01:00
Julius Volz
9412b296d5 Remove labels on persist error counter.
This fixes https://github.com/prometheus/prometheus/issues/496
2015-02-01 14:03:34 +01:00
Bjoern Rabenstein
3948e2a7f8 Move lost files to an "orphaned" directory.
Previously, those were simply deleted. The orphaned files can now be
used for forensics if needed.
2015-01-29 14:52:12 +01:00
Bjoern Rabenstein
c24bfdf701 Move crash related code into separate file.
persistence.go is way too long anyway, and a lot of code is just crash
recovery, which is not important to understand the normal operation.

Also, remove unused `exists` function.
2015-01-29 13:13:16 +01:00
Bjoern Rabenstein
ab386d1f5d Declare storage.local.index-cache-size.* default values as tweaked. 2015-01-29 13:04:54 +01:00
Bjoern Rabenstein
73f6dc4d44 Make KeyValueStore.Delete report if the key to delete was found.
Previously, it would return an error instead. Now we can distinguish
the cases 'error while deleting known key' vs. 'key not in index'
without testing for leveldb-internal kinds of errors.
2015-01-29 12:57:50 +01:00
Bjoern Rabenstein
2c8d324ca4 Remove check that did not check anything. 2015-01-26 13:48:24 +01:00
Bjoern Rabenstein
2c8fdcbc23 Remove a deadlock during shutdown.
If queries are still running when the shutdown is initiated, they will
finish _during_ the shutdown. In that case, they might request chunk
eviction upon unpinning their pinned chunks. That might completely
fill the evict request queue _after_ draining it during storage
shutdown. If that ever happens (which is the case if there are _many_
queries still running during shutdown), the affected queries will be
stuck while keeping a fingerprint locked. The checkpointing can then
not process that fingerprint (or one that shares the same lock). And
then we are deadlocked.
2015-01-22 14:42:15 +01:00
Bjoern Rabenstein
5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Bjoern Rabenstein
f298af5756 Use named returns in flock.New. 2015-01-19 14:31:16 +01:00
Bjoern Rabenstein
baca6faa1c Add double-start protection.
This mimics the locking leveldb is performing anyway. Advantages of
doing it separately:

- Should we ever replace the leveldb implementation by one without
  double-start protection, we are still good.

- In contrast to leveldb, the new code creates a meaningful error
  message.
2015-01-14 17:13:42 +01:00
Julius Volz
a6bc42bc61 Minor formatting/spelling fixups. 2015-01-09 11:04:20 +01:00
Bjoern Rabenstein
0851945054 Add a heuristics to checkpoint early if there are many "dirty" series.. 2015-01-08 20:15:58 +01:00
Bjoern Rabenstein
622e8350cd Fix a bug handling freshly unarchived series.
Usually, if you unarchive a series, it is to add something to it,
which will create a new head chunk. However, if a series in
unarchived, and before anything is added to it, it is handled by the
maintenance loop, it will be archived again. In that case, we have to
load the chunkDescs to know the lastTime of the series to be
archived. Usually, this case will happen only rarely (as a race, has
never happened so far, possibly because the locking around unarchiving
and the subsequent sample append is smart enough). However, during
crash recovery, we sometimes treat series as "freshly unarchived"
without directly appending a sample. We might add more cases of that
type later, so better deal with archiving properly and load chunkDescs
if required.
2015-01-08 16:25:50 +01:00
Bjoern Rabenstein
eb932d1524 Remove a deadlock during shutdown. 2015-01-07 19:02:38 +01:00
Brian Brazil
e56786b221 Have scrape time as a pseudovariable, not a prometheus variable.
This ensures it has the right timestamp, and is easier to work with.

Switch sd variable away from 'outcome', using total/failed instead.
2014-12-27 00:39:33 +00:00
Bjoern Rabenstein
ff24070a03 Fix embarrassing bug in crash recovery.
(And yes, we always knew we need tests for that. I have added a TODO now.)

Change-Id: I9cf52bbf98e263e0b79404bda4c442beba9696a8
2014-12-17 17:18:04 +01:00
Julius Volz
c9618d11e8 Introduce copy-on-write for metrics in AST.
This depends on changes in:

https://github.com/prometheus/client_golang/tree/cow-metrics.

Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142
2014-12-12 20:34:55 +01:00
Bjoern Rabenstein
afd864e7f4 Adjust to the new version of goleveldb.
(And yes, we do want vendoring for that... This is just the quick fix.)

Change-Id: I9d347a64d96de6b3390a0e35c8d466f14bb83e4e
2014-12-10 18:04:29 +01:00
Bjoern Rabenstein
fee88a7a77 Remove the remaining races, new and old.
Also, resolve a few other TODOs.

Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691
2014-12-03 18:07:23 +01:00
Bjoern Rabenstein
66c80b5ebd Fix typo.
Change-Id: I72608c7841c00145458807d3c3ee29db7b5ac2bc
2014-11-28 12:50:19 +01:00
Bjoern Rabenstein
674624f1c8 Completed more TODOs.
- Documented checkpoint file format.
- High-level description of series sanitation.
- Replace fp.LoadFromString panic with an error.
  (Change in client_golang already submitted.)
- Introduced checks for series file size where appropriate.
- Removed two Law of Demeter violations.

Change-Id: I555d97a2c8f4769820c2fc8bf5d6f4e160222abc
2014-11-27 20:46:45 +01:00
Bjoern Rabenstein
7d11019aa2 Squash a few trivial TODOs.
- Delete unneeded file view_adapter.go.
- Assessed that we still need the fingerprints in nodes
  (to create iterators).
- Turned numMemChunkDescs into a metric.

Change-Id: I29be963c795a075ec00c095f76bf26405535609d
2014-11-27 18:26:06 +01:00
Bjoern Rabenstein
49683c0c20 Avoid test flags in normal binary.
Change-Id: If1fba813a73bf93ea5918dcda326e3ffa81a797d
2014-11-27 18:04:48 +01:00