The integral accumulator in the remote write sharding code is just a
second way of keeping track of the number of samples pending. Remove
integralAccumulator and use the samplesPending value we already
calculate to calculate the number of shards.
This has the added benefit of fixing a bug where the integralAccumulator
was not being initialized correctly due to not taking into account the
number of ticks being counted, causing the integralAccumulator initial
value to be off by an order of magnitude in some cases.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
Also improves TestPopulateLabels: testutil.ErrorEqual just returned a
bool without failing the test.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Track remote write queues via a map so we don't care about index.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Support a job name for remote write/read so we can differentiate between
them using the name.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Remote write/read has Name to not confuse the meaning of the field with
scrape job names.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Split queue/client label into remote_name and url labels.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Don't allow for duplicate remote write/read configs.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Ensure we restart remote write queues if the hash of their config has
not changed, but the remote name has changed.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Include name in remote read/write config hashes, simplify duplicates
check, update test accordingly.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
It is possible that desired shards is always a bit higher than the
number of shards (less than 30%) and by exporting desired shards as the
raw number it will be easy to tell if a Prometheus is in that situation.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Refactor calculateDesiredShards + don't reshard if we're having issues
sending samples.
* Track lastSendTimestamp via an int64 with atomic add/load, add a test
for reshard calculation.
* Simplify conditional for skipping resharding, add samplesIn/Out to shard
testcase struct.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
The WAL Watcher replays a checkpoint after it is created in order to
garbage collect series that no longer exist in the WAL. Currently the
garbage collection process is done serially with reading from the tip of
the WAL which can cause large delays in writing samples to remote
storage just after compaction occurs.
This also fixes a memory leak where dropped series are not cleaned up as
part of the SeriesReset process.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Replaced test validations with testutils on storage/remote/codec_test.go
Signed-off-by: George Felix <george.felix@ubeeqo.com>
* gofmt
Signed-off-by: George Felix <george.felix@ubeeqo.com>
* Removed shouldPass assertion
Signed-off-by: George Felix <gfelixc@gmail.com>
* Fixes to improve readability
Signed-off-by: George Felix <george.felix@ubeeqo.com>
* Fixes based on code review comments
Signed-off-by: George Felix <george.felix@ubeeqo.com>
* Added Fatal method and used it in buffer_test
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Added period to meet contributing guidelines
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Removed fatal testutil method. Refactored test cases to use testutil.Assert
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Added if found condition for clarity
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Show warnings in UI if query have returned some warnings
+ improve warning (error) text if query to remote was finished with error
* Add prefixes for remote_read errors
Signed-off-by: Stan Putrya <root.vagner@gmail.com>
* Update go.mod dependencies before release
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add issue for showing query warnings in promtool
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert json-iterator back to 1.1.6
It produced errors when marshaling Point values with special float
values.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fix expected step values in promtool tests after client_golang update
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Update generated protobuf code after proto dep updates
Signed-off-by: Julius Volz <julius.volz@gmail.com>
The desired shards calculation now properly keeps track of the rate of
pending samples, and uses the previously unused integralAccumulator to
adjust for missing information in the desired shards calculation.
Also, configure more capacity for each shard. The default 10 capacity
causes shards to block on each other while
sending remote requests. Default to a 500 sample capacity and explain in
the documentation that having more capacity will help throughput.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Check for duplicate label names in remote read
Also add test to confirm that #5731 is fixed
* Use subtests in TestValidateLabelsAndMetricName
* Really check that expectedErr matches err
Signed-off-by: Vadym Martsynovskyy <vmartsynovskyy@gmail.com>
* Only check last directory when discovering checkpoint number
Signed-off-by: Devin Trejo <dtrejo@palantir.com>
* Comments for checkpointNum
Signed-off-by: Devin Trejo <dtrejo@palantir.com>
* Add benchmark for sample delivery
* Simplify StoreSeries to have only one loop
* Reduce allocations for pending samples in runShard
* Only allocate one send slice per segment
* Cache a buffer in each shard for snappy to use
* Remove queue manager seriesMtx
It is not possible for any of the places protected by the seriesMtx to
be called concurrently so it is safe to remove. By removing the mutex we
can simplify the Append code to one loop.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
This allows other processes to reuse just the remote write code without
having to use the remote read code as well. This will be used to create
a sidecar capable of sending remote write payloads.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Update remote write and remote read separately
* Add external labels to the remote write conf hash
* Add unit tests for remote storage lifecycle
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>