Also, fix problems in shutdown.
Starting serving and shutdown still has to be cleaned up properly.
It's a mess.
Change-Id: I51061db12064e434066446e6fceac32741c4f84c
- Always spell out the time unit (e.g. milliseconds instead of ms).
- Remove "_total" from the names of metrics that are not counters.
- Make use of the "Namespace" and "Subsystem" fields in the options.
- Removed the "capacity" facet from all metrics about channels/queues.
These are all fixed via command line flags and will never change
during the runtime of a process. Also, they should not be part of
the same metric family. I have added separate metrics for the
capacity of queues as convenience. (They will never change and are
only set once.)
- I left "metric_disk_latency_microseconds" unchanged, although that
metric measures the latency of the storage device, even if it is not
a spinning disk. "SSD" is read by many as "solid state disk", so
it's not too far off. (It should be "solid state drive", of course,
but "metric_drive_latency_microseconds" is probably confusing.)
- Brian suggested to not mix "failure" and "success" outcome in the
same metric family (distinguished by labels). For now, I left it as
it is. We are touching some bigger issue here, especially as other
parts in the Prometheus ecosystem are following the same
principle. We still need to come to terms here and then change
things consistently everywhere.
Change-Id: If799458b450d18f78500f05990301c12525197d3
Having metrics with variable timestamps inconsistently
spaced when things fail will make it harder to write correct rules.
Update status page, requires some refactoring to insert a function.
Change-Id: Ie1c586cca53b8f3b318af8c21c418873063738a8
This roughly comprises the following changes:
- index target pools by job instead of scrape interval
- make targets within a pool exchangable while preserving existing
health state for targets
- allow exchanging targets via HTTP API (PUT)
- show target lists in /status (experimental, for own debug use)
We have an open question of how long does it take for each target
pool to have the state retrieved from all participating elements.
This commit starts by providing insight into this.
client_golang was updated to support full label-oriented telemetry,
which introduced interface incompatibilities with the previous
version of Prometheus. To alleviate this, a general fetching and
processing dispatching system has been created, which discriminates
and processes according to the version of input.
Future tests around the ``TargetPool`` and ``TargetManager`` and
friends will be a lot easier when the concrete behaviors of
``Target`` can be extracted out. Plus, each ``Target``, I suspect,
will have its own resolution and query strategy.
``Target`` will be refactored down the road to support various
nuanced endpoint types. Thusly incorporating the scheduling
behavior within it will be problematic. To that end, the scheduling
behavior has been moved into a separate assistance type to improve
conciseness and testability.
``make format`` was also run.
``TargetPool`` is a pool of targets pending scraping. For now, it
uses the ``heap.Interface`` from ``container/heap`` to provide a
priority queue for the system to scrape from the next target.
It is my supposition that we'll use a model whereby we create a
``TargetPool`` for each scrape interval, into which ``Target``
instances are registered.