node_exporter/docs/TIME.md

82 lines
3.6 KiB
Markdown
Raw Permalink Normal View History

# Monitoring time sync with node_exporter
## `ntp` collector
NOTE: This collector is deprecated and will be removed in the next major version release.
This collector is intended for usage with local NTP daemons including [ntp.org](http://ntp.org/), [chrony](https://chrony.tuxfamily.org/comparison.html), and [OpenNTPD](http://www.openntpd.org/).
Note, some chrony packages have `local stratum 10` configuration value making chrony a valid server when it is unsynchronised. This configuration makes one of the heuristics that derive `node_ntp_sanity` unreliable.
Note, OpenNTPD does not listen for SNTP queries by default. Add `listen on 127.0.0.1` to the OpenNTPD configuration when using this collector with that package.
### `node_ntp_stratum`
This metric shows the [stratum](https://en.wikipedia.org/wiki/Network_Time_Protocol#Clock_strata) of the local NTP daemon.
Stratum `16` means that clock are unsynchronised. See also aforementioned note about default local stratum in chrony.
### `node_ntp_leap`
Raw leap flag value. 0 OK, 1 add leap second at UTC midnight, 2 delete leap second at UTC midnight, 3 unsynchronised.
OpenNTPD ignores leap seconds and never sets leap flag to `1` or `2`.
### `node_ntp_rtt`
RTT (round-trip time) from node_exporter collector to local NTPD. This value is
used in sanity check as part of causality violation estimate.
### `node_ntp_offset`
[Clock offset](https://en.wikipedia.org/wiki/Network_Time_Protocol#Clock_synchronization_algorithm) between local time and NTPD time.
ntp.org always sets NTPD time to local clock instead of relaying remote NTP
time, so this offset is irrelevant for this NTPD.
This value is used in sanity check as part of causality violation estimate.
### `node_ntp_reference_timestamp_seconds`
Reference Time. This field show time when the last adjustment was made, but
implementation details vary from "**local** wall-clock time" to "Reference Time
field in incoming SNTP packet".
`time() - node_ntp_reference_timestamp_seconds` and
`node_time_seconds - node_ntp_reference_timestamp_seconds` represent some estimate of
"freshness" of synchronization.
### `node_ntp_root_delay` and `node_ntp_root_dispersion`
These values are used to calculate synchronization distance that is limited by
`collector.ntp.max-distance`.
ntp.org adds known local offset to announced root dispersion and linearly
increases dispersion in case of NTP connectivity problems, OpenNTPD does not
account dispersion at all and always reports `0`.
### `node_ntp_sanity`
Aggregate NTPD health including stratum, leap flag, sane freshness, root
distance being less than `collector.ntp.max-distance` and causality violation
being less than `collector.ntp.local-offset-tolerance`.
Causality violation is lower bound estimate of clock error done using SNTP,
it's calculated as positive portion of `abs(node_ntp_offset) - node_ntp_rtt / 2`.
## `timex` collector
This collector exports state of kernel time synchronization flag that should be
maintained by time-keeping daemon and is eventually raised by Linux kernel if
time-keeping daemon does not update it regularly.
Unfortunately some daemons do not handle this flag properly, e.g. chrony-1.30
from Debian/jessie clears `STA_UNSYNC` flag during daemon initialisation and
does not indicate clock synchronization status using this flag. Modern chrony
versions should work better. All chrony versions require `rtcsync` option to
maintain this flag. OpenNTPD does not touch this flag at all till
OpenNTPD-5.9p1.
On the other hand combination of `sync_status` and `offset` exported by `timex`
module is the way to monitor if systemd-timesyncd does its job.