The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.
Fixes#1241
Signed-off-by: Paul Gier <pgier@redhat.com>
This reduces the system metric collection time by using a wait group
and go routines to allow the systemd metric calls happen concurrently.
Also, makes the start time, restarts, tasks_max, and tasks_current metrics disabled by default
because these can be time consuming to gather.
Signed-off-by: Paul Gier <pgier@redhat.com>
With a bond interface the state of the slave interface from the bond's
point of view is reflected in `mii_status` and is independent of the
link's `operstate`.
When a bond is monitored with `miimon`, `mii_status` will reflect the
state of the physical link as configured via the operator.
When a bond is monitored via `arp_interval` the `mii_status` will
reflect the results of the bond ARP checking. This means the link can
be down from the bond's point of view, but up from a physical
connection point of view.
If a bond is not monitored via miimon or arp, the `mii_status` should
likely be always `up`, however I have observed a case where this is not
true and the `operstate` is `up` while `mii_status` is `down`. Kernel
bond documentation stresses that a bond should not be configured without
one of `mii_mon` or `arp_interval` configured however.
This change results in the metric 'node_bonding_active' matching the
up/down state of the bond's point of view rather than operstate.
Signed-off-by: Sachi King <nakato@nakato.io>
* netclass_linux: remove varying labels from the 'up' metric
This moves the variable label values such as 'operstate' out of
the 'network_up' metric and into a separate metric called '_info'.
This allows the 'up' metric to remain continous over state changes.
Fixes#1236
Signed-off-by: Paul Gier <pgier@redhat.com>
* Rename interface to device in netclass collector
This makes it consistent with other networking metrics like node_network_receive_bytes_total
This closes#1223
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
* Add diskstats collector for OpenBSD
Tested on i386 and amd64, OpenBSD 6.4 and -current.
* Refactor diskstats collectors
This moves common descriptors from Linux, Darwin, OpenBSD
diskstats collectors into diskstats_common.go
Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
* Add fallback for missing /proc/1/mounts
On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add tests.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add CHANGELOG entry.
Signed-off-by: Ben Kochie <superq@gmail.com>
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.
Signed-off-by: Ben Kochie <superq@gmail.com>
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.
SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.
For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348
Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
* Change systemd unit filtering
Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Correctly cast Darwin memory info
* Cast stats to float64 before doing math on them to avoid integer
wrapping.
* Remove invalid `_total` suffix from gauge values.
* Handle counters in `meminfo.go`.
Signed-off-by: Ben Kochie <superq@gmail.com>
* If NRestarts or NRefused are not available, don't ignore the unit itself
* Don't report systemd metrics (NRestarts/NRefused) that are not available
Signed-off-by: James Hartig <james@getadmiral.com>
PIDs can vanish (exit) from /proc/ between gathering the list of PIDs
and getting all of their stats.
* Ignore file not found errors.
* Explicitly count the PIDs we find.
* Cleanup some error style issues.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric
* Use Prometheus best practices for uptime metric.
* Use "start time" rather than "uptime".
* Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add support for NRestarts counter introduced in systemd 235
`.service` units increment this counter any time the Restart= condition is
triggered.
Signed-off-by: Matthew McGinn <mamcgi@gmail.com>
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total
This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.
Rename the node_package_throttles_total metric label `node` to `package`.
Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Use the direct /sys path to the cpu files.
Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.
The latter path also does not exist e.g. on RHEL 6.9's kernel.
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Reverse core+package throttle processing order
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Add documentation URLs
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics. Separate these into their own metric to
avoid duplication of data.