Commit Graph

45 Commits

Author SHA1 Message Date
Ben Kochie
c23b76bfbb
Update exporter-toolkit
* Bump exporter-toolkit to the latest release.
* Use new toolkit landing page function.
* Update kingpin flags.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-03-07 15:18:38 +01:00
Haoyu Sun
37d49746bc Remove metrics of offline CPUs in CPU collector
Signed-off-by: Haoyu Sun <hasun@redhat.com>
2023-03-07 14:01:02 +01:00
Jia Xin
39b4556b5b fix cpustat when some cpus are offline
Signed-off-by: Jia Xin <alexjx@gmail.com>
2023-01-20 01:24:06 +00:00
david
c2085cf8ca flip branches for early return
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david
75c05f3d97 remove error from signature; update doc for function
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david
840d32622f check for nil isolatedCpus before calling updateIsolated
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david
5340d1ec37 add debug log for not existent file
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david
c05af934af warn if isolcpus cannot be read and default to an empty slice
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david
9ea9a5f029 only publish metrics for isolated cpus
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david
5d68d5b9ad move logic to procfs; create a new metric for isolation
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david
512e086dec Implement : Add "isolated" label on cpu collector on linux
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
Park Beomsu
c861ba93aa
Remove redundant nil check ()
Signed-off-by: computerphilosopher <bspark@jam2in.com>
2021-11-15 11:23:49 +01:00
Julien Pivotto
68a6c78c0d
Update go to 1.17 ()
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-03 13:35:24 +02:00
Sergei Semenchuk
5de46c6bac
collect flag_info and bug_info only for one core ()
Signed-off-by: binjip978 <binjip978@gmail.com>
2021-09-28 07:44:03 +02:00
Ben Kochie
84b36c4fd8
Add flag to disable guest CPU metrics
In high scale virtualized / cloud environments there are typically
no guest VMs. Add a boolean flag to allow disabling the Linux guest
CPU metrics.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-08-17 13:04:46 +02:00
Ben Kochie
73c9a10d37
Handle small backwards jumps in CPU idle
The Linux CPU idle stat can also jump backwards slightly in some cases.
Allow the jump back up to 3 seconds before we attempt to reset the CPU
counter cache.

Fixes: https://github.com/prometheus/node_exporter/issues/1903

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-07-07 12:24:46 +02:00
Ben Kochie
3bc9a93c20
Add ErrorLog plumbing to promhttp
Fix the error logging of the promhttp handler by connecting it to the
promlog setup.
* Switch to go-kit/log.
* Cleanup CHANGELOG.

Fixes: https://github.com/prometheus/node_exporter/issues/1886

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-06-03 10:47:41 +02:00
Ben Kochie
306a365377 Downgrade CPU counter warnings
We've gathered enough evidence that the CPU counter bug workaround is
working as intended. Downgrade the message from Warning to Debug.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-10-01 12:41:15 +02:00
Julius Volz
d05aac43e4 Fix capitalization of CPU acronym throughout
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2020-09-03 23:34:33 +02:00
domchan
503e4fc848
Expose cpu bugs and flags as info metrics. ()
* Expose cpu bugs and flags as info metrics with a regexp filter.
* Automatically enable CPU info metrics when using flags or bugs feature.

Signed-off-by: domgoer <domdoumc@gmail.com>
2020-07-17 18:32:23 +02:00
Ben Kochie
3565316d7e
Linux CPU: Cache CPU metrics
Cache CPU metrics to avoid counters (ie iowait) jumping backwards.

Fixes: https://github.com/prometheus/node_exporter/issues/1686

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-05-24 16:31:26 +02:00
Benjamin Drung
34d50e15d5 Add model_name and stepping to node_cpu_info metric
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2020-03-20 17:27:11 +01:00
Julian Kornberger
cfcaeee145
Use strconv.Itoa() instead of fmt.Sprintf() ()
Signed-off-by: Julian Kornberger <jk+github@digineo.de>
2020-02-19 14:34:05 +01:00
Ben Ye
2477c5c67d switch to go-kit/log ()
Signed-off-by: yeya24 <yb532204897@gmail.com>
2019-12-31 17:19:37 +01:00
Julian Kornberger
043fecbfd8 Wrap errors in the Go 1.13 way
Signed-off-by: Julian Kornberger <jk+github@digineo.de>
2019-12-19 15:26:55 +01:00
Paul Gier
4d72cb8059 add node_cpu_info metric
Contains information gathered from /proc/cpuinfo

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-25 14:38:57 -05:00
Paul Gier
2bc133cd48 update procfs to v0.0.2 ()
Signed-off-by: Paul Gier <pgier@redhat.com>
2019-06-12 20:47:16 +02:00
Paul Gier
b1298677aa Early init of procfs ()
Minor change to match naming convention in other collectors.

Initialize the proc or sys FS instance once while initializing
each collector instead of re-creating for each metric update.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-04-10 18:16:12 +02:00
Paul Gier
cc847f2f44 collector/cpu: split cpu freq metrics into separate collector ()
The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.

Fixes 

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-19 17:22:54 +01:00
mknapphrt
7fbdd0ae93 Update procfs vendor ()
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2019-02-04 16:54:41 +01:00
Ben Kochie
a0a164defb
Update cpufreq metrics collector ()
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-18 17:28:19 +02:00
Mario Trangoni
24a28fcc9e Remove unused func, var, and const ()
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-04-29 14:35:43 +02:00
Mario Trangoni
c9f421d0dd Fix some golint issues ()
* collector/cpu_*: rename nodeCpuSecondsDesc to nodeCPUSecondsDesc

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* collector/qdisc_linux.go: add NewQdiscStatCollector comment

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* collector/cpu_linux.go: rename core_map to coreMap

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-04-29 14:34:47 +02:00
Karsten Weiss
efc1fdb6d0 cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total ()
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total

This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.

Rename the node_package_throttles_total metric label `node` to `package`.

Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Use the direct /sys path to the cpu files.

Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.

The latter path also does not exist e.g. on RHEL 6.9's kernel.

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Reverse core+package throttle processing order

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Add documentation URLs

Signed-off-by: Karsten Weiss <knweiss@gmail.com>
2018-04-09 18:01:52 +02:00
Rene Treffer
c504c7e264 Only report core throttles per core, not per cpu ()
* Only report core throttles per core, not per cpu

* Add topology/core_id to the cpu sysfs fixtures

* Add new cpu fixtures to ttar file

* Merge core_id reading and thermal throttle accounting

* Declare core_id
2018-02-27 19:43:15 +01:00
Ben Kochie
14d60958d6
Unify CPU collector conventions ()
* Unify CPU collector conventions

Add a common CPU metric description.
* All collectors use the same `nodeCpuSecondsDesc`.
* All collectors drop the `cpu` prefix for `cpu` label values.

* Fix subsystem string in cpu_freebsd.

* Fix Linux CPU freq label names.
2018-02-01 18:42:20 +01:00
Brian Brazil
a98067a294 Make metrics better follow guidelines ()
* Improve stat linux metric names.

cpu is no longer used.

* node_cpu -> node_cpu_seconds_total for Linux

* Improve filesystem metric names with units

* Improve units and names of linux disk stats

Remove sector metrics, the bytes metrics cover those already.

* Infiniband counters should end in _total

* Improve timex metric names, convert to more normal units.

See
3c073991eb/kernel/time/ntp.c (L909)
for what stabil means, looks like a moving average of some form.

* Update test fixture

* For meminfo metrics that had "kB" units, add _bytes

* Interrupts counter should have _total
2018-01-17 17:55:55 +01:00
Ben Kochie
2a80537547
Split out guest cpu metrics on Linux. ()
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics.  Separate these into their own metric to
avoid duplication of data.
2017-11-23 15:04:47 +01:00
Karsten Weiss
a8d7d1101a cpu: Support processor-less (memory-only) NUMA nodes ()
* cpu: Support processor-less (memory-only) NUMA nodes

Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
Intel Optane drives for RAM expansion using Intel Memory Drive
Technology (IMDT).

IMDT RAM expansion supports two modes:

* "Unify Remote Memory domains": present a processor-less (memory-only)
  NUMA domain, which is the default
* "Expand local memory domains": to expand each processor’s memory domain
  with a portion of the memory made available by Optane and IMDT

This commit fixes a crash in the first case (when "cpulist" is empty).

Here's an example of such a system:

$ numastat -m|head -n5

Per-node system memory usage (in MBs):
                          Node 0          Node 1          Node 2           Total
                 --------------- --------------- --------------- ---------------
MemTotal               118239.56       130816.00       464384.00       713439.56

$ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
0: 0-7,16-23
1: 8-15,24-31
2:

$ /opt/vsmp/bin/vsmpversion -vvv
Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
System configuration:
    Boards:      3
       1 x Proc. + I/O + Memory
       2 x NVM devices (Intel SSDPED1K375GAQ)
    Processors:  2, Cores: 16, Threads: 32
        Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
    Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
       1 x 249088MB   [262036/   678/12270]
       1 x 232192MB   [357707/125369/  146]  82:00.0#1
       1 x 232192MB   [357707/125369/  146]  83:00.0#1

* cpu: rename some variables (pkg => node)

* cpu: Use %v not %q in log.Debugf() format strings
2017-11-10 15:31:26 +01:00
Calle Pettersson
859a825bb8 Replace --collectors.enabled with per-collector flags ()
* Move NodeCollector into package collector

* Refactor collector enabling

* Update README with new collector enabled flags

* Fix out-of-date inline flag reference syntax

* Use new flags in end-to-end tests

* Add flag to disable all default collectors

* Track if a flag has been set explicitly

* Add --collectors.disable-defaults to README

* Revert disable-defaults flag

* Shorten flags

* Fixup timex collector registration

* Fix end-to-end tests

* Change procfs and sysfs path flags

* Fix review comments
2017-09-28 15:06:26 +02:00
Karsten Weiss
b0d5c00832 cpu: Metric 'package_throttles_total' is per package. ()
* cpu: Metric 'package_throttles_total' is per package.

'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).

* cpu: Better handling of a cpulist edge-case.

* cpu: Extract the package number from the directory name.

Do not rely on the range index.

* cpu: Add package_throttle_count for node0 cpu1

This file must be ignored by the cpu collector.
2017-09-07 23:24:18 +02:00
Rene Treffer
56bf8d4b2d Add link to kernel documentation for sysfs/cpufreq files 2017-06-27 11:25:06 +02:00
Rene Treffer
bcc3cd92b8 Fix cpufreq statistics by converting kHz to Hz 2017-06-27 11:05:55 +02:00
Ben Kochie
182810056f Fix Linux cpu errors ()
Make the Linux cpu collector soft-error on missing `cpufreq` and
`thermal_throttle` features.
2017-06-20 07:51:26 +02:00
Rene Treffer
2e9f1913b8 Move stat_linux to cpu_linux and add cpufreq stats () 2017-06-13 11:21:53 +02:00