516e5d4beb
Add metrics that count how many running processes are linking to deleted libraries on each machine. Deleted libraries are usually outdated libraries, and outdated libraries may have known security vulnerabilities. The rationale behind storing these as metrics is allow the rollout of security fixes to be tracked across a fleet of machines, ensuring that all affected processes are restarted (e.g. via a reboot). I'm parsing the output from `/proc/*/maps` because it's using `lsof -d DEL` can be too slow, particularly if you have sockets that bind to thousands of IP addresses. The metric labels include the library path and the base filename, which allows us to pinpoint the exact path of the deleted library but also allows us to aggregate on the library name (or approximations of it) even if library locations differ between operating system versions. The metrics output and the CPU time consumed is as follows: user@host:~$ time sudo python processes.py # HELP node_processes_linking_deleted_libraries Count of running processes that link a deleted library # TYPE node_processes_linking_deleted_libraries gauge node_processes_linking_deleted_libraries{library_path="locale-archive", library_name="/usr/lib/locale"} 3 node_processes_linking_deleted_libraries{library_path="libevent-2.0.so.5.1.9", library_name="/usr/lib/x86_64-linux-gnu"} 4 real 0m0.071s user 0m0.030s sys 0m0.041s Including the library filename and path will result in reasonably high metrics cardinality, however I think the benefits when an urgent security patch is being deployed outweigh concerns around cardinality. This script assumes that library files do not contain spaces in their path. Signed-off-by: Matt Bostock <mbostock@cloudflare.com> |
||
---|---|---|
.circleci | ||
.github | ||
collector | ||
docs | ||
examples | ||
text_collector_examples | ||
vendor | ||
.dockerignore | ||
.gitignore | ||
.promu.yml | ||
CHANGELOG.md | ||
CONTRIBUTING.md | ||
Dockerfile | ||
Dockerfile.ppc64le | ||
LICENSE | ||
MAINTAINERS.md | ||
Makefile | ||
Makefile.common | ||
NOTICE | ||
README.md | ||
VERSION | ||
checkmetrics.sh | ||
end-to-end-test.sh | ||
example-rules.yml | ||
node_exporter.go | ||
node_exporter_test.go | ||
test_image.sh | ||
ttar |
README.md
Node exporter
Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.
The WMI exporter is recommended for Windows users.
Collectors
There is varying support for collectors on each operating system. The tables below list all existing collectors and the supported systems.
Collectors are enabled by providing a --collector.<name>
flag.
Collectors that are enabled by default can be disabled by providing a --no-collector.<name>
flag.
Enabled by default
Name | Description | OS |
---|---|---|
arp | Exposes ARP statistics from /proc/net/arp . |
Linux |
bcache | Exposes bcache statistics from /sys/fs/bcache/ . |
Linux |
bonding | Exposes the number of configured and active slaves of Linux bonding interfaces. | Linux |
boottime | Exposes system boot time derived from the kern.boottime sysctl. |
Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD |
conntrack | Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). |
Linux |
cpu | Exposes CPU statistics | Darwin, Dragonfly, FreeBSD, Linux |
diskstats | Exposes disk I/O statistics. | Darwin, Linux |
edac | Exposes error detection and correction statistics. | Linux |
entropy | Exposes available entropy. | Linux |
exec | Exposes execution statistics. | Dragonfly, FreeBSD |
filefd | Exposes file descriptor statistics from /proc/sys/fs/file-nr . |
Linux |
filesystem | Exposes filesystem statistics, such as disk space used. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
hwmon | Expose hardware monitoring and sensor data from /sys/class/hwmon/ . |
Linux |
infiniband | Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. | Linux |
ipvs | Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats . |
Linux |
loadavg | Exposes load average. | Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris |
mdadm | Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). |
Linux |
meminfo | Exposes memory statistics. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
netdev | Exposes network interface statistics such as bytes transferred. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
netstat | Exposes network statistics from /proc/net/netstat . This is the same information as netstat -s . |
Linux |
nfs | Exposes NFS client statistics from /proc/net/rpc/nfs . This is the same information as nfsstat -c . |
Linux |
nfsd | Exposes NFS kernel server statistics from /proc/net/rpc/nfsd . This is the same information as nfsstat -s . |
Linux |
sockstat | Exposes various statistics from /proc/net/sockstat . |
Linux |
stat | Exposes various statistics from /proc/stat . This includes boot time, forks and interrupts. |
Linux |
textfile | Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. |
any |
time | Exposes the current system time. | any |
timex | Exposes selected adjtimex(2) system call stats. | Linux |
uname | Exposes system information as provided by the uname system call. | Linux |
vmstat | Exposes statistics from /proc/vmstat . |
Linux |
wifi | Exposes WiFi device and station statistics. | Linux |
xfs | Exposes XFS runtime statistics. | Linux (kernel 4.4+) |
zfs | Exposes ZFS performance statistics. | Linux |
Disabled by default
Name | Description | OS |
---|---|---|
buddyinfo | Exposes statistics of memory fragments as reported by /proc/buddyinfo. | Linux |
devstat | Exposes device statistics | Dragonfly, FreeBSD |
drbd | Exposes Distributed Replicated Block Device statistics (to version 8.4) | Linux |
interrupts | Exposes detailed interrupts statistics. | Linux, OpenBSD |
ksmd | Exposes kernel and system statistics from /sys/kernel/mm/ksm . |
Linux |
logind | Exposes session counts from logind. | Linux |
meminfo_numa | Exposes memory statistics from /proc/meminfo_numa . |
Linux |
mountstats | Exposes filesystem statistics from /proc/self/mountstats . Exposes detailed NFS client statistics. |
Linux |
ntp | Exposes local NTP daemon health to check time | any |
qdisc | Exposes queuing discipline statistics | Linux |
runit | Exposes service status from runit. | any |
supervisord | Exposes service status from supervisord. | any |
systemd | Exposes service and system status from systemd. | Linux |
tcpstat | Exposes TCP connection status information from /proc/net/tcp and /proc/net/tcp6 . (Warning: the current version has potential performance issues in high load situations.) |
Linux |
Textfile Collector
The textfile collector is similar to the Pushgateway, in that it allows exporting of statistics from batch jobs. It can also be used to export static metrics, such as what role a machine has. The Pushgateway should be used for service-level metrics. The textfile module is for metrics that are tied to a machine.
To use it, set the --collector.textfile.directory
flag on the Node exporter. The
collector will parse all files in that directory matching the glob *.prom
using the text
format.
To atomically push completion time for a cron job:
echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom
To statically set roles for a machine using labels:
echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom
Filtering enabled collectors
The node_exporter
will expose all metrics from enabled collectors by default. This is the recommended way to collect metrics to avoid errors when comparing metrics of different families.
For advanced use the node_exporter
can be passed an optional list of collectors to filter metrics. The collect[]
parameter may be used multiple times. In Prometheus configuration you can use this syntax under the scrape config.
params:
collect[]:
- foo
- bar
This can be useful for having different Prometheus servers collect specific metrics from nodes.
Building and running
Prerequisites:
- Go compiler
- RHEL/CentOS:
glibc-static
package.
Building:
go get github.com/prometheus/node_exporter
cd ${GOPATH-$HOME/go}/src/github.com/prometheus/node_exporter
make
./node_exporter <flags>
To see all available configuration flags:
./node_exporter -h
Running tests
make test
Using Docker
The node_exporter is designed to monitor the host system. It's not recommended to deploy it as Docker container because it requires access to the host system. Be aware that any non-root mount points you want to monitor will need bind-mounted into the container.
docker run -d \
--net="host" \
--pid="host" \
quay.io/prometheus/node-exporter
Using a third-party repository for RHEL/CentOS/Fedora
There is a community-supplied COPR repository. It closely follows upstream releases.