* Add CHANGELOG from existing tags.
First release under the Prometheus Community organisation.
* [CHANGE] Update build to use standard Prometheus promu/Dockerfile
* [ENHANCEMENT] Remove duplicate column in queries.yml #433
* [ENHANCEMENT] Add query for 'pg_replication_slots' #465
* [ENHANCEMENT] Allow a custom prefix for metric namespace #387
* [ENHANCEMENT] Improve PostgreSQL replication lag detection #395
* [ENHANCEMENT] Support connstring syntax when discovering databases #473
* [ENHANCEMENT] Detect SIReadLock locks in the pg_locks metric #421
* [BUGFIX] Fix pg_database_size_bytes metric in queries.yaml #357
* [BUGFIX] Don't ignore errors in parseUserQueries #362
* [BUGFIX] Fix queries.yaml for AWS RDS #370
* [BUGFIX] Recover when connection cannot be established at startup #415
* [BUGFIX] Don't retry if an error occurs #426
* [BUGFIX] Do not panic on incorrect env #457
Signed-off-by: Ben Kochie <superq@gmail.com>
* Run the query for specific database version if provided from yml file.
By default query will run on all the databases if "runonserver" is not provided.
If user want the query to be run on multiple database versions, use below string.
runonserver: "9.5, 9.6"
Example yml file as below. ( e.g. below query will run only on database version 9.5 )
pg_replication:
query: "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) as lag"
master: true
runonserver: "9.5"
metrics:
- lag:
usage: "GAUGE"
description: "Replication lag behind master in seconds"
* Fixed the below review comments given by Ashesh Vashi
Instead of having db version string from yml file, user can define the range of
database server version where query is to be executed.
If user want to run the query on database version greater than 10.0.0, use below format.
runonserver: ">=10.0.0"
Below are the example of db version range user can defined in yml file.
<=10.1.0
>=12.1.0
=11.0.0
<9.6.0 || >=11.0.0
* Remove the call from unused places where 'runOnServer' is not required.
Only Server type hold that value.
* Fix compilation issues.
* Fix the issue with Debugln to print the database server version
* Support connstring syntax when discovering databases
Support connstring DSNs (`host=... user=... password=... dbname=...`) in
addition to URIs (`postgresql://user:pass@host/dbname`) for purposes of
database discovery.
Connstring syntax is needed to support accessing PostgreSQL via Unix
domain sockets (`host=/run/postgres`), which is not really possible with
URI syntax.
* Appease gometalinter, don't shadow namespace
When the connection to the PostgreSQL instance cannot be established straight
at startup, a race condition can happen when autoDiscoverDatabases is true. If
discoverDatabaseDSNs fails, no dsn is set as the master database, and, if
scrapeDSN succeeds, checkMapVersions will have omitted the default metrics in
the server metric map. The metric map won't be updated unless the version
returned by the PostgreSQL instance changes. With this patch, scrapeDSN won't
be run unless discoverDatabaseDSNs succeeded and thus the race condition is
eliminated.
Signed-off-by: Yann Soubeyrand <yann.soubeyrand@camptocamp.com>
In some cases master can show pg_last_xact_replay_timestamp() from past,
which can cause the exporter to show ever-growing value for the lag.
By checking if the instance is in recovery we can avoid reporting some
huge number for master instance.
Since we cannot know in advance the metrics which the exporter will generate,
the workaround is to run a Collect and return the metric descriptors. This is
problematic when the connection to the PostgreSQL instance cannot be
established straight from the start. This patch makes Describe return no
descriptors, effectively turning the collector in an unchecked one, which we're
in the typical use case here:
https://pkg.go.dev/github.com/prometheus/client_golang/prometheus?tab=doc#hdr-Custom_Collectors_and_constant_Metrics.
Signed-off-by: Yann Soubeyrand <yann.soubeyrand@camptocamp.com>
* Introduce histogram support
Prior to this change, the custom queries were restricted to counters and
gauges.
This change introduces a new ColumnUsage, namely HISTOGRAM, that expects
the column to contain an array of upper inclusive bounds for each
observation bucket in the emitted metric. It also expects three more
columns to be present with the suffixes:
- `_bucket`, containing an array of cumulative counters for the
observation buckets;
- `_sum`, the total sum of all observed values; and
- `_count`, the count of events that have been observed.
A flag has been added to the MetricMap struct to easily identify metrics
that should emit a histogram and the construction of a histogram metric
is aided by the pg.Array function and a new helper dbToUint64 function.
Finally, and example of usage is given in queries.yaml.
fixes#402
Signed-off-by: Corin Lawson <corin@responsight.com>
* Introduces tests for histogram support
Prior to this change, the histogram support was untested.
This change introduces a new integration test that reads a user query
containing a number of histogram metrics. Also, additional checks have
been added to TestBooleanConversionToValueAndString to test dbToUint64.
Signed-off-by: Corin Lawson <corin@responsight.com>
Update query for pg_stat_user_tables:
* Split up to multi-line format to make it easier to read.
* Remove duplicate of column `COALESCE(last_vacuum, '1970-01-01Z')`.
Signed-off-by: Ben Kochie <superq@gmail.com>
* do not panic when envs are set incorrectly
* do not panic when envs are set incorrectly - fix tests
Co-authored-by: Will Rouesnel <wrouesnel@wrouesnel.com>
The existing 'pg_stat_replication' data does not
include stats for inactive replication slots. This
commit adds a minimal amount of metrics from
'pg_replication_slots' to know if a slot is
active and its lag.
This is helpful to detect if an inactive slot
is causing the server to run out of storage due
to an inactive slot blocking WAL flushing.
Failures in parsing the user's queries are just being swallowed, which
makes troubleshooting YAML issues frustrating/impossible. I'm presuming
this was not intentional, since there is error handling code in the
function that calls this one, though it is unreachable as far as I can
tell without this change.
Co-authored-by: Will Rouesnel <wrouesnel@wrouesnel.com>