We need to check whether listener is empty before doing anything; in
that case, we were trying to dump listerner name while name is null. So
simply move the counters check above, which validate all possible cases
when the listener is empty. This is very similar to what is done in
stats.c
see also the trace:
Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
120 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1 0x00005555555b716b in promex_dump_listener_metrics (htx=0x5555558fadf0, appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:722
#2 promex_dump_metrics (htx=0x5555558fadf0, si=0x555555925920, appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:1200
#3 promex_appctx_handle_io (appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:1477
#4 0x00005555556f0c94 in task_run_applet (t=0x555555926180, context=0x555555926070, state=<optimized out>) at src/applet.c:88
#5 0x00005555556bc6d8 in run_tasks_from_lists (budgets=budgets@entry=0x7fffffffe374) at src/task.c:548
#6 0x00005555556bd1a0 in process_runnable_tasks () at src/task.c:750
#7 0x0000555555696cdd in run_poll_loop () at src/haproxy.c:2870
#8 0x0000555555697025 in run_thread_poll_loop (data=data@entry=0x0) at src/haproxy.c:3035
#9 0x0000555555596c90 in main (argc=<optimized out>, argv=0x7fffffffe818) at src/haproxy.c:3723
quit)
this bug was introduced by commit
e3f7bd5ae9 ("MEDIUM:
contrib/prometheus-exporter: add listen stats"), which is present for
2.4 only, so no backport needed.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
now that htx is the default everywhere, we can remove the need to put
htx as a mandatory option to setup prometheus.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this was a missing piece for a while now even though it was planned. This
patch adds listen stats.
Nothing in particular but we make use of the status helper previously
added. `promex_st_metrics` diff also looks scary, but I had to realign
all lines.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
I saw some people falling back to unix socket to collect some data they
could not find in prometheus exporter. One of them is base info from
stick tables (used/size).
I do not plan to extend it more for now; keys are quite a mess to
handle.
This should resolve github issue #1008.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Remove remaining descrition which are common to stats.c.
This patch is a followup of commit
82b2ce2f96 ("MINOR:
contrib/prometheus-exporter: use stats desc when possible"). I probably
messed up with one of my rebase because I'm pretty sure I removed them
at some point, but who knows what happened.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
For some metrics, several lines are produced per entity, one per label
value. For instance, the health-check status (ST_F_CHECK_STATUS) or the
entity status (ST_F_STATUS). The dump may be stopped in the middle of the
labels processing if the output buffer is full. This means the next time, we
must take care to restart on the right label value.
For now, this part is buggy and we always restart to dump all the label
values again from the beginning. To be sure to restart at the right
position, the field <ctx.stats.st_code> in the applet context is used to
save the last position. Of course, we take care to reset this value when
necessary.
This fix is specific for 2.4. No backport needed.
Since the labels are dynamically created for each metric, the "code" label
of the ST_F_HRSP_1XX field is missing. To fix the bug, this metric is
handled in the same way the other ST_F_HRSP_* field are. We only take care
to dump the metric header only once.
This bug was introduced by the commit 5a2f93873 ("MEDIUM:
contrib/prometheus-exporter: Use dynamic labels instead of static ones"). No
backport needed.
Some metrics were missing (haproxy_process_uptime_seconds and
haproxy_process_build_info). To ease the review against the service output,
the same order is used in the README.
Now that we got ride of description in prometheus code, let's assume we
no longer need to maintain it in README, and diret user to the output of
prometheus to get more info.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
- align safe_idle_connections_current field
fix minor typo added in commit
37286a5ac5 ("MEDIUM:
contrib/prometheus-exporter: Rework matrices defining Promex metrics")
- reorder info fields to be able to compare them easily
- add missing ignored info fields as comment
Signed-off-by: William Dauchy <wdauchy@gmail.com>
unless I'm wrong, those includes are no longer needed. The only recent
one I remember is ssl-sock include since commit
5d9b8f3c93 ("MINOR:
contrib/prometheus-exporter: use fill_info for process dump") where we
make use of the code from stats.c
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this field was added in commit bd71510024
("MINOR: stats: report server's user-configured weight next to effective
weight")
Signed-off-by: William Dauchy <wdauchy@gmail.com>
It is a followup work of commit a191b77e54
("MINOR: contrib/prometheus-exporter: merge info description from
stats") but for all other stats fields; we however keep a way to
override them when needed (e.g. units, specific cases)
this is another step which will avoid duplicating work between stats.c
and prometheus.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Since we changed the behaviour of this metric, improve the description
to better explain what is the meaning of the new gauge value; it also
reflects the description we did for health check status.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this patch is a breaking change between v2.3 and v2.4: we move from
using gauge value for health check states to labels values. The diff is
quite small thanks to the preparation work from Christopher to allow
more flexibility in labels, see commit
5a2f938732 ("MEDIUM:
contrib/prometheus-exporter: Use dynamic labels instead of static ones")
this is a follow up of commit c6464591a3
("MAJOR: contrib/prometheus-exporter: move ftd/bkd/srv states to
labels"). The main goal being to be better aligned with prometheus use
cases in terms of queries. More specifically to health checks, Pierre C.
mentioned the possible quirks he had to put in place in order to make
use of those metrics through prometheus:
<aggregator_function> by(proxy, check_status) (count_values by(proxy,
instance) ("check_status", haproxy_server_check_status))
I am perfectly aware this introduces a lot more metrics but I don't see
how we can improve the usability without it. The main issue remains in
the cardinality of the states which are > 20. Prometheus recommends to
stay below a cardinality of 10 for a given metric but I consider our
case very specific, because highly linked to the level of precision
haproxy exposes.
Even before this patch I saw several large production setup (a few
hundreds of MB in output) which are making use of the scope parameter to
simply ignore the server metrics, so that the scrapping can be faster,
and memory consumed on client side not too high. So I believe we should
eventually continue in that direction and offer more granularity of
filtering of the output. That being said it is already possible to
filter out the data on prometheus client side.
this is related to github issue #1029
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Instead of using static labels for metrics, we now use an array of labels,
filled for each metrics if necessary and passed to the dump function. This
way, it is easier to extend the promex service. For now, there are at most 8
labels per metrics. This limit may be raised by changing PROMEX_MAX_LABELS
value. And to ease labels addition, a label is defined as a key/value
pair. The formatting is handled by the dump function.
For the proxies and servers, the first entry of the array is always the
proxy name. In addition, for the servers, the second entry is always the
server name.
this patch is a breaking change between v2.3 and v2.4: we move from
using gauge value for frontend/backend/servers states to labels values.
the main motivation being I realised it is very difficult to make use of
it without hard coded quirks on prometheus client side; especially
because the main use is often to group by state, which is harder when
the state is the value of the metric.
in order to achieve that we iterate on the status metric to generate
labels, and so as many metrics.
this is the first step to resolve github issue #1029
A second step should address health check states.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in preparation to change state gauge values as labels, declare them as
enum associated with the string definition
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The EOM block may be removed. The HTX_FL_EOM flags is enough. Most of time,
to know if the end of the message is reached, we just need to have an empty
HTX message with HTX_FL_EOM flag set. It may also be detected when the last
block of a message with HTX_FL_EOM flag is manipulated.
Removing EOM blocks simplifies the HTX message filling. Indeed, there is no
more edge problems when the message ends but there is no more space to write
the EOM block. However, some part are more tricky. Especially the
compression filter or the FCGI mux. The compression filter must finish the
compression on the last DATA block. Before it was performed on the EOM
block, an extra DATA block with the checksum was added. Now, we must detect
the last DATA block to be sure to finish the compression. The FCGI mux on
its part must be sure to reserve the space for the empty STDIN record on the
last DATA block while this record was inserted on the EOM block.
The H2 multiplexer is probably the part that benefits the most from this
change. Indeed, it is now fairly easier to known when to set the ES flag.
The HTX documentaion has been updated accordingly.
use `stats_fill_sv_stats` when possible to avoid duplicating code.
the following metrics have a change of behaviour:
haproxy_server_limit_sessions
haproxy_server_queue_limit
haproxy_server_check_failures_total
haproxy_server_check_up_down_total
haproxy_server_downtime_seconds_total
haproxy_server_current_throttle
haproxy_server_idle_connections_limit
depending on cases, if the limit was not configured or enabled, NaN is
returned instead. It should not be an issue for users, even better than
before as it provides more precise info.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
use `stats_fill_be_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
the only difference is on `haproxy_backend_downtime_seconds_total` as
stats.c is testing `px->srv`. This behaviour is present since commit
7344f47893 ("MINOR: stats: only report
backend's down time if it has servers"). The end result is a NaN instead
of a zero when no server are present.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Not necessarily mandatory but I saw a few prometheus client parsing only
`NaN`. Also most librarries do output `NaN`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The global and stats matrices are replaced by a simpler ones. Now we have
only 2 arrays of prometheus metrics. Their flags are used to match on the
entity type. This simplify a bit the metrics definition. For now, labels and
descriptions are still outside of these arrays, because the labels must be
reworked to be more dynamic and the descrptions must be replaced by stats
ones as far as possible.
This structure will be used to define a Prometheus metric, i.e its name, its
type (gauge or counter) and its flags. The flags will be used to know for
which entities the metric is defined (global, frontend, backend and/or server).
PROMEX_FL_STATS_METRIC flag is splitted in 3 flags to easily identify the
processed entity type (frontend, backend or server). Thus, now we are using
PROMEX_FL_FRONT_METRIC, PROMEX_FL_BACK_METRIC or PROMEX_FL_SRV_METRIC. These
flags will be used to know if a metric is defined and must be exported for a
given entity type.
use `stats_fill_fe_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
this should not introduce any difference of output.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Now that units are coherent we can merge the info description from
haproxy stats.
Description were not always the same, but I guess we may eventually
improve them in the future.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Another patch in order to try to reconciliate haproxy stats and
prometheus. Here I'm adding a proper start time field in order to make
proper use of uptime field.
That being done we can move the calculation in `fill_info`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in order to prepare a possible merge of fields between haproxy stats and
prometheus, duplicate 3 fields:
INF_MEMMAX
INF_POOL_ALLOC
INF_POOL_USED
Those were specifically named in MB unit which is not what prometheus
recommends. We therefore used them but changed the unit while doing the
calculation. It created a specific case for that, up to the description.
This patch:
- removes some possible confusion, i.e. using MB field for bytes
- will permit an easier merge of fields such as description
First consequence for now, is that we can remove the calculation on
prometheus side and move it on `fill_info`.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
With a 2.6.8 wireshark, this module could not compile because of ws_version.h
missing header. This patch offers the possibility to compile this plugin without
having to include this header. Furthermore with my wireshark version a
"plugin_release" object is required to make it be loaded by wireshark. This is
a string which seems to have to match a dotted string made of you wireshark
major and minor versions.
it does not seem to have a reason to close connections after each
request; reflect that in tests by doing all requests within the same
client.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
commit c55a626217 ("MINOR: contrib/prometheus-exporter: Add
missing global and per-server metrics") is renaming two metrics between
v2.2 and v2.3:
server_idle_connections_current
server_idle_connections_limit
It is breaking some tools which are making use of those metrics while
supporting several haproxy versions. This build_info will permit tools
which make use of metrics to be able to match the haproxy version and
change the list of expected metrics. This was possible using the haproxy
stats socket but not with prometheus export.
This patch follows prometheus best pratices to export specific software
informations. It is adding a new field `build_info` so we can extend it
to other parameters if needed in the future.
example output:
# HELP haproxy_process_build_info HAProxy build info.
# TYPE haproxy_process_build_info gauge
haproxy_process_build_info{version="2.4-dev5-2e1a3f-5"} 1
Even though it is not a bugfix, this patch will make more sense when
backported up to >= 2.0
Signed-off-by: William Dauchy <wdauchy@gmail.com>
MSG_NOSIGNAL and MSG_MORE are not defined everywhere, let's make them
zero when not defined. It will roughly result in the same behavior,
albeit a bit less optimal, which is no big deal when debugging. This
should fix issue #1014.
%Ld isn't standard, %lld is more portable. In addition, the format
should be %llu since the printed values are unsigned. This should
address issue #1013.
This commit adds the OpenTracing filter (hereinafter we will use the
abbreviated name 'the OT filter') to the contrib tree.
The OT filter adds native support for using distributed tracing in HAProxy.
This is enabled by sending an OpenTracing compliant request to one of the
supported tracers; such as Datadog, Jaeger, Lightstep and Zipkin tracers.
Please note: tracers are not listed by any preference, but alphabetically.
The OT filter is a standard HAProxy filter, so what applies to others also
applies to this one (of course, by that I mean what is described in the
documentation, more precisely in the doc/internals/filters.txt file).
The OT filter activation is done explicitly by specifying it in the HAProxy
configuration. If this is not done, the OT filter in no way participates
in the work of HAProxy.
As for the impact on HAProxy speed, this is documented with several tests
located in the test directory, and the result is found in the README-speed-*
files. In short, the speed of operation depends on the way it is used and
the complexity of the configuration, from an almost immeasurable impact to
a significant deceleration (5x and more). I think that in some normal use
the speed of HAProxy with the filter on will be quite satisfactory with a
slowdown of less than 4%.
The OT filter allows intensive use of ACLs, which can be defined anywhere in
the configuration. Thus, it is possible to use the filter only for those
connections that are of interest to us.
More detailed documentation related to the operation, configuration and use
of the filter can be found in the contrib/opentracing directory.
To make the OpenTracing filter easier to configure and compile, several
entries have been added to the Makefile. When running the make utility,
it is possible to use several new arguments:
USE_OT=1 : enable the OpenTracing filter
OT_DEBUG=1 : compile the OpenTracing filter in debug mode
OT_INC=path : force the include path to libopentracing-c-wrapper
OT_LIB=path : force the lib path to libopentracing-c-wrapper
OT_RUNPATH=1 : add libopentracing-c-wrapper RUNPATH to haproxy executable
If USE_OT is set, then an additional Makefile from the contrib/opentracing
directory is included in the compilation process.
As per https://docs.python.org/3/c-api/none.html, None has to be treated
exactly like other objects for reference counting.
So, when we use it, we need to INCREF and when we are done, DECREF
This patch must be backported as far as 2.0.
As per https://docs.python.org/3/c-api/object.html#c.PyObject_Call,
positional arguments should be an empty tuple when not used.
Previously the code had a dictionary instead of tuple. This commit is to
fix it and use tuple to avoid unexpected consequences
This patch must be backported as far as 2.0.
This change is to ensure objects from the ipaddress module are cleaned
up when spoa module initialization fails.
In general the interpreter would just crash, but in a code where import
is conditional (try/except), then we would keep those objects around
This patch must be backported as far as 2.0.