We need to check whether listener is empty before doing anything; in
that case, we were trying to dump listerner name while name is null. So
simply move the counters check above, which validate all possible cases
when the listener is empty. This is very similar to what is done in
stats.c
see also the trace:
Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
120 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1 0x00005555555b716b in promex_dump_listener_metrics (htx=0x5555558fadf0, appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:722
#2 promex_dump_metrics (htx=0x5555558fadf0, si=0x555555925920, appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:1200
#3 promex_appctx_handle_io (appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:1477
#4 0x00005555556f0c94 in task_run_applet (t=0x555555926180, context=0x555555926070, state=<optimized out>) at src/applet.c:88
#5 0x00005555556bc6d8 in run_tasks_from_lists (budgets=budgets@entry=0x7fffffffe374) at src/task.c:548
#6 0x00005555556bd1a0 in process_runnable_tasks () at src/task.c:750
#7 0x0000555555696cdd in run_poll_loop () at src/haproxy.c:2870
#8 0x0000555555697025 in run_thread_poll_loop (data=data@entry=0x0) at src/haproxy.c:3035
#9 0x0000555555596c90 in main (argc=<optimized out>, argv=0x7fffffffe818) at src/haproxy.c:3723
quit)
this bug was introduced by commit
e3f7bd5ae9 ("MEDIUM:
contrib/prometheus-exporter: add listen stats"), which is present for
2.4 only, so no backport needed.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
now that htx is the default everywhere, we can remove the need to put
htx as a mandatory option to setup prometheus.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this was a missing piece for a while now even though it was planned. This
patch adds listen stats.
Nothing in particular but we make use of the status helper previously
added. `promex_st_metrics` diff also looks scary, but I had to realign
all lines.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
I saw some people falling back to unix socket to collect some data they
could not find in prometheus exporter. One of them is base info from
stick tables (used/size).
I do not plan to extend it more for now; keys are quite a mess to
handle.
This should resolve github issue #1008.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Remove remaining descrition which are common to stats.c.
This patch is a followup of commit
82b2ce2f96 ("MINOR:
contrib/prometheus-exporter: use stats desc when possible"). I probably
messed up with one of my rebase because I'm pretty sure I removed them
at some point, but who knows what happened.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
For some metrics, several lines are produced per entity, one per label
value. For instance, the health-check status (ST_F_CHECK_STATUS) or the
entity status (ST_F_STATUS). The dump may be stopped in the middle of the
labels processing if the output buffer is full. This means the next time, we
must take care to restart on the right label value.
For now, this part is buggy and we always restart to dump all the label
values again from the beginning. To be sure to restart at the right
position, the field <ctx.stats.st_code> in the applet context is used to
save the last position. Of course, we take care to reset this value when
necessary.
This fix is specific for 2.4. No backport needed.
Since the labels are dynamically created for each metric, the "code" label
of the ST_F_HRSP_1XX field is missing. To fix the bug, this metric is
handled in the same way the other ST_F_HRSP_* field are. We only take care
to dump the metric header only once.
This bug was introduced by the commit 5a2f93873 ("MEDIUM:
contrib/prometheus-exporter: Use dynamic labels instead of static ones"). No
backport needed.
Some metrics were missing (haproxy_process_uptime_seconds and
haproxy_process_build_info). To ease the review against the service output,
the same order is used in the README.
Now that we got ride of description in prometheus code, let's assume we
no longer need to maintain it in README, and diret user to the output of
prometheus to get more info.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
- align safe_idle_connections_current field
fix minor typo added in commit
37286a5ac5 ("MEDIUM:
contrib/prometheus-exporter: Rework matrices defining Promex metrics")
- reorder info fields to be able to compare them easily
- add missing ignored info fields as comment
Signed-off-by: William Dauchy <wdauchy@gmail.com>
unless I'm wrong, those includes are no longer needed. The only recent
one I remember is ssl-sock include since commit
5d9b8f3c93 ("MINOR:
contrib/prometheus-exporter: use fill_info for process dump") where we
make use of the code from stats.c
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this field was added in commit bd71510024
("MINOR: stats: report server's user-configured weight next to effective
weight")
Signed-off-by: William Dauchy <wdauchy@gmail.com>
It is a followup work of commit a191b77e54
("MINOR: contrib/prometheus-exporter: merge info description from
stats") but for all other stats fields; we however keep a way to
override them when needed (e.g. units, specific cases)
this is another step which will avoid duplicating work between stats.c
and prometheus.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Since we changed the behaviour of this metric, improve the description
to better explain what is the meaning of the new gauge value; it also
reflects the description we did for health check status.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this patch is a breaking change between v2.3 and v2.4: we move from
using gauge value for health check states to labels values. The diff is
quite small thanks to the preparation work from Christopher to allow
more flexibility in labels, see commit
5a2f938732 ("MEDIUM:
contrib/prometheus-exporter: Use dynamic labels instead of static ones")
this is a follow up of commit c6464591a3
("MAJOR: contrib/prometheus-exporter: move ftd/bkd/srv states to
labels"). The main goal being to be better aligned with prometheus use
cases in terms of queries. More specifically to health checks, Pierre C.
mentioned the possible quirks he had to put in place in order to make
use of those metrics through prometheus:
<aggregator_function> by(proxy, check_status) (count_values by(proxy,
instance) ("check_status", haproxy_server_check_status))
I am perfectly aware this introduces a lot more metrics but I don't see
how we can improve the usability without it. The main issue remains in
the cardinality of the states which are > 20. Prometheus recommends to
stay below a cardinality of 10 for a given metric but I consider our
case very specific, because highly linked to the level of precision
haproxy exposes.
Even before this patch I saw several large production setup (a few
hundreds of MB in output) which are making use of the scope parameter to
simply ignore the server metrics, so that the scrapping can be faster,
and memory consumed on client side not too high. So I believe we should
eventually continue in that direction and offer more granularity of
filtering of the output. That being said it is already possible to
filter out the data on prometheus client side.
this is related to github issue #1029
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Instead of using static labels for metrics, we now use an array of labels,
filled for each metrics if necessary and passed to the dump function. This
way, it is easier to extend the promex service. For now, there are at most 8
labels per metrics. This limit may be raised by changing PROMEX_MAX_LABELS
value. And to ease labels addition, a label is defined as a key/value
pair. The formatting is handled by the dump function.
For the proxies and servers, the first entry of the array is always the
proxy name. In addition, for the servers, the second entry is always the
server name.
this patch is a breaking change between v2.3 and v2.4: we move from
using gauge value for frontend/backend/servers states to labels values.
the main motivation being I realised it is very difficult to make use of
it without hard coded quirks on prometheus client side; especially
because the main use is often to group by state, which is harder when
the state is the value of the metric.
in order to achieve that we iterate on the status metric to generate
labels, and so as many metrics.
this is the first step to resolve github issue #1029
A second step should address health check states.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in preparation to change state gauge values as labels, declare them as
enum associated with the string definition
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The EOM block may be removed. The HTX_FL_EOM flags is enough. Most of time,
to know if the end of the message is reached, we just need to have an empty
HTX message with HTX_FL_EOM flag set. It may also be detected when the last
block of a message with HTX_FL_EOM flag is manipulated.
Removing EOM blocks simplifies the HTX message filling. Indeed, there is no
more edge problems when the message ends but there is no more space to write
the EOM block. However, some part are more tricky. Especially the
compression filter or the FCGI mux. The compression filter must finish the
compression on the last DATA block. Before it was performed on the EOM
block, an extra DATA block with the checksum was added. Now, we must detect
the last DATA block to be sure to finish the compression. The FCGI mux on
its part must be sure to reserve the space for the empty STDIN record on the
last DATA block while this record was inserted on the EOM block.
The H2 multiplexer is probably the part that benefits the most from this
change. Indeed, it is now fairly easier to known when to set the ES flag.
The HTX documentaion has been updated accordingly.
use `stats_fill_sv_stats` when possible to avoid duplicating code.
the following metrics have a change of behaviour:
haproxy_server_limit_sessions
haproxy_server_queue_limit
haproxy_server_check_failures_total
haproxy_server_check_up_down_total
haproxy_server_downtime_seconds_total
haproxy_server_current_throttle
haproxy_server_idle_connections_limit
depending on cases, if the limit was not configured or enabled, NaN is
returned instead. It should not be an issue for users, even better than
before as it provides more precise info.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
use `stats_fill_be_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
the only difference is on `haproxy_backend_downtime_seconds_total` as
stats.c is testing `px->srv`. This behaviour is present since commit
7344f47893 ("MINOR: stats: only report
backend's down time if it has servers"). The end result is a NaN instead
of a zero when no server are present.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Not necessarily mandatory but I saw a few prometheus client parsing only
`NaN`. Also most librarries do output `NaN`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The global and stats matrices are replaced by a simpler ones. Now we have
only 2 arrays of prometheus metrics. Their flags are used to match on the
entity type. This simplify a bit the metrics definition. For now, labels and
descriptions are still outside of these arrays, because the labels must be
reworked to be more dynamic and the descrptions must be replaced by stats
ones as far as possible.
This structure will be used to define a Prometheus metric, i.e its name, its
type (gauge or counter) and its flags. The flags will be used to know for
which entities the metric is defined (global, frontend, backend and/or server).
PROMEX_FL_STATS_METRIC flag is splitted in 3 flags to easily identify the
processed entity type (frontend, backend or server). Thus, now we are using
PROMEX_FL_FRONT_METRIC, PROMEX_FL_BACK_METRIC or PROMEX_FL_SRV_METRIC. These
flags will be used to know if a metric is defined and must be exported for a
given entity type.
use `stats_fill_fe_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
this should not introduce any difference of output.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Now that units are coherent we can merge the info description from
haproxy stats.
Description were not always the same, but I guess we may eventually
improve them in the future.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Another patch in order to try to reconciliate haproxy stats and
prometheus. Here I'm adding a proper start time field in order to make
proper use of uptime field.
That being done we can move the calculation in `fill_info`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in order to prepare a possible merge of fields between haproxy stats and
prometheus, duplicate 3 fields:
INF_MEMMAX
INF_POOL_ALLOC
INF_POOL_USED
Those were specifically named in MB unit which is not what prometheus
recommends. We therefore used them but changed the unit while doing the
calculation. It created a specific case for that, up to the description.
This patch:
- removes some possible confusion, i.e. using MB field for bytes
- will permit an easier merge of fields such as description
First consequence for now, is that we can remove the calculation on
prometheus side and move it on `fill_info`.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
it does not seem to have a reason to close connections after each
request; reflect that in tests by doing all requests within the same
client.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
commit c55a626217 ("MINOR: contrib/prometheus-exporter: Add
missing global and per-server metrics") is renaming two metrics between
v2.2 and v2.3:
server_idle_connections_current
server_idle_connections_limit
It is breaking some tools which are making use of those metrics while
supporting several haproxy versions. This build_info will permit tools
which make use of metrics to be able to match the haproxy version and
change the list of expected metrics. This was possible using the haproxy
stats socket but not with prometheus export.
This patch follows prometheus best pratices to export specific software
informations. It is adding a new field `build_info` so we can extend it
to other parameters if needed in the future.
example output:
# HELP haproxy_process_build_info HAProxy build info.
# TYPE haproxy_process_build_info gauge
haproxy_process_build_info{version="2.4-dev5-2e1a3f-5"} 1
Even though it is not a bugfix, this patch will make more sense when
backported up to >= 2.0
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The remaining proxy states were only used to distinguish an enabled
proxy from a disabled one. Due to the initialization order, both
PR_STNEW and PR_STREADY were equivalent after startup, and they
would only differ from PR_STSTOPPED when the proxy is disabled or
shutdown (which is effectively another way to disable it).
Now we just have a "disabled" field which allows to distinguish them.
It's becoming obvious that start_proxies() is only used to print a
greeting message now, that we'd rather get rid of. Probably that
zombify_proxy() and stop_proxy() should be merged once their
differences move to the right place.
Since v1.4 or so, it's almost not possible anymore to set this state. The
only exception is by using the CLI to change a frontend's maxconn setting
below its current usage. This case makes no sense, and for other cases it
doesn't make sense either because "full" is a vague concept when only
certain listeners are full and not all. Let's just remove this unused
state and make it clear that it's not reported. The "ready" or "open"
states will continue to be reported without being misleading as they
will be opposed to "stop".
Use an opaque pointer to store proxy instance. Regroup server/listener
as a single opaque pointer. This has the benefit to render the structure
more evolutive to support statistics on other types of objects in the
future.
This patch is needed to extend stat support for components other than
proxies objects.
The prometheus module has been adapted for these changes.
A workaround for some difficulties encountered to anticipate end of
messages was addressed by commit 810df0614 ("MEDIUM: htx: Add a flag on
a HTX message when no more data are expected"), but there were 3 issues
in it (with minor impact):
- the flag was mistakenly set before an EOH in Lua, which would only
cause incomplete packets to be emitted for now but could cause
truncated responses in the future. It's not needed to add it on
the next EOM block as http_forward_proxy_resp() already does it.
- one was still missing in hlua_applet_http_fct(), possibly causing
delays on Lua services
- one was missing in the Prometheus exporter.
All this simply shows that this mechanism is still quite fragile and
not trivial to use, especially in order to deal with the impossibility
to append the EOM, so we'll need to improve the solution in the future
and future backports should not be completely ruled out.
This fix must be backported where the patch above is backported,
typically 2.1 and later as it was required for a set of fixes.
Following metrics are now exported by the prometheus exporter to reflect recent
changes on HAProxy :
* haproxy_process_failed_resolutions
* haproxy_process_bytes_out_total
* haproxy_process_spliced_bytes_out_total
* haproxy_process_bytes_out_rate
and
* haproxy_server_unsafe_idle_connections_current
* haproxy_server_safe_idle_connections_current
* haproxy_server_used_connections_current
* haproxy_server_need_connections_current
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().
This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.
There is nothing left in common/ now so this directory must not be used
anymore.
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
The files remained mostly unchanged since they were OK. However, half of
the users didn't need to include them, and about as many actually needed
to have it and used to find functions like srv_currently_usable() through
a long chain that broke when moving the file.