The following directories were moved from contrib/ to dev/ to make their
use case a bit clearer. In short, only developers are expected to ever
go there. The makefile was updated to build and clean from these ones.
base64/ flags/ hpack/ plug_qdisc/ poll/ tcploop/ trace/
Now poll is its own project and doesn't share the "flags" Makefile
any more. One of the issues was that it was making references to the
haproxy include path which is not needed here.
I just noticed this in the windows build after moving the file to dev/:
In file included from include/import/ist.h:32,
from include/haproxy/connection-t.h:32,
from dev/flags/flags.c:5:
dev/flags/flags.c: In function `main':
dev/flags/flags.c:442:20: error: array subscript has type `char' [-Werror=char-subscripts]
442 | (isalnum(*err) && toupper(*err) != 'U' && toupper(*err) != 'L'))
| ^~~~
LD haproxy
cc1: all warnings being treated as errors
make: *** [Makefile:932: dev/flags/flags.o] Error 1
make: *** Waiting for unfinished jobs....
Error: Process completed with exit code 2.
Let's just cast it to uchar as is done everywhere else.
Historically we've used SOL_IP/SOL_IPV6/SOL_TCP everywhere as the socket
level value in getsockopt() and setsockopt() but as we've seen over time
it regularly broke the build and required to have them defined to their
IPPROTO_* equivalent. The Linux ip(7) man page says:
Using the SOL_IP socket options level isn't portable; BSD-based
stacks use the IPPROTO_IP level.
And it indeed looks like a pure linuxism inherited from old examples and
documentation. strace also reports SOL_* instead of IPPROTO_*, which does
not help... A check to linux/in.h shows they have the same values. Only
SOL_SOCKET and other non-IP values make sense since there is no IPPROTO
equivalent.
Let's get rid of this annoying confusion by removing all redefinitions of
SOL_IP/IPV6/TCP and using IPPROTO_* instead, just like any other operating
system. This also removes duplicated tests for the same value.
Note that this should not result in exposing syscalls to other OSes
as the only ones that were still conditionned to SOL_IPV6 were for
IPV6_UNICAST_HOPS which already had an IPPROTO_IPV6 equivalent, and
IPV6_TRANSPARENT which is Linux-specific.
pool_alloc_dirty() is the version below pool_alloc() that never performs
the memory poisonning. It should only be called directly for very large
unstructured areas for which enabling memory poisonning would not bring
anything but could significantly hurt performance (e.g. buffers). Using
this function here will not provide any benefit and will hurt the ability
to debug.
It would be desirable to backport this, although it does not cause any
user-visible bug, it just complicates debugging.
We need to check whether listener is empty before doing anything; in
that case, we were trying to dump listerner name while name is null. So
simply move the counters check above, which validate all possible cases
when the listener is empty. This is very similar to what is done in
stats.c
see also the trace:
Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
120 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1 0x00005555555b716b in promex_dump_listener_metrics (htx=0x5555558fadf0, appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:722
#2 promex_dump_metrics (htx=0x5555558fadf0, si=0x555555925920, appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:1200
#3 promex_appctx_handle_io (appctx=0x555555926070) at contrib/prometheus-exporter/service-prometheus.c:1477
#4 0x00005555556f0c94 in task_run_applet (t=0x555555926180, context=0x555555926070, state=<optimized out>) at src/applet.c:88
#5 0x00005555556bc6d8 in run_tasks_from_lists (budgets=budgets@entry=0x7fffffffe374) at src/task.c:548
#6 0x00005555556bd1a0 in process_runnable_tasks () at src/task.c:750
#7 0x0000555555696cdd in run_poll_loop () at src/haproxy.c:2870
#8 0x0000555555697025 in run_thread_poll_loop (data=data@entry=0x0) at src/haproxy.c:3035
#9 0x0000555555596c90 in main (argc=<optimized out>, argv=0x7fffffffe818) at src/haproxy.c:3723
quit)
this bug was introduced by commit
e3f7bd5ae9 ("MEDIUM:
contrib/prometheus-exporter: add listen stats"), which is present for
2.4 only, so no backport needed.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
now that htx is the default everywhere, we can remove the need to put
htx as a mandatory option to setup prometheus.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this was a missing piece for a while now even though it was planned. This
patch adds listen stats.
Nothing in particular but we make use of the status helper previously
added. `promex_st_metrics` diff also looks scary, but I had to realign
all lines.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
I saw some people falling back to unix socket to collect some data they
could not find in prometheus exporter. One of them is base info from
stick tables (used/size).
I do not plan to extend it more for now; keys are quite a mess to
handle.
This should resolve github issue #1008.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Remove remaining descrition which are common to stats.c.
This patch is a followup of commit
82b2ce2f96 ("MINOR:
contrib/prometheus-exporter: use stats desc when possible"). I probably
messed up with one of my rebase because I'm pretty sure I removed them
at some point, but who knows what happened.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
For some metrics, several lines are produced per entity, one per label
value. For instance, the health-check status (ST_F_CHECK_STATUS) or the
entity status (ST_F_STATUS). The dump may be stopped in the middle of the
labels processing if the output buffer is full. This means the next time, we
must take care to restart on the right label value.
For now, this part is buggy and we always restart to dump all the label
values again from the beginning. To be sure to restart at the right
position, the field <ctx.stats.st_code> in the applet context is used to
save the last position. Of course, we take care to reset this value when
necessary.
This fix is specific for 2.4. No backport needed.
Since the labels are dynamically created for each metric, the "code" label
of the ST_F_HRSP_1XX field is missing. To fix the bug, this metric is
handled in the same way the other ST_F_HRSP_* field are. We only take care
to dump the metric header only once.
This bug was introduced by the commit 5a2f93873 ("MEDIUM:
contrib/prometheus-exporter: Use dynamic labels instead of static ones"). No
backport needed.
Some metrics were missing (haproxy_process_uptime_seconds and
haproxy_process_build_info). To ease the review against the service output,
the same order is used in the README.
Now that we got ride of description in prometheus code, let's assume we
no longer need to maintain it in README, and diret user to the output of
prometheus to get more info.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
- align safe_idle_connections_current field
fix minor typo added in commit
37286a5ac595069a491c3e8a7a7e4faf3d9ea8ad ("MEDIUM:
contrib/prometheus-exporter: Rework matrices defining Promex metrics")
- reorder info fields to be able to compare them easily
- add missing ignored info fields as comment
Signed-off-by: William Dauchy <wdauchy@gmail.com>
unless I'm wrong, those includes are no longer needed. The only recent
one I remember is ssl-sock include since commit
5d9b8f3c9347a1a10b86f81d70b22c3cab0e6925 ("MINOR:
contrib/prometheus-exporter: use fill_info for process dump") where we
make use of the code from stats.c
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this field was added in commit bd7151002437af1a034a9fdbb582b3cbef5a78d1
("MINOR: stats: report server's user-configured weight next to effective
weight")
Signed-off-by: William Dauchy <wdauchy@gmail.com>
It is a followup work of commit a191b77e54c26a97064cb42ab4927d4f5c95b896
("MINOR: contrib/prometheus-exporter: merge info description from
stats") but for all other stats fields; we however keep a way to
override them when needed (e.g. units, specific cases)
this is another step which will avoid duplicating work between stats.c
and prometheus.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Since we changed the behaviour of this metric, improve the description
to better explain what is the meaning of the new gauge value; it also
reflects the description we did for health check status.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this patch is a breaking change between v2.3 and v2.4: we move from
using gauge value for health check states to labels values. The diff is
quite small thanks to the preparation work from Christopher to allow
more flexibility in labels, see commit
5a2f938732126f43bbf4cea5482c01552b0d0314 ("MEDIUM:
contrib/prometheus-exporter: Use dynamic labels instead of static ones")
this is a follow up of commit c6464591a365bfcf509b322bdaa4d608c9395d75
("MAJOR: contrib/prometheus-exporter: move ftd/bkd/srv states to
labels"). The main goal being to be better aligned with prometheus use
cases in terms of queries. More specifically to health checks, Pierre C.
mentioned the possible quirks he had to put in place in order to make
use of those metrics through prometheus:
<aggregator_function> by(proxy, check_status) (count_values by(proxy,
instance) ("check_status", haproxy_server_check_status))
I am perfectly aware this introduces a lot more metrics but I don't see
how we can improve the usability without it. The main issue remains in
the cardinality of the states which are > 20. Prometheus recommends to
stay below a cardinality of 10 for a given metric but I consider our
case very specific, because highly linked to the level of precision
haproxy exposes.
Even before this patch I saw several large production setup (a few
hundreds of MB in output) which are making use of the scope parameter to
simply ignore the server metrics, so that the scrapping can be faster,
and memory consumed on client side not too high. So I believe we should
eventually continue in that direction and offer more granularity of
filtering of the output. That being said it is already possible to
filter out the data on prometheus client side.
this is related to github issue #1029
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Instead of using static labels for metrics, we now use an array of labels,
filled for each metrics if necessary and passed to the dump function. This
way, it is easier to extend the promex service. For now, there are at most 8
labels per metrics. This limit may be raised by changing PROMEX_MAX_LABELS
value. And to ease labels addition, a label is defined as a key/value
pair. The formatting is handled by the dump function.
For the proxies and servers, the first entry of the array is always the
proxy name. In addition, for the servers, the second entry is always the
server name.
this patch is a breaking change between v2.3 and v2.4: we move from
using gauge value for frontend/backend/servers states to labels values.
the main motivation being I realised it is very difficult to make use of
it without hard coded quirks on prometheus client side; especially
because the main use is often to group by state, which is harder when
the state is the value of the metric.
in order to achieve that we iterate on the status metric to generate
labels, and so as many metrics.
this is the first step to resolve github issue #1029
A second step should address health check states.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in preparation to change state gauge values as labels, declare them as
enum associated with the string definition
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The EOM block may be removed. The HTX_FL_EOM flags is enough. Most of time,
to know if the end of the message is reached, we just need to have an empty
HTX message with HTX_FL_EOM flag set. It may also be detected when the last
block of a message with HTX_FL_EOM flag is manipulated.
Removing EOM blocks simplifies the HTX message filling. Indeed, there is no
more edge problems when the message ends but there is no more space to write
the EOM block. However, some part are more tricky. Especially the
compression filter or the FCGI mux. The compression filter must finish the
compression on the last DATA block. Before it was performed on the EOM
block, an extra DATA block with the checksum was added. Now, we must detect
the last DATA block to be sure to finish the compression. The FCGI mux on
its part must be sure to reserve the space for the empty STDIN record on the
last DATA block while this record was inserted on the EOM block.
The H2 multiplexer is probably the part that benefits the most from this
change. Indeed, it is now fairly easier to known when to set the ES flag.
The HTX documentaion has been updated accordingly.
use `stats_fill_sv_stats` when possible to avoid duplicating code.
the following metrics have a change of behaviour:
haproxy_server_limit_sessions
haproxy_server_queue_limit
haproxy_server_check_failures_total
haproxy_server_check_up_down_total
haproxy_server_downtime_seconds_total
haproxy_server_current_throttle
haproxy_server_idle_connections_limit
depending on cases, if the limit was not configured or enabled, NaN is
returned instead. It should not be an issue for users, even better than
before as it provides more precise info.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
use `stats_fill_be_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
the only difference is on `haproxy_backend_downtime_seconds_total` as
stats.c is testing `px->srv`. This behaviour is present since commit
7344f47893 ("MINOR: stats: only report
backend's down time if it has servers"). The end result is a NaN instead
of a zero when no server are present.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Not necessarily mandatory but I saw a few prometheus client parsing only
`NaN`. Also most librarries do output `NaN`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The global and stats matrices are replaced by a simpler ones. Now we have
only 2 arrays of prometheus metrics. Their flags are used to match on the
entity type. This simplify a bit the metrics definition. For now, labels and
descriptions are still outside of these arrays, because the labels must be
reworked to be more dynamic and the descrptions must be replaced by stats
ones as far as possible.
This structure will be used to define a Prometheus metric, i.e its name, its
type (gauge or counter) and its flags. The flags will be used to know for
which entities the metric is defined (global, frontend, backend and/or server).
PROMEX_FL_STATS_METRIC flag is splitted in 3 flags to easily identify the
processed entity type (frontend, backend or server). Thus, now we are using
PROMEX_FL_FRONT_METRIC, PROMEX_FL_BACK_METRIC or PROMEX_FL_SRV_METRIC. These
flags will be used to know if a metric is defined and must be exported for a
given entity type.
use `stats_fill_fe_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
this should not introduce any difference of output.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Now that units are coherent we can merge the info description from
haproxy stats.
Description were not always the same, but I guess we may eventually
improve them in the future.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Another patch in order to try to reconciliate haproxy stats and
prometheus. Here I'm adding a proper start time field in order to make
proper use of uptime field.
That being done we can move the calculation in `fill_info`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in order to prepare a possible merge of fields between haproxy stats and
prometheus, duplicate 3 fields:
INF_MEMMAX
INF_POOL_ALLOC
INF_POOL_USED
Those were specifically named in MB unit which is not what prometheus
recommends. We therefore used them but changed the unit while doing the
calculation. It created a specific case for that, up to the description.
This patch:
- removes some possible confusion, i.e. using MB field for bytes
- will permit an easier merge of fields such as description
First consequence for now, is that we can remove the calculation on
prometheus side and move it on `fill_info`.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
With a 2.6.8 wireshark, this module could not compile because of ws_version.h
missing header. This patch offers the possibility to compile this plugin without
having to include this header. Furthermore with my wireshark version a
"plugin_release" object is required to make it be loaded by wireshark. This is
a string which seems to have to match a dotted string made of you wireshark
major and minor versions.
it does not seem to have a reason to close connections after each
request; reflect that in tests by doing all requests within the same
client.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
commit c55a626217 ("MINOR: contrib/prometheus-exporter: Add
missing global and per-server metrics") is renaming two metrics between
v2.2 and v2.3:
server_idle_connections_current
server_idle_connections_limit
It is breaking some tools which are making use of those metrics while
supporting several haproxy versions. This build_info will permit tools
which make use of metrics to be able to match the haproxy version and
change the list of expected metrics. This was possible using the haproxy
stats socket but not with prometheus export.
This patch follows prometheus best pratices to export specific software
informations. It is adding a new field `build_info` so we can extend it
to other parameters if needed in the future.
example output:
# HELP haproxy_process_build_info HAProxy build info.
# TYPE haproxy_process_build_info gauge
haproxy_process_build_info{version="2.4-dev5-2e1a3f-5"} 1
Even though it is not a bugfix, this patch will make more sense when
backported up to >= 2.0
Signed-off-by: William Dauchy <wdauchy@gmail.com>
MSG_NOSIGNAL and MSG_MORE are not defined everywhere, let's make them
zero when not defined. It will roughly result in the same behavior,
albeit a bit less optimal, which is no big deal when debugging. This
should fix issue #1014.