The server response time is erroneously reported as -1 when it is
intercepted by HAProxy.
As stated in the documentation, the server response time is reported as -1
when the last response header was never seen. It happens when a server
timeout is triggered before the server managed to process the request. It
also happens if the response is invalid. This may be reported by the mux
during the response parsing, but also by the HTTP analyzers. However, in
this last case, the response time must only be reported as -1 on 502.
This patch must be backported to all stable versions. It should fix the
issue #2384.
When the request is too large to fit in a buffer a 414 or a 431 error
message is returned depending on the error state of the request parser. A
414 is returned if the URI is too long, otherwise a 431 is returned.
This patch should fix the issue #1309.
414-Uri-Too-Long and 431-Request-Header-Fields-Too-Large are now part of
supported status codes that can be define as error files. The hash table
defined in http_get_status_idx() was updated accordingly.
"show sess" command now supports a list of options that can be set after all
other possible arguments (<id>, all...). For now, "show-uri" is the only
supported option. With this options, the captured URI, if non-null, is added
to the dump of a stream, complete or now. The URI may be anonymized if
necessary.
This patch should fix the issue #663.
Historically, an agent-check program is only able to set a proportial weight
to the initial server's weight. However, it could be handy to also set an
absolute value. It is the purpose of this patch.
Instead of changing the current way to set a server's weight, a new
agent-check command is introduced. The string "weight:", followed by an
positive interger or a positive interger percentage, can now be used. If the
value ends with the '%' sign, then the new weight will be proportional to
the initially weight of the server. Otherwise, the value is considered as an
absolute weight and must be between 0 and 256.
This patch should fix the issue #360.
It is now possible to use a log-format string to define the "Set-Cookie"
header value of a response generated by a redirect rule. There is no special
check on the result format and it is not possible during the configuration
parsing. It is proably not a big deal because already existing "set-cookie"
and "clear-cookie" options don't perform any check.
Here is an example:
http-request redirect location https://someurl.com/ set-cookie haproxy="%[var(txn.var)]"
This patch should fix the issue #1784.
On prefix-based redirect, there is an option to drop the query-string of the
location. Here it is the opposite. an option is added to preserve the
query-string of the original URI for a localtion-based redirect.
By setting "keep-query" option, for a location-based redirect only, the
query-string of the original URI is appended to the location. If there is no
query-string, nothing is added (no empty '?'). If there is already a
non-empty query-string on the localtion, the original one is appended with
'&' separator.
This patch should fix issue #2728.
Before this patch HAPROXY_BRANCH was unset just after configuration parsing.
Let's keep it, as it could be used in conditional blocks and some
configuration directives and it's handy to check its runtime value via "show
env".
In master-worker mode, this variable is set to the same value for both
processes.
Before this patch, we have need to put the master CLI in debug mode to be able
to issue 'show env' command for the master process. Output of this command is
handy even for the master process context, as it allows to control its
environment variables, which could be used/modified in the 'global' section.
So, let's provide in 'show env' command structure the level ACCESS_MASTER.
This allows to see and to access this command in master CLI without putting it
in debug mode.
As master parses now expose-deprecated-directives option, let's emit warning
about deprecated 'progam' section only in case, if this option wasn't set in
the 'global' section. This allows to people, who don't prefer to remove the
'program' section immediately to continue to start the process in zero-warning
mode.
Adjust the warning message accordingly and mcli_start_progs.vtc test. As
expose-deprecated-directives option is a 'global' section keyword, this section
must always precede any 'program' section, if users still continue to keep
'program' section.
This doesn't need to be backported, as related to the latest changes in
the master-worker architecture.
'Program' section is considered as deprecated now, see the commit 581c8a27d9
("MEDIUM: mworker: depreciate the 'program' section"). So, the 'program'
section parser emits a warning every time since this commit, if its section is
presented. This makes impossible to launch the process in zero-warning mode.
After master-worker refactoring only the master process parses the 'program'
section. So, at first, in order to be able to start in zero-warning mode, we
need to parse in master process option, which allows deprecated keywords. Thus,
let's set in this commit KWF_DISCOVERY flag to
cfg_parse_global_non_std_directives parser, which parses
'expose-deprecated-directives' and 'expose-deprecated-directives' options.
The ring size used to take only numbers and silently ignore letters (due
to atol()), resulting it tiny buffers when trying to collect traces and
using e.g. "size 10g". Let's make use of parse_size_err() to properly
parse units.
parse_size_err() currently is a function working only on an uint. It's
not convenient for certain elements such as rings on large machines.
This commit addresses this by having one function for uints and one
for ullong, and making parse_size_err() a macro that automatically
calls one or the other. It also has the benefit of automatically
supporting compatible types (long, size_t etc).
From time to time we face a configuration with very small timeouts which
look accidental because there could be expectations that they're expressed
in seconds and not milliseconds.
This commit adds a check for non-nul unitless values smaller than 100
and emits a warning suggesting to append an explicit unit if that was
the intent.
Only the common timeouts, the server check intervals and the resolvers
hold and timeout values were covered for now. All the code needs to be
manually reviewed to verify if it supports emitting warnings.
This may break some configs using "zero-warning", but greps in existing
configs indicate that these are extremely rare and solely intentionally
done during tests. At least even if a user leaves that after a test, it
will be more obvious when reading 10ms that something's probably not
correct.
Now that warnings were almost all removed, let's enable zero-warning
via -dW. All tests were adjusted, but two:
- mcli/mcli_start_progs.vtc:
the programs section currently cannot be silenced
- stats/stats-file.vtc:
the warning comes from the stats file itself on comment lines.
All other ones are now OK.
No less than 30 tests were missing timeouts, preventing them from being
started with zero-warning. Since they were not supposed to trigger, they
have been set to 30s so as never to trigger, and now they do not produce
any warning anymore.
This reg-test uses req.len in an HTTP backend. It does work but emits a
warning suggesting that this is ignored, so most likely its days are
counted now. Let's just use req.hdrs,length instead.
The following rules are triggering warnings about content-type being
ignored:
http-request return content-type "text/plain" if { path /def-4 }
http-request return content-type "text/plain" file /dev/null hdr "x-custom-hdr" "%[url]" if { path /empty-file }
Annoyingly, the content-type is mandatory when the file is not empty,
that might be something to revisit in the future to relax at least one
of the rules so that the config doesn't strictly require to know the
file contents upfront.
Two tests were using "timeout {client,server} 1" to forcefully trigger
them, but a forthcoming patch will emit a warning for such small unitless
values, so let's be explicit about the unit.
The regtest "h1or2_to_h1c" contains both an allow and a deny at the end,
likely to help catch rare bugs. But this triggers a warning that we can
silence by placing a condition on the penultimate rule.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "4k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, preventing from starting when set
e.g. to "64k". Let's make use of parse_size_err() on it so that units are
supported. This requires to turn it to uint as well, and to explicitly
limit its range to INT_MAX - 2*sizeof(void*), which was previously
partially handled as part of the sign check.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, and
since it's sometimes compared to an int, we limit its range to
0..INT_MAX.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
Sometimes conditions to decide of an anomaly are not as easy to define
as just an error or a success. One example use case would be to monitor
the transfer time and fix a threshold.
An idea suggested by Tristan would be to make permit the "when"
converter to refer to a more variable or dynamic condition.
Here we make this possible by making "when" rely on a named ACL. The
ACL then needs to be specified in either the proxy or the defaults
section. Since it is evaluated inline, it may even refer to information
available at the end (at log time) such as the data transfer time. If
the ACL evalutates to true, the converter passes the data.
Example: log "dbg={-}" when fine, or "dbg={... debug info ...}" on slow
transfers:
acl slow_xfer res.timer.data ge 10000 # more than 10s is slow
log-format "$HAPROXY_HTTP_LOG_FMT \
fsdbg={%[fs.debug_str,when(acl,slow_xfer)]} \
bsdbg={%[bs.debug_str,when(acl,slow_xfer)]}"
Released version 3.1-dev13 with the following main changes :
- MEDIUM: mworker: depreciate the 'program' section
- BUILD: ot: use a cebtree instead of a list for variable names
- MINOR: startup: replace HAPROXY_LOAD_SUCCESS with global load_status
- BUG/MINOR: startup: set HAPROXY_CFGFILES in read_cfg
- BUG/MINOR: cli: don't show sockpairs in HAPROXY_CLI and HAPROXY_MASTER_CLI
- BUG/MEDIUM: stconn: Don't forward shut for SC in connecting state
- BUG/MEDIUM: resolvers: Insert a non-executed resulution in front of the wait list
- MINOR: debug: explicitly permit the counter condition to be empty
- MINOR: debug: add a new counter type for glitches
- MINOR: mux-h2: count glitches when they're reported
- BUG/MINOR: deinit: release uri_auth admin rules
- MINOR: uri_auth: add stats_uri_auth_free helper
- MEDIUM: uri_auth: implement clean uri_auth cleaning
- MINOR: mux-quic/h3: count glitches when they're reported
- BUG/MEDIUM: mux-h2: Don't send RST_STREAM frame for streams with no ID
- BUG/MINOR: Don't report early srv aborts on request forwarding in DONE state
- MINOR: promex: Expose the global node and description in process metrics
- MINOR: promex: Add global and proxies description as labels to all metrics
- OPTIM: pattern: only apply LRU cache for large enough lists
- BUG/MEDIUM: checks: make sure to always apply offsets to now_ms in expiration
- BUG/MINOR: debug: do not set task expiration to TICK_ETERNITY
- BUG/MEDIUM: mailers: make sure to always apply offsets to now_ms in expiration
- BUG/MINOR: mux_quic: make sure to always apply offsets to now_ms in expiration
- BUG/MINOR: peers: make sure to always apply offsets to now_ms in expiration
- BUG/MEDIUM: clock: make sure now_ms cannot be TICK_ETERNITY
- MINOR: debug/cli: replace "debug dev counters" with "debug counters"
- DOC: config: add tune.h2.{be,fe}.rxbuf to the global keywords index
- MINOR: chunk: add a BUG_ON upon the next init_trash_buffer()
The trash pool is initialized twice in haproxy, first during STG_POOL,
and 2nd after configuration parsing.
Doing alloc_trash_chunk() between this 2 phases can lead to strange
things if we are using it after, indeed the pool is destroyed and
trying to do a free_trash_chunk() or accessing the pointer will lead to
crashes.
This patch checks that we don't have used buffers from the trash pool
before initializing the pool again.
"debug dev" commands are not meant to be used by end-users, and are
purposely not documented. Yet due to their usefulness in troubleshooting
sessions, users are increasingly invited by developers to use some of
them.
"debug dev counters" is one of them. Better move it to "debug counters"
and document it so that users can check them even if the output can look
cryptic at times. This, combined with DEBUG_GLITCHES, can be convenient
to observe suspcious activity. The doc however precises that the format
may change between versions and that new entries/types might appear
within a stable branch.
In clock ticks, 0 is TICK_ETERNITY. Long ago we used to make sure now_ms
couldn't be zero so that it could be assigned to expiration timers, but
it has long changed after functions like tick_add() were instrumented to
make the check. The problem is that aside the rare few accidental direct
assignments to expiration dates, it's also used to mark the beginning of
an event that's later checked against TICK_ETERNITY to know if it has
already struck. The problem in this case is that certain events may just
be replaced or dropped just because they apparently never appeared. It's
probably the case for stconn's "lra" and "fsb" fields, just like it is
for all those involving tick_add_ifset(), like h2c->idle_start.
The right approach would be to change the type of now_ms to something
else that cannot take direct computations and that represents a timestamp,
forcing to always use the conversion functions. The variables holding such
timestamps would also be distinguished from intervals. At first glance we
could have for timestamps:
- 0 = never happened (for the past), eternity (for the future)
- X = date
and for intervals:
- 0 = not set
- X = interval
However this requires significant changes. Instead for now, let's just
make sure again that now_ms is never 0 by setting it to 1 when this
happens (1 / 4 billion times, or 1ms every 49.7 days).
This will need to be carefully backported to older versions. Note that
with this patch backported, the previous ones fixing the zero date are
not strictly needed.
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be a reconnect programmed upon signal
receipt at the wrapping date not having a working timeout.
This should be backported where it applies.
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact looks nul since the task is also woken up, but better
not leave such tasks in the timer tree anyway.
This should be backported where it applies.
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be mailers suddenly stopping.
This should be backported where it applies.
Using "debug task", it's possible to change a task's expiration, but
we must be careful not to set it to TICK_ETERNITY. Let's use tick_add()
instead. The risk is basically nul since it's a debugging command, so
no backport is needed.
Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be health checks suddenly stopping.
This should be backported where it applies.
As shown in issue #1518, the LRU cache has a non-null cost that can
sometimes be above the match cost it's trying to avoid. After a number
of tests, it appears that:
- "simple" match operations (sub, beg, end, int etc) reach a break-even
after ~20 patterns in list
- "heavy" match operations (reg) reach a break-even after ~5 patterns in
list
Let's only consult the LRU cache when the number of patterns in the
expression is at least as large as this limit. Of course there will
always be outliers but it already starts good.
Another improvement consists in reducing the cache size to further
speed up lookups, which makes sense if less expressions use the cache.
While the global description is exposed, when defined, in a dedicated
metric, it is not possible to dump the description defined in a
frontend/listen/backend sections. So, thanks to this patch, it is now
possible to dump it as a label of all metrics of the corresponding
section. To do so, "desc-labels" parameter must be provided on the URL:
/metrics?desc-labels
When this parameter is set, if a description is provided in a section,
including the global one, the "desc" label will be added to all metrics of
this section. For instance:
haproxy_frontend_current_sessions{proxy="front-http",desc="..."} 1
Note that servers metrics inherit the description of their backend/listen
section.
This patch should solve the issue #1531.
The global node value is now exposed via "haproxy_process_node" metrics. The
metric value is always set to 1 and the node name itself is the "node"
label. The same is performed for the global description. But only if it is
defined. In that case "haproxy_process_description" metric is defined, with
1 as value and the description itself is set in the "desc" label.