In connect_server(), when creating a new connection for which we don't yet
know the mux (because it'll be decided by the ALPN), instead of associating
the connection to the stream_interface, always create a conn_stream. This way,
we have less special-casing needed. Store the conn_stream in conn->ctx,
so that we can reach the upper layers if needed.
"set ssl cert <filename> <payload>" CLI command should have the same
result as reload HAproxy with the updated pem file (<filename>).
Is not the case, DHparams/cert-chain is kept from the previous
context if no DHparams/cert-chain is set in the context (<payload>).
This patch should be backport to 2.1
For complex stick tables with many entries/columns, it can be beneficial
to filter using multiple criteria. The maximum number of filter entries
can be controlled by defining STKTABLE_FILTER_LEN during build time.
This patch can be backported to older releases.
while working on issue #429, I encountered build failures with various
non-released openssl versions, let us improve ssl defines, switch to
features, not versions, for EVP_CTRL_AEAD_SET_IVLEN and
EVP_CTRL_AEAD_SET_TAG.
No backport is needed as there is no valid reason to build a stable haproxy
version against a development version of openssl.
It was inherited from acl_time, introduced in 1.3.10 by commit a84d374367
("[MAJOR] new framework for generic ACL support") and was never ever used.
Let's simply drop it now.
Signed bitfields of size `1` hold the values `0` and `-1`, but are
usually assigned `1`, possibly leading to subtle bugs when the value
is explicitely compared against `1`.
Most DNS servers provide A/AAAA records in the Additional section of a
response, which correspond to the SRV records from the Answer section:
;; QUESTION SECTION:
;_http._tcp.be1.domain.tld. IN SRV
;; ANSWER SECTION:
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A1.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A8.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A5.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A6.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A4.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A3.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A2.domain.tld.
_http._tcp.be1.domain.tld. 3600 IN SRV 5 500 80 A7.domain.tld.
;; ADDITIONAL SECTION:
A1.domain.tld. 3600 IN A 192.168.0.1
A8.domain.tld. 3600 IN A 192.168.0.8
A5.domain.tld. 3600 IN A 192.168.0.5
A6.domain.tld. 3600 IN A 192.168.0.6
A4.domain.tld. 3600 IN A 192.168.0.4
A3.domain.tld. 3600 IN A 192.168.0.3
A2.domain.tld. 3600 IN A 192.168.0.2
A7.domain.tld. 3600 IN A 192.168.0.7
SRV record support was introduced in HAProxy 1.8 and the first design
did not take into account the records from the Additional section.
Instead, a new resolution is associated to each server with its relevant
FQDN.
This behavior generates a lot of DNS requests (1 SRV + 1 per server
associated).
This patch aims at fixing this by:
- when a DNS response is validated, we associate A/AAAA records to
relevant SRV ones
- set a flag on associated servers to prevent them from running a DNS
resolution for said FADN
- update server IP address with information found in the Additional
section
If no relevant record can be found in the Additional section, then
HAProxy will failback to running a dedicated resolution for this server,
as it used to do.
This behavior is the one described in RFC 2782.
It is now possible to insert any attribute when a cookie is inserted by
HAProxy. Any value may be set, no check is performed except the syntax validity
(CTRL chars and ';' are forbidden). For instance, it may be used to add the
SameSite attribute:
cookie SRV insert attr "SameSite=Strict"
The attr option may be repeated to add several attributes.
This patch should fix the issue #361.
It is now possible to set the error message to use when a deny rule is
executed. It may be a specific error file, adding "errorfile <file>" :
http-request deny deny_status 400 errorfile /etc/haproxy/errorfiles/400badreq.http
It may also be an error file from an http-errors section, adding "errorfiles
<name>" :
http-request deny errorfiles my-errors # use 403 error from "my-errors" section
When defined, this error message is set in the HTTP transaction. The tarpit rule
is also concerned by this change.
It is now possible to set the error message to return to client in the HTTP
transaction. If it is defined, this error message is used instead of proxy's
errors or default errors.
It is now possible to import in a proxy, fully or partially, error files
declared in an http-errors section. It may be done using the "errorfiles"
directive, followed by a name and optionally a list of status code. If there is
no status code specified, all error files of the http-errors section are
imported. Otherwise, only error files associated to the listed status code are
imported. For instance :
http-errors my-errors
errorfile 400 ...
errorfile 403 ...
errorfile 404 ...
frontend frt
errorfiles my-errors 403 404 # ==> error 400 not imported
A new section may now be declared in the configuration to create global groups
of HTTP errors. These groups are not linked to a proxy and are referenced by
name. The section must be declared using the keyword "http-errors" followed by
the group name. This name must be unique. A list of "errorfile" directives may
be declared in such section. For instance:
http-errors website-1
errorfile 400 /path/to/site1/400.http
errorfile 404 /path/to/site1/404.http
http-errors website-2
errorfile 400 /path/to/site2/400.http
errorfile 404 /path/to/site2/404.http
For now, it is just possible to create "http-errors" sections. There is no
documentation because these groups are not used yet.
All custom HTTP errors are now stored in a global tree. Proxies use a references
on these messages. The key used for errorfile directives is the file name as
specified in the configuration. For errorloc directives, a key is created using
the redirect code and the url. This means that the same custom error message is
now stored only once. It may be used in several proxies or for several status
code, it is only parsed and stored once.
http_parse_errorloc() may now be used to create an HTTP 302 or 303 redirect
message with a specific url passed as parameter. A parameter is used to known if
it is a 302 or a 303 redirect. A status code is passed as parameter. It must be
one of the supported HTTP error codes to be valid. Otherwise an error is
returned. It aims to be used to parse "errorloc" directives. It relies on
http_load_errormsg() to do most of the job, ie converting it in HTX.
http_parse_errorfile() may now be used to parse a raw HTTP message from a
file. A status code is passed as parameter. It must be one of the supported HTTP
error codes to be valid. Otherwise an error is returned. It aims to be used to
parse "errorfile" directives. It relies on http_load_errorfile() to do most of
the job, ie reading the file content and converting it in HTX.
Now, this action is use its own dedicated function and is no longer handled "in
place" during the TCP rules evaluation. Thus the action name ACT_TCP_CAPTURE is
removed. The action type is set to ACT_CUSTOM and a check function is used to
know if the rule depends on request contents while there is no inspect-delay.
Now, these actions use their own dedicated function and are no longer handled
"in place" during the TCP/HTTP rules evaluation. Thus the action names
ACT_ACTION_TRK_SC0 and ACT_ACTION_TRK_SCMAX are removed. The action type is now
the tracking index. Thus the function trk_idx() is no longer needed.
Now, the early-hint action uses its own dedicated action and is no longer
handled "in place" during the HTTP rules evaluation. Thus the action name
ACT_HTTP_EARLY_HINT is removed. In additionn, http_add_early_hint_header() and
http_reply_103_early_hints() are also removed. This part is now handled in the
new action_ptr callback function.
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_*_ACL and ACT_HTTP_*_MAP are removed. The action type is now mapped as
following: 0 = add-acl, 1 = set-map, 2 = del-acl and 3 = del-map.
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_SET_HDR and ACT_HTTP_ADD_VAL are removed. The action type is now set to
0 to set a header (so remove existing ones if any and add a new one) or to 1 to
add a header (add without remove).
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_REPLACE_HDR and ACT_HTTP_REPLACE_VAL are removed. The action type is
now set to 0 to evaluate the whole header or to 1 to evaluate every
comma-delimited values.
The function http_transform_header_str() is renamed to http_replace_hdrs() to be
more explicit and the function http_transform_header() is removed. In fact, this
last one is now more or less the new action function.
The lua code has been updated accordingly to use http_replace_hdrs().
<action> field in the act_rule structure is now an integer. The act_name values
are used for all actions without action function (but it is not a pre-requisit
though) or the action will have no effect. But for all other actions, any
integer value may used, only the action function will take care of it. The
default for such actions is ACT_CUSTOM.
Some flags can now be set on an action when it is registered. The flags are
defined in the act_flag enum. For now, only ACT_FLAG_FINAL may be set on an
action to specify if it stops the rules evaluation. It is set on
ACT_ACTION_ALLOW, ACT_ACTION_DENY, ACT_HTTP_REQ_TARPIT, ACT_HTTP_REQ_AUTH,
ACT_HTTP_REDIR and ACT_TCP_CLOSE actions. But, when required, it may also be set
on custom actions.
Consequently, this flag is checked instead of the action type during the
configuration parsing to trigger a warning when a rule inhibits all the
following ones.
The flags in the act_flag enum have been renamed act_opt. It means ACT_OPT
prefix is used instead of ACT_FLAG. The purpose of this patch is to reserve the
action flags for the actions configuration.
When TCP and HTTP rules are evaluated, if an action function (action_ptr field
in the act_rule structure) is defined for a given action, it is now always
called in priority over the test on the action type. Concretly, for now, only
custom actions define it. Thus there is no change. It just let us the choice to
extend the action type beyond the existing ones in the enum.
Info used by HTTP rules manipulating the message itself are splitted in several
structures in the arg union. But it is possible to group all of them in a unique
struct. Now, <arg.http> is used by most of these rules, which contains:
* <arg.http.i> : an integer used as status code, nice/tos/mark/loglevel or
action id.
* <arg.http.str> : an IST used as header name, reason string or auth realm.
* <arg.http.fmt> : a log-format compatible expression
* <arg.http.re> : a regular expression used by replace rules
Arguments used by actions are never released during HAProxy deinit. Now, it is
possible to specify a function to do so. ".release_ptr" field in the act_rule
structure may be set during the configuration parsing to a specific deinit
function depending on the action type.
In HTTP rules, error handling during a rewrite is now handle the same way for
all rules. First, allocation errors are reported as internal errors. Then, if
soft rewrites are allowed, rewrite errors are ignored and only the
failed_rewrites counter is incremented. Otherwise, when strict rewrites are
mandatory, interanl errors are returned.
For now, only soft rewrites are supported. Note also that the warning sent to
notify a rewrite failure was removed. It will be useless once the strict
rewrites will be possible.
the HTTP_MSGF_SOFT_RW flag must now be set on the HTTP transaction to ignore
rewrite errors on a message, from HTTP rules. The mode is called the soft
rewrites. If thes flag is not set, strict rewrites are performed. In this mode,
if a rewrite error occurred, an internal error is reported.
For now, HTTP_MSGF_SOFT_RW is always set and there is no way to switch a
transaction in strict mode.
The failed_secu counter is only used for the servers stats. It is used to report
the number of denied responses. On proxies, the same info is stored in the
denied_resp counter. So, it is more consistent to use the same field for
servers.
The stats field ST_F_EINT has been added to report internal errors encountered
per proxy, per listener and per server. It appears in the CLI export and on the
HTML stats page.
When HTTP/TCP rules are evaluated, especially HTTP ones, some results are
possible for normal actions and not for custom ones. So missing return codes
(ACT_RET_) have been added to let custom actions act as normal ones. Concretely
following codes have been added:
* ACT_RET_DENY : deny the request/response. It must be handled by the caller
* ACT_RET_ABRT : abort the request/response, handled by action itsleft.
* ACT_RET_INV : invalid request/response
Now, when HTTP rules are evaluated, HTTP_RULE_RES_ERROR must be returned when an
internal error is catched. It is a way to make the difference between a bad
request or a bad response and an error during its processing.
This counter, named 'internal_errors', has been added in frontend and backend
counters. It should be used when a internal error is encountered, instead for
failed_req or failed_resp.
Functions to deinitialize the HTTP rules are buggy. These functions does not
check the action name to release the right part in the arg union. Only few info
are released. For auth rules, the realm is released and there is no problem
here. But the regex <arg.hdr_add.re> is always unconditionally released. So it
is easy to make these functions crash. For instance, with the following rule
HAProxy crashes during the deinit :
http-request set-map(/path/to/map) %[src] %[req.hdr(X-Value)]
For now, These functions are simply removed and we rely on the deinit function
used for TCP rules (renamed as deinit_act_rules()). This patch fixes the
bug. But arguments used by actions are not released at all, this part will be
addressed later.
This patch must be backported to all stable versions.
The subscriber used to be passed as a "void *param" that was systematically
cast to a struct wait_event*. By now it appears clear that the subscribe()
call at every layer is well defined and always takes a pointer to an event
subscriber of type wait_event, so let's enforce this in the functions'
prototypes, remove the intermediary variables used to cast it and clean up
the comments to clarify what all these functions do in their context.
In practice all callers use the same wait_event notification for any I/O
so instead of keeping specific code to handle them separately, let's merge
them and it will allow us to create new events later.
For more than a decade we've kept all the sess_update_st_*() functions
in stream.c while they're only there to work in relation with what is
currently being done in backend.c (srv_redispatch_connect, connect_server,
etc). Let's move all this pollution over there and take this opportunity
to try to find slightly less confusing names for these old functions
whose role is only to handle transitions from one specific stream-int
state:
sess_update_st_rdy_tcp() -> back_handle_st_rdy()
sess_update_st_con_tcp() -> back_handle_st_con()
sess_update_st_cer() -> back_handle_st_cer()
sess_update_stream_int() -> back_try_conn_req()
sess_prepare_conn_req() -> back_handle_st_req()
sess_establish() -> back_establish()
The last one remained in stream.c because it's more or less a completion
function which does all the initialization expected on a connection
success or failure, can set analysers and emit logs.
The other ones could possibly slightly benefit from being modified to
take a stream-int instead since it's really what they're working with,
but it's unimportant here.
These ones used to serve as a set of switches between CO_FL_SOCK_* and
CO_FL_XPRT_*, and now that the SOCK layer is gone, they're always a
copy of the last know CO_FL_XPRT_* ones that is resynchronized before
I/O events by calling conn_refresh_polling_flags(), and that are pushed
back to FDs when detecting changes with conn_xprt_polling_changes().
While these functions are not particularly heavy, what they do is
totally redundant by now because the fd_want_*/fd_stop_*() actions
already perform test-and-set operations to decide to create an entry
or not, so they do the exact same thing that is done by
conn_xprt_polling_changes(). As such it is pointless to call that
one, and given that the only reason to keep CO_FL_CURR_* is to detect
changes there, we can now remove them.
Even if this does only save very few cycles, this removes a significant
complexity that has been responsible for many bugs in the past, including
the last one affecting FreeBSD.
All tests look good, and no performance regressions were observed.
CO_FL_WAIT_ROOM is set by the splicing function in raw_sock, and cleared
by the stream-int when splicing is disabled, as well as in
conn_refresh_polling_flags() so that a new call to ->rcv_pipe() could
be attempted by the I/O callbacks called from conn_fd_handler(). This
clearing in conn_refresh_polling_flags() makes no sense anymore and is
in no way related to the polling at all.
Since we don't call them from there anymore it's better to clear it
before attempting to receive, and to set it again later. So let's move
this operation where it should be, in raw_sock_to_pipe() so that it's
now symmetric. It was also placed in raw_sock_to_buf() so that we're
certain that it gets cleared if an attempt to splice is replaced with
a subsequent attempt to recv(). And these were currently already achieved
by the call to conn_refresh_polling_flags(). Now it could theorically be
removed from the stream-int.
Commit c640ef1a7d ("BUG/MINOR: stream-int: avoid calling rcv_buf() when
splicing is still possible") fixed splicing in TCP and legacy mode but
broke it badly in HTX mode.
What happens in HTX mode is that the channel's to_forward value remains
set to CHN_INFINITE_FORWARD during the whole transfer, and as such it is
not a reliable signal anymore to indicate whether more data are expected
or not. Thus, when data are spliced out of the mux using rcv_pipe(), even
when the end is reached (that only the mux knows about), the call to
rcv_buf() to get the final HTX blocks completing the message were skipped
and there was often no new event to wake this up, resulting in transfer
timeouts at the end of large objects.
All this goes down to the fact that the channel has no more information
about whether it can splice or not despite being the one having to take
the decision to call rcv_pipe() or not. And we cannot afford to call
rcv_buf() inconditionally because, as the commit above showed, this
reduces the forwarding performance by 2 to 3 in TCP and legacy modes
due to data lying in the buffer preventing splicing from being used
later.
The approach taken by this patch consists in offering the muxes the ability
to report a bit more information to the upper layers via the conn_stream.
This information could simply be to indicate that more data are awaited
but the real need being to distinguish splicing and receiving, here
instead we clearly report the mux's willingness to be called for splicing
or not. Hence the flag's name, CS_FL_MAY_SPLICE.
The mux sets this flag when it knows that its buffer is empty and that
data waiting past what is currently known may be spliced, and clears it
when it knows there's no more data or that the caller must fall back to
rcv_buf() instead.
The stream-int code now uses this to determine if splicing may be used
or not instead of looking at the rcv_pipe() callbacks through the whole
chain. And after the rcv_pipe() call, it checks the flag again to decide
whether it may safely skip rcv_buf() or not.
All this bitfield dance remains a bit complex and it starts to appear
obvious that splicing vs reading should be a decision of the mux based
on permission granted by the data layer. This would however increase
the API's complexity but definitely need to be thought about, and should
even significantly simplify the data processing layer.
The way it was integrated in mux-h1 will also result in no more calls
to rcv_pipe() on chunked encoded data, since these ones are currently
disabled at the mux level. However once the issue with chunks+splice
is fixed, it will be important to explicitly check for curr_len|CHNK
to set MAY_SPLICE, so that we don't call rcv_buf() after each chunk.
This fix must be backported to 2.1 and 2.0.
Wietse Venema reported in the thread below that we have a signedness
issue with our hashes implementations: due to the use of const char*
for the input key that's often text, the crc32, sdbm, djb2, and wt6
algorithms return a platform-dependent value for binary input keys
containing bytes with bit 7 set. This means that an ARM or PPC
platform will hash binary inputs differently from an x86 typically.
Worse, some algorithms are well defined in the industry (like CRC32)
and do not provide the expected result on x86, possibly causing
interoperability issues (e.g. a user-agent would fail to compare the
CRC32 of a message body against the one computed by haproxy).
Fortunately, and contrary to the first impression, the CRC32c variant
used in the PROXY protocol processing is not affected. Thus the impact
remains very limited (the vast majority of input keys are text-based,
such as user-agent headers for exmaple).
This patch addresses the issue by fixing all hash functions' prototypes
(even those not affected, for API consistency). A reg test will follow
in another patch.
The vast majority of users do not use these hashes. And among those
using them, very few will pass them on binary inputs. However, for the
rare ones doing it, this fix MAY have an impact during the upgrade. For
example if the package is upgraded on one LB then on another one, and
the CRC32 of a binary input is used as a stick table key (why?) then
these CRCs will not match between both nodes. Similarly, if
"hash-type ... crc32" is used, LB inconsistency may appear during the
transition. For this reason it is preferable to apply the patch on all
nodes using such hashes at the same time. Systems upgraded via their
distros will likely observe the least impact since they're expected to
be upgraded within a short time frame.
And it is important for distros NOT to skip this fix, in order to avoid
distributing an incompatible implementation of a hash. This is the
reason why this patch is tagged as MAJOR, eventhough it's extremely
unlikely that anyone will ever notice a change at all.
This patch must be backported to all supported branches since the
hashes were introduced in 1.5-dev20 (commit 98634f0c). Some parts
may be dropped since implemented later.
Link to Wietse's report:
https://marc.info/?l=postfix-users&m=157879464518535&w=2
In order to properly close connections established from Lua in case
a Lua context dies, the context currently automatically gets a flag
HLUA_MUST_GC set whenever an outgoing connection is used. This causes
the GC to be enforced on the context's death as well as on yield. First,
it does not appear necessary to do it when yielding, since if the
connections die they are already cleaned up. Second, the problem with
the flag is that even if a connection gets properly closed, the flag is
not removed and the GC continues to be called on the Lua context.
The impact on performance looks quite significant, as noticed and
diagnosed by Sadasiva Gujjarlapudi in the following thread:
https://www.mail-archive.com/haproxy@formilux.org/msg35810.html
This patch changes the flag for a counter so that each created
connection increments it and each cleanly closed connection decrements
it. That way we know we have to call the GC on the context's death only
if the count is non-null. As reported in the thread above, the Lua
performance gain is now over 20% by doing this.
Thanks to Sada and Thierry for the design discussion and tests that
led to this solution.
In tasklet_free(), to attempt to remove ourself, use MT_LIST_DEL, we can't
just use LIST_DEL(), as we theorically could be in the shared tasklet list.
This should be backported to 2.1.
In order to reduce the number of poller updates, we can benefit from
the fact that modern pollers use sampling to report readiness and that
under load they rarely report the same FD multiple times in a row. As
such it's not always necessary to disable such FDs especially when we're
almost certain they'll be re-enabled again and will require another set
of syscalls.
Now instead of creating an update for a (possibly temporary) removal,
we only perform this removal if the FD is reported again as ready while
inactive. In addition this is performed via another update so that
alternating workloads like transfers have a chance to re-enable the
FD without any syscall during the loop (typically after the data that
filled a buffer have been sent). However we only do that for single-
threaded FDs as the other ones require a more complex setup and are not
on the critical path.
This does cause a few spurious wakeups but almost totally eliminates the
calls to epoll_ctl() on connections seeing intermitent traffic like HTTP/1
to a server or client.
A typical example with 100k requests for 4 kB objects over 200 connections
shows that the number of epoll_ctl() calls doesn't depend on the number
of requests anymore but most exclusively on the number of established
connections:
Before:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
57.09 0.499964 0 654361 321190 recvfrom
38.33 0.335741 0 369097 1 epoll_wait
4.56 0.039898 0 44643 epoll_ctl
0.02 0.000211 1 200 200 connect
------ ----------- ----------- --------- --------- ----------------
100.00 0.875814 1068301 321391 total
After:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
59.25 0.504676 0 657600 323630 recvfrom
40.68 0.346560 0 374289 1 epoll_wait
0.04 0.000370 0 620 epoll_ctl
0.03 0.000228 1 200 200 connect
------ ----------- ----------- --------- --------- ----------------
100.00 0.851834 1032709 323831 total
As expected there is also a slight increase of epoll_wait() calls since
delaying de-activation of events can occasionally cause one spurious
wakeup.
For now this almost never happens but with subsequent patches it will
become more important not to uselessly call the I/O handlers if the FD
is not active.
Both flags became equal in commit 82967bf9 ("MINOR: connection: adjust
CO_FL_NOTIFY_DATA after removal of flags"), which already predicted the
overlap between xprt_done_cb() and wake() after the removal of the DATA
specific flags in 1.8. Let's simply remove CO_FL_NOTIFY_DATA since the
"_DONE" version already covers everything and explains the intent well
enough.
The function is not TCP-specific at all, it covers all FD-based sockets
so let's move this where other similar functions are, in connection.c,
and rename it conn_fd_check().