Zero-copy fast-forwading feature is a quite new and is a bit sensitive.
There is an option to disable it globally. However, all protocols have not
the same maturity. For instance, for the PT multiplexer, there is nothing
really new. The zero-copy fast-forwading is only another name for the kernel
splicing. However, for the QUIC/H3, it is pretty new, not really optimized
and it will evolved. And soon, the support will be added for the cache
applet.
In this context, it is usefull to be able to enable/disable zero-copy
fast-forwading per-protocol and applet. And when it is applicable, on sends
or receives separately. So, instead of having one flag to disable it
globally, there is now a dedicated bitfield, global.tune.no_zero_copy_fwd.
It is possible that a server's addr family is temporarily set to AF_UNSPEC
even if we're certain to be in INET context (ipv4, ipv6).
Indeed, as soon as IP address resolving is involved, srv->addr family will
be set to AF_UNSPEC when the resolution fails (could happen at anytime).
However, _srv_event_hdl_prepare_inetaddr() wrongly assumed that it would
only be called with AF_INET or AF_INET6 families. Because of that, the
function will handle AF_UNSPEC address as an IPV6 address: not only
we could risk reading from an unititialized area, but we would then
propagate false information when publishing the event.
In this patch we make sure to properly handle the AF_UNSPEC family in
both the "prev" and the "next" part for SERVER_INETADDR event and that
every members are explicitly initialized.
This bug was introduced by 6fde37e046 ("MINOR: server/event_hdl: add
SERVER_INETADDR event"), no backport needed.
preferred_address is a transport parameter specify by the server. It
specified both an IPv4 and IPv6 address. These addresses were defined as
plain array in <struct tp_preferred_address>.
Convert these adressees to use the common types in_addr/in6_addr. With
this change, dumping of preferred_address is extended. It now displays
the addresses using inet_ntop() and CID value.
retrieve_qc_conn_from_cid() requires listener as argument whereas it is
unused. This is an artifact from the old architecture where CID trees
where stored on listener instances instead of globally. Remove it to
better reflect this change.
Just like the ->ctl() callback function, used to send commands to mux
connections, the ->sctl() callback function can now be used to send commands
to mux streams. The first command, MUX_SCTL_SID, is a way to request the mux
stream ID.
It will be implemented later for each mux.
Instead of the generic MUX_, we now use MUX_CTL_ prefix for all mux_ctl_type
value. This will avoid any ambiguities with other enums, especially with a
new one that will be added to get information on mux streams.
It is now possible to retrieve the session terminate state, using
"txn.sess_term_state". The sample fetch returns the 2-character session
termation state.
Of course, the result of this sample fetch is volatile. It is subject to
change. It is also most of time useless because no termation state is set
except at the end. It should only be useful in http-after-response rule
sets. It may also be used to customize the logs using a log-format
directive.
This patch should fix the issue #2221.
The code returned by the "status" sample fetch is the one in the HTTP
response at the moment the sample is evaluated. It may be the status code in
the server response or the one of the HAProxy reply in case of error, deny,
redirect...
However, it could be handy to retrieve the status code returned by the
server, when a HTTP response was really received from it. It is the purpose
of the "server_status" sample fetch. The server status code itself is stored
in the HTTP txn.
In previous log backend implementation, we created a pseudo log target
for each declared log server, and we made the log target's address point
to the actual server address to save some time and prevent unecessary
copies.
But this was done without knowing that when FQDN is involved (more broadly
when dns/resolution is involved), the "port" part of server addr should
not be relied upon, and we should explicitly use ->svc_port for that
purpose.
With that in mind and thanks to the previous commit, some changes were
required: we allocate a dedicated addr within the log target when target
is in DGRAM mode. The addr is first initialized with known values and it
is then updated automatically by _srv_set_inetaddr() during runtime.
(the change is atomic so readers don't need to worry about it)
addr from server "log target" (INET/DGRAM mode) is made of the combination
of server's address (lacking the port part) and server's svc_port.
The local variable "event_hdl_async_max_notif_at_once" which was
introduced with the event_hdl API was left as is but with a TODO note
telling that we should make it a global tunable.
Well, we're doing this now. To prepare for upcoming tunables related to
event_hdl API, we add a dedicated struct named event_hdl_tune which is
globally exposed through the event_hdl header file so that it may be used
from everywhere. The struct is automatically initialized in
event_hdl_init() according to defaults.h.
"event_hdl_async_max_notif_at_once" now becomes
"event_hdl_tune.max_events_at_once" with it's dedicated
configuation keyword: "tune.events.max-events-at-once".
We're also taking this opportunity to raise the default value from 10
to 100 since it's seems quite reasonnable given existing async event_hdl
users.
The documentation was updated accordingly.
Implements the customized payload pattern for the master CLI.
The pattern is stored in the stream in char pcli_payload_pat[8].
The principle is basically the same as the CLI one, it looks for '<<'
then stores what's between '<<' and '\n', and look for it to exit the
payload mode.
The CLI payload syntax has some limitation, it can't handle payloads
with empty lines, which is a common problem when uploading a PEM file
over the CLI.
This patch implements a way to customize the ending pattern of the CLI,
so we can't look for other things than empty lines.
A char cli_payload_pat[8] is used in the appctx to store the customized
pattern. The pattern can't be more than 7 characters and can still empty
to match an empty line.
The cli_io_handler() identifies the pattern and stores it, and
cli_parse_request() identifies the end of the payload.
If the customized pattern between "<<" and "\n" is more than 7
characters, it is not considered as a pattern.
This patch only implements the parser for the 'stats socket', another
patch is needed for the 'master CLI'.
Move qc_notify_send() from quic_tx.c to quic_conn.c. Note that it was already
exported from both quic_conn.h and quic_tx.h. Modify this latter header
to fix the duplication.
Such a warning appeared after having added quic_retry.h which includes only
headers for types (quic_cid-t.h, clock-t.h...)
In file included from include/haproxy/quic_retry.h:12,
from src/quic_retry.c:5:
include/haproxy/quic_cid-t.h:26:26: error: field ‘seq_num’ has incomplete type
26 | struct eb64_node seq_num;
Add quic_retry.c new C file for the QUIC retry feature:
quic_saddr_cpy() moved from quic_tx.c,
quic_generate_retry_token_aad() moved from
quic_generate_retry_token() moved from
parse_retry_token() moved from
quic_retry_token_check() moved from
quic_retry_token_check() moved from
This function is in relation with the Initial packet number space which is
more linked to the QUIC TLS specifications. Let's move it to quic_tls.h
to be inlined.
Move quic_path struct from quic_conn-t.h to quic_cc-t.h and rename it to quic_cc_path.
Update the code consequently.
Also some inlined functions in relation with QUIC path to quic_cc.h
Move quic_pkt_type(), quic_saddr_cpy(), quic_write_uint32(), max_available_room(),
max_stream_data_size(), quic_packet_number_length(), quic_packet_number_encode()
and quic_compute_ack_delay_us() to quic_tx.c because only used in this file.
Also move quic_ack_delay_ms() and quic_read_uint32() to quic_tx.c because they
are used only in this file.
Move quic_rx_packet_refinc() and quic_rx_packet_refdec() to quic_rx.h header.
Move qc_el_rx_pkts(), qc_el_rx_pkts_del() and qc_list_qel_rx_pkts() to quic_tls.h
header.
Move quic_cstream struct definition from quic_conn-t.h to quic_tls-t.h.
Its pool is also moved from quic_conn module to quic_tls. Same thing for
quic_cstream_new() and quic_cstream_free().
Fix such building issues:
In file included from src/quic_tx.c:15:
include/haproxy/quic_tx.h:51:23: warning: ‘struct quic_rx_packet’
Do not know why the compiler warns about such missing header inclusions
just now. It should have complained a long time ago during the big QUIC
source code split.
Move quic_cid and quic_connnection_id from quic_conn-t.h to new quic_cid-t.h header.
Move defintions of quic_stateless_reset_token_init(), quic_derive_cid(),
new_quic_cid(), quic_get_cid_tid() and retrieve_qc_conn_from_cid() to quic_cid.c
new C file.
When zero-copy data fast-forwarding is inuse, if the opposite SC is blocked,
there is no reason to try to fast-forward more data. Worst, in some cases,
this can lead to a receive loop of the producer side while the consumer side
is blocked.
No backport needed.
Add an optional argument for "-dt". This argument is interpreted as a
list of several trace statement separated by comma. For each statement,
a specific trace name can be specifed, or none to act on all sources.
Using double-colon separator, it is possible to add specifications on
the wanted level and verbosity.
Add '-dt' haproxy process argument. This will automatically activate all
trace sources on stderr with the error level. This could be useful to
troubleshoot issues such as protocol violations.
In the pat_ref_elt struct, the pattern string is stored outside of the
node element, using a pointer to an strdup(). Not only this needlessly
wastes at least 16-24 bytes per entry (8 for the pointer, 8-16 for the
allocator), it also makes the tree descent less efficient since both
the node and the string have to be visited for each layer (hence at least
two cache lines). Let's use an ebmb storage and place the pattern right
at the end of the pat_ref_elt, making it a variable-sized element instead.
The set-map test below jumps from 173 to 182 kreq/s/core, and the memory
usage drops from 356 MB to 324 MB:
http-request set-map(/dev/null) %[rand(1000000)] 1
This is even more visible with large maps: after loading 16M IP addresses
into a map, the process uses this amount of memory:
- 3.15 GB with haproxy-2.8
- 4.21 GB with haproxy-2.9-dev11
- 3.68 GB with this patch
So that's a net saving of 32 bytes per entry here, which cuts in half the
extra cost of the tree, and loading a large map takes about 20% less time.
Task_drop_running() is used to remove the RUNNING bit and check if
while the task was running it got a new wakeup from itself. Thus
each time task_drop_running() marks itself as a caller, it in fact
removes the previous caller that woke up the task, such as below:
Tasks activity over 10.439 sec till 0.000 sec ago:
function calls cpu_tot cpu_avg lat_tot lat_avg
task_run_applet 57895273 6.396m 6.628us 2.733h 170.0us <- run_tasks_from_lists@src/task.c:658 task_drop_running
Better not mark this function as a caller and keep the original one:
Tasks activity over 13.834 sec till 0.000 sec ago:
function calls cpu_tot cpu_avg lat_tot lat_avg
task_run_applet 62424582 5.825m 5.599us 5.717h 329.7us <- sc_app_chk_rcv_applet@src/stconn.c:952 appctx_wakeup
The mworker mode never had a proper 'hard-stop' (-st) for the reload,
this is a mode which was commonly used with the daemon mode, but it was
never implemented in mworker mode.
This patch fixes the problem by implementing a "hard-reload" command
over the master CLI. It does the same as the "reload" command, but
instead of waiting for the connections to stop in the previous process,
it immediately quits the previous process after binding.
Take the px->server_rules freeing part out of free_proxy() and make it
a dedicated helper function so that it becomes possible to use it from
anywhere.
In this patch we fix the prototype for ipcmp() and ipcpy() functions so
that input pointers that are used exclusively for reads are used as const
pointers. This way, the compiler can safely assume that those variables
won't be altered by the function.
In this patch we add the support for a new SERVER event in the
event_hdl API.
SERVER_INETADDR is implemented as an advanced server event.
It is published each time the server's ip address or port is
about to change. (ie: from the cli, dns, lua...)
SERVER_INETADDR data is an event_hdl_cb_data_server_inetaddr struct
that provides additional info related to the server inet addr change,
but can be casted as a regular event_hdl_cb_data_server struct if
additional info is not needed.
When possible, we try send DATA frame without copying data. To do so, we
swap the input buffer with QCS tx buffer. It is only possible iff:
* There is only one HTX block of data at the beginning of the message
* Amount of data to send is equal to the size of the HTX data block
* The QCS tx buffer is empty
In this case, both buffers are swapped. The frame metadata are written at
the begining of the buffer, before data and where the HTX structure is
stored.
Add a new member <nb_rhttp_conns> in thread_ctx structure. Its purpose
is to count the current number of opened reverse HTTP connections
regarding from their listeners membership.
This patch will be useful to support multi-thread for active reverse
HTTP, in order to select the less loaded thread.
Note that despite access to <nb_rhttp_conns> are only done by the
current thread, atomic operations are used. This is because once
multi-thread support will be added, external threads will also retrieve
values from others.
Previous commit renames 'proto_reverse_connect' module to 'proto_rhttp'.
This commits follows this by replacing various custom prefix by 'rhttp_'
to make the code uniform.
Note that 'reverse_' prefix was kept in connection module. This is
because if a new reversable protocol not based on HTTP is implemented,
it may be necessary to reused the same connection function which are
protocol agnostic.
In Github issue #2128, @jvincze84 explained the complexity of using
external checks in some advanced setups due to the systematic purge of
environment variables, and expressed the desire to preserve the
existing environment. During the discussion an agreement was found
around having an option to "external-check" to do that and that
solution was tested and confirmed to work by user @nyxi.
This patch just cleans this up, implements the option as
"preserve-env" and documents it. The default behavior does not change,
the environment is still purged, unless "preserve-env" is passed. The
choice of not using "import-env" instead was made so that we could
later use it to name specific variables that have to be imported
instead of keeping the whole environment.
The patch is simple enough that it could be backported if needed (and
was in fact tested on 2.6 first).