Tests performed between a 1 Gbps connected server and a 100 mbps client,
distant by 95ms showed that:
- we need 1.1 MB in flight to fill the link
- rare but inevitable losses are sufficient to make cubic's window
collapse fast and long to recover
- a 100 MB object takes 69s to download
- tolerance for 1 loss between two ACKs suffices to shrink the download
time to 20-22s
- 2 losses go to 17-20s
- 4 losses reach 14-17s
At 100 concurrent connections that fill the server's link:
- 0 loss tolerance shows 2-3% losses
- 1 loss tolerance shows 3-5% losses
- 2 loss tolerance shows 10-13% losses
- 4 loss tolerance shows 23-29% losses
As such while there can be a significant gain sometimes in setting this
tolerance above zero, it can also significantly waste bandwidth by sending
far more than can be received. While it's probably not a solution to real
world problems, it repeatedly proved to be a very effective troubleshooting
tool helping to figure different root causes of low transfer speeds. In
spirit it is comparable to the no-cc congestion algorithm, i.e. it must
not be used except for experimentation.
Define a new buffer pool reserved to allocate smaller memory area. For
the moment, its usage will be restricted to QUIC, as such it is declared
in quic_stream module.
Add a new config option "tune.bufsize.small" to specify the size of the
allocated objects. A special check ensures that it is not greater than
the default bufsize to avoid unexpected effects.
QUIC MUX buffer allocation limit is now directly based on the underlying
congestion window size. previous static limit based on conn-tx-buffers
is now unused. As such, this commit adds a warning to users to prevent
that it is now obsolete.
Secondly, update max-window-size setting. It is now the main entrypoint
to limit both the maximum congestion window size and the number of QUIC
MUX allocated buffer on emission. Remove its special value '0' which was
used to automatically adjust it on now unused conn-tx-buffers.
Define a new global keyword tune.quic.frontend.max-window-size. This
allows to set globally the maximum congestion window size for each QUIC
frontend connections.
The default value is 0. It is a special value which automatically derive
the size from the configured QUIC connection buffer limit. This is
similar to the previous "quic-cc-algo" behavior, which can be used to
override the maximum window size per bind line.
Document nocc congestion algorithm as an entry of quic-cc-algo.
Highlight the fact that it is reserved for debugging and should not be
used outside of this use case.
It is possible to override the default QUIC congestion algorithm on a
bind line. With the same setting, it is also possible to specify the
maximum congestion window size.
The parser rejects values outside of the range between 10k and 4g. This
is in contradiction with the documentation which specify 1k as the lower
value. Correct this value in the documentation.
This should be backported up to 2.9.
Some systems require log formats in the CLF format and that meant that I
could not send my logs for proxies in mode tcp to those servers. This
implements a format that uses log variables that are compatble with TCP
mode frontends and replaces traditional HTTP values in the CLF format
to make them stand out. Instead of logging method and URI like this
"GET /example HTTP/1.1" it will log "TCP " and for a response code I
used "000" so it would be easy to separate from legitimate HTTP
traffic. Now your log servers that require a CLF format can see the
timings for TCP traffic as well as HTTP.
It is now possible to use "drop" keyword for "on" lines under a
log-profile section to specify that no log at all should be emitted for
the specified step (setting an empty format was not sufficient to do so
because only the log payload would be empty, not the log header, thus the
log would still be emitted).
It may be useful to selectively disable logging at specific steps for a
given log target (since the log profile may be set on log directives):
log-profile myprof
on request format "blabla" sd "custom sd"
on response drop
New testcase was added to reg-tests/log/log_profiles.vtc
Released version 3.1-dev5 with the following main changes :
- BUG/MINOR: quic: Lack of precision when computing K (cubic only cc)
- MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD
- MINOR: quic: rename confusing wording aes to hp
- MEDIUM: quic: add key argument to header protection crypto functions
- MEDIUM: quic: implement CHACHA20_POLY1305 for AWS-LC
- MEDIUM: sink: assume sft appctx stickiness
- MINOR: quic: delay Retry emission on quic-force-retry
- MEDIUM: quic: implement quic-initial rules
- MINOR: quic: support ACL for quic-initial rules
- MINOR: quic: pass quic_dgram as obj_type for quic-initial rules
- MINOR: quic: implement reject quic-initial action
- MINOR: quic: implement send-retry quic-initial rules
- BUG/MEDIUM: quic: fix invalid conn reject with CONNECTION_REFUSED
- MEDIUM: h1: allow to preserve keep-alive on T-E + C-L
- MINOR: quic: Add information to "show quic" for CUBIC cc.
- MINOR: quic: Dump TX in flight bytes vs window values ratio.
- BUG/MEDIUM: jwt: Clear SSL error queue on error when checking the signature
- BUILD: cfgparse-quic: fix build error on Solaris due to missing netinet/in.h
- MINOR: queue: add a function to check for TOCTOU after queueing
- BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()
- DOC: config: Add documentation about spop mode for backends
- BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was set
- BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path
- BUILD: mux-pt: Use the right name for the sedesc variable
- BUG/MINOR: stconn: bs.id and fs.id had their dependencies incorrect
- BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC
- BUG/MEDIUM: ssl: 0-RTT initialized at the wrong place for AWS-LC
- BUILD: ssl: replace USE_OPENSSL_AWSLC by OPENSSL_IS_AWSLC
- BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content
- MINOR: tcp_sample: Move TCP low level sample fetch function to control layer
- MINOR: quic: Define ->get_info() control layer callback for QUIC
- MINOR: flags/mux-quic: decode qcc and qcs flags
- BUG/MINOR: quic: fix fc_rtt/srtt values
- BUG/MIONR: quic: fix fc_lost
- BUG/MINOR: h1: do not forward h2c upgrade header token
- BUG/MINOR: h2: reject extended connect for h2c protocol
- BUG/MEDIUM: http-ana: Report error on write error waiting for the response
- BUG/MEDIUM: h2: Only report early HTX EOM for tunneled streams
- BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream
- BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
- BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
- CI: add weekly QUIC Interop regression against AWS-LC
- CI: harden NetBSD builds by ERR=1
- BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only)
- DEV: coccinelle: add a test to detect unchecked strdup()
- BUG/MINOR: fcgi-app: handle a possible strdup() failure
- BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
- MINOR: quic: convert qc_stream_desc release field to flags
- MINOR: quic: implement function to check if STREAM is fully acked
- BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
- MINOR: quic: enforce ACK reception is handled in order
- DOC: configuration: fix alphabetical ordering of {bs,fs}.aborted
- MINOR: stconn: add a new pair of sf functions {bs,fs}.debug_str
- MINOR: mux-h2: implement the debug string for logs
- MINOR: mux-quic: define dump functions for QCC and QCS
- MINOR: mux-quic: implement debug string for logs
- MINOR: quic: dump quic_conn debug string for logs
- MINOR: time: define tot_time structure
- MINOR: mux-quic: measure QCS lifetime and its blocking state
- BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn
- BUG/MINOR: trace/quic: permit to lock on frontend/connect/session etc
- BUG/MEDIUM: trace: fix null deref in lockon mechanism since TRACE_ENABLED()
- BUG/MINOR: trace: automatically start in waiting mode with "start <evt>"
- BUG/MINOR: trace/quic: make "qconn" selectable as a lockon criterion
- BUG/MINOR: quic/trace: make quic_conn_enc_level_init() emit NEW not CLOSE
- MINOR: trace: support setting the sink and level for all sources at once
- MINOR: session/trace: enable very minimal session tracing
- MEDIUM: trace: implement a "follow" mechanism
- MINOR: trace: move the known trace context into a dedicated struct
- MINOR: trace: add a per-source helper to pre-fill the context
- MINOR: mux-h2: add a trace context filling helper
- MINOR: mux-h1: add a trace context filling helper
- MINOR: mux-quic: don't leave dangling pointer after freeing qcs->sd
- MINOR: mux-quic: add a trace context filling helper
- MINOR: mux-h1/trace: add a state trace on stream creation/upgrade
- MINOR: mux-h2/trace: add a state trace on stream creation/destruction
- MINOR: mux-h3/trace: add a state trace on stream creation/destruction
- BUG/MINOR: quic: prevent freeze after early QCS closure
- MINOR: server: ensure max_events_at_once > 0 in server_atomic_sync()
- MINOR: cfgparse: add struct cfgfile to represent config in memory
- REORG: tools: move list_append_word to cfgparse
- MINOR: startup: adapt list_append_word to use cfgfile
- MINOR: cfgparse: add load_cfg_in_mem
- MINOR: cfgparse: load_cfg_in_mem: take in account file size
- MINOR: tools: add fgets_from_mem
- MEDIUM: startup: make read_cfg() return immediately on ENOMEM
- MEDIUM: startup: load and parse configs from memory
- MINOR: startup: rename readcfgfile in parse_cfg
With "follow" from one source to another, it becomes possible for a
source to automatically follow another source's tracked pointer. The
best example is the session:
- the "session" source is enabled and has a "lockon session"
-> its lockon_ptr is equal to the session when valid
- other sources (h1,h2,h3 etc) are configured for "follow session"
and will then automatically check if session's lockon_ptr matches
its own session, in which case tracing will be enabled for that
trace (no state change).
It's not necessary to start/pause/stop traces when using this, only
"follow" followed by a source with lockon enabled is needed. Some
combinations might work better than others. At the moment the session
is almost never known from the backend, but this may improve.
The meta-source "all" is supported for the follower so that all sources
will follow the tracked one.
It's extremely painful to have to set "trace <src> sink buf1" for all
sources, then to do the same for "level developer" (for example). Let's
have a possibility via a meta-source "all" to apply the change to all
sources at once. This currently supports level and sink, which are not
dependent on the source, this is a good start.
These are passed to the underlying mux to retrieve debug information
at the mux level (stream/connection) as a string that's meant to be
added to logs.
The API is quite complex just because we can't pass any info to the
bottom function. So we construct a union and pass the argument as an
int, and expect the callee to fill that with its buffer in return.
Most likely the mux->ctl and ->sctl API should be reworked before
the release to simplify this.
The functions take an optional argument that is a bit mask of the
layers to dump:
muxs=1
muxc=2
xprt=4
conn=8
sock=16
The default (0) logs everything available.
These must be before {bs,fs}.id, not after. Should be backported wherever
068ce2d5d2 ("MINOR: stconn: Add samples to retrieve about stream aborts")
is (normally 3.0).
This low level callback may be called by several sample fetches for
frontend connections like "fc_rtt", "fc_rttvar" etc.
Define this callback for QUIC protocol as pointer to quic_get_info().
This latter supports these sample fetches:
"fc_lost", "fc_reordering", "fc_rtt" and "fc_rttvar".
Update the documentation consequently.
The SPOE was refactored. Now backends referenced by a SPOE filter must use
the spop mode to be able to use the spop multiplexer for server connections.
The "spop" mode was added in the list of supported mode for backends.
In 2.5-dev9, commit 631c7e866 ("MEDIUM: h1: Force close mode for invalid
uses of T-E header") enforced a recently arrived new security rule in the
HTTP specification aiming at preventing a class of content-smuggling
attacks involving HTTP/1.0 agents. It consists in handling the very rare
T-E + C-L requests or responses in close mode.
It happens it does have an impact of a rare few and very old clients
(probably running insecure TLS stacks by the way) that continue to send
both with their POST requests. The impact is that for each and every
request they'll have to reconnect, possibly negotiating a full TLS
handshake that becomes harmful to the machine in terms of CPU computation.
This commit adds a new option "h1-do-not-close-on-insecure-transfer-encoding"
that does exactly what it says, it just asks not to close on such messages,
even though the message continues to be sanitized and C-L dropped. It means
that the risk is only between the sender and haproxy, which is limited, and
might be the only acceptable solution for such environments having to deal
with broken implementations.
The cases are so rare that it should not need to be backported, or in the
worst case, to the latest LTS if there is any demand.
Define a new quic-initial "send-retry" rule. This allows to force the
emission of a Retry packet on an initial without token instead of
instantiating a new QUIC connection.
Define a new quic-initial action named "reject". Contrary to dgram-drop,
the client is notified of the rejection by a CONNECTION_CLOSE with
CONNECTION_REFUSED error code.
To be able to emit the necessary CONNECTION_CLOSE frame, quic_conn is
instantiated, contrary to dgram-drop action. quic_set_connection_close()
is called immediatly after qc_new_conn() which prevents the handshake
startup.
Add ACL condition support for quic-initial rules. This requires the
extension of quic_parse_quic_initial() to parse an extra if/unless
block.
Only layer4 client samples are allowed to be used with quic-initial
rules. However, due to the early execution of quic-initial rules prior
to any connection instantiation, some samples are non supported.
To be able to use the 4 described samples, a dummy session is
instantiated before quic-initial rules execution. Its src and dst fields
are set from the received datagram values.
Implement a new set of rules labelled as quic-initial.
These rules as specific to QUIC. They are scheduled to be executed early
on Initial packet parsing, prior a new QUIC connection instantiation.
Contrary to tcp-request connection, this allows to reject traffic
earlier, most notably by avoiding unnecessary QUIC SSL handshake
processing.
A new module quic_rules is created. Its main function
quic_init_exec_rules() is called on Initial packet parsing in function
quic_rx_pkt_retrieve_conn().
For the moment, only "accept" and "dgram-drop" are valid actions. Both
are final. The latter drops silently the Initial packet instead of
allocating a new QUIC connection.
Released version 3.1-dev4 with the following main changes :
- MINOR: limits: prepare to keep limits in one place
- REORG: fd: move raise_rlim_nofile to limits
- CLEANUP: fd: rm struct rlimit definition
- REORG: global: move rlim_fd_*_at_boot in limits
- MINOR: haproxy: prepare to move limits-related code
- REORG: haproxy: move limits handlers to limits
- MINOR: limits: add is_any_limit_configured
- CLEANUP: quic: remove obsolete comment on send
- MINOR: quic: extend detection of UDP API OS features
- MINOR: quic: activate UDP GSO for QUIC if supported
- MINOR: quic: define quic_cc_path MTU as constant
- MINOR: quic: add GSO parameter on quic_sock send API
- MAJOR: quic: support GSO when encoding datagrams
- MEDIUM: quic: implement GSO fallback mechanism
- MINOR: quic: add counters of sent bytes with and without GSO
- BUG/MEDIUM: bwlim: Be sure to never set the analyze expiration date in past
- CLEANUP: proto: rename TID affinity callbacks
- CLEANUP: quic: rename TID affinity elements
- BUG/MINOR: limits: fix license type in limits.h
- BUG/MINOR: session: Eval L4/L5 rules defined in the default section
- CLEANUP: stconn: Fix a typo in comments for SE_ABRT_SRC_*
- MEDIUM: spoe: Remove fragmentation support
- MEDIUM: spoe: Remove async mode support
- MINOR: spoe: Use only a global engine-id per agent
- MINOR: spoe: Remove debugging
- MAJOR: spoe: Remove idle applets and pipelining support
- MINOR: spoe: Remove the dedicated SPOE applet task
- MEDIUM: proxy/spoe: Add a SPOP mode
- MEDIUM: applet: Add a .shut callback function for applets
- MINOR: connection: No longer include stconn type header in connection-t.h
- MINOR: stconn: Use a dedicated function to get the opposite sedesc
- MINOR: spoe: Rename some flags and constant to use SPOP prefix
- MINOR: spoe: Dynamically alloc the message list per event of an agent
- MINOR: spoe: Move all stuff regarding the filter/applet in the C file
- MINOR: spoe: Move spoe_str_to_vsn() into the header file
- MEDIUM: mux-spop: Introduce the SPOP multiplexer
- MEDIUM: check/spoe: Use SPOP multiplexer to perform SPOP health-checks
- MAJOR: spoe: Rewrite SPOE applet to use the SPOP mux
- CLEANUP: spoe: Uniformize function definitions
- MINOR: spoe: Add internal sample fetch to retrieve the SPOE engine ID
- MEDIUM: spoe: Set a specific name for the connection pool of SPOP servers
- MINOR: backend: Remove test on HTX streams to reuse idle connections on connect
- MEDIUM: spoe: Force the reuse 'always' mode for SPOP backends
- MINOR: mux-spop: Use a dedicated function to update the SPOP connection timeout
- MAJOR: mux-spop: Make the SPOP connections reusable
- MINOR: stats-html: Display reuse ratio for spop connections
- MEDIUM: spoe: Directly xfer NOTIFY frame when SPOE applet is created
- MEDIUM: spoe: Directly receive ACK frame in the SPOE context buffer
- MEDIUM: mux-spop/spoe: Save negociated max-frame-size value in the mux
- MINOR: spoe: Remove the spop version from the SPOE appctx context
- MEDIUM: mux-spop: Add checks on received frames
- MEDIUM: mux-spop: Announce the pipeling support if possible
- MEDIUM: spoe: Forward SPOE context error to the SPOE applet
- MEDIUM: spoe: Make the SPOE applet use its own buffers
- DOC: spoe: Update SPOE documentation to reflect recent refactoring
- BUILD: mux-spop: fix build failure on gcc 4-10 and clang
- MINOR: fd: don't scan the full fdtab on all threads
- MINOR: server: better mt_list usage for node migration (prev_deleted handling)
- BUG/MINOR: do not close uninit FD in quic_test_socketops()
- BUG/MEDIUM: debug/cli: fix "show threads" crashing with low thread counts
- MINOR: debug: prepare feed_post_mortem_late
- CLEANUP: debug: fix indents in debug_parse_cli_show_dev
- MINOR: debug: store runtime uid/gid in postmortem
- MINOR: debug: keep runtime capabilities in post_mortem
- MINOR: debug: use LIM2A to show limits
- MINOR: debug: prepare to show runtime limits
- MINOR: debug: keep runtime limits in postmortem
- DOC: install: don't reference removed CPU arg
- BUG/MEDIUM: ssl_sock: fix deadlock in ssl_sock_load_ocsp() on error path
- BUG/MAJOR: mux-h2: force a hard error upon short read with pending error
- MEDIUM: sink: start applets asynchronously
- OPTIM: sink: balance applets accross threads
- MEDIUM: ocsp: fix ocsp when the chain is loaded from 'issuers-chain-path'
- MEDIUM: ssl: add extra_chain to ckch_data
- MINOR: ssl: change issuers-chain for show_cert_detail()
- REGTESTS: ssl: test the issuers-chain-path keyword
- DOC: configuration: issuers-chain-path not compatible with OCSP
- DOC: configuration: issuers-chain-path is compatible with OCSP
- BUG/MEDIUM: startup: fix zero-warning mode
- BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char (2)
- MINOR: cfgparse-global: move mode's keywords in cfg_kw_list
- MINOR: cfgparse-global: move no<poller_name> in cfg_kw_list
- DOC: config: improve the http-keep-alive section
- BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter
- BUG/MINOR: server: Don't warn fallback IP is used during init-addr resolution
- BUG/MINOR: cli: Atomically inc the global request counter between CLI commands
- MINOR: stream: Add a pointer to set the parent stream
- MINOR: vars: Fill a description instead of hash and scope when a name is parsed
- MINOR: vars: Use a description to set/unset a variable instead of its hash and scope
- MEDIUM: vars: Be able to parse parent scopes for variables
- MINOR: vars: Use a variable description to get variables of a specific scope
- MEDIUM: vars: Be able to retrieve variable of the parent stream, if any
- MEDIUM: spoe: Set the parent stream for SPOE streams
- BUG/MINOR: quic: Non optimal first datagram.
- DOC: config: Add a dedicated section about variables
- DOC: config: Add info about variable scopes referencing the parent stream
- DOC: config: Explicitly state the SPOE streams have a usable parent stream
- MINOR: quic: Avoid cc priv buffer overflow.
- MINOR: spoe: Add a function to validate a version is supported
- MINOR: spoe: export the list of SPOP error reasons
- MEDIUM: spoe/tcpcheck: Reintroduce SPOP check as a customized tcp-check
- REGTESTS: check/spoe: Re-enable the script performing SPOP health-checks
- BUG/MEDIUM: sink: properly init applet under sft lock
- MINOR: sink: unify and sink_forward_io_handler() and sink_forward_oc_io_handler()
- MINOR: sink: Remove useless test on SE_FL_SHR/SHW flags
- MINOR: sink: merge sink_forward_io_handler() with sink_forward_oc_io_handler()
- MINOR: sink: add some comments about sft->appctx usage in applet handlers
- MINOR: sink: distinguish between hard and soft close in _sink_forward_io_handler()
- MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface
- MINOR: ring: count processed messages in ring_dispatch_messages()
- MINOR: sink: add processed events counter in sft
- MEDIUM: sink: "max-reuse" support for sink servers
- OPTIM: sink: consider threads' current load when rebalancing applets
Thanks to the previous commit, it is now possible to know how many events
were processed for a given sft/server sink pair. As mentioned in commit
c454296 ("OPTIM: sink: balance applets accross threads"), let's provide
the ability to restart a server connection when a certain amount of events
were processed to help better balance the load over multiple threads.
For this, we make use the of "max-reuse" server keyword which was only
relevant under "http" context so far. Under sink context, "max-reuse"
corresponds to the number of times the tcp connection can be reused
for sending messages, which in fact means that "max-reuse + 1" is the
number of events (ie: messages) that are allowed to be sent using the
same tcp server connection: when this threshold is met, the connection
will be destroyed and a new one will be created on a random thread.
The value is not strict: it is the minimum value above which the
connection may be destroyed since the value is checked after
ring_dispatch_messages() which may process multiple messages at once.
By default, no limit is enforced (the connection will be reused for as
long as it is available).
The documentation was updated accordingly.
It is explicitly mentionned in the configuration manual that the parent of a
SPOE stream is the filtered stream. It means variables of the filtered
stream are usable from the SPOE stream.
It is now possible for a stream to have a parent and it is also possible to
retrieve variables defined in the parent stream context. To do so, some
extra scopes were introduced. The section 2.8. was updated accordingly.
The variables in the HAProxy configuration are now described in a dedicated
section. Instead of repeating the same description everywhere a variable
name can be used, the section 2.8. is now referenced.
Nathan Wehrman suggested this add-on to try to better explain the
interactions between http-keep-alive and other timeouts, and the
impacts on protocols (HTTP/1, HTTP/2 etc).
Let's check the second time a global counter of "ha_warning" messages, if
zero-warning is set. And let's do this just before forking. At this moment we
are sure, that we've already done all init operations, where we could emit
"ha_warning", and we still have stderr fd opened.
Even with the second check, we could lost some late and rare warnings
about failing to drop supplementary groups and about re-enabling core dumps.
Notes about this are added into 'zero-warning' keyword description.
Since patch f3dfd95a ("MEDIUM: ocsp: fix ocsp when the chain is loaded
from 'issuers-chain-path'") the OCSP features are compatible with
'issuers-chain-path'.
The SPOE was refactored. Several parameters were deprecated. Fragmentation
and async capabilities support were removed. The default log-format was
updated too.
So, the SPOE documentation was updated accordingly.
The related issue is #2502.
Add a startup test for GSO support in quic_test_socketopts() and
automatically activate it in qc_prep_pkts() when building datagrams as
big as MTU.
Also define a new config option tune.quic.disable-udp-gso. This is
useful to prevent warning on older platform or to debug an issue which
may be related to GSO.
Released version 3.1-dev3 with the following main changes :
- BUG/MINOR: quic: Wrong datagram building when probing.
- BUG/MEDIUM: quic: fix possible exit from qc_check_dcid() without unlocking
- BUG/MINOR: promex: Remove Help prefix repeated twice for each metric
- DOC: configuration: add details about crt-store in bind "crt" keyword
- BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work with applet's buffers
- DOC: configuration: more details about the master-worker mode
- BUG/MEDIUM: server: fix race on server_atomic_sync()
- BUG/MINOR: jwt: don't try to load files with HMAC algorithm
- CLEANUP: quic: cleanup prototypes related to CIDs handling
- CLEANUP: quic: remove non-existing quic_cid_tree definition
- MINOR: quic: remove access to CID global tree outside of quic_cid module
- REORG: quic: remove quic_cid_trees reference from proto_quic
- MINOR: quic: add 2 BUG_ON() on datagram dispatch
- MINOR: quic: ensure quic_conn is never removed on thread affinity rebind
- MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD
- DOC: configuration: update maxconn description
- MINOR: proto: extend connection thread rebind API
- BUG/MEDIUM: quic: prevent crash on accept queue full
- BUG/MEDIUM: peers: Fix crash when syncing learn state of a peer without appctx
- CI: add weekly QUIC Interop regression against LibreSSL
- DEV: flags/quic: decode quic_conn flags
- MINOR: quic: rename "ssl error" trace
- BUG/MEDIUM: init: fix fd_hard_limit default in compute_ideal_maxconn
- BUG/MINOR: jwt: fix variable initialisation
- MINOR: ssl/sample: ssl_c_san returns a comma separated list of SAN
- OPTIM: pool: improve needed_avg cache line access pattern
- MAJOR: import: update mt_list to support exponential back-off (try #2)
- CI: weekly QUIC Interop: try to fix private image
- BUG/MINOR: h1: Fail to parse empty transfer coding names
- BUG/MINOR: h1: Reject empty coding name as last transfer-encoding value
- BUG/MEDIUM: h1: Reject empty Transfer-encoding header
- BUG/MEDIUM: spoe: Be sure to create a SPOE applet if none on the current thread
- BUILD: listener: silence a build warning about unused value without threads
- DOC: architecture: remove the totally outdated architecture manual
- SCRIPTS: create-release: no more need to skip architecture.txt
We've discussed about removing it many times and I thought it had been
removed long ago, but apparently not as William proved me. Let's get
rid of it now. It's totally outdated (last updated 18 years ago, when
laptop processors were still 32 bits), mentions keywords and external
products that don't exist anymore. It's not even on docs.haproxy.org.
At some point, old stuff must really die.
This is the second attempt at importing the updated mt_list code (commit
59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR:
import: update mt_list to support exponential back-off") but revealed
problems with QUIC connections and was reverted.
The problem that was faced was that elements deleted inside an iterator
were no longer reset, and that if they were to be recycled in this form,
they could appear as busy to the next user. This was trivially reproduced
with this:
$ cat quic-repro.cfg
global
stats socket /tmp/sock1 level admin
stats timeout 1h
limited-quic
frontend stats
mode http
bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
timeout client 5s
stats uri /
$ ./haproxy -db -f quic-repro.cfg &
$ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/
=> hang
This was purely an API issue caused by the simplified usage of the macros
for the iterator. The original version had two backups (one full element
and one pointer) that the user had to take care of, while the new one only
uses one that is transparent for the user. But during removal, the element
still has to be unlocked if it's going to be reused.
All of this sparked discussions with Fred and Aurlien regarding the still
unclear state of locking. It was found that the lock API does too much at
once and is lacking granularity. The new version offers a much more fine-
grained control allowing to selectively lock/unlock an element, a link,
the rest of the list etc.
It was also found that plenty of places just want to free the current
element, or delete it to do anything with it, hence don't need to reset
its pointers (e.g. event_hdl). Finally it appeared obvious that the
root cause of the problem was the unclear usage of the list iterators
themselves because one does not necessarily expect the element to be
presented locked when not needed, which makes the unlock easy to overlook
during reviews.
The updated version of the list presents explicit lock status in the
macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED
suffix, the caller is expected to unlock the element if it intends to
reuse it. At least the status is advertised. The _UNLOCKED variant,
instead, always unlocks it before starting the loop block. This means
it's not necessary to think about unlocking it, though it's obviously
not usable with everything. A few _UNLOCKED were used at obvious places
(i.e. where the element is deleted and freed without any prior check).
Interestingly, the tests performed last year on QUIC forwarding, that
resulted in limited traffic for the original version and higher bit
rate for the new one couldn't be reproduced because since then the QUIC
stack has gaind in efficiency, and the 100 Gbps barrier is now reached
with or without the mt_list update. However the unit tests definitely
show a huge difference, particularly on EPYC platforms where the EBO
provides tremendous CPU savings.
Overall, the following changes are visible from the application code:
- mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr
=> MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
+ 1 back elem
- MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
=> just manually set iterator to NULL however.
For MT_LIST_FOR_EACH_ENTRY_LOCKED()
=> mt_list_unlock_self() (if element going to be reused) + NULL
- MT_LIST_LOCK_ELT => mt_list_lock_full()
- MT_LIST_UNLOCK_ELT => mt_list_unlock_full()
- l = MT_LIST_APPEND_LOCKED(h, e); MT_LIST_UNLOCK_ELT();
=> l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)
The ssl_c_san sample fetch returns a list of Subject Alt Name which was
presented by the client certificate.
The format is the same as the "openssl x509 -text" command, it's a
Description: Value list separated by commas.
The format is directly generated by the GENERAL_NAME_print() openssl
function.
https://github.com/openssl/openssl/blob/openssl-3.0/crypto/x509/v3_san.c#L207
Example:
IP Address:127.0.0.1, IP Address:127.0.0.2, IP Address:127.0.0.3, URI:http://docs.haproxy.org/2.7/, DNS:ca.tests.haproxy.com
Let's update maxconn keyword description, in order to make it clear, which
setting has the precedence over the global.maxconn and the SYSTEM_MAXCONN if
set.
Let's provide a default value for fd_hard_limit, if it's not set in the
configuration. With this patch we could set some specific default via
compile-time variable DEFAULT_MAXFD as well. Hope, this will be helpfull for
haproxy package maintainers.
make -j 8 TARGET=linux-glibc DEBUG=-DDEFAULT_MAXFD=50000
If haproxy is comipled without DEFAULT_MAXFD defined, the default will be set
to 1048576.
This is done to avoid killing the process by its watchdog, while it started
without any limitations in its configuration or in the command line and the
hard RLIMIT_NOFILE is extremely huge (~1000000000). We use in this case
compute_ideal_maxconn() to calculate maxconn and maxsock, maxsock defines the
size of internal fdtab, which becames very-very large as well. When
the process starts to simply loop over this fdtab (0(n)), this takes a lot of
time, so watchdog does it job.
To avoid this, maxconn now is always reduced to some reasonable value either
by explicit global.fd-hard-limit from configuration, or by its default. The
default may be changed at build-time and overwritten then by
global.fd-hard-limit at runtime. Explicit global.fd-hard-limit from the
configuration has always precedence over DEFAULT_MAXFD, if set.
Must be backported in all stable versions until v2.6.0, including v2.6.0.
Released version 3.1-dev2 with the following main changes :
- BUG/MINOR: log: fix broken '+bin' logformat node option
- DEBUG: hlua: distinguish burst timeout errors from exec timeout errors
- REGTESTS: ssl: fix some regtests 'feature cmd' start condition
- BUG/MEDIUM: ssl: AWS-LC + TLSv1.3 won't do ECDSA in RSA+ECDSA configuration
- MINOR: ssl: activate sigalgs feature for AWS-LC
- REGTESTS: ssl: activate new SSL reg-tests with AWS-LC
- BUG/MEDIUM: proxy: fix email-alert invalid free
- REORG: mailers: move free_email_alert() to mailers.c
- BUG/MINOR: proxy: fix email-alert leak on deinit() (2nd try)
- DOC: configuration: fix alphabetical order of bind options
- DOC: management: document ptr lookup for table commands
- BUG/MAJOR: quic: fix padding with short packets
- BUG/MAJOR: quic: do not loop on emission on closing/draining state
- MINOR: sample: date converter takes HTTP date and output an UNIX timestamp
- SCRIPTS: git-show-backports: do not truncate git-show output
- DOC: api/event_hdl: small updates, fix an example and add some precisions
- BUG/MINOR: h3: fix crash on STOP_SENDING receive after GOAWAY emission
- BUG/MINOR: mux-quic: fix crash on qcs SD alloc failure
- BUG/MINOR: h3: fix BUG_ON() crash on control stream alloc failure
- BUG/MINOR: quic: fix BUG_ON() on Tx pkt alloc failure
- DEV: flags/show-fd-to-flags: adapt to recent versions
- MINOR: capabilities: export capget and __user_cap_header_struct
- MINOR: capabilities: prepare support for version 3
- MINOR: capabilities: use _LINUX_CAPABILITY_VERSION_3
- MINOR: cli/debug: show dev: add cmdline and version
- MINOR: cli/debug: show dev: show capabilities
- MINOR: debug: print gdb hints when crashing
- BUILD: debug: also declare strlen() in __ABORT_NOW()
- BUILD: Missing inclusion header for ssize_t type
- BUG/MINOR: hlua: report proper context upon error in hlua_cli_io_handler_fct()
- MINOR: cfgparse/log: remove leftover dead code
- BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session
- MINOR: stick-table: Always decrement ref count before killing a session
- REORG: init: do MODE_CHECK_CONDITION logic first
- REORG: init: encapsulate CHECK_CONDITION logic in a func
- REORG: init: encapsulate 'reload' sockpair and master CLI listeners creation
- REORG: init: encapsulate code that reads cfg files
- BUG/MINOR: server: fix first server template name lookup UAF
- MINOR: activity: make the memory profiling hash size configurable at build time
- BUG/MEDIUM: server/dns: prevent DOWN/UP flap upon resolution timeout or error
- BUG/MEDIUM: h3: ensure the ":method" pseudo header is totally valid
- BUG/MEDIUM: h3: ensure the ":scheme" pseudo header is totally valid
- BUG/MEDIUM: quic: fix race-condition in quic_get_cid_tid()
- BUG/MINOR: quic: fix race condition in qc_check_dcid()
- BUG/MINOR: quic: fix race-condition on trace for CID retrieval
Fix an example suggesting that using EVENT_HDL_SUB_TYPE(x, y) with y being
0 was valid. Then add some notes to explain how to use
EVENT_HDL_SUB_FAMILY() and EVENT_HDL_SUB_TYPE() with valid values.
Also mention that the feature is available starting from 2.8 and not 2.7.
Finally, perform some purely cosmetic updates.
This could be backported in 2.8.
Add missing documentation and examples for the optional ptr lookup method
for table {show,set,clear} commands introduced in commit 9b2717e7 ("MINOR:
stktable: use {show,set,clear} table with ptr"), as initially described in
GH #2118.
It may be backported in 3.0.
Released version 3.1-dev1 with the following main changes :
- REGTESTS: Remove REQUIRE_VERSION=2.1 from all tests
- REGTESTS: Remove REQUIRE_VERSION=2.2 from all tests
- CI: use "--no-install-recommends" for apt-get
- CI: switch to lua 5.4
- CI: use USE_PCRE2 instead of USE_PCRE
- DOC: replace the README by a markdown version
- CI: VTest: accelerate package install a bit
- ADMIN: acme.sh: remove the old acme.sh code
- BUG/MINOR: cfgparse: remove the correct option on httpcheck send-state warning
- BUG/MINOR: tcpcheck: report correct error in tcp-check rule parser
- BUG/MINOR: tools: fix possible null-deref in env_expand() on out-of-memory
- DOC: configuration: add an example for keywords from crt-store
- CI: speedup apt package install
- DOC: add the FreeBSD status badge to README.md
- DOC: change the link to the FreeBSD CI in README.md
- MINOR: stktable: avoid ambiguous stktable_data_ptr() usage in cli_io_handler_table()
- BUG/MINOR: hlua: use CertCache.set() from various hlua contexts
- CLEANUP: hlua: fix CertCache class comment
- CI: FreeBSD: upgrade image, packages
- BUG/MEDIUM: h1-htx: Don't state interim responses are bodyless
- MEDIUM: stconn: Be able to unblock zero-copy data forwarding from done_fastfwd
- BUG/MEDIUM: mux-quic: Unblock zero-copy forwarding if the txbuf can be released
- BUG/MINOR: quic: prevent crash on qc_kill_conn()
- CLEANUP: hlua: use hlua_pusherror() where relevant
- BUG/MINOR: hlua: don't use lua_pushfstring() when we don't expect LJMP
- BUG/MINOR: hlua: fix unsafe hlua_pusherror() usage
- BUG/MINOR: hlua: prevent LJMP in hlua_traceback()
- CLEANUP: hlua: get rid of hlua_traceback() security checks
- BUG/MINOR: hlua: fix leak in hlua_ckch_set() error path
- CLEANUP: hlua: simplify ambiguous lua_insert() usage in hlua_ctx_resume()
- BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego
- MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
- BUG/MEDIUM: ssl: wrong priority whem limiting ECDSA ciphers in ECDSA+RSA configuration
- BUG/MEDIUM: ssl: bad auth selection with TLS1.2 and WolfSSL
- BUG/MINOR: quic: fix computed length of emitted STREAM frames
- BUG/MINOR: quic: ensure Tx buf is always purged
- BUG/MEDIUM: stconn/mux-h1: Fix suspect change causing timeouts
- BUG/MAJOR: mux-h1: Properly copy chunked input data during zero-copy nego
- BUG/MINOR: mux-h1: Use the right variable to set NEGO_FF_FL_EXACT_SIZE flag
- DOC: install: remove boringssl from the list of supported libraries
- MINOR: log: fix "http-send-name-header" ignore warning message
- BUG/MINOR: proxy: fix server_id_hdr_name leak on deinit()
- BUG/MINOR: proxy: fix log_tag leak on deinit()
- BUG/MINOR: proxy: fix email-alert leak on deinit()
- BUG/MINOR: proxy: fix check_{command,path} leak on deinit()
- BUG/MINOR: proxy: fix dyncookie_key leak on deinit()
- BUG/MINOR: proxy: fix source interface and usesrc leaks on deinit()
- BUG/MINOR: proxy: fix header_unique_id leak on deinit()
- MINOR: proxy: add proxy_free_common() helper function
- BUG/MEDIUM: proxy: fix UAF with {tcp,http}checks logformat expressions
- MINOR: log: change wording in lf_expr_postcheck() error message
- BUG/MEDIUM: log: fix lf_expr_postcheck() behavior with default section
- CLEANUP: log/proxy: fix comment in proxy_free_common()
- DOC: config: move "hash-key" from proxy to server options
- DOC: config: add missing section hint for "guid" proxy keyword
- DOC: config: add missing context hint for new server and proxy keywords
- BUG/MINOR: promex: Skip resolvers metrics when there is no resolver section
- DOC: internals: add a documentation about the master worker
- BUG/MAJOR: mux-h1: Prevent any UAF on H1 connection after draining a request
- BUG/MINOR: quic: fix padding of INITIAL packets
- OPTIM: quic: fill whole Tx buffer if needed
- MINOR: quic: refactor qc_build_pkt() error handling
- MINOR: quic: use global datagram headlen definition
- MINOR: quic: refactor qc_prep_pkts() loop
- DOC/MINOR: management: add missed -dR and -dv options
- DOC/MINOR: management: add -dZ option
- DOC: management: rename show stats domain cli "dns" to "resolvers"
- REORG: log: reorder send log helpers by dependency order
- MINOR: session: expose session_embryonic_build_legacy_err() function
- MEDIUM: log/session: handle embryonic session log within sess_log()
- MINOR: log: provide sending log context to process_send_log() when available
- MINOR: log: add log_orig_to_str() function
- MINOR: log: provide log origin in logformat expressions using '%OG'
- CLEANUP: log: remove ambiguous legacy comment for resolve_logger()
- MINOR: log/backend: always free parsing hints in resolve_logger()
- MINOR: log: make resolve_logger() static
- MINOR: log: provide proxy context to resolve_logger()
- MINOR: log: add __send_log_set_metadata_sd helper
- MINOR: log: add logger flags
- MINOR: log: add log-profile parsing logic
- MINOR: log: add log profile buildlines
- MEDIUM: log: handle log-profile in process_send_log()
- DOC: config: add documentation for log profiles
- REGTESTS: log: add a test for log-profile
- MINOR: ssl: add ssl_sock_bind_verifycbk() in ssl_sock.h
- REORG: ssl: move the SNI selection code in ssl_clienthello.c
- BUILD: ssl: fix build with wolfSSL
- CI: github: upgrade aws-lc to 1.29.0
- Revert "CI: github: upgrade aws-lc to 1.29.0"
- MEDIUM: ssl: support for ECDA+RSA certificate selection with AWS-LC
- BUILD: ssl: disable deprecated functions for AWS-LC 1.29.0
- MINOR: ssl: relax the 'ssl.default-dh-param' keyword parsing
- CI: github: upgrade aws-lc to 1.29.0
- DOC: INSTALL: minimum AWS-LC version is v1.22.0
- CI: github: do the AWS-LC weekly build with ERR=1
Now that log-profile parsing logic has been implemented in "MINOR: log:
add log-profile parsing logic" and is actually effective since "MEDIUM:
log: handle log-profile in process_send_log()", let's document the feature
and add some examples.
Log-profile section is declared like this:
log-profile myprof
log-tag "custom-tag"
on error format "%ci: error"
on any format "(custom httplog) ${HAPROXY_HTTP_LOG_FMT}" sd "[exampleSDID@1234 step=\"accept\" id=\"%ID\"]"
(check out the documentation for the full list of options, some options
are only relevant under specific contexts)
And used this way (from usual "log" directive lines):
global
log stdout format rfc5424 profile myprof local0
--------------
For now, the use of log-profiles is somewhat limited because we lack
the ability to explicitly trigger the log building process at specific
steps during the stream handling, but it should gain more traction over
the time as the feature evolves and new mechanisms allowing the emission
of logs at expected processing steps will be added.
It should partially fix GH #401
'%OG' logformat alias may be used to report the log origin (when/where)
that triggered log generation using sess_build_logline().
Possible values are:
- "sess_error": log was generated during session error handling
- "sess_killed": log was generated during session abortion (killed
embryonic session)
- "txn_accept": log was generated right after frontend conn was accepted
- "txn_request": log was generated after client request was received
- "txn_connect": log was generated after backend connection establishment
- "txn_response": log was generated during server response handling
- "txn_close": log was generated at the final txn step, before closing
- "unspec": unknown or not specified
Documentation was updated.
In commit f8642ee82 ("MEDIUM: resolvers: rename dns extra counters to
resolvers extra counters"), we renamed "dns" counters to "resolvers", but
we forgot to update the documentation accordingly.
This may be backported to all stable versions.