The program section is unreliable and should not be used, more reliable
alternatives exist outside HAProxy. Let's depreciate the section so we
could remove it completely in 3.3.
Released version 3.1-dev12 with the following main changes :
- MINOR: startup: tune.renice.{startup,runtime} allow to change priorities
- BUG/MEDIUM: promex: Fix dump of extra counters
- BUILD: import/mt_list: support building with TCC
- BUILD: compiler: define __builtin_prefetch() for tcc
- CLEANUP: quic: Remove the useless directive "tune.quic.backend.max-idle-timeou"
- DOC: config: document connection error 44 (reverse connect failure)
- CLEANUP: connection: properly name the CO_ER_SSL_FATAL enum entry
- DEBUG: cli: support closing "hard" using close() in addition to fd_delete()
- MINOR: connection: add more connection error codes to cover common errno
- MINOR: rawsock: set connection error codes when returning from recv/send/splice
- MINOR: connection: add new sample fetch functions fc_err_name and bc_err_name
- MINOR: quic: Help diagnosing malformed probing packets
- BUG/MINOR: quic: fix malformed probing packet building
- MINOR: listener: Remove useless checks on the receiver protocol existence
- MINOR: http-conv: Remove unreachable goto statement in sample_conv_q_preferred
- MINOR: http: don't %-encode the payload when not relevant
- MINOR: quic: simplify qc_parse_pkt_frms() return path
- MINOR: quic: use dynamically allocated frame on parsing
- MINOR: quic: extend return value of CRYPTO parsing
- BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO
- BUG/MINOR: mworker: do 'program' postparser checks in read_cfg_in_discovery_mode
- EXAMPLES: add "traces.cfg" with traces examples
- BUG/MEDIUM: quic: do not consider ACK on released stream as error
- CLEANUP: stats: fix misleading comment on top of stat_idx_info
- MINOR: wdt: move the local timers to a struct
- MINOR: debug: add a function to dump a stuck thread
- DEBUG: wdt: better detect apparently locked up threads and warn about them
- DEBUG: cli: make it possible for "debug dev loop" to trigger warnings
- DEBUG: wdt: make the blocked traffic warning delay configurable
- DEBUG: wdt: add a stats counter "BlockedTrafficWarnings" in show info
- DEBUG: wdt: set the default blocked task delay to 100 ms
- MINOR: debug: move the "recover now" warn message after the optional notes
- MINOR: event_hdl: add event_hdl_sub_list_empty() helper func
- MINOR: pattern: add _pat_ref_new() helper func
- OPTIM: pattern: use malloc() to initialize new pat_ref struct
- MINOR: pattern: add pat_ref_free() helper func
- CLEANUP: guid: remove global tree export
- BUG/MINOR: guid/server: ensure thread-safety on GUID insert/delete
- DOC: management: explain the change of behavior of the program section
- BUG/MEDIUM: mux-h2: try to wait for the peer to read the GOAWAY
- BUG/MEDIUM: quic: prevent crash due to CRYPTO parsing error
The warn-blocked-traffic-after can be significantly lowered. In any
case, in order to be usable it must be well below the limit to have a
chance to emit exploitable traces before the watchdog finally fires.
Even configured at 1ms it looks very difficult to trigger it on a
laptop doing SSL and compression, so applying a 100-fold factor to
cover for large configs and small machines sounds sane for 3.1. In any
case, even at 100ms, the service degradation becomes quite visible.
These functions return a symbolic error code such as ECONNRESET to keep
logs compact while making them human-readable. It's a good alternative
to the numeric code in that it's more expressive, and a good one to the
full message since it's shorter and more precise (some codes even match
errno names).
The doc was updated so that the symbolic names appear in the table. It
could be useful to backport this feature to help with troubleshooting
some issues, though backporting the doc might possibly be more annoying
in case users have local patches already, so maybe the table update does
not need to be backported in this case.
While we get reports of connection setup errors in fc_err/bc_err, we
don't have the equivalent for the recv/send/splice syscalls. Let's
add provisions for new codes that cover the common errno values that
recv/send/splice can return, i.e. ECONNREFUSED, ENOMEM, EBADF, EFAULT,
EINVAL, ENOTCONN, ENOTSOCK, ENOBUFS, EPIPE. We also add a special case
for when the poller reported the error itself. It's worth noting that
EBADF/EFAULT/EINVAL will generally indicate serious bugs in the code
and should not be reported.
The only thing is that it's quite hard to forcefully (and reliably)
trigger these errors in automated tests as the timing is critical.
Using iptables to manually reset established connections in the
middle of large transfers at least permits to see some ECONNRESET
and/or EPIPE, but the other ones are harder to trigger.
This commit introduces the tune.renice.startup and tune.renice.runtime
global keywords that allows to change the priority with setpriority().
tune.renice.startup is parsed and applied in the worker or the standalone
process for configuration parsing. If this keyword is used alone, the
nice value is changed to the previous one after configuration parsing.
tune.renice.runtime is applied after configuration parsing, so in the
worker or a standalone process. Combined with tune.renice.startup it
allows to have a different nice value during configuration parsing and
during runtime.
The feature was discussed in github issue #1919.
Example:
global
tune.renice.startup 15
tune.renice.runtime 0
Released version 3.1-dev11 with the following main changes :
- BUG/MINOR: httpclient: return NULL when no proxy available during httpclient_new()
- BUG/MEDIUM: mworker/httpclient: initialization skipped by accident in mworker mode
- BUG/MINOR: resolvers/mworker: missing default resolvers in mworker mode
- MINOR: mworker/ocsp: skip ocsp-update proxy init in master
- BUG/MEDIUM: stconn: Wait iobuf is empty to shut SE down during a check send
- MINOR: mux-h1: Show the SD iobuf in trace messages on stream send events
- MINOR: mux-h1: Add a trace on shutdown when keep-alive is not possible
- BUG/MINOR: http-ana: Don't report a server abort if response payload is invalid
- BUG/MEDIUM: stconn: Check FF data of SC to perform a shutdown in sc_notify()
- BUG/MAJOR: filters/htx: Add a flag to state the payload is altered by a filter
- REGTESTS: Never reuse server connection in http-messaging/truncated.vtc
- BUG/MINOR: quic: avoid leaking post handshake frames
- MINOR: quic: send new tokens (NEW_TOKEN) even for 1RTT sessions
- BUG/MEDIUM: quic: avoid freezing 0RTT connections
- DOC: config: fix rfc7239 forwarded typo in desc
- MINOR: http_ext: implement rfc7239_{nn,np} converters
- CLEANUP: http_ext: remove useless BUG_ON() in http_handle_xot_header()
- BUG/MINOR: sample: free err2 in smp_resolve_args for type ARGT_REG
- MINOR: arg: add an argument type for identifier
- BUILD: buffers: keep b_getblk_nc() and b_peek_varint() in buf.h
- CLEANUP: buffers: simplify b_get_varint()
- OPTIM: buffers: avoid a useless wrapping check for ofs == 0
- MINOR: debug: make mark_tainted() return the previous value
- MINOR: chunk: drop the global thread_dump_buffer
- MINOR: debug: split ha_thread_dump() in two parts
- MINOR: debug: slightly change the thread_dump_pointer signification
- MINOR: debug: make ha_thread_dump_done() take the pointer to be used
- MINOR: debug: replace ha_thread_dump() with its two components
- MEDIUM: debug: on panic, make the target thread automatically allocate its buf
- BUILD: mux-h2/traces: fix build on 32-bit due to size of the DATA frame
- CI: prepare Coverity build for Ubuntu 24
- CI: bump development builds explicitely to Ubuntu 24.04
- CI: modernize macos builds to macos-15
- BUG/MINOR: mworker: fix mworker-max-reloads parser
- MINOR: mux-quic: simplify sending of empty STREAM FIN
- BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent
- CLEANUP: debug: make the BUG_ON() macros check the condition in the outer one
- MEDIUM: debug: add match counters for BUG_ON/WARN_ON/CHECK_IF
- MINOR: debug: add a new debug macro COUNT_IF()
- MINOR: debug: add "debug dev counters" to list code counters
- BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF
- BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF
- BUG/MINOR: stconn: Pretend the SE have more data to deliver on abortonclose
- CLEANUP: stream: remove outdated comments
- DEBUG: stream: Add debug counters to track some client/server aborts
- DEBUG: mux-h1: Add debug counters to track some errors
- MINOR: mux-h1: Add support of the debug string for logs
- MINOR: stream: maintain per-stream counters of the number of passes on code
- MINOR: filters: add per-filter call counters
- MINOR: sample: add the "when" converter to condition some expressions
- BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families
- BUILD: spoe: fix build warning on older gcc around sub-struct initialization
- Revert "OPTIM: mux-h2: make h2_send() report more accurate wake up conditions"
- DEBUG: mux-h1: Add debug counters to track errors with in/out pending data
- BUG/MINOR: mux-h1: Fix conditions on pipe in some COUNT_IF()
- MINOR: activity/memprofile: show per-DSO stats
- BUG/MINOR: mworker/cli: show master startup logs in recovery mode
- MINOR: mworker: stop MASTER proxy listener on worker mcli sockpair
- MINOR: error: simplify startup_logs_init_shm
- BUG/MINOR: mworker: show worker warnings in startup logs
- CLEANUP: mworker: clean mworker_reexec
- MINOR: mworker/cli: split mworker_cli_proxy_create
- BUG/MINOR: server: fix dynamic server leak with check on failed init
- BUG/MEDIUM: server: fix race on servers_list during server deletion
- BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error
- BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding
- BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side
- MINOR: mworker/cli: add 'debug' to 'show proc'
- MINOR: mworker/cli: remove comment line for program when useless
- MINOR: mworker/cli: 'show proc debug' for old workers
- BUILD: debug: silence a build warning with threads disabled
- CLEANUP: mux-h2: remove the unused "full" variable in h2_frt_transfer_data()
- MINOR: pools: export the pools variable
- MINOR: debug: place a magic pattern at the beginning of post_mortem
- MINOR: debug: place the post_mortem struct in its own section.
- MINOR: debug: store important pointers in post_mortem
- MINOR: debug: do not limit backtraces to stuck threads
- MINOR: cli: remove non-printable characters from 'debug dev fd'
- MINOR: cli: add an 'echo' command
- MINOR: debug: also add a pointer to struct global to post_mortem
- CLEANUP: mworker: make mworker_create_master_cli more readable
- BUG/MEIDUM: mworker: fix fd leak from master to worker
- BUG/MINOR: mworker/cli: fix mworker_cli_global_proxy_new_listener
- MINOR: tools: add strnlen2() helper
- CLEANUP: log: use strnlen2() in _lf_text_len() to compute string length
- DOC: design: add notes about more detailed error reporting for logs
- MINOR: debug: also add fdtab and acitvity to struct post_mortem
- MINOR: debug: remove the redundant process.thread_info array from post_mortem
- DEV: gdb: add a number of gdb scripts to navigate in core dumps
- BUG/MINOR: trace: stop rewriting argv with -dt
- MEDIUM: protocol: make abns a custom unix socket address family
- MEDIUM: protocol: rely on AF_CUST_ABNS family to recognize ABNS sockets
- CLEANUP: tools: rely on address family to detect ABNS sockets
- MINOR: protocol: create abnsz socket address family
- MINOR: sock: restore effective UNIX family in sock_get_old_sockets()
- MEDIUM: sock: also restore effective unix family in get_{src,dst}()
- MEDIUM: sock_unix: use per-family addrcmp function
- MEDIUM: socket: add zero-terminated ABNS alternative
- BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name correctly
- BUG/MINOR: mworker: mworker_reexec: unset MODE_STARTING before free startup logs ring
- BUG/MINOR: errors: startup_logs_free: set global startup_logs ptr to NULL
- BUG/MINOR: errors: print_message: don't allocate startup logs ring
- BUG/MINOR: startup: don't fork worker if started with -c -W
- BUG/MINOR: startup: dump libs only in worker if started with -W -dL
- BUG/MINOR: startup: dump keywords only in worker if started with -W -dKAll
- BUG/MINOR: startup: don't dump polling info for master in verbose mode
- CI: switch QUIC Interop on AWS-LC to common docker image
- CI: switch QUIC Interop on LibreSSL to common docker image
- CI: enable chacha20 test on LibreSSL QUIC Interop
- DOC: config: add missing glitch_{cnt,rate} data types
- DOC: config: add missing glitch_{cnt,rate} sample definitions
- CI: LibreSSL QUIC Interop: fix docker context
- DEBUG: mux-h1: Add H1C expiration dates in trace messages
- BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections
- BUG/MINOR: http-ana: Report internal error if an action yields on a final eval
- MINOR: stream: Save last evaluated rule on invalid yield
- MINOR: quic: complete trace in qc_may_build_pkt()
- MINOR: quic: move qc_send_mux() prototype into quic_tx.h
- MINOR: stream: Replace last_rule_file/line fields by a more generic field
- MINOR: stream: Save the last filter evaluated interrupting the processing
- MINOR: stream: Save the entity waiting to continue its processing
- MINOR: stream: Use an enum to identify last and waiting entities for streams
- MINOR: stream: Add http-buffer-request option in the waiting entities
- DOC: config: Add documentation about last_entity sample fetch
- DOC: config: Add documentation about waiting_entity sample fetch
Following previous commit, when glitch_cnt and glitch_rate data types were
implemented in c9c6b683f ("MEDIUM: stick-tables: add a new stored type for
glitch_cnt and glitch_rate"), newly exposed samples such as
table_glitch_cnt(), table_glitch_rate, src_glitch_cnt() and
src_glitch_rate() were documented but their definitions was missing in
supported keywords list.
It should be backported in 3.0 with c9c6b683f
When glitch_cnt and glitch_rate data types were implemented in
c9c6b683f ("MEDIUM: stick-tables: add a new stored type for glitch_cnt and
glitch_rate"), the data types list for "stick-table" keyword documentation
was overlooked.
This was reported by Nick Ramirez.
It should be backported in 3.0 with c9c6b683f.
When an abstract unix socket is bound by HAProxy (using "abns@" prefix),
NUL bytes are appended at the end of its path until sun_path is filled
(for a total of 108 characters).
Here we add an alternative to pass only the non-NUL length of that path
to connect/bind calls, such that the effective path of the socket's name
is as humanly written. This may be useful to interconnect with existing
softwares that implement abstract sockets with this logic instead of the
default haproxy one.
This is achieved by implementing the "abnsz" socket prefix (instead of
"abns"), which stands for "zero-terminated ABNS". "abnsz" prefix may be
used anywhere "abns" is. Internally, haproxy uses the custom socket
family (AF_CUST_ABNS vs AF_CUST_ABNSZ) to differentiate default abns
sockets from zero-terminated ones.
Documentation was updated and regtest was added.
Fixes GH issues #977 and #2479
Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
These are the notes of a day long code analysis session (CFA+WTA)
aimed at figuring what's missing during most code troubleshooting
sessions. The goal is to provide good indications about what rules/
filters were still active when the processing ended (timeout, error
etc), what subscribers are still active (indicating waiting for an
event), and what shut/abort events were met at the various levels
of each side's stack, in each direction.
Sometimes it would be desirable to include some debugging output only
under certain conditions, but the end of the transfer is too late to
apply some rules.
Here we take the approach of making a converter ("when") that takes a
condition among an arbitrary list, and decides whether or not to let
the input sample pass through or not based on the condition. This
allows for example to log debugging information only when an error
was encountered during the processing (sort of an extension of
dontlog-normal). The conditions are quite limited (stopping, error,
normal, toapplet, forwarded, processed) and can be negated. The
converter can also be chained to use more complex conditions.
A suggested example will be:
# log "dbg={-}" when fine, or "dbg={... debug info ...}" on error:
log-format "$HAPROXY_HTTP_LOG_FMT dbg={%[bs.debug_str,when(!normal)]}"
"option forwarded" provides a convenient way to automatically insert
rfc7239 forwarded header to requests sent to servers.
On the other hand, manually crafting the header is quite complicated due
to specific formatting rules that must be followed as per rfc7239.
However, sometimes it may be necessary to craft the header manually, for
instance if it has to be conditional or based on parameters that "option
forwarded" doesn't provide. To ease this task, in this patch we implement
rfc7239_nn and rfc7239_np which are respectively meant to craft nodename:
nodeport values, specifically intended to manually build rfc7239 'for'
and 'by' header fields while ensuring rfc7239 compliancy.
Example:
# build RFC-compliant 7239 header:
http-request set-var-fmt(txn.forwarded) "for=\"%[ipv6(::1),rfc7239_nn]:%[str(8888),rfc7239_np]\";host=\"haproxy.org\";proto=http"
# check RFC-compliancy:
http-request set-var(txn.test) "var(txn.forwarded),debug(ok,stderr),rfc7239_is_valid,debug(ok,stderr)"
# stderr output:
# [debug] ok: type=str <for="[::1]:_8888";host="haproxy.org";proto=http>
# [debug] ok: type=bool <1>
See documentation for more info and examples.
Released version 3.1-dev10 with the following main changes :
- BUG/MAJOR: mux-quic: do not crash on empty STREAM frame emission
- BUG/MINOR: stats: Fix the name for the total number of streams created
- MINOR: quic: strengthen qc_release_frm()
- MEDIUM: quic: decount acknowledged data for MUX txbuf window
- MINOR: quic: implement dedicated type for out-of-order stream ACK
- MEDIUM: quic: merge contiguous/overlapping buffered ack stream range
- MEDIUM: quic: decount out-of-order ACK data range for MUX txbuf window
- MINOR: log: add do_log() logging helper
- MINOR: log: add do_log_parse_act() helper func
- MINOR: action: add do-log action
- REGTESTS: add some tests for 'do-log' action
- BUG/MEDIUM: hlua: make hlua_ctx_renew() safe
- BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}()
- BUG/MINOR: quic: fix discarding of already stored out-of-order ACK
- BUG/MEDIUM: quic: properly decount out-of-order ACK on stream release
- MINOR: ssl: disable server side default CRL check with WolfSSL
- MEDIUM: sink: implement sink_find_early()
- MINOR: trace: postresolve sink names
- MINOR: sample: postresolve sink names in debug() converter
- BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests
- MINOR: cfgparse: simulate long configuration parsing with force-cfg-parser-pause
- BUILD: cache: silence an uninitialized warning at -Og with gcc-12.2
- BUG/MINOR: mux-h2/traces: present the correct buffer for trailers errors traces
- MINOR: mux-h2/traces: print the size of the DATA frames
- CLEANUP: muxes: remove useless inclusion of ebmbtree.h
- REORG: buffers: move some of the heavy functions from buf.h to buf.c
- MINOR: buffer: add a buffer list type with functions
- MINOR: mux-h2: split the amount of rx data from the amount to ack
- MINOR: mux-h2: create and initialize an rx offset per stream
- MEDIUM: mux-h2: start to update stream when sending WU
- MEDIUM: mux-h2: start to introduce the window size in the offset calculation
- MINOR: mux-h2: count within a connection, how many streams are receiving data
- MINOR: mux-h2: allocate the array of shared rx bufs in the h2c
- MINOR: mux-h2: add rxbuf head/tail/count management for h2s
- MINOR: mux-h2: move H2_CF_WAIT_IN_LIST flag away from the demux flags
- MINOR: mux-h2: simplify the exit code in h2_rcv_buf()
- MINOR: mux-h2: simplify the wake up code in h2_rcv_buf()
- MINOR: mux-h2: clear up H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ ambiguity
- MAJOR: mux-h2: make streams use the connection's buffers
- MAJOR: mux-h2: permit a stream to allocate as many buffers as desired
- MAJOR: mux-h2: make the rxbuf allocation algorithm a bit smarter
- MINOR: mux-h2: add tune.h2.be.rxbuf and tune.h2.fe.rxbuf global settings
- MEDIUM: mux-h2: change the default initial window to 16kB
- DOC: design-thoughts: add diagrams illustrating an rx win groth
- MEDIUM: mux-h2: rework h2_restart_reading() to differentiate recv and demux
- OPTIM: mux-h2: make h2_send() report more accurate wake up conditions
- OPTIM: mux-h2: try to continue reading after demuxing when useful
- OPTIM: mux-h2: use tasklet_wakeup_after() in h2s_notify_recv()
- MINOR: mux-h2/traces: add missing flags and proxy ID in traces
- MINOR: mux-h2/traces: add buffer-related info to h2s and h2c
- CI: cirrus-ci: bump FreeBSD image to 14-1
- REGTESTS: fix a reload race in abns_socket.vtc
- MINOR: activity/memprofile: always return "other" bin on NULL return address
- MINOR: quic: notify connection layer on handshake completion
- BUG/MINOR: stream: unblock stream on wait-for-handshake completion
- BUG/MEDIUM: quic: support wait-for-handshake
- BUG/MEDIUM: server: server stuck in maintenance after FQDN change
- BUG/MEDIUM: queue: make sure never to queue when there's no more served conns
- DEBUG: mux-h2/flags: add H2_CF_DEM_RXBUF & H2_SF_EXPECT_RXDATA for the decoder
- REGTESTS: cli: add delay 0.1 before connect to cli
- MINOR: startup: add O_CLOEXEC flag to open /dev/null
- MEDIUM: startup: move daemonization fork in init
- MINOR: startup: refactor "daemonization" fork
- MEDIUM: startup: move PID handling in init()
- MAJOR: mworker: move master-worker fork in init()
- BUG/MINOR: mworker: fix memory leak due to master-worker fork
- REORG: mworker: set nbthread=1 for master after fork
- MINOR: init: check MODE_MWORKER before creating master CLI
- REORG: mworker: move mworker_create_master_cli in master 'case'
- MEDIUM: startup: call chroot() if needed in one place
- MEDIUM: startup: do set_identity() if needed in one place
- MINOR: startup: only worker gets capabilities from bin
- CLEANUP: haproxy: rm no longer used mworker_reexec_waitmode
- MINOR: startup: rename exit_on_waitmode_failure to exit_on_failure
- MINOR: defaults: update MASTER_MAXCONN description
- MEDIUM: startup: remove MODE_MWORKER_WAIT
- MINOR: global: add MODE_DISCOVERY flag
- MEDIUM: cfgparse: add KWF_DISCOVERY keyword flag
- MEDIUM: cfgparse: call some parsers only in MODE_DISCOVERY
- MEDIUM: cfgparse-global: parse only KWF_DISCOVERY keywords in MODE_DISCOVERY
- MEDIUM: cfgparse: parse only "global" section in MODE_DISCOVERY
- MEDIUM: startup: introduce load_cfg and read_cfg
- MINOR: cfgparse: fix *thread keywords sensitive to global section position
- MINOR: mworker/cli: rename mworker_cli_proxy_new_listener
- MINOR: mworker/cli: rename and clean mworker_cli_sockpair_new
- MINOR: mworker/cli: create master CLI sockpair before fork
- MINOR: mworker/cli: create MASTER proxy before mcli listeners
- MINOR: mworker: add and set state PROC_O_INIT for new worker
- MEDIUM: mworker/cli: close child and parent fds, setup listeners
- MINOR: mworker: mworker_catch_sigchld: use fd_delete instead of close
- MINOR: startup: rename and adapt reexec_on_failure
- MINOR: mworker: add support for case when new worker dies
- MINOR: mworker: simplify the code that sets PROC_O_LEAVING
- MINOR: mworker/cli: add _send_status to support state transition
- MEDIUM: startup: split sending oldpids_sig logic for standalone and mworker modes
- MINOR: startup: split init() into separate initialization routines
- MINOR: startup: split main: add step_init_3
- MINOR: startup: simplify check for calling sock_get_old_sockets
- MINOR: startup: encapsulate sock_get_old_sockets in a function
- MINOR: startup: add bind_listeners
- MINOR: startup: split main: add step_init_4
- MINOR: startup: encapsulate master's code in run_master
- MINOR: startup: add read_cfg_in_discovery_mode
- MINOR: mworker: adapt exit_on_failure for master recovery mode
- MEDIUM: mworker: add support of master recovery mode
- MINOR: startup: add set_verbosity
- MEDIUM: mworker: block reloads
- MINOR: mworker: slow load status delivery if worker is starting
- MINOR: mworker: readapt program support in mworker_catch_sigchld
- MINOR: mworker: deserialize process list before read_cfg_in_discovery_mode
- MINOR: mworker: parse program only in MODE_DISCOVERY
- MINOR: cfgparse: add support for program section
- MINOR: startup: reintroduce program support
- MINOR: mworker-prog: stop old programs in mworker_ext_launch_all
- MINOR: mworker: reintroduce systemd support
- MINOR: mworker: report explicitly when worker exits due to max reloads
- MINOR: cfgparse-global: parse *env keywords in MODE_DISCOVERY
- MINOR: startup: reintroduce *env keywords support
- MINOR: startup: close devnullfd, when daemon mode is applied
Let's just see on a diagram how the receiver can detect that the
window is large enough for the remote sender to fill the link. Here
it seems that a first criterion is that data are accumulating in
the rxbuf, indicating that the next hop doesn't consume them fast
enough. On the diagram it's visible when blue arrows (incoming data)
are more frequent than the magenta ones on average (outgoing data),
which happens when silence moments are less frequent and don't allow
the reader to catch up. It's also visible that there are two phases
alternating in the transfer:
- measure round trip time (i.e. how long it takes to restart
sending after a WU was sent after a long silence)
- measure the lowest rxbuf size during the previous round trip
It's worth noting that a window size change only has *observable* effect
after two RTT: the first RTT is to restart sending (opening or enlarging
the window), the second RTT to measure the lowest rxbuf size over the
period.
By turning the advertised window into an offset and comparing it to
the received quantity, it's possible to measure the RTT of the whole
chain (including the client possibly producing the data). Note that
when multiple streams compete for BW this can become tricky. Limiting
the window to available buffers and counting the number of sending
streams on a connection could work (i.e. split total buffers into
1+#senders, first one being used for tx).
Now that we're using all available rx buffers for transfers, there's
no point anymore in advertising more than the minimum value we can
safely buffer. Let's be conservative and only rely on the dynamic
buffers to improve speed beyond the configured value, and make sure
than many streams will no longer cause unfairness.
Interestingly, the total number of wakeups has further shrunk down, but
with a different distribution. From 128k for 1000 1M transfers, it went
down to 119k, with 96k from restart_reading, 10k from done_ff and 2.6k
from snd_buf. done_ff went up by 30% and restart_reading went down by
30%.
These settings allow to change the total buffer size allocated to the
backend and frontend respectively. This way it's no longer necessary to
play with tune.bufsize nor increase the number of streams to benefit from
more buffers.
Setting tune.h2.fe.rxbuf to 4m to match a sender's max tcp_wmem resulted
in 257 Mbps for a single stream at 103ms vs 121 Mbps default (or 5.1 Mbps
with a single buffer and 64kB window).
The buffer ring is problematic in multiple aspects, one of which being
that it is only usable by one entity. With multiplexed protocols, we need
to have shared buffers used by many entities (streams and connection),
and the only way to use the buffer ring model in this case is to have
each entity store its own array, and keep a shared counter on allocated
entries. But even with the default 32 buf and 100 streams per HTTP/2
connection, we're speaking about 32*101*32 bytes = 103424 bytes per H2
connection, just to store up to 32 shared buffers, spread randomly in
these tables. Some users might want to achieve much higher than default
rates over high speed links (e.g. 30-50 MB/s at 100ms), which is 3 to 5
MB storage per connection, hence 180 to 300 buffers. There it starts to
cost a lot, up to 1 MB per connection, just to store buffer indexes.
Instead this patch introduces a variant which we call a buffer list.
That's basically just a free list encoded in an array. Each cell
contains a buffer structure, a next index, and a few flags. The index
could be reduced to 16 bits if needed, in order to make room for a new
struct member. The design permits initializing a whole freelist at once
using memset(0).
The list pointer is stored at a single location (e.g. the connection)
and all users (the streams) will just have indexes referencing their
first and last assigned entries (head and tail). This means that with
a single table we can now have all our buffers shared between multiple
streams, irrelevant to the number of potential streams which would want
to use them. Now the 180 to 300 entries array only costs 7.2 to 12 kB,
or 80 times less.
Two large functions (bl_deinit() & bl_get()) were implemented in buf.c.
A basic doc was added to explain how it works.
This command is pausing the configuration parser for <timeout>
milliseconds. This is useful for development or for testing timeouts of
init scripts, particularly to simulate a very long reload. It requires
the expose-experimental-directives to be set.
A previous known limitation about traces was that parsing was performed on
the fly, meaning that when using "sink" keyword, only sinks that were
either internal or previously defined in the config could be used. Indeed,
it was not possible to use a ring section defined AFTER the traces section
when using the 'sink' keyword from traces.
This limitation was also mentioned in the config file.
Let's get rid of that limitation by implementing proper postparsing for
the sink parameter in traces section. To do this, make use of the new
sink_find_early() helper to start referencing sink by their names even
if they don't exist yet (if they are about to be defined later in the
config)
Traces commands on the cli are not concerned by this change.
Thanks to the two previous commits, we can now expose the do-log action
on all available action contexts, including the new quic-init context.
Each context is responsible for exposing the do-log action by registering
the relevant log steps, saving the idendifier, and then store it in the
rule's context so that do_log_action() automatically uses it to produce
the log during runtime.
To use the feature, it is simply needed to use "do-log" (without argument)
on an action directive, example:
tcp-request connection do-log
As mentioned before, each context where the action is exposed has its own
log step identifier. Currently known identifiers are:
quic-initial: quic-init
tcp-request connection: tcp-req-conn
tcp-request session: tcp-req-sess
tcp-request content: tcp-req-cont
tcp-response content: tcp-res-cont
http-request: http-req
http-response: http-res
http-after-response: http-after-res
Thus, these "additional" logging steps can be used as-is under log-profile
section (after "on" keyword). However, although the parser will accept
them, it makes no sense to use them with the "log-steps" proxy keyword,
since the only path for these origins to trigger a log generation is
through the explicit use of "do-log" action.
This need was described in GH #401, it should help to conditionally
trigger logs using ACL at specific key points.. and may either be used
alone or combined with "log-steps" to add additional log "trackers" during
transaction handling.
Documentation was updated and some examples were added.
Released version 3.1-dev9 with the following main changes :
- MINOR: tools: add minimal file name management
- CLEANUP: stick-table: make the file location point to a global file name
- MINOR: proxy: use the global file names for conf->file
- CLEANUP: cfgparse: factor proxy vs log-forward collisions
- BUG/MINOR: cfgparse: detect another uncaught case of duplicate defaults
- MINOR: proxy: add a list of orphaned defaults sections
- MEDIUM: cfgparse: drop duplicate named defaults sections after use
- OPTIM: cfgparse: speed up duplicate server detection
- MEDIUM: cfgparse: warn about deprecated use of duplicate server names
- BUG/MINOR: server: shut down streams under thread isolation
- BUG/MINOR: proxy: also make the cli and resolvers use the global name
- REGTESTS: log: fix log-profile.vtc
- MEDIUM: mailers: warn about deprecated legacy mailers
- BUG/MEDIUM: cli: Be sure to catch immediate client abort
- DEV: flags/applet: decode appctx flags
- BUG/MEDIUM: cli: Deadlock when setting frontend maxconn
- MINOR: log: fix indent in strm_log()
- MINOR: log: introduce extra log profile steps
- MINOR: log: handle extra log origins in _process_send_log_override()
- MINOR: log: introduce log_orig flags
- MINOR: log: explicitly handle extra log origins as error when relevant
- MINOR: log: support extra log origins for '%OG' alias
- MINOR: proxy: add log_steps struct member
- MINOR: log: introduce "log-steps" proxy keyword
- MINOR: log: add log_orig_proxy() helper function
- MEDIUM: log: consider log-steps proxy setting for existing log origins
- DOC: config: document proxy "log-steps" keyword
- REGTESTS: add a test for proxy "log-steps"
- Revert "BUG/MINOR: server: shut down streams under thread isolation"
- MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG
- BUG/MEDIUM: stream: make stream_shutdown() async-safe
- BUG/MINOR: server: make sure the HMAINT state is part of MAINT
- BUG/MINOR: queue: make sure that maintenance redispatches server queue
- MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
- BUILD: tools: only include execinfo.h for the real backtrace() function
- MINOR: tools: do not attempt to use backtrace() on linux without glibc
- OPTIM: channel: speed up co_getline()'s search of the end of line
- OPTIM: stconn: Don't pretend mux have more data to deliver on EOI/EOS/ERROR
- BUG/MINOR: mcli: Pretend the mux have more data to deliver between two commands
- MINOR: action: Export release_expr_int_action() release function
- MINOR: stream: Rely on a per-stream max connection retries value
- MINOR: stream: Support dynamic changes of the number of connection retries
- MINOR: stream/stats: Expose the current number of streams in stats
- MINOR: stream/stats: Expose the total number of streams ever created in stats
- BUG/MINOR: cfgparse-global: fix allowed args number for setenv
- MINOR: cfgparse-global: add dedicated parser for *env keywords
- MINOR: mux-quic: complete Tx infos for QCS dump
- MINOR: quic: ensure txbuf realloc is only performed on empty buffer
- MINOR: mux-quic: strengthen qcs_send_metadata() usage
- MINOR: quic: remove unneeded notification of txbuf room
- MINOR: quic: refactor MUX send notification
- MEDIUM: quic: strengthen MUX send notification
- MINOR: quic: refactor STREAM room notification
- MINOR: quic: do not remove qc_stream_desc automatically on ACK handling
- MINOR: quic: store streambuf in a streamdesc tree
- MINOR: quic: move buffered ACK to streambuf
- MEDIUM: quic: handle out-of-order ACK at streamdesc layer
- MEDIUM: quic: refactor buffered STREAM ACK consuming
- BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server
- MINOR: config/trace: Add a 'traces' section to declare debug traces
- MINOR: trace: Be able to chain commands for a source in one line
- MINOR: tcpcheck: Add support for an option host header value for httpchk option
- BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding
- MINOR: mux-h1: Use a dedicated function to conditionnaly set EOI flag on SE
- BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade
- BUG/MINOR: mux-quic: fix crash on qcc_init() early return
- BUG/MINOR: quic: fix trace on releasing STREAM frame after ack
Support for headers and body hidden in the version for the "option httpchk"
directive was removed. However a Host header is mandatory for HTTP/1.1
requests and some servers may return an error if it is not set. For now, to
add it, an "http-check send" rule must be added. But it is not really handy
to use an extra config line for this purpose.
So now, it is possible to set the host header value, a log-format string, as
extra argument to "option httpchk" directive. It must be the fourth argument:
option httpchk GET / HTTP/1.1 www.srv.com
While this patch is not a bug fix, it is simple enough to be backported if
necessary. On 2.9 and older, lf_init_expr() does not exist and LIST_INIT() must
be used instead.
In the configuration file or on the CLI, configuring traces for a specific
source is a bit painful because this must be done in several lines. Thanks
to this patch, it is now possible to fully configure traces for a source in
one line. For instance, the following on the CLI:
trace h1 sink stderr; trace h1 level developer; trace h1 verbosity complete; trace h1 start now
can now be replaced by:
trace h1 sink stderr level developer verbosity complete start now
The same is true for the 'trace' directives in the configuration file.
It is no longer supported to declare debug traces, via 'trace' directive, in
a global section. A 'traces' directive must be used instead. The syntax of
the 'trace' directive in these sections remains the same. But it is no
longer experimental.
The main reason for this change is to avoid to have a ring section defined
before a global one. Indeed, for now, forward declarations of ring sections
are not supported. So to configure traces, you had to add a ring section
before the global one defining the traces. Most of time, that meant to have
two global sections :
global
[...] # global settings
ring <name>
[...]
global
[...] # trace config
In addition, it will be possible to easily extend the traces section by
adding some new directives.
Thanks to the previous patch, it is now possible to add an action to
dynamically change the maxumum number of connection retires for a stream.
"set-retries" action may now be used to do so, from a "tcp-request content"
or a "http-request" rule. This action accepts an expression or an integer
between 0 and 100. The integer value is checked during the configuration
parsing and leads to an error if it is not in the expected range. However,
for the expression, the value is retrieve at runtime. So, invalid value are
just ignored.
Too high value is forbidden to avoid any trouble. 100 retries seems already
be an amazingly hight value. In addition, the option is only available on
backend or listen sections.
Because the max retries is limited to 100 at most, it can be stored as a
unsigned short. This save some space in the stream structure.
As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2],
use of legacy mailers is now deprecated and will not be supported anymore
starting with version 3.3. Use of Lua script (AKA Lua mailers) is now
encouraged (and fully supported since 2.8) for this purpose, as it offers
more flexibility (e.g: alerts can be customized) and is more future-proof.
Configurations relying on legacy mailers will now raise a warning.
Users willing to keep their existing mailers config in a working state
should simply add the following line to their global section:
# mailers.lua file as provided in the git repository
# adjust path as needed
lua-load examples/lua/mailers.lua
[1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html
[2]: https://github.com/haproxy/wiki/wiki/Breaking-changes
Released version 3.1-dev8 with the following main changes :
- DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line
- MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state
- BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only
- REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades
- BUG/MEDIUM: clock: detect and cover jumps during execution
- BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
- BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}
- BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
- MEDIUM: ssl/cli: "dump ssl cert" allow to dump a certificate in PEM format
- BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
- BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
- REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
- BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
- REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
- MINOR: server: allow init-state for dynamic servers
- DOC: server: document what to check for when adding new server keywords
- MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
- BUG/MINOR: polling: fix time reporting when using busy polling
- BUG/MINOR: clock: make time jump corrections a bit more accurate
- BUG/MINOR: clock: validate that now_offset still applies to the current date
- BUG/MEDIUM: queue: implement a flag to check for the dequeuing
- OPTIM: sample: don't check casts for samples of same type
- OPTIM: vars: remove the unneeded lock in vars_prune_*
- OPTIM: vars: inline vars_prune() to avoid many calls
- MINOR: vars: remove the emptiness tests in callers before pruning
- IMPORT: import cebtree (compact elastic binary trees)
- OPTIM: vars: use a cebtree instead of a list for variable names
- OPTIM: vars: use multiple name heads in the vars struct
- BUG/MINOR: peers: local entries updates may not be advertised after resync
- DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
- MINOR: proxy: Rename accept-invalid-http-* options
- DOC: configuration: Remove dangerous directives from the proxy matrix
- BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
- BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
- BUG/MEDIUM: promex: Wait to have the request before sending the response
- MINOR: clock: test all clock_gettime() return values
- MEDIUM: clock: collect the monotonic time in clock_local_update_date()
- MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time
- MEDIUM: clock: use the monotonic clock for idle time calculation
- MEDIUM: clock: don't compute before_poll when using monotonic clock
- BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection
- BUG/MINOR: fix missing "'option httpslog' overrides previous 'option tcplog clf'..." detection
- BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
- BUG/MINOR: cfgparse: detect incorrect overlap of same backend names
- MEDIUM: cfgparse: warn about proxies having the same names
- DOC: management: add init-state to add server keywords
- BUG/MINOR: mux-quic: report glitches to session
- BUILD: cebtree: silence a bogus gcc warning on impossible code paths
- MEDIUM: cfgparse: warn about colliding names between defaults and proxies
- MEDIUM: cfgparse: detect collisions between defaults and log-forward
For now, that only concerns accept-invalid-http-{request/response} and
accept-unsafe-violations-in-http-{request/response}. But the idea is to make
dangerous directives hard to find. It is one more way to discourage anyone
to use it. And, optionnaly, it is also handy because it keeps the matrix
aligned on 80 columns.
With these options, it is possible to accept some invalid messages that may
considered as unsafe and may result as vulnerabilities. The naming is not
explicit enough on this point. These option must really be considered as
dangerous and only used as a temporary workaround. Unfortunately, when used,
it is probably because there are some legacy and unsupported applications in
place. Nevermind. The documentation warns about the use of these
options. Now the name of the options itself is a warning.
So now, "accept-invalid-http-request" and "accept-invalid-http-response"
options are deprecated and replaced by
"accept-unsafe-violations-in-http-request" and
"accept-unsafe-violations-in-http-response" options.
Time to time, new exceptions are added in the HTTP parsing (most of time H1)
to not reject some invalid messages sent by legacy applications. But the
documentation of accept-invalid-http-request and
accept-invalid-http-response options is not pretty clear. So, now, there is
an explicit list of relaxing rules for both options.
Commit 50322df introduced the init-state keyword, but it didn't enable
it for dynamic servers. However, this feature is perfectly desirable
for virtual servers too, where someone would like a server inlived
through "set server be1/srv1 state ready" to be put out of maintenance
in down state until the next health check succeeds.
At reading the code, it seems that it's only a matter of allowing this
keyword for dynamic servers, as current code path calls
srv_adm_set_ready() which incidentally triggers a call to
_srv_update_status_adm().
The new "dump ssl cert" CLI command allows to dump a certificate stored
into HAProxy memory. Until now it was only possible to dump the
description of the certificate using "show ssl cert", but with this new
command you can dump the PEM content on the filesystem.
This command is only available on a admin stats socket.
$ echo "@1 dump ssl cert cert.pem" | socat /tmp/master.sock -
-----BEGIN PRIVATE KEY-----
[...]
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
When HAPROXY_HTTP_LOG_FMT was added by commit 537b9e7f36 ("MINOR: config:
add environment variables for default log format"), the example was placed
by accident after the clf log format instead of the HTTP log format,
causing a bit of confusion.
This can be backported to 2.8.
Released version 3.1-dev7 with the following main changes :
- MINOR: config: Created env variables for http and tcp clf formats
- MINOR: mux-quic: add buf_in_flight to QCC debug infos
- MINOR: mux-quic: correct qcc_bufwnd_full() documentation
- MINOR: tools: add helpers to backup/clean/restore env
- MINOR: mworker: restore initial env before wait mode
- BUG/MINOR: haproxy: free init_env in deinit only if allocated
- BUILD: tools: environ is not defined in OS X and BSD
- DEV: coccinelle: add a test to detect unchecked malloc()
- DEV: coccinelle: add a test to detect unchecked calloc()
- CI: QUIC Interop AWS-LC: enable ngtcp2 client
- CI: fix missing comma introduced in 956839c0f6
- CI: QUIC Interop: do not run bandwidth measurement tests
- CI: QUIC Interop: use different artifact names for uploading logs
- BUILD: quic: 32bits build broken by wrong integer conversions for printf()
- CLEANUP: ssl: cleanup the clienthello capture
- MEDIUM: ssl: capture the supported_versions extension from Client Hello
- MEDIUM: ssl/sample: add ssl_fc_supported_versions_bin sample fetch
- MEDIUM: ssl: capture the signature_algorithms extension from Client Hello
- MEDIUM: ssl/sample: add ssl_fc_sigalgs_bin sample fetch
- MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
- BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding
- BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready
- BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
- CLEANUP: haproxy: fix typos in code comment
- CLEANUP: mqtt: fix typo in MQTT_REMAINING_LENGHT_MAX_SIZE
- MINOR: tools: Implement ipaddrcpy().
- MINOR: quic: Implement quic_tls_derive_token_secret().
- MINOR: quic: Token for future connections implementation.
- BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
- MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
- MINOR: quic: Implement qc_ssl_eary_data_accepted().
- MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
- BUG/MEDIUM: quic: always validate sender address on 0-RTT
- BUILD: quic: fix build errors on FreeBSD since recent GSO changes
- MINOR: tools: extend str2sa_range to add an alt parameter
- MINOR: server: add a alt_proto field for server
- MEDIUM: sock: use protocol when creating socket
- MEDIUM: protocol: add MPTCP per address support
- BUG/MINOR: quic: Crash from trace dumping SSL eary data status (AWS-LC)
- MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates
- MEDIUM: bwlim: Use a read-lock on the sticky session to apply a shared limit
- BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown
- BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli
- BUG/MINOR: quic: unexploited retransmission cases for Initial pktns.
- BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered
- MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places
- BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
- BUG/MINOR: mux-spop: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
- BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns
- BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut()
- CLEANUP: assorted typo fixes in the code and comments
- DEV: patchbot: count the number of backported/non-backported patches
- DEV: patchbot: add direct links to show only specific categories
- DEV: patchbot: detect commit IDs starting with 7 chars
- BUG/MEDIUM: clock: also update the date offset on time jumps
- MEDIUM: server: add init-state
Allow the user to set the "initial state" of a server.
Context:
Servers are always set in an UP status by default. In
some cases, further checks are required to determine if the server is
ready to receive client traffic.
This introduces the "init-state {up|down}" configuration parameter to
the server.
- when set to 'fully-up', the server is considered immediately available
and can turn to the DOWN sate when ALL health checks fail.
- when set to 'up' (the default), the server is considered immediately
available and will initiate a health check that can turn it to the DOWN
state immediately if it fails.
- when set to 'down', the server initially is considered unavailable and
will initiate a health check that can turn it to the UP state immediately
if it succeeds.
- when set to 'fully-down', the server is initially considered unavailable
and can turn to the UP state when ALL health checks succeed.
The server's init-state is considered when the HAProxy instance
is (re)started, a new server is detected (for example via service
discovery / DNS resolution), a server exits maintenance, etc.
Link: https://github.com/haproxy/haproxy/issues/51
Add a factor parameter to stick-tables, called "brates-factor", that is
applied to in/out bytes rates to work around the 32-bits limit of the
frequency counters. Thanks to this factor, it is possible to have bytes
rates beyond the 4GB. Instead of counting each bytes, we count blocks
of bytes. Among other things, it will be useful for the bwlim filter, to be
able to configure shared limit exceeding the 4GB/s.
For now, this parameter must be in the range ]0-1024].
Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths.
Multipath TCP has been used for several use cases. On smartphones, MPTCP
enables seamless handovers between cellular and Wi-Fi networks while
preserving established connections. This use-case is what pushed Apple
to use MPTCP since 2013 in multiple applications [2]. On dual-stack
hosts, Multipath TCP enables the TCP connection to automatically use the
best performing path, either IPv4 or IPv6. If one path fails, MPTCP
automatically uses the other path.
To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [3]. To
use it on Linux, an application must explicitly enable it when creating
the socket. No need to change anything else in the application.
This attached patch adds MPTCP per address support, to be used with:
mptcp{,4,6}@<address>[:port1[-port2]]
MPTCP v4 and v6 protocols have been added: they are mainly a copy of the
TCP ones, with small differences: names, proto, and receivers lists.
These protocols are stored in __protocol_by_family, as an alternative to
TCP, similar to what has been done with QUIC. By doing that, the size of
__protocol_by_family has not been increased, and it behaves like TCP.
MPTCP is both supported for the frontend and backend sides.
Also added an example of configuration using mptcp along with a backend
allowing to experiment with it.
Note that this is a re-implementation of Björn's work from 3 years ago
[4], when haproxy's internals were probably less ready to deal with
this, causing his work to be left pending for a while.
Currently, the TCP_MAXSEG socket option doesn't seem to be supported
with MPTCP [5]. This results in a warning when trying to set the MSS of
sockets in proto_tcp:tcp_bind_listener.
This can be resolved by adding two new variables:
sock_inet(6)_mptcp_maxseg_default that will hold the default
value of the TCP_MAXSEG option. Note that for the moment, this
will always be -1 as the option isn't supported. However, in the
future, when the support for this option will be added, it should
contain the correct value for the MSS, allowing to correctly
set the TCP_MAXSEG option.
Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2]
Link: https://www.mptcp.dev [3]
Link: https://github.com/haproxy/haproxy/issues/1028 [4]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/515 [5]
Co-authored-by: Dorian Craps <dorian.craps@student.vinci.be>
Co-authored-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
The "429" status can now be specified on retry-on directives. PR_RE_* flags
were updated to remains sorted.
This patch should fix the issue #2687. It is quite simple so it may safely
be backported to 3.0 if necessary.