The buffer reserve set by tune.buffers.reserve has long been unused, and
in order to deal gracefully with failed memory allocations we'll need to
resort to a few emergency buffers that are pre-allocated per thread.
These buffers are only for emergency use, so every time their count is
below the configured number a b_free() will refill them. For this reason
their count can remain pretty low. We changed the default number from 2
to 4 per thread, and the minimum value is now zero (e.g. for low-memory
systems). The tune.buffers.limit setting has always been a problem when
trying to deal with the reserve but now we could simplify it by simply
pushing the limit (if set) to match the reserve. That was already done in
the past with a static value, but now with threads it was a bit trickier,
which is why the per-thread allocators increment the limit on the fly
before allocating their own buffers. This also means that the configured
limit is saner and now corresponds to the regular buffers that can be
allocated on top of emergency buffers.
At the moment these emergency buffers are not used upon allocation
failure. The only reason is to ease bisecting later if needed, since
this commit only has to deal with resource management.
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).
The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.
For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
- MUX_RX: buffer used to receive data from the OS
- SE_RX: buffer used to place a transformation of the RX data for
a mux, or to produce a response for an applet
- CHANNEL: the channel buffer for sync recv
- MUX_TX: buffer used to transfer data from the channel to the outside,
generally a mux but there can be a few specificities (e.g.
http client's response buffer passed to the application,
which also gets a transformation of the channel data).
The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
At the beginning of the filter class section, we encourage the user to
check out filters.txt file to get to know how the filters API works
within haproxy.
However the file location is incorrect. The proper directory to look for
the file is: doc/internals/api.
It should be backported up to 2.5.
Thanks to the previous fix, it is now possible to get the number of opened
streams for a connection and the negociated limit. Here, corresponding
sample feches are added, in fc_ and bc_ scopes.
On frontend side, the limit of streams is imposed by HAProxy. But on the
backend side, the limit is defined by the server. it may be useful for
debugging purpose because it may explain slow-downs on some processing.
It is now possible to retrieve some info about the abort received for a
server or a client stream, if any.
* fs.aborted and bs.aborted can be used to know if an abort was received
on frontend or backend side. A boolean is returned.
* fs.rst_code and bs.rst_code return the code of the received RESET_STREAM
frame for a H2 stream or the code of the received STOP_SENDING frame for
a QUIC stream. In both cases, the error code attached to the frame is
returned. The sample fetch fails if no such frame was received or if the
stream is not an H2/QUIC stream.
Released version 3.0-dev10 with the following main changes :
- BUG/MEDIUM: cache: Vary not working properly on anything other than accept-encoding
- REGTESTS: cache: Add test on 'vary' other than accept-encoding
- BUG/MINOR: stats: replace objt_* by __objt_* macros
- CLEANUP: tools/cbor: rename cbor_encode_ctx struct members
- MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx
- BUG/MINOR: log: fix global lf_expr node options behavior
- CLEANUP: log: add a macro to know if a lf_node is configurable
- MINOR: httpclient: allow to use absolute URI with new flag HC_F_HTTPROXY
- MINOR: ssl: introduce ocsp_update.http_proxy for ocsp-update keyword
- BUG/MINOR: log/encode: consider global options for key encoding
- BUG/MINOR: log/encode: fix potential NULL-dereference in LOGCHAR()
- BUG/MINOR: log: fix global lf_expr node options behavior (2nd try)
- MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx (again)
- BUG/MEDIUM: log: don't ignore disabled node's options
- BUG/MINOR: stconn: don't wake up an applet waiting on buffer allocation
- MINOR: sock: rename sock to sock_fd in sock_create_server_socket
- MEDIUM: proto_uxst: take in account server namespace
- MEIDUM: unix sock: use my_socketat to create bind socket
- MINOR: sock_set_mark: take sock family in account
- MEDIUM: proto: make common fd checks in sock_create_server_socket
- MINOR: sock: add EPERM case in sock_handle_system_err
- MINOR: capabilities: add cap_sys_admin support
- CLEANUP: ssl: clean the includes in ssl_ocsp.c
- CLEANUP: ssl: move the global ocsp-update options parsing to ssl_ocsp.c
- MINOR: stats: fix visual alignment for stat_cols_px definition
- MINOR: stats: convert req_tot as generic column
- MINOR: stats: prepare stats-file support for values other than FN_COUNTER
- MINOR: counters: move freq-ctr from proxy/server into counters struct
- MINOR: stats: support rate in stats-file
- MINOR: stats: convert rate as generic column for proxy stats
- MINOR: counters: move last_change into counters struct
- MINOR: stats: support age in stats-file
- MINOR: stats: convert age as generic column for proxy stat
- CLEANUP: ssl: rename new_ckch_store_load_files_path() to ckch_store_new_load_files_path()
- MINOR: ssl: rename ocsp_update.http_proxy into ocsp-update.httpproxy
- REORG: stats: define stats-proxy source module
- MINOR: stats: extract proxy clear-counter in a dedicated function
- REGTESTS: stats: add test stats-file counters preload
- CI: netbsd: adjust packages after NetBSD-10 released
- CLEANUP: assorted typo fixes in the code and comments
- REGTESTS: replace REQUIRE_VERSION by version_atleast
- MEDIUM: log: optimizing tmp->type handling in sess_build_logline()
- BUG/MINOR: log: prevent double spaces emission in sess_build_logline()
- OPTIM: log: declare empty buffer as global variable
- OPTIM: log: use thread local lf_buildctx to stop pushing it on the stack
- OPTIM: log: use lf_buildctx's buffer instead of temporary stack buffers
- OPTIM: log: speedup date printing in sess_build_logline() when no encoding is used
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.
To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.
As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.
If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.
In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
The ocsp_update.http_proxy global option allows to set an HTTP proxy
address which will be used to send the OCSP update request with an
absolute form URI.
Released version 3.0-dev9 with the following main changes :
- BUILD: ssl: use %zd for sizeof() in ssl_ckch.c
- MINOR: backend: use be_counters for health down accounting
- BUG/MINOR: backend: use cum_sess counters instead of cum_conn
- BUG/MINOR: stats: fix stot metric for listeners
- REGTESTS: use -dI for insecure fork by default in the regtest scripts
- MINOR: stats: rename proxy stats
- MINOR: stats: rename ambiguous stat_l and stat_count
- MINOR: stats: rename info stats
- MINOR: stats: use stricter naming stats/field/line
- MINOR: stats: use STAT_F_* prefix for flags
- BUG/MEDIUM: applet: Let's applets decide if they have more data to deliver
- BUILD: stick-tables: silence build warnings when threads are disabled
- MINOR: tools: Rename `ha_generate_uuid` to `ha_generate_uuid_v4`
- MINOR: Add `ha_generate_uuid_v7`
- MINOR: Add support for UUIDv7 to the `uuid` sample fetch
- MEDIUM: shctx: Naming shared memory context
- BUG/MINOR: h1: fix detection of upper bytes in the URI
- MINOR: intops: add a pair of functions to check multi-byte ranges
- TESTS: add a unit test for the multi-byte range checks
- CLEANUP: h1: make use of the multi-byte matching functions
- REGTESTS: ssl: Remove "sleep" calls from ocsp auto update test
- BUG/MEDIUM: peers: Automatically start to learn on local peer
- BUG/MEDIUM: peers: Reprocess peer state after all session shutdowns
- MINOR: peers: Remove unused PEERS_F_RESYNC_REQUESTED flag
- MINOR: peers: Don't set TEACH flags on a peer from the sync task
- MINOR: peers: Use a peer flag to block the applet waiting ack of the sync task
- BUG/MEDIUM: peers: Wait for sync task ack when a resynchro is finished
- MINOR: peers: Remove unused PEERS_F_RESYNC_PROCESS flag
- MINOR: applet: Add a function to know the side where an applet was created
- MEDIUM: peers: Simplify the peer flags dealing with the connection state
- MEDIUM: peers: Use true states for the peer applets as seen from outside
- MEDIUM: peers: Use true states for the learn state of a peer
- MINOR: peers: Start learning for local peer before receiving messages
- MINOR: peers: Rename PEERS_F_TEACH_COMPLETE to PEERS_F_LOCAL_TEACH_COMPLETE
- MINOR: peers: Reorder and slightly rename PEER flags
- MINOR: peers: Reorder and rename PEERS flags
- REORG: peers: Move peer and peers flags in the corresponding header file
- DEV: flags/peers: Decode PEER and PEERS flags
- MINOR: peers: Add comment on processing functions of the sync task
- MINOR: peers: Use a static variable to wait a resync on reload
- BUG/MEDIUM: peers: Use atomic operations on peers flags when necessary
- REORG: peers: Rename all occurrences to 'ps' variable
- BUG/MINOR: peers: Don't wait for a remote resync if there no remote peer
- MINOR: stats: update ambiguous "metrics" naming to "stat_cols"
- MINOR: stats: introduce a more expressive stat definition method
- MINOR: stats: implement automatic metric generation from stat_col
- MINOR: stats: hide some columns in output
- MEDIUM: stats: convert counters to new column definition
- MINOR: stats: define stats-file output format support
- MEDIUM: stats: implement dump stats-file CLI
- MINOR: ist: define iststrip() new function
- MINOR: guid: define guid_is_valid_fmt()
- MINOR: stats: apply stats-file on process startup
- MINOR: stats: parse header lines from stats-file
- MINOR: stats: parse values from stats-file
- MEDIUM: stats: define stats-file keyword
- BUG/MINOR: mworker: reintroduce way to disable seamless reload with -x /dev/null
- CLEANUP: log: remove unused checks for encode_{chunk,string}
- MINOR: log: store lf_expr nodes inside substruct
- MINOR: log: global lf_expr node options
- CLEANUP: log: simplify complex values usages in sess_build_logline()
- MINOR: log: skip custom logformat_node name if empty
- MINOR: log: add lf_int() wrapper to print integers
- MINOR: log: add lf_rawtext{_len}() functions
- MEDIUM: log: pass date strings to lf_rawtext()
- MEDIUM: log: write raw strings using lf_rawtext()
- MEDIUM: log: use lf_rawtext for lf_ip() and lf_port() hex strings
- MINOR: log: explicitly handle %ts and %tsc as text strings
- MINOR: log: use LOG_VARTEXT_{START,END} to enclose text strings
- MINOR: log: make all lf_* sess build helper static
- MINOR: log: merge lf_encode_string() and lf_encode_chunk() logic
- MEDIUM: log: lf_* build helpers now take a ctx argument
- MINOR: log: expose node typecast in lf_buildctx struct
- MINOR: log: postpone conversion for sample expressions in sess_build_logline()
- MINOR: log: add LOG_OPT_NONE flag
- MINOR: log: add no_escape_map to bypass escape with _lf_encode_bytes()
- MINOR: log: add +bin logformat node option
- MINOR: log: add +json encoding option
- MINOR: tools: add cbor encode helpers
- MINOR: log: add +cbor encoding option
- MINOR: log: support true cbor binary encoding
- CLEANUP: dynbuf: move the reserve and limit parsers to dynbuf.c
- MINOR: list: add a macro to detect that a list contains at most one element
- MINOR: cli/wait: rename the condition "srv-unused" to "srv-removable"
As previously discussed, "srv-unused" is sufficiently ambiguous to cause
some trouble over the long term. Better use "srv-removable" to indicate
that the server is removable, and if the conditions to delete a server
change over time, the wait condition will be adjusted without renaming
it.
CBOR in hex format as implemented in previous commit is convenient because
the produced output is portable and can easily be embedded in regular
syslog payloads.
However, one of the goal of CBOR implementation is to be able to produce
"Concise Binary" object representation. Here is an excerpt from cbor.io
website:
"Some applications also benefit from CBOR itself being encoded in
binary. This saves bulk and allows faster processing."
Currently we don't offer that with '+cbor', quite the opposite actually
since a text string encoded with '+cbor' option will be larger than a
text string encoded with '+json' or without encoding at all, because for
each CBOR binary byte, 2 characters will be emitted.
Hopefully, the sink/log API allows for binary data to be passed as
parameter, this is because all relevant functions in the chain don't rely
on the terminating NULL byte and take a string pointer + string length as
parameter. We can actually rely on this property to support the '+bin'
option when combined with '+cbor' to produce RAW binary CBOR output.
Be careful though, as this is only intended for use with set-var-fmt or to
send binary data to capable UDP/ring endpoints.
Example:
log-format "%{+cbor,+bin}o %(test)[bin(00AABB)]"
Will produce:
bf64746573745f4300aabbffff
(output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)
With cbor.me pretty printer, it gives us:
BF # map(*)
64 # text(4)
74657374 # "test"
5F # bytes(*)
43 # bytes(3)
00AABB # "\u0000\xAA\xBB"
FF # primitive(*)
FF # primitive(*)
In this patch, we make use of the CBOR (RFC8949) encode helper functions
from the previous commit to implement '+cbor' encoding option for log-
formats. The logic behind it is pretty similar to '+json' encoding option,
except that the produced output is a CBOR payload written in HEX format so
that it remains compatible to use this with regular syslog endpoints.
Example:
log-format "%{+cbor}o %[int(4)] test %(named_field)[str(ok)]"
Will produce:
BF6B6E616D65645F6669656C64626F6BFF
Detailed view (from cbor.me):
BF # map(*)
6B # text(11)
6E616D65645F6669656C64 # "named_field"
62 # text(2)
6F6B # "ok"
FF # primitive(*)
If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to CBOR specification.
Example:
log-format "test cbor bool: %{+cbor}[bool(true)]"
Will produce:
test cbor bool: F5
In this patch, we add the "+json" log format option that can be set
globally or per log format node.
What it does, it that it sets the LOG_OPT_ENCODE_JSON flag for the
current context which is provided to all lf_* log building function.
This way, all lf_* are now aware of this option and try to comply with
JSON specification when the option is set.
If the option is set globally, then sess_build_logline() will produce a
map-like object with key=val pairs for named logformat nodes.
(logformat nodes that don't have a name are simply ignored).
Example:
log-format "%{+json}o %[int(4)] test %(named_field)[str(ok)]"
Will produce:
{"named_field": "ok"}
If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to JSON specification.
Example:
log-format "{ \"manual_key\": %(named_field){+json}[bool(true)] }"
Will produce:
{"manual_key": true}
When the option is set, +E option will be ignored, and partial numerical
values (ie: because of logasap) will be encoded as-is.
Support '+bin' option argument on logformat nodes to try to preserve
binary output type with binary sample expressions.
For this, we rely on the log/sink API which is capable of conveying binary
data since all related functions don't search for a terminating NULL byte
in provided log payload as they take a string pointer and a string length
as argument.
Example:
log-format "%{+bin}o %[bin(00AABB)]"
Will produce:
00aabb
(output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)
This should be used carefully, because many syslog endpoints don't expect
binary data (especially NULL bytes). This is mainly intended for use with
set-var-fmt actions or with ring/udp log endpoints that know how to deal
with such binary payloads.
Also, this option is only supported globally (for use with '%o'), it will
not have any effect when set on an individual node. (it makes no sense to
have binary data in the middle of log payload that was started without
binary data option)
Since the introduction of the automatic seamless reload using the
internal socketpair, there is no way of disabling the seamless reload.
Previously we just needed to remove -x from the startup command line,
and remove any "expose-fd" keyword on stats socket lines.
This was introduced in 2be557f7c ("MEDIUM: mworker: seamless reload use
the internal sockpairs").
The patch copy /dev/null again and pass it to the next exec so we never
try to get socket from the -x.
Must be backported as far as 2.6.
This commit is the final to implement preloading of haproxy internal
counters via stats-file parsing.
Define a global keyword "stats-file". It allows to specify the path to
the stats-file which will be parsed on process startup.
Define a new CLI command "dump stats-file" with its handler
cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump
first frontend and then backend side. It reuses the common function
stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the
correct side.
A new module stats-file.c is added to regroup function specifics to
stats-file. It defines two main functions :
* stats_dump_file_header() to generate the list of column list prefixed
by the line context, either "#fe" or "#be"
* stats_dump_fields_file() to generate each stat lines. Object without
GUID are skipped. Each stat entry is separated by a comma.
For the moment, stats-file does not support statistics modules. As such,
stats_dump_*_line() functions are updated to prevent looping over stats
module on stats-file output.
Released version 3.0-dev8 with the following main changes :
- BUG/MINOR: cli: Don't warn about a too big command for incomplete commands
- BUG/MINOR: listener: always assign distinct IDs to shards
- BUG/MINOR: log: fix lf_text_len() truncate inconsistency
- BUG/MINOR: tools/log: invalid encode_{chunk,string} usage
- BUG/MINOR: log: invalid snprintf() usage in sess_build_logline()
- CLEANUP: log: lf_text_len() returns a pointer not an integer
- MINOR: quic: simplify qc_send_hdshk_pkts() return
- MINOR: quic: uniformize sending methods for handshake
- MINOR: quic: improve sending API on retransmit
- MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb
- MEDIUM: quic: remove duplicate hdshk/app send functions
- OPTIM: quic: do not call qc_send() if nothing to emit
- OPTIM: quic: do not call qc_prep_pkts() if everything sent
- BUG/MEDIUM: http-ana: Deliver 502 on keep-alive for fressh server connection
- BUG/MINOR: http-ana: Fix TX_L7_RETRY and TX_D_L7_RETRY values
- BUILD: makefile: warn about unknown USE_* variables
- BUILD: makefile: support USE_xxx=0 as well
- BUG/MINOR: guid: fix crash on invalid guid name
- BUILD: atomic: fix peers build regression on gcc < 4.7 after recent changes
- BUG/MINOR: debug: make sure DEBUG_STRICT=0 does work as documented
- BUILD: cache: fix non-inline vs inline declaration mismatch to silence a warning
- BUILD: debug: make DEBUG_STRICT=1 the default
- BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option
- CI: update the build options to get rid of unneeded DEBUG options
- BUILD: makefile: get rid of the config CFLAGS variable
- BUILD: makefile: allow to use CFLAGS to append build options
- BUILD: makefile: drop the SMALL_OPTS settings
- BUILD: makefile: move -O2 from CPU_CFLAGS to OPT_CFLAGS
- BUILD: makefile: get rid of the CPU variable
- BUILD: makefile: drop the ARCH variable and better document ARCH_FLAGS
- BUILD: makefile: extract ARCH_FLAGS out of LDFLAGS
- BUILD: makefile: move the fwrapv option to STD_CFLAGS
- BUILD: makefile: make the ERR variable also support 0
- BUILD: makefile: add FAILFAST to select the -Wfatal-errors behavior
- BUILD: makefile: extract -Werror/-Wfatal-errors from automatic CFLAGS
- BUILD: makefile: split WARN_CFLAGS from SPEC_CFLAGS
- BUILD: makefile: rename SPEC_CFLAGS to NOWARN_CFLAGS
- BUILD: makefile: do not pass warnings to VERBOSE_CFLAGS
- BUILD: makefile: also drop DEBUG_CFLAGS
- CLEANUP: makefile: make the output of the "opts" target more readable
- DOC: install: clarify the build process by splitting it into subsections
- BUG/MINOR: server: fix slowstart behavior
- BUG/MEDIUM: cache/stats: Handle inbuf allocation failure in the I/O handler
- MINOR: ssl: add the section parser for 'crt-store'
- DOC: configuration: Add 3.12 Certificate Storage
- REGTESTS: ssl: test simple case of crt-store
- MINOR: ssl: rename ckchs_load_cert_file to new_ckch_store_load_files_path
- MINOR: ssl/crtlist: alloc ssl_conf only when a valid keyword is found
- BUG/MEDIUM: stick-tables: fix the task's next expiration date
- CLEANUP: stick-tables: always respect the to_batch limit when trashing
- BUG/MEDIUM: peers/trace: fix crash when listing event types
- BUG/MAJOR: stick-tables: fix race with peers in entry expiration
- DEBUG: pool: improve decoding of corrupted pools
- REORG: pool: move the area dump with symbol resolution to tools.c
- DEBUG: pools: report the data around the offending area in case of mismatch
- MINOR: listener/protocol: add proto name in alerts
- MINOR: proto_quic: add proto name in alert
- BUG/MINOR: lru: fix the standalone test case for invalid revision
- DOC: management: fix typos
- CI: revert kernel addr randomization introduced in 3a0fc864
- MINOR: ring: clarify the usage of ring_size() and add ring_allocated_size()
- BUG/MAJOR: ring: use the correct size to reallocate startup_logs
- MINOR: ring: always check that the old ring fits in the new one in ring_dup()
- CLEANUP: ssl: remove dead code in cfg_parse_crtstore()
- MINOR: ssl: supports crt-base in crt-store
- MINOR: ssl: 'key-base' allows to load a 'key' from a specific path
- MINOR: net_helper: Add support for floats/doubles.
- BUG/MEDIUM: grpc: Fix several unaligned 32/64 bits accesses
- MINOR: peers: Split resync process function to separate running/stopping states
- MINOR: peers: Add 2 peer flags about the peer learn status
- MINOR: peers: Add flags to report the peer state to the resync task
- MINOR: peers: sligthly adapt part processing the stopping signal
- MINOR: peers: Add functions to commit peer changes from the resync task
- BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
- BUG/MAJOR: peers: Update peers section state from a thread-safe manner
- MEDIUM: peers: Only lock one peer at a time in the sync process function
- MINOR: peer: Restore previous peer flags value to ease debugging
- BUG/MEDIUM: stconn: Don't forward channel data if input data must be filtered
- BUILD: cache: fix a build warning with gcc < 7
- BUILD: xxhash: silence a build warning on Solaris + gcc-5.5
- CI: reduce ASAN log redirection umbrella size
- CLEANUP: assorted typo fixes in the code and comments
- BUG/MEDIUM: evports: do not clear returned events list on signal
- MEDIUM: evports: permit to report multiple events at once
- MEDIUM: ssl: support aliases in crt-store
- BUG/MINOR: ssl: check on forbidden character on wrong value
- BUG/MINOR: ssl: fix crt-store load parsing
- BUG/MEDIUM: applet: Fix applet API to put input data in a buffer
- BUG/MEDIUM: spoe: Always retry when an applet fails to send a frame
- BUG/MEDIUM: peers: Fix exit condition when max-updates-at-once is reached
- BUILD: linuxcap: Properly declare prepare_caps_from_permitted_set()
- BUG/MEDIUM: peers: fix localpeer regression with 'bind+server' config style
- MINOR: peers: stop relying on srv->addr to find peer port
- MEDIUM: ssl: support a named crt-store section
- MINOR: stats: remove implicit static trash_chunk usage
- REORG: stats: extract HTML related functions
- REORG: stats: extract JSON related functions
- MEDIUM: ssl: crt-base and key-base local keywords for crt-store
- MINOR: stats: Get the right prototype for stats_dump_html_end().
- MAJOR: ssl: use the msg callback mecanism for backend connections
- MINOR: ssl: implement keylog fetches for backend connections
- BUG/MINOR: stconn: Fix sc_mux_strm() return value
- MINOR: mux-pt: Test conn flags instead of sedesc ones to perform a full close
- MINOR: stconn/connection: Move shut modes at the SE descriptor level
- MINOR: stconn: Rewrite shutdown functions to simplify the switch statements
- MEDIUM: stconn: Use only one SC function to shut connection endpoints
- MEDIUM: stconn: Explicitly pass shut modes to shut applet endpoints
- MEDIUM: stconn: Use one function to shut connection and applet endpoints
- MEDIUM: muxes: Use one callback function to shut a mux stream
- BUG/MINOR: sock: handle a weird condition with connect()
- BUG/MINOR: fd: my_closefrom() on Linux could skip contiguous series of sockets
- BUG/MEDIUM: peers: Don't set PEERS_F_RESYNC_PROCESS flag on a peer
- BUG/MEDIUM: peers: Fix state transitions of a peer
- MINOR: init: use RLIMIT_DATA instead of RLIMIT_AS
- CI: modernize macos matrix
Limiting total allocatable process memory (VSZ) via setting RLIMIT_AS limit is
no longer effective, in order to restrict memory consumption at run time.
We can see from process memory map below, that there are many holes within
the process VA space, which bumps its VSZ to 1.5G. These holes are here by
many reasons and could be explaned at first by the full randomization of
system VA space. Now it is usually enabled in Linux kernels by default. There
are always gaps around the process stack area to trap overflows. Holes before
and after shared libraries could be explained by the fact, that on many
architectures libraries have a 'preferred' address to be loaded at; putting
them elsewhere requires relocation work, and probably some unshared pages.
Repetitive holes of 65380K are most probably correspond to the header that
malloc has to allocate before asked a claimed memory block. This header is
used by malloc to link allocated chunks together and for its internal book
keeping.
$ sudo pmap -x -p `pidof haproxy`
127136: ./haproxy -f /home/haproxy/haproxy/haproxy_h2.cfg
Address Kbytes RSS Dirty Mode Mapping
0000555555554000 388 64 0 r---- /home/haproxy/haproxy/haproxy
00005555555b5000 2608 1216 0 r-x-- /home/haproxy/haproxy/haproxy
0000555555841000 916 64 0 r---- /home/haproxy/haproxy/haproxy
0000555555926000 60 60 60 r---- /home/haproxy/haproxy/haproxy
0000555555935000 116 116 116 rw--- /home/haproxy/haproxy/haproxy
0000555555952000 7872 5236 5236 rw--- [ anon ]
00007fff98000000 156 36 36 rw--- [ anon ]
00007fff98027000 65380 0 0 ----- [ anon ]
00007fffa0000000 156 36 36 rw--- [ anon ]
00007fffa0027000 65380 0 0 ----- [ anon ]
00007fffa4000000 156 36 36 rw--- [ anon ]
00007fffa4027000 65380 0 0 ----- [ anon ]
00007fffa8000000 156 36 36 rw--- [ anon ]
00007fffa8027000 65380 0 0 ----- [ anon ]
00007fffac000000 156 36 36 rw--- [ anon ]
00007fffac027000 65380 0 0 ----- [ anon ]
00007fffb0000000 156 36 36 rw--- [ anon ]
00007fffb0027000 65380 0 0 ----- [ anon ]
...
00007ffff7fce000 4 4 0 r-x-- [ anon ]
00007ffff7fcf000 4 4 0 r---- /usr/lib/x86_64-linux-gnu/ld-2.31.so
00007ffff7fd0000 140 140 0 r-x-- /usr/lib/x86_64-linux-gnu/ld-2.31.so
...
00007ffff7ffe000 4 4 4 rw--- [ anon ]
00007ffffffde000 132 20 20 rw--- [ stack ]
ffffffffff600000 4 0 0 --x-- [ anon ]
---------------- ------- ------- -------
total kB 1499288 75504 72760
This exceeded VSZ makes impossible to start an haproxy process with 200M
memory limit, set at its initialization stage as RLIMIT_AS. We usually
have in this case such cryptic output at stderr:
$ haproxy -m 200 -f haproxy_quic.cfg
(null)(null)(null)(null)(null)(null)
At the same time the process RSS (a memory really used) is only 75,5M.
So to make process memory accounting more realistic let's base the memory
limit, set by -m option, on RSS measurement and let's use RLIMIT_DATA instead
of RLIMIT_AS.
RLIMIT_AS was used before, because earlier versions of haproxy always allocate
memory buffers for new connections, but data were not written there
immediately. So these buffers were not instantly counted in RSS, but were
always counted in VSZ. Now we allocate new buffers only in the case, when we
will write there some data immediately, so using RLIMIT_DATA becomes more
appropriate.
This patch implements the backend side of the keylog fetches.
The code was ready but needed the SSL message callbacks.
This could be used like this:
log-format "CLIENT_EARLY_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_client_early_traffic_secret]\n
CLIENT_HANDSHAKE_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_client_handshake_traffic_secret]\n
SERVER_HANDSHAKE_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_server_handshake_traffic_secret]\n
CLIENT_TRAFFIC_SECRET_0 %[ssl_bc_client_random,hex] %[ssl_bc_client_traffic_secret_0]\n
SERVER_TRAFFIC_SECRET_0 %[ssl_bc_client_random,hex] %[ssl_bc_server_traffic_secret_0]\n
EXPORTER_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_exporter_secret]\n
EARLY_EXPORTER_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_early_exporter_secret]"
Add support for crt-base and key-base local keywords for the crt-store.
current_crtbase and current_keybase are filed with a copy of the global
keyword argument when a crt-store is declared, and updated with a new
path when the keywords are in the crt-store section.
The ckch_conf_kws[] array was updated with ¤t_crtbase and
¤t_keybase instead of the global_ssl ones so the parser can use
them.
The keyword must be used before any "load" line in a crt-store section.
Example:
crt-store web
crt-base /etc/ssl/certs/
key-base /etc/ssl/private/
load crt "site3.crt" alias "site3"
load crt "site4.crt" key "site4.key"
frontend in2
bind *:443 ssl crt "@web/site3" crt "@web/site4.crt"
This patch introduces named crt-store section. A named crt-store allows
to add a scope to the crt name.
For example, a crt named "foo.crt" in a crt-store named "web" will
result in a certificate called "@web/foo.crt".
The crt-store load line now allows to put an alias. This alias is used
as the key in the ckch_tree instead of the certificate. This way an
alias can be referenced in the configuration with the '@/' prefix.
This can only be define with a crt-store.
The global 'key-base' keyword allows to read the 'key' parameter of a
crt-store load line using a path prefix.
This is the equivalent of the 'crt-base' keyword but for 'key'.
It only applies on crt-store.
Released version 3.0-dev7 with the following main changes :
- BUG/MINOR: ssl: Wrong ocsp-update "incompatibility" error message
- BUG/MINOR: ssl: Detect more 'ocsp-update' incompatibilities
- MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option
- REGTESTS: ssl: Add OCSP update compatibility tests
- REGTESTS: ssl: Add functional test for global ocsp-update option
- BUG/MINOR: server: reject enabled for dynamic server
- BUG/MINOR: server: fix persistence cookie for dynamic servers
- MINOR: server: allow cookie for dynamic servers
- REGTESTS: Fix script about OCSP update compatibility tests
- BUG/MINOR: cli: Report an error to user if command or payload is too big
- MINOR: sc_strm: Add generic version to perform sync receives and sends
- MEDIUM: stream: Use generic version to perform sync receives and sends
- MEDIUM: buf: Add b_getline() and b_getdelim() functions
- MEDIUM: applet: Handle applets with their own buffers in put functions
- MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands
- MINOR: applet: Always use applet API to set appctx flags
- BUG/MEDIUM: applet: State appctx have more data if its EOI/EOS/ERROR flag is set
- MAJOR: cli: Update the CLI applet to handle its own buffers
- MINOR: applet: Let's applets .snd_buf function deal with full input buffers
- MINOR: stconn: Add a connection flag to notify sending data are the last ones
- MAJOR: cli: Use a custom .snd_buf function to only copy the current command
- DOC: config: balance 'first' not usable in LOG mode
- BUG/MINOR: log/balance: detect if user tries to use unsupported algo
- MINOR: lbprm: implement true "sticky" balance algo
- MEDIUM: log/balance: leverage lbprm api for log load-balancing
- BUG/BUILD: debug: fix unused variable error
- MEDIUM: lb-chash: Deterministic node hashes based on server address
- BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task
- REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (4)
- REGTESTS: Remove REQUIRE_VERSION=1.9 from all tests (2)
- CLEANUP: Reapply ist.cocci (3)
- CLEANUP: Reapply strcmp.cocci (2)
- CLEANUP: Reapply xalloc_cast.cocci
- CLEANUP: Reapply ha_free.cocci
- CI: vtest: show coredumps if any
- REGTESTS: ssl: disable ssl/ocsp_auto_update.vtc
- BUG/MINOR: backend: properly handle redispatch 0
- MINOR: quic: HyStart++ implementation (RFC 9406)
- BUG/MEDIUM: stconn: Don't forward shutdown to SE if iobuf is not empty
- BUG/MEDIUM: stick-table: use the update lock when reading tables from peers
- BUG/MAJOR: applet: fix a MIN vs MAX usage in appctx_raw_rcv_buf()
- OPTIM: peers: avoid the locking dance around peer_send_teach_process_msgs()
- BUILD: quic: 32 bits compilation issue (QUIC_MIN() usage)
- BUG/MEDIUM: server/lbprm: fix crash in _srv_set_inetaddr_port()
- MEDIUM: mworker: get rid of libsystemd
- BUILD: systemd: fix build error on non-systemd systems with USE_SYSTEMD=1
- BUG/MINOR: bwlim/config: fix missing '\n' after error messages
- MINOR: stick-tables: mark the seen stksess with a flag "seen"
- OPTIM: stick-tables: check the stksess without taking the read lock
- MAJOR: stktable: split the keys across multiple shards to reduce contention
- CI: extend Fedora Rawhide, add m32 mode
- BUG/MINOR: stick-tables: Missing stick-table key nullity check
- BUILD: systemd: enable USE_SYSTEMD by default with TARGET=linux-glibc
- MINOR: systemd: Include MONOTONIC_USEC field in RELOADING=1 message
- BUG/MINOR: proxy: fix logformat expression leak in use_backend rules
- MEDIUM: log: rename logformat var to logformat tag
- MINOR: log: expose logformat_tag struct
- MEDIUM: log: carry tag context in logformat node
- MEDIUM: tree-wide: add logformat expressions wrapper
- MINOR: proxy: add PR_FL_CHECKED flag
- MAJOR: log: implement proper postparsing for logformat expressions
- MEDIUM: log: add compiling logic to logformat expressions
- MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing
- MINOR: guid: introduce global UID module
- MINOR: guid: restrict guid format
- MINOR: proxy: implement GUID support
- MINOR: server: implement GUID support
- MINOR: listener: implement GUID support
- DOC: configuration: grammar fixes for strict-sni
- BUG/MINOR: init: relax LSTCHK_NETADM checks for non root
- MEDIUM: capabilities: check process capabilities sets
- CLEANUP: global: remove LSTCHK_CAP_BIND
- BUG/MEDIUM: quic: don't blindly rely on unaligned accesses
Since the Linux capabilities support add-on (see the commit bd84387beb
("MEDIUM: capabilities: enable support for Linux capabilities")), we can also
check haproxy process effective and permitted capabilities sets, when it
starts and runs as non-root.
Like this, if needed network capabilities are presented only in the process
permitted set, we can get this information with capget and put them in the
process effective set via capset. To do this properly, let's introduce
prepare_caps_from_permitted_set().
First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If
there is a match, LSTCHK_NETADM is removed from global.last_checks list to
avoid warning, because in the initialization sequence some last configuration
checks are based on LSTCHK_NETADM flag and haproxy process euid may stay
unpriviledged.
If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted
set will be checked and only capabilities given in 'setcap' keyword will be
promoted in the process effective set. LSTCHK_NETADM will be also removed in
this case by the same reason. In order to be transparent, we promote from
permitted set only capabilities given by user in 'setcap' keyword. So, if
caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not
be unset and warning about missing priviledges will be emitted at
initialization.
Need to call it before protocol_bind_all() to allow binding to priviledged
ports under non-root and 'setcap cap_net_bind_service' must be set in the
global section in this case.
This commit is similar with the two previous ones. Its purpose is to add
GUID support on listeners. Due to bind_conf and listeners configuration,
some specifities were required.
Its possible to define several listeners on a single bind line, for
example by specifying multiple addresses. As such, it's impossible to
support a "guid" keyword on a bind line. The problem is exacerbated by
the cloning of listeners when sharding is used.
To resolve this, a new keyword "guid-prefix" is defined for bind lines.
It allows to specify a string which will be used as a prefix for
automatically generated GUID for each listeners attached to a bind_conf.
Automatic GUID listeners generation is implemented via a new function
bind_generate_guid(). It is called on post-parsing, after
bind_complete_thread_setup(). For each listeners on a bind_conf, a new
GUID is generated with bind_conf prefix and the index of the listener
relative to other listeners in the bind_conf. This last value is stored
in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted,
for example due to a non-unique value, an error is returned, startup is
interrupted with configuration rejected.
This commit is similar to previous one, except that it implements GUID
support for server instances. A guid_node field is inserted into server
structure. A new "guid" server keyword is defined.
Implement proxy identiciation through GUID. As such, a guid_node member
is inserted into proxy structure. A proxy keyword "guid" is defined to
allow user to fix its value.
This is a simple algorithm to replace the classic slow start phase of the
congestion control algorithms. It should reduce the high packet loss during
this step.
Implemented only for Cubic.
Motivation: When services are discovered through DNS resolution, the order in
which DNS records get resolved and assigned to servers is arbitrary. Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they have
assigned different IPs and ports to the server in that slot.
This patch adds a server option, "hash-key <key>" which can be set to "id" (the
existing behaviour, default), "addr", or "addr-port". By deriving the keys for
the chash tree nodes from a server's address and port we ensure that independent
HAProxy instances will agree on routing decisions. If an address is not known
then the key is derived from the server's puid as it was previously.
When adjusting a server's weight, we now check whether the server's hash has
changed. If it has, we have to remove all its nodes first, since the node keys
will also have to change.
log load-balancing implementation was not seamlessly integrated within
lbprm API. The consequence is that it could become harder to maintain
over time since it added some specific cases just for the log backend.
Moreover, it resulted in some code duplication since balance algorithms
that are common to logs and regular (tcp, http) backends were specifically
rewritten for log backends.
Thanks to the previous commit, we now have all the prerequisites to make
log load-balancing fully leverage lbprm logic. Thus in this patch we make
__do_send_log_backend() use existing lbprm algorithms, and we no longer
require log-specific lbprm initialization in cfgparse.c and in
postcheck_log_backend().
As a bonus, for log backends this allows weighed algorithms to properly
support weights (ie: roundrobin, random and log-hash) since we now
leverage the same lb algorithms that we use for tcp/http backends
(doc was updated).
As previously mentioned in cd352c0db ("MINOR: log/balance: rename
"log-sticky" to "sticky""), let's define a sticky algorithm that may be
used from any protocol. Sticky algorithm sticks on the same server as
long as it remains available.
The documentation was updated accordingly.
b61147fd ("MEDIUM: log/balance: merge tcp/http algo with log ones")
introduced an ambiguity because 'first' algorithm is not usable in
LOG mode but it was not specified in the doc.
This should be backported in 2.9 with b61147fd.
This commit allows "cookie" keyword for dynamic servers. After code
review, nothing was found which could prevent a dynamic server to use
it. An extra warning is added under cli_parse_add_server() if cookie
value is ignored due to a non HTTP backend.
This patch is not considered a bugfix. However, it may backported if
needed as its impact seems minimal.
Since their first implementation, dynamic servers are created into
maintenance state. This has been done purposely to avoid immediate
activation of a newly inserted server.
However, this principle is incompatible if "enabled" keyword is used on
"add server". The newly created instance will be unreacheable as proxy
load-balancing algorithm is not informed of its presence via
srv_lb_propagate(). The new server could be unblocked by toggling its
state with "disable server" / "enable server" commands, which will
trigger srv_lb_propagate() invocation.
To avoid this unexpected state, simply forbid "enabled" keyword for
dynamic servers. In the long-term, it could be possible to re authorize
it but at least this requires to call srv_lb_propagate() on dynamic
server creation.
This should fix github issue #2497.
This patch should not be backported as-is, to avoid breaking dynamic
servers API on stable versions. "enabled" should instead be ignored for
them. This will be implemented in a dedicated patch on top of 2.9.
This option can be used to set a default ocsp-update mode for all
certificates of a given conf file. It allows to activate ocsp-update on
certificates without the need to create separate crt-lists. It can still
be superseded by the crt-list 'ocsp-update' option. It takes either "on"
or "off" as value and defaults to "off".
Since setting this new parameter to "on" would mean that we try to
enable ocsp-update on any certificate, and also certificates that don't
have an OCSP URI, the checks performed in ssl_sock_load_ocsp were
softened. We don't systematically raise an error when trying to enable
ocsp-update on a certificate that does not have an OCSP URI, be it via
the global option or the crt-list one. We will still raise an error when
a user tries to load a certificate that does have an OCSP URI but a
missing issuer certificate (if ocsp-update is enabled).
Released version 3.0-dev6 with the following main changes :
- MINOR: mux-h2: always use h2c_report_glitch()
- MEDIUM: mux-h2: allow to set the glitches threshold to kill a connection
- MINOR: quic: simplify rescheduling for handshake
- MINOR: quic: remove qc_treat_rx_crypto_frms()
- DOC: configuration: clarify ciphersuites usage (V2)
- MINOR: tools: use public interface for FreeBSD get_exec_path()
- BUG/MINOR: ssl: fix possible ctx memory leak in sample_conv_aes_gcm()
- BUG/MINOR: ssl: do not set the aead_tag flags in sample_conv_aes_gcm()
- BUG/MINOR: server: fix first server template not being indexed
- MEDIUM: ssl: initialize the SSL stack explicitely
- MEDIUM: ssl: allow to change the OpenSSL security level from global section
- CLEANUP: ssl: remove useless #ifdef in openssl-compat.h
- CI: github: add -DDEBUG_LIST to the default builds
- BUG/MINOR: hlua: segfault when loading the same filter from different contexts
- BUG/MINOR: hlua: missing lock in hlua_filter_new()
- BUG/MINOR: hlua: fix missing lock in hlua_filter_delete()
- DEBUG: lua: precisely identify if stream is stuck inside lua or not
- MINOR: hlua: use accessors for stream hlua ctx
- BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread (2nd try)
- MINOR: debug: enable insecure fork on the command line
- CI: github: add -dI to haproxy arguments
- BUG/MINOR: listener: Wake proxy's mngmt task up if necessary on session release
- BUG/MINOR: listener: Don't schedule frontend without task in listener_release()
- MINOR: session: rename private conns elements
- BUG/MAJOR: server: do not delete srv referenced by session
- BUG/MEDIUM: spoe: Don't rely on stream's expiration to detect processing timeout
- BUG/MINOR: spoe: Be sure to be able to quickly close IDLE applets on soft-stop
- MAJOR: spoe: Deprecate the SPOE filter
- MINOR: cfgparse: Add a global option to expose deprecated directives
- MINOR: spoe: Add SPOE filters in the exposed deprecated directives
- CLEANUP: assorted typo fixes in the code and comments
- CI: temporarily adjust kernel entropy to work with ASAN/clang
- BUG/MEDIUM: spoe: Return an invalid frame on recv if size is too small
- BUG/MINOR: session: ensure conn owner is set after insert into session
- BUG/MEDIUM: http_ana: ignore NTLM for reuse aggressive/always and no H1
- BUG/MAJOR: connection: fix server used_conns with H2 + reuse safe
- BUG/MAJOR: ocsp: Separate refcount per instance and per store
- REGTESTS: ssl: Add OCSP related tests
- BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing
- BUG/MEDIUM: ssl: Fix crash in ocsp-update log function
- MEDIUM: ssl: Change output of ocsp-update log
- MINOR: ssl: Change level of ocsp-update logs
- CLEANUP: ssl: Remove undocumented ocsp fetches
- REGTESTS: ssl: Add checks on ocsp-update log format
- MINOR: connection: implement conn_release()
- MINOR: connection: extend takeover with release option
- MEDIUM: server: close idle conn on server deletion
- MEDIUM: mux: prepare for takeover on private connections
- MEDIUM: server: close private idle connection before server deletion
- BUG/MINOR: mux-quic: close all QCS before freeing QCC tasklet
- BUG/MEDIUM: mux-fcgi: Properly handle EOM flag on end-of-trailers HTX block
- BUILD: server: fix build regression on old compilers (<= gcc-4.4)
- OPTIM: http_ext: avoid useless copy in http_7239_extract_{ipv4,ipv6}
- MINOR: debug: add "debug dev trace" to flood with traces
- MINOR: atomic: add a read-specific variant of __ha_cpu_relax()
- MINOR: applet: add new function applet_append_line()
- MINOR: log/applet: add new function syslog_applet_append_event()
- MEDIUM: ring/sink: use applet_append_line()/syslog_applet_append_event() for readers
- REORG: dns/ring: split the ring between the generic one and the DNS one
- MEDIUM: ring: move the ring reader code to ring_dispatch_messages()
- MEDIUM: sink: move the generic ring forwarder code use ring_dispatch_messages()
- MEDIUM: log/sink: make the log forwarder code use ring_dispatch_messages()
- MINOR: buf: add b_add_ofs() to add a count to an absolute position
- MINOR: buf: add b_rel_ofs() to turn an absolute offset into a relative one
- MINOR: buf: add b_putblk_ofs() to copy a block at a specific position
- MINOR: buf: add b_getblk_ofs() that works relative to area and not head
- MINOR: ring: make the ring reader use only absolute offsets
- MINOR: ring: reserve one special value for the readers count
- MINOR: vecpair: add new vector pair based data manipulation mechanisms
- MINOR: vecpair: add necessary functions to use vecpairss from/to ring APIs
- MINOR: ring: rename totlen vs msglen in ring_write()
- MINOR: ring: add ring_data() to report the amount of data in a ring
- MINOR: ring: add ring_size() to return the ring's size
- MINOR: ring: add ring_dup() to copy a ring into another one
- MINOR: ring: also add ring_area(), ring_head(), ring_tail()
- MINOR: ring: make callers use ring_data() and ring_size(), not ring->buf
- MINOR: errors: use ring_dup() to duplicate the startup_logs
- MINOR: ring: use ring_size(), ring_area(), ring_head() and ring_tail()
- MINOR: ring: add a flag to indicate a mapped file
- MAJOR: ring: insert an intermediary ring_storage level
- MINOR: ring: resize only under thread isolation
- MINOR: ring: allow to reduce a ring size
- MEDIUM: ring: replace the buffer API in ring_write() with the vec<->ring API
- MEDIUM: ring: change the ring reader to use the new vector-based API now
- MEDIUM: ring: remove the struct buffer from the ring
- MEDIUM: ring: align the head and tail fields in the ring_storage structure
- MINOR: ring: make the reader check the readers count before inc/dec
- MEDIUM: ring: lock the tail's readers counters before proceeding with the changes
- MEDIUM: ring: protect the reader's positions against writers
- MEDIUM: ring: use the topmost bit of the tail as a lock
- MEDIUM: move the ring's lock to only protect the readers list
- MEDIUM: ring: unlock the ring's tail earlier
- MINOR: ring: don't take the readers lock if there are no readers
- MEDIUM: ring/applet: turn the wait_entry list to an mt_list instead
- MEDIUM: ring: protect the initialization of the initial reader offset
- MINOR: ring: make sure ring_dispatch waits when facing a changing message
- MAJOR: ring: drop the now unneeded lock
- OPTIM: ring: don't even try to update offset when failed to read
- OPTIM: ring: have only one thread at a time wake up all readers
- MINOR: ring: keep a few frequently used pointers in the local stack
- MINOR: ring: add the definition of a ring waiting cell
- MINOR: ring: make the number of queues configurable
- MAJOR: ring: implement a waiting queue in front of the ring
- MEDIUM: ring: significant boost in the loop by checking the ring queue ptr first
- MEDIUM: ring: improve speed in the queue waiting loop on x86_64
- MINOR: ring: simplify the write loop a little bit
- CLEANUP: ring: further simplify the write loop
- MINOR: ring: it's not x86 but all non-ARMv8.1 which needs the read before OR
- MINOR: ring: avoid writes to cells during copy
- OPTIM: ring: use relaxed stores to release the threads
- CLEANUP: ring: use only curr_cell and not next_cell in the main write loop
- BUILD: ssl: fix build error on older compilers with openssl-3.2
- BUG/MINOR: server: 'source' interface ignored from 'default-server' directive
- BUG/MAJOR: ring: free the ring storage not the ring itself when using maps
Now the rings have one wait queue per group. This should limit the
contention on systems such as EPYC CPUs where the performance drops
dramatically when using more than one CCX.
Tests were run with different numbers and it was showed that value
6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an
EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and
anything around these values degrades quickly. The value has been
left tunable in the global section.
This commit only introduces everything needed to set up the queue count
so that it's easier to adjust it in the forthcoming patches, but it was
initially added after the series, making it harder to compare.
It was also shown that trying to group the threads in queues by their
thread groups is counter-productive and that it was more efficient to
do that by applying a modulo on the thread number. As surprising as it
seems, it does have the benefit of well balancing any number of threads.
Since commit "BUG/MEDIUM: ssl: Fix crash in ocsp-update log function",
some information from the log line are "faked" because they can be
actually retrieved anymore (or never could). We should then remove them
from the logline all along instead of providing some useless fields.
We then only keep pure OCSP-update information in the log line:
"<certname> <status> <status str> <fail count> <success count>"
Backend connections can be marked as private to prevent their sharing by
multiple clients. Now, this has become an exception as only two reasons
for data traffic can trigger this (checks are ignored here) :
* http-reuse never
* HTTP response with NTLM header
The first case is easy to manage as the connection is flagged as private
since its inception. However, the second case is dynamic as the
connection can be flagged anytime during its lifetime. When using a
backend protocol such as HTTP/2 with reuse mode aggressive or always, we
face a design issue as the connection would be marked as private,
despite potentially being shared by several clients at the same time.
This is conceptually invalid, but worst it can trigger crashes on MUX
stream detach callback depending on the order of release of the streams,
by calling session_check_idle_conn() with a NULL session. It could also
be possible to have several NTLM responses on a single connection for
different sessions. In this case, connection owner is still being
updated without attaching the connection to its correct session, which
ultimately would cause a crash on session_check_idle_conn with an
invalid session.
Here are two backtrace examples from GDB for such cases :
Thread 1 (Thread 0x7ff73e9fc700 (LWP 648859)):
#0 session_check_idle_conn (conn=0x7ff72f597800, sess=0x0) at include/haproxy/session.h:209
#1 h2_detach (sd=<optimized out>) at src/mux_h2.c:4520
#2 0x000056151742be24 in sc_detach_endp (scp=scp@entry=0x7ff73e9f0f18) at src/stconn.c:376
#3 0x000056151742c208 in sc_destroy (sc=<optimized out>) at src/stconn.c:444
#4 0x0000561517370871 in stream_free (s=s@entry=0x7ff72a2dbd80) at src/stream.c:728
#5 0x000056151737541f in process_stream (t=t@entry=0x7ff72d5e2620, context=0x7ff72a2dbd80, state=<optimized out>) at src/stream.c:2645
#6 0x0000561517456cbb in run_tasks_from_lists (budgets=budgets@entry=0x7ff73e9f10d0) at src/task.c:632
#7 0x00005615174576b9 in process_runnable_tasks () at src/task.c:876
#8 0x000056151742275a in run_poll_loop () at src/haproxy.c:2996
#9 0x0000561517422db1 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3195
#10 0x00007ff789e081ca in start_thread () from /lib64/libpthread.so.0
#11 0x00007ff789a39e73 in clone () from /lib64/libc.so.6
(gdb)
Thread 1 (Thread 0x7ff52e7fc700 (LWP 681458)):
#0 0x0000556ebd6e7e69 in session_check_idle_conn (conn=0x7ff5787ff100, sess=0x7ff51d2539a0) at include/haproxy/session.h:209
#1 h2_detach (sd=<optimized out>) at src/mux_h2.c:4520
#2 0x0000556ebd7f3e24 in sc_detach_endp (scp=scp@entry=0x7ff52e7f0f18) at src/stconn.c:376
#3 0x0000556ebd7f4208 in sc_destroy (sc=<optimized out>) at src/stconn.c:444
#4 0x0000556ebd738871 in stream_free (s=s@entry=0x7ff520e28200) at src/stream.c:728
#5 0x0000556ebd73d41f in process_stream (t=t@entry=0x7ff565783700, context=0x7ff520e28200, state=<optimized out>) at src/stream.c:2645
#6 0x0000556ebd81ecbb in run_tasks_from_lists (budgets=budgets@entry=0x7ff52e7f10d0) at src/task.c:632
#7 0x0000556ebd81f6b9 in process_runnable_tasks () at src/task.c:876
#8 0x0000556ebd7ea75a in run_poll_loop () at src/haproxy.c:2996
#9 0x0000556ebd7eadb1 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3195
#10 0x00007ff5752081ca in start_thread () from /lib64/libpthread.so.0
#11 0x00007ff574e39e73 in clone () from /lib64/libc.so.6
(gdb)
To solve this issue, simply ignore NTLM responses when using a
multiplexer with streams support and the connection is not already
attached to the session. The connection is not marked as private and
will continue to be shared freely accross clients. This is considered
conceptually valid as NTLM usage (rfc 4559) with HTTP is broken and was
designed only with HTTP/1.1 in mind. A side-effect of the change is that
SESS_FL_PREFER_LAST is also not set anymore on NTLM detection, which
allows following requests to be load-balanced accross several server
instances.
The original behavior is kept for HTTP/1 or if the connection is already
attached to the session. This last case happens when using HTTP/2 with
default http-reuse safe mode since the following patch :
0d21deaded
MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking
This should be backported up to all stable releases. Up until 2.4, it
can be taken as-is. For lesser versions, above patch is not present. In
this case the condition should be restricted only to HTTP/1 usage :
if (srv_conn && strcmp(srv_conn->mux->name, "H1") == 0) {
It is the first deprecated directive exposed via the
'expose-deprecated-directives' global option. This way, it is possible to
silent the warning about the SPOE uses.