The buffer ring is problematic in multiple aspects, one of which being
that it is only usable by one entity. With multiplexed protocols, we need
to have shared buffers used by many entities (streams and connection),
and the only way to use the buffer ring model in this case is to have
each entity store its own array, and keep a shared counter on allocated
entries. But even with the default 32 buf and 100 streams per HTTP/2
connection, we're speaking about 32*101*32 bytes = 103424 bytes per H2
connection, just to store up to 32 shared buffers, spread randomly in
these tables. Some users might want to achieve much higher than default
rates over high speed links (e.g. 30-50 MB/s at 100ms), which is 3 to 5
MB storage per connection, hence 180 to 300 buffers. There it starts to
cost a lot, up to 1 MB per connection, just to store buffer indexes.
Instead this patch introduces a variant which we call a buffer list.
That's basically just a free list encoded in an array. Each cell
contains a buffer structure, a next index, and a few flags. The index
could be reduced to 16 bits if needed, in order to make room for a new
struct member. The design permits initializing a whole freelist at once
using memset(0).
The list pointer is stored at a single location (e.g. the connection)
and all users (the streams) will just have indexes referencing their
first and last assigned entries (head and tail). This means that with
a single table we can now have all our buffers shared between multiple
streams, irrelevant to the number of potential streams which would want
to use them. Now the 180 to 300 entries array only costs 7.2 to 12 kB,
or 80 times less.
Two large functions (bl_deinit() & bl_get()) were implemented in buf.c.
A basic doc was added to explain how it works.
This is the second attempt at importing the updated mt_list code (commit
59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR:
import: update mt_list to support exponential back-off") but revealed
problems with QUIC connections and was reverted.
The problem that was faced was that elements deleted inside an iterator
were no longer reset, and that if they were to be recycled in this form,
they could appear as busy to the next user. This was trivially reproduced
with this:
$ cat quic-repro.cfg
global
stats socket /tmp/sock1 level admin
stats timeout 1h
limited-quic
frontend stats
mode http
bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
timeout client 5s
stats uri /
$ ./haproxy -db -f quic-repro.cfg &
$ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/
=> hang
This was purely an API issue caused by the simplified usage of the macros
for the iterator. The original version had two backups (one full element
and one pointer) that the user had to take care of, while the new one only
uses one that is transparent for the user. But during removal, the element
still has to be unlocked if it's going to be reused.
All of this sparked discussions with Fred and Aurélien regarding the still
unclear state of locking. It was found that the lock API does too much at
once and is lacking granularity. The new version offers a much more fine-
grained control allowing to selectively lock/unlock an element, a link,
the rest of the list etc.
It was also found that plenty of places just want to free the current
element, or delete it to do anything with it, hence don't need to reset
its pointers (e.g. event_hdl). Finally it appeared obvious that the
root cause of the problem was the unclear usage of the list iterators
themselves because one does not necessarily expect the element to be
presented locked when not needed, which makes the unlock easy to overlook
during reviews.
The updated version of the list presents explicit lock status in the
macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED
suffix, the caller is expected to unlock the element if it intends to
reuse it. At least the status is advertised. The _UNLOCKED variant,
instead, always unlocks it before starting the loop block. This means
it's not necessary to think about unlocking it, though it's obviously
not usable with everything. A few _UNLOCKED were used at obvious places
(i.e. where the element is deleted and freed without any prior check).
Interestingly, the tests performed last year on QUIC forwarding, that
resulted in limited traffic for the original version and higher bit
rate for the new one couldn't be reproduced because since then the QUIC
stack has gaind in efficiency, and the 100 Gbps barrier is now reached
with or without the mt_list update. However the unit tests definitely
show a huge difference, particularly on EPYC platforms where the EBO
provides tremendous CPU savings.
Overall, the following changes are visible from the application code:
- mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr
=> MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
+ 1 back elem
- MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
=> just manually set iterator to NULL however.
For MT_LIST_FOR_EACH_ENTRY_LOCKED()
=> mt_list_unlock_self() (if element going to be reused) + NULL
- MT_LIST_LOCK_ELT => mt_list_lock_full()
- MT_LIST_UNLOCK_ELT => mt_list_unlock_full()
- l = MT_LIST_APPEND_LOCKED(h, e); MT_LIST_UNLOCK_ELT();
=> l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)
Released version 3.1-dev2 with the following main changes :
- BUG/MINOR: log: fix broken '+bin' logformat node option
- DEBUG: hlua: distinguish burst timeout errors from exec timeout errors
- REGTESTS: ssl: fix some regtests 'feature cmd' start condition
- BUG/MEDIUM: ssl: AWS-LC + TLSv1.3 won't do ECDSA in RSA+ECDSA configuration
- MINOR: ssl: activate sigalgs feature for AWS-LC
- REGTESTS: ssl: activate new SSL reg-tests with AWS-LC
- BUG/MEDIUM: proxy: fix email-alert invalid free
- REORG: mailers: move free_email_alert() to mailers.c
- BUG/MINOR: proxy: fix email-alert leak on deinit() (2nd try)
- DOC: configuration: fix alphabetical order of bind options
- DOC: management: document ptr lookup for table commands
- BUG/MAJOR: quic: fix padding with short packets
- BUG/MAJOR: quic: do not loop on emission on closing/draining state
- MINOR: sample: date converter takes HTTP date and output an UNIX timestamp
- SCRIPTS: git-show-backports: do not truncate git-show output
- DOC: api/event_hdl: small updates, fix an example and add some precisions
- BUG/MINOR: h3: fix crash on STOP_SENDING receive after GOAWAY emission
- BUG/MINOR: mux-quic: fix crash on qcs SD alloc failure
- BUG/MINOR: h3: fix BUG_ON() crash on control stream alloc failure
- BUG/MINOR: quic: fix BUG_ON() on Tx pkt alloc failure
- DEV: flags/show-fd-to-flags: adapt to recent versions
- MINOR: capabilities: export capget and __user_cap_header_struct
- MINOR: capabilities: prepare support for version 3
- MINOR: capabilities: use _LINUX_CAPABILITY_VERSION_3
- MINOR: cli/debug: show dev: add cmdline and version
- MINOR: cli/debug: show dev: show capabilities
- MINOR: debug: print gdb hints when crashing
- BUILD: debug: also declare strlen() in __ABORT_NOW()
- BUILD: Missing inclusion header for ssize_t type
- BUG/MINOR: hlua: report proper context upon error in hlua_cli_io_handler_fct()
- MINOR: cfgparse/log: remove leftover dead code
- BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session
- MINOR: stick-table: Always decrement ref count before killing a session
- REORG: init: do MODE_CHECK_CONDITION logic first
- REORG: init: encapsulate CHECK_CONDITION logic in a func
- REORG: init: encapsulate 'reload' sockpair and master CLI listeners creation
- REORG: init: encapsulate code that reads cfg files
- BUG/MINOR: server: fix first server template name lookup UAF
- MINOR: activity: make the memory profiling hash size configurable at build time
- BUG/MEDIUM: server/dns: prevent DOWN/UP flap upon resolution timeout or error
- BUG/MEDIUM: h3: ensure the ":method" pseudo header is totally valid
- BUG/MEDIUM: h3: ensure the ":scheme" pseudo header is totally valid
- BUG/MEDIUM: quic: fix race-condition in quic_get_cid_tid()
- BUG/MINOR: quic: fix race condition in qc_check_dcid()
- BUG/MINOR: quic: fix race-condition on trace for CID retrieval
Fix an example suggesting that using EVENT_HDL_SUB_TYPE(x, y) with y being
0 was valid. Then add some notes to explain how to use
EVENT_HDL_SUB_FAMILY() and EVENT_HDL_SUB_TYPE() with valid values.
Also mention that the feature is available starting from 2.8 and not 2.7.
Finally, perform some purely cosmetic updates.
This could be backported in 2.8.
Add a documentation about the history of the master-worker and how it
was implemented in its first version and how it is currently working.
This is a global view of the architecture, and not an exhaustive
explanation of all mechanisms.
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).
The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.
For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
- MUX_RX: buffer used to receive data from the OS
- SE_RX: buffer used to place a transformation of the RX data for
a mux, or to produce a response for an applet
- CHANNEL: the channel buffer for sync recv
- MUX_TX: buffer used to transfer data from the channel to the outside,
generally a mux but there can be a few specificities (e.g.
http client's response buffer passed to the application,
which also gets a transformation of the channel data).
The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
Released version 2.9-dev9 with the following main changes :
- DOC: internal: filters: fix reference to entities.pdf
- BUG/MINOR: ssl: load correctly @system-ca when ca-base is define
- MINOR: lua: Add flags to configure logging behaviour
- MINOR: lua: change tune.lua.log.stderr default from 'on' to 'auto'
- BUG/MINOR: backend: fix wrong BUG_ON for avail conn
- BUG/MAJOR: backend: fix idle conn crash under low FD
- MINOR: backend: refactor insertion in avail conns tree
- DEBUG: mux-h2/flags: fix list of h2c flags used by the flags decoder
- BUG/MEDIUM: server/log: "mode log" after server keyword causes crash
- MINOR: connection: add conn_pr_mode_to_proto_mode() helper func
- BUG/MEDIUM: server: "proto" not working for dynamic servers
- MINOR: server: add helper function to detach server from proxy list
- DEBUG: add a tainted flag when ha_panic() is called
- DEBUG: lua: add tainted flags for stuck Lua contexts
- DEBUG: pools: detect that malloc_trim() is in progress
- BUG/MINOR: quic: do not consider idle timeout on CLOSING state
- MINOR: frontend: implement a dedicated actconn increment function
- BUG/MINOR: ssl: use a thread-safe sslconns increment
- MEDIUM: quic: count quic_conn instance for maxconn
- MEDIUM: quic: count quic_conn for global sslconns
- BUG/MINOR: ssl: suboptimal certificate selection with TLSv1.3 and dual ECDSA/RSA
- REGTESTS: ssl: update the filters test for TLSv1.3 and sigalgs
- BUG/MINOR: mux-quic: fix early close if unset client timeout
- BUG/MEDIUM: ssl: segfault when cipher is NULL
- BUG/MINOR: tcpcheck: Report hexstring instead of binary one on check failure
- MEDIUM: systemd: be more verbose about the reload
- MINOR: sample: Add fetcher for getting all cookie names
- BUG/MINOR: proto_reverse_connect: support SNI on active connect
- MINOR: proxy/stktable: add resolve_stick_rule helper function
- BUG/MINOR: stktable: missing free in parse_stick_table()
- BUG/MINOR: cfgparse/stktable: fix error message on stktable_init() failure
- MINOR: stktable: stktable_init() sets err_msg on error
- MINOR: stktable: check if a type should be used as-is
- MEDIUM: stktable/peers: "write-to" local table on peer updates
- CI: github: update wolfSSL to 5.6.4
- DOC: install: update the wolfSSL required version
- MINOR: server: Add parser support for set-proxy-v2-tlv-fmt
- MINOR: connection: Send out generic, user-defined server TLVs
- BUG/MEDIUM: pattern: don't trim pools under lock in pat_ref_purge_range()
- MINOR: mux-h2: always use h2_send() in h2_done_ff(), not h2_process()
- OPTIM: mux-h2: call h2_send() directly from h2_snd_buf()
- BUG/MINOR: server: remove some incorrect free() calls on null elements
This reverts commit c618ed5ff4.
The list iterator is broken. As found by Fred, running QUIC single-
threaded shows that only the first connection is accepted because the
accepter relies on the element being initialized once detached (which
is expected and matches what MT_LIST_DELETE_SAFE() used to do before).
However while doing this in the quic_sock code seems to work, doing it
inside the macro show total breakage and the unit test doesn't work
anymore (random crashes). Thus it looks like the fix is not trivial,
let's roll this back for the time it will take to fix the loop.
The new mt_list code supports exponential back-off on conflict, which
is important for use cases where there is contention on a large number
of threads. The API evolved a little bit and required some updates:
- mt_list_for_each_entry_safe() is now in upper case to explicitly
show that it is a macro, and only uses the back element, doesn't
require a secondary pointer for deletes anymore.
- MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has
to set the list iterator to NULL so that it is not re-inserted
into the list and the list is spliced there. One must be careful
because it was usually performed before freeing the element. Now
instead the element must be nulled before the continue/break.
- MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been
unclear. They were replaced by mt_list_cut_around() and
mt_list_connect_elem() which more explicitly detach the element
and reconnect it into the list.
- MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is
in list.h. It may however possibly benefit from being upstreamed.
This required tiny adaptations to event_hdl.c and quic_sock.c. The
test case was updated and the API doc added. Note that in order to
keep include files small, the struct mt_list definition remains in
list-t.h (par of the internal API) and was ifdef'd out in mt_list.h.
A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with
80 threads shows a drastic reduction of CPU usage thanks to this and
the refined memory barriers. Please note that the CPU usage on OpenSSL
3.0.9 is significantly higher due to the excessive use of atomic ops
by openssl, but 3.1 is only slightly above 1.1.1 though:
- before: 35 Gbps, 3.5 Mpps, 7800% CPU
- after: 41 Gbps, 4.2 Mpps, 2900% CPU
More and more utility functions rely on the trash while most of the init
code doesn't have access to it because it's initialized very late (in
PRE_CHECK for the initial one). It's a pool, and it purposely supports
being reallocated, so let's initialize it in STG_POOL so that early
STG_INIT code can at least use it.
These were docs for very old design thoughts or internal subsystems
which are now totally irrelevant and even misleading. Those with some
outdated ideas mixed with useful stuff were kept though.
The conditions where ERR, EOS and EOI are found are not always
crystal clear, and the fact that there's still a good bunch of
original ones dating from the early days and that seem to test for
non-existing cases doesn't help either.
After auditing the code base and projecting the 3 main muxes' stream
termination conditions, with Christopher and Amaury we could establish
the current flags matrix which indicates both what each combination
means for each mux and when it is set by each of them (or not set and
for what reason).
It should be sufficient to void doubts when adding code or when chasing
a bug.
It *must not* be backported because it is highly specific to the latest
2.8-dev.
Released version 2.8-dev7 with the following main changes :
- BUG/MINOR: stats: Don't replace sc_shutr() by SE_FL_EOS flag yet
- BUG/MEDIUM: mux-h2: Be able to detect connection error during handshake
- BUG/MINOR: quic: Missing padding in very short probe packets
- MINOR: proxy/pool: prevent unnecessary calls to pool_gc()
- CLEANUP: proxy: remove stop_time related dead code
- DOC/MINOR: reformat configuration.txt's "quoting and escaping" table
- MINOR: http_fetch: Add support for empty delim in url_param
- MINOR: http_fetch: add case insensitive support for smp_fetch_url_param
- MINOR: http_fetch: Add case-insensitive argument for url_param/urlp_val
- REGTESTS : Add test support for case insentitive for url_param
- BUG/MEDIUM: proxy/sktable: prevent watchdog trigger on soft-stop
- BUG/MINOR: backend: make be_usable_srv() consistent when stopping
- BUG/MINOR: ssl: Remove dead code in cli_parse_update_ocsp_response
- BUG/MINOR: ssl: Fix potential leak in cli_parse_update_ocsp_response
- BUG/MINOR: ssl: ssl-(min|max)-ver parameter not duplicated for bundles in crt-list
- BUG/MINOR: quic: Wrong use of now_ms timestamps (cubic algo)
- MINOR: quic: Add recovery related information to "show quic"
- BUG/MINOR: quic: Wrong use of now_ms timestamps (newreno algo)
- BUG/MINOR: quic: Missing max_idle_timeout initialization for the connection
- MINOR: quic: Implement cubic state trace callback
- MINOR: quic: Adjustments for generic control congestion traces
- MINOR: quic: Traces adjustments at proto level.
- MEDIUM: quic: Ack delay implementation
- BUG/MINOR: quic: Wrong rtt variance computing
- MINOR: cli: support filtering on FD types in "show fd"
- MINOR: quic: Add a fake congestion control algorithm named "nocc"
- CI: run smoke tests on config syntax to check memory related issues
- CLEANUP: assorted typo fixes in the code and comments
- CI: exclude doc/{design-thoughts,internals} from spell check
- BUG/MINOR: quic: Remaining useless statements in cubic slow start callback
- BUG/MINOR: quic: Cubic congestion control window may wrap
- MINOR: quic: Add missing traces in cubic algorithm implementation
- BUG/MAJOR: quic: Congestion algorithms states shared between the connection
- BUG/MINOR: ssl: Undefined reference when building with OPENSSL_NO_DEPRECATED
- BUG/MINOR: quic: Remove useless BUG_ON() in newreno and cubic algo implementation
- MINOR: http-act: emit a warning when a header field name contains forbidden chars
- DOC: config: strict-sni allows to start without certificate
- MINOR: quic: Add trace to debug idle timer task issues
- BUG/MINOR: quic: Unexpected connection closures upon idle timer task execution
- BUG/MINOR: quic: Wrong idle timer expiration (during 20s)
- BUILD: quic: 32bits compilation issue in cli_io_handler_dump_quic()
- BUG/MINOR: quic: Possible wrong PTO computing
- BUG/MINOR: tcpcheck: Be able to expect an empty response
- BUG/MEDIUM: stconn: Add a missing return statement in sc_app_shutr()
- BUG/MINOR: stream: Fix test on channels flags to set clientfin/serverfin touts
- MINOR: applet: Uninline appctx_free()
- MEDIUM: applet/trace: Register a new trace source with its events
- CLEANUP: stconn: Remove remaining debug messages
- BUG/MEDIUM: channel: Improve reports for shut in co_getblk()
- BUG/MEDIUM: dns: Properly handle error when a response consumed
- MINOR: stconn: Remove unecessary test on SE_FL_EOS before receiving data
- MINOR: stconn/channel: Move CF_READ_DONTWAIT into the SC and rename it
- MINOR: stconn/channel: Move CF_SEND_DONTWAIT into the SC and rename it
- MINOR: stconn/channel: Move CF_NEVER_WAIT into the SC and rename it
- MINOR: stconn/channel: Move CF_EXPECT_MORE into the SC and rename it
- MINOR: mux-pt: Report end-of-input with the end-of-stream after a read
- BUG/MINOR: mux-h1: Properly report EOI/ERROR on read0 in h1_rcv_pipe()
- CLEANUP: mux-h1/mux-pt: Remove useless test on SE_FL_SHR/SE_FL_SHW flags
- MINOR: mux-h1: Report an error to the SE descriptor on truncated message
- MINOR: stconn: Always ack EOS at the end of sc_conn_recv()
- MINOR: stconn/applet: Handle EOI in the applet .wake callback function
- MINOR: applet: No longer set EOI on the SC
- MINOR: stconn/applet: Handle EOS in the applet .wake callback function
- MEDIUM: cache: Use the sedesc to report and detect end of processing
- MEDIUM: cli: Use the sedesc to report and detect end of processing
- MINOR: dns: Remove the test on the opposite SC state to send requests
- MEDIUM: dns: Use the sedesc to report and detect end of processing
- MEDIUM: spoe: Use the sedesc to report and detect end of processing
- MEDIUM: hlua/applet: Use the sedesc to report and detect end of processing
- MEDIUM: log: Use the sedesc to report and detect end of processing
- MEDIUM: peers: Use the sedesc to report and detect end of processing
- MINOR: sink: Remove the tests on the opposite SC state to process messages
- MEDIUM: sink: Use the sedesc to report and detect end of processing
- MEDIUM: stats: Use the sedesc to report and detect end of processing
- MEDIUM: promex: Use the sedesc to report and detect end of processing
- MEDIUM: http_client: Use the sedesc to report and detect end of processing
- MINOR: stconn/channel: Move CF_EOI into the SC and rename it
- MEDIUM: tree-wide: Move flags about shut from the channel to the SC
- MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly
- MINOR: stconn/applet: Add BUG_ON_HOT() to be sure SE_FL_EOS is never set alone
- MINOR: server: add SRV_F_DELETED flag
- BUG/MINOR: server/del: fix srv->next pointer consistency
- BUG/MINOR: stats: properly handle server stats dumping resumption
- BUG/MINOR: sink: free forward_px on deinit()
- BUG/MINOR: log: free log forward proxies on deinit()
- MINOR: server: always call ssl->destroy_srv when available
- MINOR: server: correctly free servers on deinit()
- BUG/MINOR: hlua: hook yield does not behave as expected
- MINOR: hlua: properly handle hlua_process_task HLUA_E_ETMOUT
- BUG/MINOR: hlua: enforce proper running context for register_x functions
- MINOR: hlua: Fix two functions that return nothing useful
- MEDIUM: hlua: Dynamic list of frontend/backend in Lua
- MINOR: hlua_fcn: alternative to old proxy and server attributes
- MEDIUM: hlua_fcn: dynamic server iteration and indexing
- MEDIUM: hlua_fcn/api: remove some old server and proxy attributes
- CLEANUP: hlua: fix conflicting comment in hlua_ctx_destroy()
- MINOR: hlua: add simple hlua reference handling API
- MINOR: hlua: fix return type for hlua_checkfunction() and hlua_checktable()
- BUG/MINOR: hlua: fix reference leak in core.register_task()
- BUG/MINOR: hlua: fix reference leak in hlua_post_init_state()
- BUG/MINOR: hlua: prevent function and table reference leaks on errors
- CLEANUP: hlua: use hlua_ref() instead of luaL_ref()
- CLEANUP: hlua: use hlua_pushref() instead of lua_rawgeti()
- CLEANUP: hlua: use hlua_unref() instead of luaL_unref()
- MINOR: hlua: simplify lua locking
- BUG/MEDIUM: hlua: prevent deadlocks with main lua lock
- MINOR: hlua_fcn: add server->get_rid() method
- MINOR: hlua: support for optional arguments to core.register_task()
- DOC: lua: silence "literal block ends without a blank line" Sphinx warnings
- DOC: lua: silence "Unexpected indentation" Sphinx warnings
- BUG/MINOR: event_hdl: fix rid storage type
- BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe
- MINOR: event_hdl: global sublist management clarification
- BUG/MEDIUM: event_hdl: clean soft-stop handling
- BUG/MEDIUM: event_hdl: fix async data refcount issue
- MINOR: event_hdl: normal tasks support for advanced async mode
- MINOR: event_hdl: add event_hdl_async_equeue_isempty() function
- MINOR: event_hdl: add event_hdl_async_equeue_size() function
- MINOR: event_hdl: pause/resume for subscriptions
- MINOR: proxy: add findserver_unique_id() and findserver_unique_name()
- MEDIUM: hlua/event_hdl: initial support for event handlers
- MINOR: hlua/event_hdl: per-server event subscription
- EXAMPLES: add basic event_hdl lua example script
- MINOR: http-ana: Add a HTTP_MSGF flag to state the Expect header was checked
- BUG/MINOR: http-ana: Don't switch message to DATA when waiting for payload
- BUG/MINOR: quic: Possible crashes in qc_idle_timer_task()
- MINOR: quic: derive first DCID from client ODCID
- MINOR: quic: remove ODCID dedicated tree
- MINOR: quic: remove address concatenation to ODCID
- BUG/MINOR: mworker: unset more internal variables from program section
- BUG/MINOR: errors: invalid use of memprintf in startup_logs_init()
- MINOR: applet: Use unsafe version to get stream from SC in the trace function
- BUG/MUNOR: http-ana: Use an unsigned integer for http_msg flags
- MINOR: compression: Make compression offload a flag
- MINOR: compression: Prepare compression code for request compression
- MINOR: compression: Store algo and type for both request and response
- MINOR: compression: Count separately request and response compression
- MEDIUM: compression: Make it so we can compress requests as well.
- BUG/MINOR: lua: remove incorrect usage of strncat()
- CLEANUP: tcpcheck: remove the only occurrence of sprintf() in the code
- CLEANUP: ocsp: do no use strpcy() to copy a path!
- CLEANUP: tree-wide: remove strpcy() from constant strings
- CLEANUP: opentracing: remove the last two occurrences of strncat()
- BUILD: compiler: fix __equals_1() on older compilers
- MINOR: compiler: define a __attribute__warning() macro
- BUILD: bug.h: add a warning in the base API when unsafe functions are used
- BUG/MEDIUM: listeners: Use the right parameters for strlcpy2().
advanced async mode (EVENT_HDL_ASYNC_TASK) provided full support for
custom tasklets registration.
Due to the similarities between tasks and tasklets, it may be useful
to use the advanced mode with an existing task (not a tasklet).
While the API did not explicitly disallow this usage, things would
get bad if we try to wakeup a task using tasklet_wakeup() for notifying
the task about new events.
To make the API support both custom tasks and tasklets, we use the
TASK_IS_TASKLET() macro to call the proper waking function depending
on the task's type:
- For tasklets: we use tasklet_wakeup()
- For tasks: we use task_wakeup()
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
The same change was already performed for the cli. The stats applet and the
prometheus exporter are also concerned. Both use the stats API and rely on
pool functions to get total pool usage in bytes. pool_total_allocated() and
pool_total_used() must return 64 bits unsigned integer to avoid any wrapping
around 4G.
This may be backported to all versions.
Till now it was only possible to change the thread local hot cache size
at build time using CONFIG_HAP_POOL_CACHE_SIZE. But along benchmarks it
was sometimes noticed a huge contention in the lower level memory
allocators indicating that larger caches could be beneficial, especially
on machines with large L2 CPUs.
Given that the checks against this value was no longer on a hot path
anymore, there was no reason for continuing to force it to be tuned at
build time. So this patch allows to set it by tune.memory-hot-size.
It's worth noting that during the boot phase the value remains zero so
that it's possible to know if the value was set or not, which opens the
possibility that we try to automatically adjust it based on the per-cpu
L2 cache size or the use of certain protocols (none of this is done yet).
Since the massive pools cleanup that happened in 2.6, the pools
architecture was made quite more hierarchical and many alternate code
blocks could be moved to runtime flags set by -dM. One of them had not
been converted by then, DEBUG_UAF. It's not much more difficult actually,
since it only acts on a pair of functions indirection on the slow path
(OS-level allocator) and a default setting for the cache activation.
This patch adds the "uaf" setting to the options permitted in -dM so
that it now becomes possible to set or unset UAF at boot time without
recompiling. This is particularly convenient, because every 3 months on
average, developers ask a user to recompile haproxy with DEBUG_UAF to
understand a bug. Now it will not be needed anymore, instead the user
will only have to disable pools and enable uaf using -dMuaf. Note that
-dMuaf only disables previously enabled pools, but it remains possible
to re-enable caching by specifying the cache after, like -dMuaf,cache.
A few tests with this mode show that it can be an interesting combination
which catches significantly less UAF but will do so with much less
overhead, so it might be compatible with some high-traffic deployments.
The change is very small and isolated. It could be helpful to backport
this at least to 2.7 once confirmed not to cause build issues on exotic
systems, and even to 2.6 a bit later as this has proven to be useful
over time, and could be even more if it did not require a rebuild. If
a backport is desired, the following patches are needed as well:
CLEANUP: pools: move the write before free to the uaf-only function
CLEANUP: pool: only include pool-os from pool.c not pool.h
REORG: pool: move all the OS specific code to pool-os.h
CLEANUP: pools: get rid of CONFIG_HAP_POOLS
DEBUG: pool: show a few examples in -dMhelp
This is an initial work for the dedicated
event handler API internal documentation.
The file is located at doc/internals/api/event_hdl.txt
event_hdl feature has been introduced with:
MINOR: event_hdl: add event handler base api
Let's keep these notes as references for later use. Polling on connect()
can sometimes return a few unexpected state combinations that such tests
illustrate. They can serve as reminders for special error handling.
The "sequence" and "entities" diagrams have become so much outdated that
they are at best confusing, but more generally wrong. Let's simply remove
them.
The "layers" mini-doc shows how streams, stconn, sedesc, conns, applets
and muxes interact, with field names, pointers and invariants. It should
be completed but already provides a quick overview about what can be
guaranteed at any step and at different layers.
The stream connector replaced the conn_stream and the sc_conn_io_cb()
function appeared. There's no place there to mention the endpoint
descriptor, but a separate diagram showing the relation between stream
and endpoint via the connector would be nice.
This adds a call to function <fct> to the list of functions to be called at
the step just before the configuration validity checks. This is useful when you
need to create things like it would have been done during the configuration
parsing and where the initialization should continue in the configuration
check.
It could be used for example to generate a proxy with multiple servers using
the configuration parser itself. At this step the trash buffers are allocated.
Threads are not yet started so no protection is required. The function is
expected to return non-zero on success, or zero on failure. A failure will make
the process emit a succinct error message and immediately exit.
The STG_REGISTER init level is used to register known keywords and
protocol stacks. It must be called earlier because some of the init
code already relies on it to be known. For example, "haproxy -vv"
for now is constrained to start very late only because of this.
This patch moves it between STG_LOCK and STG_ALLOC, which is fine as
it's used for static registration.
The poisonning performed on pool_free() used to help a little bit with
use-after-free detection, but usually did more harm than good in that
it was never possible to perform post-mortem analysis on released
objects once poisonning was enabled on allocation. Now that there is
a dedicated DEBUG_POOL_INTEGRITY, let's get rid of this annoyance
which is not even documented in the management manual.
This new option, when set, will cause the callers of pool_alloc() and
pool_free() to be recorded into an extra area in the pool that is expected
to be helpful for later inspection (e.g. in core dumps). For example it
may help figure that an object was released to a pool with some sub-fields
not yet released or that a use-after-free happened after releasing it,
with an immediate indication about the exact line of code that released
it (possibly an error path).
This only works with the per-thread cache, and even objects refilled from
the shared pool directly into the thread-local cache will have a NULL
there. That's not an issue since these objects have not yet been freed.
It's worth noting that pool_alloc_nocache() continues not to set any
caller pointer (e.g. when the cache is empty) because that would require
a possibly undesirable API change.
The extra cost is minimal (one pointer per object) and this completes
well with DEBUG_POOL_INTEGRITY.
When enabled, objects picked from the cache are checked for corruption
by comparing their contents against a pattern that was placed when they
were inserted into the cache. Objects are also allocated in the reverse
order, from the oldest one to the most recent, so as to maximize the
ability to detect such a corruption. The goal is to detect writes after
free (or possibly hardware memory corruptions). Contrary to DEBUG_UAF
this cannot detect reads after free, but may possibly detect later
corruptions and will not consume extra memory. The CPU usage will
increase a bit due to the cost of filling/checking the area and for the
preference for cold cache instead of hot cache, though not as much as
with DEBUG_UAF. This option is meant to be usable in production.
The purpose here is to explain how memory pools work, what their
architecture is depending on the build options (4 possible combinations),
and how the various build options affect their behavior.
Two pool-specific macros that were previously documented in initcalls
were moved to pools.txt.
Another non-trivial part that is often needed. Exported functions
and flags available to applications were documented as well as some
restrictions and falltraps.
This one was missing. It should be easier to use now. It is obvious that
some functions are missing, and it looks like ist2str() and istpad() are
exactly the same.
Released version 2.5-dev13 with the following main changes :
- SCRIPTS: git-show-backports: re-enable file-based filtering
- MINOR: jwt: Make invalid static JWT algorithms an error in `jwt_verify` converter
- MINOR: mux-h2: add trace on extended connect usage
- BUG/MEDIUM: mux-h2: reject upgrade if no RFC8441 support
- MINOR: stream/mux: implement websocket stream flag
- MINOR: connection: implement function to update ALPN
- MINOR: connection: add alternative mux_ops param for conn_install_mux_be
- MEDIUM: server/backend: implement websocket protocol selection
- MINOR: server: add ws keyword
- BUG/MINOR: resolvers: fix sent messages were counted twice
- BUG/MINOR: resolvers: throw log message if trash not large enough for query
- MINOR: resolvers/dns: split dns and resolver counters in dns_counter struct
- MEDIUM: resolvers: rename dns extra counters to resolvers extra counters
- BUG/MINOR: jwt: Fix jwt_parse_alg incorrectly returning JWS_ALG_NONE
- DOC: add QUIC instruction in INSTALL
- CLEANUP: halog: Remove dead stores
- DEV: coccinelle: Add ha_free.cocci
- CLEANUP: Apply ha_free.cocci
- DEV: coccinelle: Add rule to use `istnext()` where possible
- CLEANUP: Apply ist.cocci
- REGTESTS: Use `feature cmd` for 2.5+ tests (2)
- DOC: internals: move some API definitions to an "api" subdirectory
- MINOR: quic: Allocate listener RX buffers
- CLEANUP: quic: Remove useless code
- MINOR: quic: Enhance the listener RX buffering part
- MINOR: quic: Remove a useless lock for CRYPTO frames
- MINOR: quic: Use QUIC_LOCK QUIC specific lock label.
- MINOR: backend: Get client dst address to set the server's one only if needful
- MINOR: compression: Warn for 'compression offload' in defaults sections
- MEDIUM: connection: rename fc_conn_err and bc_conn_err to fc_err and bc_err
- DOC: configuration: move the default log formats to their own section
- MINOR: ssl: make the ssl_fc_sni() sample-fetch function always available
- MEDIUM: log: add the client's SNI to the default HTTPS log format
- DOC: config: add an example of reasonably complete error-log-format
- DOC: config: move error-log-format before custom log format
It's not always easy to figure that there are some docs on internal API
stuff, let's move them to their own directory. There's a diagram for
lists that could be placed there but instead would deserve a greppable
description for quick lookups, so it was not moved there.
Right now we're using a DWCAS to atomically set the running_mask while
being constrained by the thread_mask. This DWCAS is annoying because we
may seriously need it later when adding support for thread groups, for
checking that the running_mask applies to the correct group.
It turns out that the DWCAS is not strictly necessary because we never
need it to set the thread_mask based on the running_mask, only the other
way around. And in fact, the running_mask is always cleared alone, and
the thread_mask is changed alone as well. The running_mask is only
relevant to indicate a takeover when the thread_mask matches it. Any
bit set in running and not present in thread_mask indicates a transition
in progress.
As such, it is possible to re-arrange this by using a regular CAS around a
consistency check between running_mask and thread_mask in fd_update_events
and by making a CAS on running_mask then an atomic store on the thread_mask
in fd_takeover(). The only other case is fd_delete() but that one already
sets the running_mask before clearing the thread_mask, which is compatible
with the consistency check above.
This change has happily survived 10 billion takeovers on a 16-thread
machine at 800k requests/s.
The fd-migration doc was updated to reflect this change.
This explains the traps to avoid and the sequence that leads to
consistent use of an FD known by multiple threads at once. This
was co-authored with Olivier.
Released version 2.4-dev19 with the following main changes :
- BUG/MINOR: hlua: Don't rely on top of the stack when using Lua buffers
- BUG/MEDIUM: cli: prevent memory leak on write errors
- BUG/MINOR: ssl/cli: fix a lock leak when no memory available
- MINOR: debug: add a new "debug dev sym" command in expert mode
- MINOR: pools/debug: slightly relax DEBUG_DONT_SHARE_POOLS
- CI: Github Actions: switch to LibreSSL-3.3.3
- MINOR: srv: close all idle connections on shutdown
- MINOR: connection: move session_list member in a union
- MEDIUM: mux_h1: release idling frontend conns on soft-stop
- MEDIUM: connection: close front idling connection on soft-stop
- MINOR: tools: add functions to retrieve the address of a symbol
- CLEANUP: activity: mark the profiling and task_profiling_mask __read_mostly
- MINOR: activity: add a "memory" entry to "profiling"
- MINOR: activity: declare the storage for memory usage statistics
- MEDIUM: activity: collect memory allocator statistics with USE_MEMORY_PROFILING
- MINOR: activity: clean up the show profiling io_handler a little bit
- MINOR: activity: make "show profiling" support a few arguments
- MINOR: activity: make "show profiling" also dump the memoery usage
- MINOR: activity: add the profiling.memory global setting
- BUILD: makefile: add new option USE_MEMORY_PROFILING
- MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
- BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
- MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
- MEDIUM: mux-h1: Don't block reads when waiting for the other side
- BUG/MEDIUM: mux-h1: Properly report client close if abortonclose option is set
- REGTESTS: Add script to test abortonclose option
- MINOR: mux-h1: clean up conditions to enabled and disabled splicing
- MINOR: mux-h1: Subscribe for sends if output buffer is not empty in h1_snd_pipe
- MINOR: mux-h1: Always subscribe for reads when splicing is disabled
- MEDIUM: mux-h1: Wake H1 stream when both sides a synchronized
- CLEANUP: mux-h1: rename WAIT_INPUT/WAIT_OUTPUT flags
- MINOR: mux-h1: Manage processing blocking flags on the H1 stream
- BUG/MINOR: stream: Decrement server current session counter on L7 retry
- BUG/MINOR: config: fix uninitialized initial state in ".if" block evaluator
- BUG/MINOR: config: add a missing "ELIF_TAKE" test for ".elif" condition evaluator
- BUG/MINOR: config: .if/.elif should also accept negative integers
- MINOR: config: centralize the ".if"/".elif" condition parser and evaluator
- MINOR: config: keep up-to-date current file/line/section in the global struct
- MINOR: config: support some pseudo-variables for file/line/section
- BUILD: activity: do not include malloc.h
- MINOR: arg: improve the error message on missing closing parenthesis
- MINOR: global: export the build features string list
- MINOR: global: add version comparison functions
- MINOR: config: improve .if condition error reporting
- MINOR: config: make cfg_eval_condition() support predicates with arguments
- MINOR: config: add predicate "defined()" to conditional expression blocks
- MINOR: config: add predicates "streq()" and "strneq()" to conditional expressions
- MINOR: config: add predicate "feature" to detect certain built-in features
- MINOR: config: add predicates "version_atleast" and "version_before" to cond blocks
- BUG/MINOR: activity: use the new pointer to calculate the new size in realloc()
- BUG/MINOR: stream: properly clear the previous error mask on L7 retries
- MEDIUM: log: slightly refine the output format of alerts/warnings/etc
- MINOR: config: add a new message directive: .diag
- CLEANUP: cli/tree-wide: properly re-align the CLI commands' help messages
- BUG/MINOR: stream: Reset stream final state and si error type on L7 retry
- BUG/MINOR: checks: Handle synchronous connect when a tcpcheck is started
- BUG/MINOR: checks: Reschedule check on observe mode only if fastinter is set
- MINOR: global: define tainted flag
- MINOR: cfgparse: add a new field flags in cfg_keyword
- MINOR: cfgparse: implement experimental config keywords
- MINOR: action: replace match_pfx by a keyword flags field
- MINOR: action: implement experimental actions
- MINOR: cli: set tainted when using CLI expert/experimental mode
- MINOR: stats: report tainted on show info
- MINOR: http_act: mark normalize-uri as experimental
- BUILD: fix usage of ha_alert without format string
- MINOR: proxy: define PR_CAP_LB
- BUG/MINOR: server: do not report diag for peer servers with null weight
- DOC: ssl: Extra files loading now works for backends too
- ADDONS: make addons/ discoverable by git via .gitignore
- DOC: ssl: Add information about crl-file option
- MINOR: sample: improve error reporting on missing arg to strcmp() converter
- DOC: management: mention that some fields may be emitted as floats
- MINOR: tools: implement trimming of floating point numbers
- MINOR: tools: add a float-to-ascii conversion function
- MINOR: freq_ctr: add new functions to report float measurements
- MINOR: stats: avoid excessive padding of float values with trailing zeroes
- MINOR: stats: add the HTML conversion for float types
- MINOR: stats: pass the appctx flags to stats_fill_info()
- MINOR: stats: support an optional "float" option to "show info"
- MINOR: stats: use tv_remain() to precisely compute the uptime
- MINOR: stats: report uptime and start time as floats with subsecond resolution
- MINOR: stats: make "show info" able to report rates as floats when asked
- MINOR: config: mark tune.fd.edge-triggered as experimental
- REORG: vars: move the "proc" scope variables out of the global struct
- REORG: threads: move all_thread_mask() to thread.h
- BUILD: wdt: include signal-t.h
- BUILD: auth: include missing list.h
- REORG: mworker: move proc_self from global to mworker
- BUILD: ssl: ssl_utils requires chunk.h
- BUILD: config: cfgparse-ssl.c needs tools.h
- BUILD: wurfl: wurfl.c needs tools.h
- BUILD: spoe: flt_spoe.c needs tools.h
- BUILD: promex: service-prometheus.c needs tools.h
- BUILD: resolvers: include tools.h
- BUILD: config: include tools.h in cfgparse-listen.c
- BUILD: htx: include tools.h in http_htx.c
- BUILD: proxy: include tools.h in proxy.c
- BUILD: session: include tools.h in session.c
- BUILD: cache: include tools.h in cache.c
- BUILD: sink: include tools.h in sink.c
- BUILD: connection: include tools.h in connection.c
- BUILD: server-state: include tools.h from server_state.c
- BUILD: dns: include tools.h in dns.c
- BUILD: payload: include tools.h in payload.c
- BUILD: vars: include tools.h in vars.c
- BUILD: compression: include tools.h in compression.c
- BUILD: mworker: include tools.h from mworker.c
- BUILD: queue: include tools.h from queue.c
- BUILD: udp: include tools.h from proto_udp.c
- BUILD: stick-table: include freq_ctr.h from stick_table.h
- BUILD: server: include tools.h from server.c
- BUILD: server: include missing proxy.h in server.c
- BUILD: sink: include proxy.h in sink.c
- BUILD: mworker: include proxy.h in mworker.c
- BUILD: filters: include proxy.h in filters.c
- BUILD: fcgi-app: include proxy.h in fcgi-app.c
- BUILD: connection: move list_mux_proto() to connection.c
- REORG: stick-table: uninline stktable_alloc_data_type()
- REORG: stick-table: move composite address functions to stick_table.h
- REORG: config: uninline warnifnotcap() and failifnotcap()
- BUILD: task: remove unused includes from task.c
- MINOR: task: stop including stream.h from task.c
- BUILD: connection: stop including listener-t.h
- BUILD: hlua: include proxy.h from hlua.c
- BUILD: mux-h1: include proxy.h from mux-h1.c
- BUILD: mux-fcgi: include proxy.h from mux-fcgi.c
- BUILD: listener: include proxy.h from listener.c
- BUILD: http-rules: include proxy.h from http_rules.c
- BUILD: thread: include log.h from thread.c
- BUILD: comp: include proxy.h from flt_http_comp.c
- BUILD: fd: include log.h from fd.c
- BUILD: config: do not include proxy.h nor errors.h anymore in cfgparse.h
- BUILD: makefile: reorder object files by build time
- DOC: Fix a few grammar/spelling issues and casing of HAProxy
- REGTESTS: run-regtests: match both "HAProxy" and "HA-Proxy" in the version
- MINOR: version: report "HAProxy" not "HA-Proxy" in the version output
- DOC: remove last occurrences of "HA-Proxy" syntax
- DOC: peers: fix the protocol tag name in the doc
- ADMIN: netsnmp: report "HAProxy" and not "Haproxy" in output descriptions
- MEDIUM: mailers: use "HAProxy" nor "HAproxy" in the subject of messages
- DOC: fix a few remainig cases of "Haproxy" and "HAproxy" in doc and comments
- MINOR: tools/rnd: compute the result outside of the CAS loop
- BUILD: http_fetch: address a few aliasing warnings with older compilers
- BUILD: ssl: define HAVE_CRYPTO_memcmp() based on the library version
- BUILD: errors: include stdarg in errors.h
- REGTESTS: disable inter-thread idle connection sharing on sensitive tests
- MINOR: cli: make "help" support a command in argument
- MINOR: cli: sort the output of the "help" keywords
- CLEANUP: cli/mworker: properly align the help messages
- BUILD: memprof: make the old caller pointer a const in get_prof_bin()
- BUILD: compat: include malloc_np.h for USE_MEMORY_PROFILING on FreeBSD
- CI: Github Actions: enable USE_QUIC=1 for BoringSSL builds
- BUG/MEDIUM: quic: fix null deref on error path in qc_conn_init()
- BUILD: cli: appease a null-deref warning in cli_gen_usage_msg()
Some of the Lua doc and a few places still used "Haproxy" or "HAproxy".
There was even one "HA proxy". A few of them were in an example of VTest
output, indicating that VTest ought to be fixed as well. No big deal but
better address all the remaining ones so that these inconsistencies stop
spreading around.