The same change was already performed for the cli. The stats applet and the
prometheus exporter are also concerned. Both use the stats API and rely on
pool functions to get total pool usage in bytes. pool_total_allocated() and
pool_total_used() must return 64 bits unsigned integer to avoid any wrapping
around 4G.
This may be backported to all versions.
Till now it was only possible to change the thread local hot cache size
at build time using CONFIG_HAP_POOL_CACHE_SIZE. But along benchmarks it
was sometimes noticed a huge contention in the lower level memory
allocators indicating that larger caches could be beneficial, especially
on machines with large L2 CPUs.
Given that the checks against this value was no longer on a hot path
anymore, there was no reason for continuing to force it to be tuned at
build time. So this patch allows to set it by tune.memory-hot-size.
It's worth noting that during the boot phase the value remains zero so
that it's possible to know if the value was set or not, which opens the
possibility that we try to automatically adjust it based on the per-cpu
L2 cache size or the use of certain protocols (none of this is done yet).
Since the massive pools cleanup that happened in 2.6, the pools
architecture was made quite more hierarchical and many alternate code
blocks could be moved to runtime flags set by -dM. One of them had not
been converted by then, DEBUG_UAF. It's not much more difficult actually,
since it only acts on a pair of functions indirection on the slow path
(OS-level allocator) and a default setting for the cache activation.
This patch adds the "uaf" setting to the options permitted in -dM so
that it now becomes possible to set or unset UAF at boot time without
recompiling. This is particularly convenient, because every 3 months on
average, developers ask a user to recompile haproxy with DEBUG_UAF to
understand a bug. Now it will not be needed anymore, instead the user
will only have to disable pools and enable uaf using -dMuaf. Note that
-dMuaf only disables previously enabled pools, but it remains possible
to re-enable caching by specifying the cache after, like -dMuaf,cache.
A few tests with this mode show that it can be an interesting combination
which catches significantly less UAF but will do so with much less
overhead, so it might be compatible with some high-traffic deployments.
The change is very small and isolated. It could be helpful to backport
this at least to 2.7 once confirmed not to cause build issues on exotic
systems, and even to 2.6 a bit later as this has proven to be useful
over time, and could be even more if it did not require a rebuild. If
a backport is desired, the following patches are needed as well:
CLEANUP: pools: move the write before free to the uaf-only function
CLEANUP: pool: only include pool-os from pool.c not pool.h
REORG: pool: move all the OS specific code to pool-os.h
CLEANUP: pools: get rid of CONFIG_HAP_POOLS
DEBUG: pool: show a few examples in -dMhelp
This is an initial work for the dedicated
event handler API internal documentation.
The file is located at doc/internals/api/event_hdl.txt
event_hdl feature has been introduced with:
MINOR: event_hdl: add event handler base api
The "layers" mini-doc shows how streams, stconn, sedesc, conns, applets
and muxes interact, with field names, pointers and invariants. It should
be completed but already provides a quick overview about what can be
guaranteed at any step and at different layers.
This adds a call to function <fct> to the list of functions to be called at
the step just before the configuration validity checks. This is useful when you
need to create things like it would have been done during the configuration
parsing and where the initialization should continue in the configuration
check.
It could be used for example to generate a proxy with multiple servers using
the configuration parser itself. At this step the trash buffers are allocated.
Threads are not yet started so no protection is required. The function is
expected to return non-zero on success, or zero on failure. A failure will make
the process emit a succinct error message and immediately exit.
The STG_REGISTER init level is used to register known keywords and
protocol stacks. It must be called earlier because some of the init
code already relies on it to be known. For example, "haproxy -vv"
for now is constrained to start very late only because of this.
This patch moves it between STG_LOCK and STG_ALLOC, which is fine as
it's used for static registration.
The poisonning performed on pool_free() used to help a little bit with
use-after-free detection, but usually did more harm than good in that
it was never possible to perform post-mortem analysis on released
objects once poisonning was enabled on allocation. Now that there is
a dedicated DEBUG_POOL_INTEGRITY, let's get rid of this annoyance
which is not even documented in the management manual.
This new option, when set, will cause the callers of pool_alloc() and
pool_free() to be recorded into an extra area in the pool that is expected
to be helpful for later inspection (e.g. in core dumps). For example it
may help figure that an object was released to a pool with some sub-fields
not yet released or that a use-after-free happened after releasing it,
with an immediate indication about the exact line of code that released
it (possibly an error path).
This only works with the per-thread cache, and even objects refilled from
the shared pool directly into the thread-local cache will have a NULL
there. That's not an issue since these objects have not yet been freed.
It's worth noting that pool_alloc_nocache() continues not to set any
caller pointer (e.g. when the cache is empty) because that would require
a possibly undesirable API change.
The extra cost is minimal (one pointer per object) and this completes
well with DEBUG_POOL_INTEGRITY.
When enabled, objects picked from the cache are checked for corruption
by comparing their contents against a pattern that was placed when they
were inserted into the cache. Objects are also allocated in the reverse
order, from the oldest one to the most recent, so as to maximize the
ability to detect such a corruption. The goal is to detect writes after
free (or possibly hardware memory corruptions). Contrary to DEBUG_UAF
this cannot detect reads after free, but may possibly detect later
corruptions and will not consume extra memory. The CPU usage will
increase a bit due to the cost of filling/checking the area and for the
preference for cold cache instead of hot cache, though not as much as
with DEBUG_UAF. This option is meant to be usable in production.
The purpose here is to explain how memory pools work, what their
architecture is depending on the build options (4 possible combinations),
and how the various build options affect their behavior.
Two pool-specific macros that were previously documented in initcalls
were moved to pools.txt.
Another non-trivial part that is often needed. Exported functions
and flags available to applications were documented as well as some
restrictions and falltraps.
This one was missing. It should be easier to use now. It is obvious that
some functions are missing, and it looks like ist2str() and istpad() are
exactly the same.
Released version 2.5-dev13 with the following main changes :
- SCRIPTS: git-show-backports: re-enable file-based filtering
- MINOR: jwt: Make invalid static JWT algorithms an error in `jwt_verify` converter
- MINOR: mux-h2: add trace on extended connect usage
- BUG/MEDIUM: mux-h2: reject upgrade if no RFC8441 support
- MINOR: stream/mux: implement websocket stream flag
- MINOR: connection: implement function to update ALPN
- MINOR: connection: add alternative mux_ops param for conn_install_mux_be
- MEDIUM: server/backend: implement websocket protocol selection
- MINOR: server: add ws keyword
- BUG/MINOR: resolvers: fix sent messages were counted twice
- BUG/MINOR: resolvers: throw log message if trash not large enough for query
- MINOR: resolvers/dns: split dns and resolver counters in dns_counter struct
- MEDIUM: resolvers: rename dns extra counters to resolvers extra counters
- BUG/MINOR: jwt: Fix jwt_parse_alg incorrectly returning JWS_ALG_NONE
- DOC: add QUIC instruction in INSTALL
- CLEANUP: halog: Remove dead stores
- DEV: coccinelle: Add ha_free.cocci
- CLEANUP: Apply ha_free.cocci
- DEV: coccinelle: Add rule to use `istnext()` where possible
- CLEANUP: Apply ist.cocci
- REGTESTS: Use `feature cmd` for 2.5+ tests (2)
- DOC: internals: move some API definitions to an "api" subdirectory
- MINOR: quic: Allocate listener RX buffers
- CLEANUP: quic: Remove useless code
- MINOR: quic: Enhance the listener RX buffering part
- MINOR: quic: Remove a useless lock for CRYPTO frames
- MINOR: quic: Use QUIC_LOCK QUIC specific lock label.
- MINOR: backend: Get client dst address to set the server's one only if needful
- MINOR: compression: Warn for 'compression offload' in defaults sections
- MEDIUM: connection: rename fc_conn_err and bc_conn_err to fc_err and bc_err
- DOC: configuration: move the default log formats to their own section
- MINOR: ssl: make the ssl_fc_sni() sample-fetch function always available
- MEDIUM: log: add the client's SNI to the default HTTPS log format
- DOC: config: add an example of reasonably complete error-log-format
- DOC: config: move error-log-format before custom log format
It's not always easy to figure that there are some docs on internal API
stuff, let's move them to their own directory. There's a diagram for
lists that could be placed there but instead would deserve a greppable
description for quick lookups, so it was not moved there.