Sometimes when debugging it's convenient to be able to focus only on
certain pools. Just like we did for "show pools", let's add a filter
based on a prefix on "debug dev memstats match <prefix>".
This patch also adds a set of new global options:
- 51degrees-use-performance-graph { on | off }
- 51degrees-use-predictive-graph { on | off }
- 51degrees-drift <number>
- 51degrees-difference <number>
- 51degrees-allow-unmatched { on | off }
To build using the latest 51Degrees V4 engine with Hash algorithm, set
USE_51DEGREES_V4=1.
Other supported build options are 51DEGREES_INC, 51DEGREES_LIB and
51DEGREES_SRC which needs to be set to the directory that contains
headers and C files. For example:
make TARGET=<target> USE_51DEGREES_V4=1 51DEGREES_SRC='51D_REPO_PATH'/src
Released version 2.7.0 with the following main changes :
- MINOR: ssl: forgotten newline in error messages on ca-file
- BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init
- DOC: config: provide some configuration hints for "http-reuse"
- DOC: config: refer to section about quoting in the "add_item" converter
- DOC: halog: explain how to use -ac and -ad in the help message
- DOC: config: clarify the fact that SNI should not be used in HTTP scenarios
- DOC: config: mention that a single monitor-uri rule is supported
- DOC: config: explain how default matching method for ACL works
- DOC: config: clarify the fact that "retries" is not just for connections
- BUILD: halog: fix missing double-quote at end of help line
- DOC: config: clarify the -m dir and -m dom pattern matching methods
- MINOR: activity: report uptime in "show activity"
- REORG: activity/cli: move the "show activity" handler to activity.c
- DEV: poll: add support for epoll
- DEV: tcploop: centralize the polling code into wait_for_fd()
- DEV: tcploop: add support for POLLRDHUP when supported
- DEV: tcploop: do not report an error on POLLERR
- DEV: tcploop: add optional support for epoll
- SCRIPTS: announce-release: add a link to the data plane API
- CLEANUP: stick-table: fill alignment holes in the stktable struct
- MINOR: stick-table: store a per-table hash seed and use it
- MINOR: stick-table: show the shard number in each entry's "show table" output
- CLEANUP: ncbuf: remove ncb_blk args by value
- CLEANUP: ncbuf: inline small functions
- CLEANUP: ncbuf: use standard BUG_ON with DEBUG_STRICT
- BUG/MINOR: quic: Endless loop during retransmissions
- MINOR: mux-h2: add the expire task and its expiration date in "show fd"
- BUG/MINOR: peers: always initialize the stksess shard value
- REGTESTS: fix peers-related regtests regarding "show table"
- BUG/MEDIUM: mux-h1: Close client H1C on EOS when there is no output data
- MINOR: stick-table: change the API of the function used to calculate the shard
- CLEANUP: peers: factor out the key len calculation in received updates
- BUG/MINOR: peers: always update the stksess shard number on incoming updates
- CLEANUP: assorted typo fixes in the code and comments
- MINOR: mux-h1: add the expire task and its expiration date in "show fd"
- MINOR: debug: improve error handling on the memstats command parser
- BUILD: quic: allow build with USE_QUIC and USE_OPENSSL_WOLFSSL
- CLEANUP: anon: clarify the help message on "debug dev hash"
- MINOR: debug: relax access restrictions on "debug dev hash" and "memstats"
- SCRIPTS: run-regtests: add a version check
- MINOR: version: mention that it's stable now
It happens from time to time while switching between branches and/or
updating after someone else's changes that regtests are run by accident
on the wrong binary, typically the one the tests were run on during
development and not with the latest adaptations. And obviously it's
when this happens that we break the CI. There are various causes to
this but they all come down to humans context-switching a lot, and
there's no real fix for this that doesn't add even more burden hence
increases the overhead. However we can help the human detect such
mistakes very easily.
This change here will compare the version of the haproxy binary to
the version of the tree, and will emit a warning in the regtest output
if they do not match, regardless of the outcome of the test. This is
sufficient in case of failures because these are quickly glanced over,
and is sufficient as well in case of accidental success because the
warning is the last message. E.g:
########################## Starting vtest ##########################
Testing with haproxy version: 2.7-dev10-cfcdbc-38
Warning: version does not match the current tree (2.7-dev10-111c78-39)
0 tests failed, 0 tests skipped, 182 tests passed
This should not affect builds made out of a git tree because the version
is retrieved using "make version", or exactly the same way as it's passd
to the haproxy binary. We just need to know what "make" command to run,
so $MAKE is used primarily, falling back to "make" then to "gmake". In
case all of these fail, we just ignore the version check. This should be
sufficient to catch human mistakes without affecting the CI.
These two have absolutely zero impact on the process and do not need to
be restricted to the expert mode. The first one calculates a string hash
that can be used by anyone when checking a dump; the second one may be
used by anyone tracking a memory leak, and is cumbersome to use due to
the "expert-mode on" that needs to be prepended. In addition this gives
bad habits to users and needlessly taints the process. So let's drop
this restriction for these two commands.
This command is used to hash a section name using the current anon key,
it was brought in 2.7 by commit 54966dffd ("MINOR: anon: store the
anonymizing key in the CLI's appctx"). However the help message only
says "return msg hashed" which is misleading because if anon mode is
not enabled, it returns the string as-is. Let's just mention this
condition in the help message, and also fix the alphabetical ordering
and alignment on the line.
WolfSSL does not implement the TLS1_3_CK_AES_128_CCM_SHA256 cipher as
well as the SSL_ERROR_WANT_ASYNC, SSL_ERROR_WANT_ASYNC_JOB and
SSL_ERROR_WANT_CLIENT_HELLO_CB error codes.
This patch disables them for WolfSSL.
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
"debug dev memstats" supports various options but silently ignores the
unknown ones. Let's make sure it returns indications about what it
expects, as the help message is quite limited otherwise.
Just like for the H2 multiplexer, info about the H1 connection task is now
displayed in "show fd" output. The task pointer is displayed and, if not
null, its expiration date.
It may be useful to backport it.
If shards are in use, we must fill the shard number on incoming updates,
otherwise some entries are assigned shard number zero, and may be broadcast
everywhere once updated, instead of being sent only to the peers having the
same shard number.
This fixes commit 36d156564 ("MINOR: peers: Support for peer shards"). No
backport is needed.
In peer_treat_updatemsg(), the lower layers of the stick-table code are
reimplemented, and the key length is never really known for an entry
being processed, it depends on the type being parsed and the moment
where it's done. This makes it quite difficult to stuff some shard
number calculation there.
This patch adds a keylen local variable that is always set to the length
of the current key depending on its type. It takes this opportunity for
reducing redudant expressions involving this length and always using the
new variable instead, limiting the risk of errors. Arguably that code
would have been way simpler by creating a dummy stktable_key and passing
it to stksess_new() as done anywhere else, but let's not change all that
a few days before the release.
The function used to calculate the shard number currently requires a
stktable_key on input for this. Unfortunately, it happens that peers
currently miss this calculation and they do not provide stktable_key
at all, instead they're open-coding all the low-level stick-table work
(hence why it's missing). Thus we'll need to be able to calculate the
shard number in keys coming from peers as well but the current API does
not make it possible.
This commit addresses this by inverting the order where the length and
the shard number are used. Now the low-level function is independent on
stksess and stktable_key, it takes a table, pointer and length and does
all the job. The upper function takes care of the type and key to get
the its length, and is for use only from stick-table code.
This doesn't change anything except that the low-level one will be usable
from outside (hence why it's exported now).
If the client closes the connection while there is no pending outgoing data,
the H1 connection must be released. However, it was switched to CLOSING
state instead. Thus the client connection was closed on client timeout.
It is side effect of the commif d1b573059a ("MINOR: mux-h1: Avoid useless
call to h1_send() if no error is sent"). Before, the extra call to h1_send()
was able to fix the H1C state.
To fix the bug and make switch to close state (CLOSING or CLOSED) less
errorprone, h1_close() helper function is systematically used.
It is a 2.7-specific bug. No backport needed.
When I added commit 16b282f4b ("MINOR: stick-table: show the shard
number in each entry's "show table" output"), I don't know how but
I managed to mess up my reg tests since everything worked fine,
most likely by running it on a binary built in the wrong branch.
Several reg tests include some table outputs that were upset by the
new "shard=" field. This test added them and revealed at the same
time that entries learned over peers are not properly initialized,
which will be fixed in a future series of fixes.
This commit requires previous fix "BUG/MINOR: peers: always
initialize the stksess shard value" so as not to trip on entries
learned from peers.
We need to initialize the shard value in __stksess_init() because there is
not necessarily a key to make it happen later, resulting in an uninitialized
shard value appearing in the entry, typically when entries are learned from
peers. This fixes commit 36d156564 ("MINOR: peers: Support for peer shards"),
no backport is needed.
Note however that it is not sufficient to completely fix the peers code, the
shard value remains zero because the setting of the key was open-coded in
the peers code and these parts were not identified when adding support for
shards.
Some issues such as #1929 seem to involve a task without timeout but we
can't find the condition to reproduce this in the code. However, not having
this info in the output doesn't help, so this patch adds the task pointer
and its timeout (when the task is non-null). It may be useful to backport
it.
qc_dgrams_retransmit() could reuse the same local list and could splice it two
times to the packet number space list of frame to be send/resend. This creates a
loop in this list and makes qc_build_frms() possibly endlessly loop when trying
to build frames from the packet number space list of frames. Then haproxy aborts.
This issue could be easily reproduced patching qc_build_frms() function to set <dlen>
variable value to 0 after having built at least 10 CRYPTO frames and using ngtcp2
as client with 30% packet loss in both direction.
Thank you to @gabrieltz for having reported this issue in GH #1903.
Must be backported to 2.6.
ncbuf can be compiled for haproxy or standalone to run unit test suite.
For the latest mode, BUG_ON() macro has been re-implemented in a simple
version.
The inclusion of the default or the redefined macro relied on DEBUG_DEV.
Change this to now rely on DEBUG_STRICT as this is activated for the
default build.
This change is safe as only BUG_ON_HOT() macro is used in ncbuf code,
which is activated only with the default value DEBUG_STRICT=2.
This should be backported up to 2.6.
ncbuf API relies on lot of small functions. Mark these functions as
inline to reduce call invocations and facilitate compiler optimizations
to reduce code size.
This should be backported up to 2.6.
ncb_blk structure is used to represent a block of data or a gap in a
non-contiguous buffer. This is used in several functions for ncbuf
implementation. Before this patch, ncb_blk was passed by value, which is
sub-optimal. Replace this by const pointer arguments.
This has the side-effect of suppressing a compiler warning reported in
older GCC version :
CC src/http_conv.o
src/ncbuf.c: In function 'ncb_blk_next':
src/ncbuf.c:170: warning: 'blk.end' may be used uninitialized in this function
This should be backported up to 2.6.
Stick-tables support sharding to multiple peers but there was no way to
know to what shard an entry was going to be sent. Let's display this in
the "show table" output to ease debugging.
Instead of using memcpy() to concatenate the table's name to the key when
allocating an stksess, let's compute once for all a per-table seed at boot
time and use it to calculate the key's hash. This saves two memcpy() and
the usage of a chunk, it's always nice in a fast path.
When tested under extreme conditions with a 80-byte long table name, it
showed a 1% performance increase.
Since Marko announced at HAProxyConf 2022 that the data plane API is
mostly complete and will now follow the same release cycle as haproxy
starting with 2.7, it's probably the right moment to encourage users
to start trying it so that we can hope to migrate all the painful
discovery stuff there in a not too distant future.
Let's just point to the latest release for now. We'll see in the future
if we need to adapt the link depending on the branch.
When -e is passed, epoll is used instead of poll. The FD is added
then removed around the call to epoll_wait() so that we don't need
to track it. The only purpose is to compare events reported by each
syscall.
There are multiple call places for poll(), let's first centralize them
to make it easier to enhance it. All callers now use the wait_for_fd()
function which was extended to take a timeout and which can return the
indication that an error was seen.
When called with -e, epoll is used instead of poll. The poller does
very little in this code (just checks for any event without waiting) but
that's sufficient to see return values.
Initially the code was placed into cli.c to keep activity.c small and
independent of the cli stuff, but that's no longer the case anyway and
keeping that code over there makes it harder to find. Let's move it to
its more natural place now.
It happened a few times that it was difficult to figure if a counter was
normal or not in "show activity" based on the uptime. Let's just emit the
uptime value along with the date.
There's regularly some confusion about them (do they match at the
beginning, end ? do they support multiple components etc). Tim
suggested to improve the doc in issue #61, it's never too late, so
let's do it now wih a few examples.
This will tell me to change the line format after testing :-(
This was introduced with commit 286199c24 ("DOC: halog: explain how to
use -ac and -ad in the help message"), no backport is needed unless it's
backported as well.
In issue #412 it was rightfully reported that the wording in "retries"
still exclusively speaks about connection attempts, while since L7
retries with "retry-on" it's no longer a limitation. Let's update the
text.
In issue #698, it's made apparent that the default matching method for
ACL keywords can be confusing when a converter is applied, because
depending on the converters used, users may think that the default
matching method from the sample fetch name might apply to the whole
expression. It's easier to understand that this doesn't make sense
when thinking about converters turning to completely different types
(e.g. hdr_beg(host),do_resolve() returns an IP, thus it's obvious
that _beg makes no sense at all). This patch states this in the
doc to avoid future confusion.
It was reported in issue #1059 that when multiple monitor-uri rules are
specified, only the last one is used. While this was done on purpose
since a single URI is used, it was not clearly mentioned in the doc,
possibly leading to confusion or wasted time trying to establish a
working setup. Let's clarify this point.
As reported by Tim in issue #1373 some warnings are deserved to explain
why using the frontend SNI for routing or connecting to a server is
usually not correct, especially since it can be tempting and used to
make sense in pure TCP scenarios.
Tim reported in issue #1435 that halog options -ac/-ad were poorly
documented. They're indeed used to spot infrastructure outages between
the clients and haproxy by detecting abnormal periods of silence followed
by bursts, either affecting the network itself, or also a single machine
(e.g. swapping on an edge client or proxy can cause such patterns).
As requested by Nick in issue #1719, let's add a reference to the section
about quoting there, since add_item() will often be used with commas and
it's easy to mess up.
This adds some configuration hints regarding various workloads that do
not manage to achieve high reuse rates due to too low a global maxconn
or thread groups.
This fixes github issue #1472.
With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to
load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it
can't.
This patch fixes the issue by allowing to shut the error when the SSL
configuration for the httpclient is not explicit.
Must be backported in 2.6.
Released version 2.7-dev10 with the following main changes :
- MEDIUM: tcp-act: add parameter rst-ttl to silent-drop
- BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets
- MINOR: cli: print parsed command when not found
- BUG/MAJOR: quic: Crash after discarding packet number spaces
- CLEANUP: quic: replace "choosen" with "chosen" all over the code
- MINOR: cli/pools: store "show pools" results into a temporary array
- MINOR: cli/pools: add sorting capabilities to "show pools"
- MINOR: cli/pools: add pool name filtering capability to "show pools"
- DOC: configuration: fix quic prefix typo
- MINOR: quic: report error if force-retry without cluster-secret
- MINOR: global: generate random cluster.secret if not defined
- BUG/MINOR: resolvers: do not run the timeout task when there's no resolution
- BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns
- MINOR: server/idle: make the next_takeover index per-tgroup
- BUILD: listener: fix build warning on global_listener_rwlock without threads
- BUG/MAJOR: sched: protect task during removal from wait queue
- BUILD: sched: fix build with DEBUG_THREAD with the previous commit
- DOC: quic: add note on performance issue with listener contention
- BUG/MINOR: cfgparse-listen: fix ebpt_next_dup pointer dereference on proxy "from" inheritance
- BUG/MINOR: log: fix parse_log_message rfc5424 size check
- CLEANUP: arg: remove extra check in make_arg_list arg escaping
- CLEANUP: tools: extra check in utoa_pad
- MINOR: h1: Consider empty port as invalid in authority for CONNECT
- MINOR: http: Considere empty ports as valid default ports
- BUG/MINOR: http-htx: Normalized absolute URIs with an empty port
- BUG/MINOR: h1: Replace authority validation to conform RFC3986
- REG-TESTS: http: Add more tests about authority/host matching
- BUG/MINOR: http-htx: Don't consider an URI as normalized after a set-uri action
- BUG/MEDIUM: mux-h1: Don't release H1C on timeout if there is a SC attached
- BUG/MEDIUM: mux-h1: Subscribe for reads on error on sending path
- BUILD: http-htx: Silent build error about a possible NULL start-line
- DOC: configuration.txt: add default_value for table_idle signature
- BUILD: ssl-sock: Silent error about NULL deref in ssl_sock_bind_verifycbk()
- BUG/MEDIUM: mux-h1: Remove H1C_F_WAIT_NEXT_REQ flag on a next request
- BUG/MINOR: mux-h1: Fix handling of 408-Request-Time-Out
- MINOR: mux-h1: Remove H1C_F_WAIT_NEXT_REQ in functions handling errors
- MINOR: mux-h1: Avoid useless call to h1_send() if no error is sent
- DOC: configuration.txt: fix typo in table_idle signature
- BUILD: stick-tables: fix build breakage in xxhash on older compilers
- BUILD: compiler: include compiler's definitions before ours
- BUILD: quic: global.h is needed in cfgparse-quic
- CLEANUP: tools: do not needlessly include xxhash nor cli from tools.h
- BUILD: flags: really restrict the cases where flags are exposed
- BUILD: makefile: minor reordering of objects by build time
- BUILD: quic: silence two invalid build warnings at -O1 with gcc-6.5
- BUILD: quic: use openssl-compat.h instead of openssl/ssl.h
- MEDIUM: ssl: add minimal WolfSSL support with OpenSSL compatibility mode
- MINOR: sample: make the rand() sample fetch function use the statistical_prng
- MINOR: auth: silence null dereference warning in check_user()
- CLEANUP: peers: fix format string for status messages (int signedness)
- CLEANUP: qpack: fix format string in debugging code (int signedness)
- CLEANUP: qpack: properly use the QPACK macros not HPACK ones in debug code
- BUG/MEDIUM: quic: fix datagram dropping on queueing failed
After reading a datagram, it is enqueud for the thread attached to the
DCID. This is done via quic_lstnr_dgram_dispatch() function. If this
step fails, we remove the datagram from the buffer of quic_receiver_buf.
This step is faulty because we use b_del() instead of b_sub(). If
quic_receiver_buf was not empty, we will remove content from another
datagram while leaving the content of the last read datagram. This
probably produces valid datagram dropping and may even result in crash.
As stated, this bug can only happen if qc_lstnr_dgram_dispatch() fails
which happen on two occaions :
* on quic_dgram allocation failure, which should be pretty rare
* on datagram labelled as invalid for QUIC protocol. This may happen
more frequently depending on the network conditions. Thus, this bug
has been labelled as a medium one.
This should be backported up to 2.6.
There were a few leftovers of DEBUG_HPACK and HPACK_SHT_SIZE instead of
their QPACK equivalent in the QPACK debug code. There's no harm anyway,
but it could lead to confusing results if the tables are not sized
equally.
In issue #1939, Ilya mentions that cppchecks warned about use of "%d" to
report the QPACK table's index that's locally stored as an unsigned int.
While technically valid, this will never cause any trouble since indexes
are always small positive values, but better use %u anyway to silence
this warning.