Commit 54832b97 ("BUILD: enable several LibreSSL hacks, including")
changed empty handshake detection in OpenSSL <= 1.0.2 and LibreSSL,
from accessing packet_length directly (not available in LibreSSL) to
calling SSL_state() instead.
However, SSL_state() appears to be fully broken in both OpenSSL and
LibreSSL.
Since there is no possibility in LibreSSL to detect an empty handshake,
let's not try (like BoringSSL) and restore this functionality for
OpenSSL 1.0.2 and older, by reverting to the previous behavior.
Should be backported to 2.0.
In connect_server(), if there were already a CS assosciated with the stream,
but we can't reuse it, because the target is different (because we tried a
previous connection, it failed, and we use redispatch so we switched servers),
don't forget to set srv_cs to NULL. Otherwise, if we end up reusing another
connection, we would consider we already have a conn_stream, and we won't
create a new one, so we'd have a new connection but we would not be able to
use it.
This can explain frozen streams and connections stuck in CLOSE_WAIT when
using redispatch.
This should be backported to 1.9 and 2.0.
In the function stream_int_notify(), when the opposite stream-interface is
blocked because there is no more room into the input buffer, if the flag
CF_WRITE_PARTIAL is set on this buffer, it is unblocked. It is a way to unblock
the reads on the other side because some data was sent.
But it is a problem during the fast-forwarding because only the stream is able
to remove the flag CF_WRITE_PARTIAL. So it is possible to have this flag because
of a previous send while the input buffer of the opposite stream-interface is
now full. In such case, the opposite stream-interface will be woken up for
nothing because its input buffer is full. If the same happens on the opposite
side, we will have a loop consumming all the CPU.
To fix the bug, the opposite side is now only notify if there is some available
room in its input buffer in the function si_cs_send(), so only if some data was
sent.
This patch must be backported to 2.0 and 1.9.
In the function si_cs_send(), what is done when an error occurred on the
connection or the conn_stream or when some successfully data was send via a pipe
or the channel's buffer may be factorized at the function. It slightly simplify
the function.
This patch must be backported to 2.0 and 1.9 because a bugfix depends on it.
It is useless to proceed if an error already occurred. Instead, it is better to
wait it will be catched by the stream or the connection, depending on which is
the first one to detect it.
This patch must be backported to 2.0.
Since the commit 94b2c7 ("MEDIUM: mux-h1: refactor output processing"), the
formatting of outgoing messages is performed on the message state and no more on
the HTX blocks read. But the TUNNEL state was left out. So, the HTTP tunneling
using the CONNECT method or switching the protocol (for instance,
the WebSocket) does not work.
This issue was reported on Github. See #131. This patch must be backported to
2.0.
In the function fas_srv_reposition(), the server's lb_tree is tested from
outside the lock. So it is possible to remove it after the test and then call
eb32_insert() in fas_queue_srv() with a NULL root pointer, which is
invalid. Moving the test in the scope of the lock fixes the bug.
This issue was reported on Github, issue #126.
This patch must be backported to 2.0, 1.9 and 1.8.
In the analyzers AN_REQ_HTTP_PROCESS_FE/BE, when a service is registered, it is
important to not interrupt remaining processing but just the http-request rules
processing. Otherwise, the part that handles the applets installation is
skipped.
Among the several effects, if the service is registered on a frontend (not a
listen), the forwarding of the request is skipped because all analyzers are not
set on the request channel. If the service does not depends on it, the response
is still produced and forwarded to the client. But the stream is infinitly
blocked because the request is not fully consumed. This issue was reported on
Github, see #151.
So this bug is fixed thanks to the new action return ACT_RET_DONE. Once a
service is registered, the action process_use_service() still returns
ACT_RET_STOP. But now, only rules processing is stopped. As a side effet, the
action http_action_reject() must now return ACT_RET_DONE to really stop all
processing.
This patch must be backported to 2.0. It depends on the commit introducing the
return code ACT_RET_DONE.
This code should be now used by action to stop at the same time the rules
processing and the possible following processings. And from its side, the return
code ACT_RET_STOP should be used to only stop rules processing.
So concretely, for TCP rules, there is no changes. ACT_RET_STOP and ACT_RET_DONE
are handled the same way. However, for HTTP rules, ACT_RET_STOP should now be
mapped on HTTP_RULE_RES_STOP and ACT_RET_DONE on HTTP_RULE_RES_DONE. So this
way, a action will have the possibilty to stop all processing or only rules
processing.
Note that changes about the TCP is done in this commit but changes about the
HTTP will be done in another one because it will fix a bug in the same time.
This patch must be backported to 2.0 because a bugfix depends on it.
When the response buffer is full and nothing more can be inserted, it is
important to not try to insert an empty data block. Otherwise, when the function
channel_add_input() is called, the flag CF_READ_PARTIAL is set on the response
channel while nothing was read and the stream is uselessly woken up. Finally, we
have loop while the response buffer is full.
This patch must be backported to 2.0.
When deciding if we keep an idle connection in the session, check if
the number of connections currently in the session is >= the max allowed,
not >, or we'll keep an extra connection.
This should be backported to 1.9 and 2.0.
Just calling conn_force_unsubscribe() from conn_upgrade_mux_fe() is not
enough, as there may be multiple XPRT involved. Instead, require that
any user of conn_upgrade_mux_fe() unsubscribe itself before calling it.
This should fix upgrading a TCP connection to HTX when using SSL.
This should be backported to 2.0.
The previous commit e6cdfe574 ("BUG/MINOR: contrib/prometheus-exporter: Don't
use channel_htx_recv_max()") is buggy. The buffer's reserve must be respected.
This patch must be backported to 2.0 and 1.9.
The previous commit 7e145b3e2 ("BUG/MINOR: hlua: Don't use
channel_htx_recv_max()") is buggy. The buffer's reserve must be respected.
This patch must be backported to 2.0 and 1.9.
The receive limit of an HTX channel must be calculated against the total size of
the HTX message. Otherwise, the buffer may never be seen as full whereas the
receive limit is 0. Indeed, the function channel_htx_full() already takes care
to add a block size to the buffer's reserve (8 bytes). So if the function
channel_htx_recv_limit() also keep a block size free in addition to the buffer's
reserve, it means that at least 2 block size will be kept free but only one will
be taken into account, freezing the stream if the option http-buffer-request is
enabled.
This patch fixes the Github issue #136. It should be backported to 2.0 and
1.9. Thanks jaroslawr (Jarosław Rzeszótko) for his help.
The function htx_free_data_space() must be used intead. Otherwise, if there are
some output data not already forwarded, the maximum amount of data that may be
inserted into the buffer may be greater than what we can really insert.
This patch must be backported to 2.0 and 1.9.
The function htx_free_data_space() must be used intead. Otherwise, if there are
some output data not already forwarded, the maximum amount of data that may be
inserted into the buffer may be greater than what we can really insert.
This patch must be backported to 2.0.
wake_srv_chk() can be called from conn_fd_handler(), and may decide to
destroy the conn_stream and the connection, by calling cs_close(). If that
happens, we have to make sure the tasklet isn't scheduled to run, or it will
probably crash trying to access the connection or the conn_stream.
This fixes a crash that can be seen when using tcp checks.
This should be backported to 1.9 and 2.0.
For 1.9, the call should be instead :
task_remove_from_tasklet_list((struct task *)check->wait_list.task);
That function was renamed in 2.0.
Revert commit fe4abe62c7.
The goal was to make sure for health-checks, we would not get sockets in
TIME_WAIT. To do so, we would not call shutdown() if linger_risk is set.
However that is wrong, and that means shutw would never be forwarded to
the server, and thus we could get connection that are never properly closed.
Instead, to fix the original problem as described here :
https://www.mail-archive.com/haproxy@formilux.org/msg34080.html
Just make sure the checks code call cs_shutr() before calling cs_shutw().
If shutr has been called, conn_sock_shutw() will make no attempt to call
shutdown(), as it knows close() will be called.
We should really review and revamp the shutr/shutw code, as described in
github issue #142.
This should be backported to 1.9 and 2.0.
HEAD responses must not have any body payload. But, because of a bug, for chunk
reponses, the empty chunk was always added.
This patch fixes the Github issue #146. It must be backported to 2.0 and 1.9.
Unlike H1, H2 messages may contains trailers while the header "Content-Length"
is set. Indeed, because of the framed structure of HTTP/2, it is no longer
necessary to use the chunked transfer encoding. So Trailing HEADERS frames,
after all DATA frames, may be added on messages with an explicit content length.
But in H1, it is impossible to have trailers on non-chunked messages. So when
outgoing messages are formatted by the H1 multiplexer, if the message is not
chunked, all trailers must be dropped.
This patch must be backported to 2.0 and 1.9. However, the patch will have to be
adapted for the 1.9.
As discussed in issue #140, processes are forked with signals blocked
resulting in haproxy's kill being ignored. This happens when the command
takes more time to complete than the configured check timeout or interval.
Just calling "sleep 30" every second makes the problem obvious.
The fix simply consists in unblocking the signals in the child after the
fork. It needs to be backported to all stable branches containing external
checks and where signals are blocked on startup. It's unclear when it
started, but the following config exhibits the issue :
global
external-check
listen www
bind :8001
timeout client 5s
timeout server 5s
timeout connect 5s
option external-check
external-check command "$PWD/sleep10.sh"
server local 127.0.0.1:80 check inter 200
$ cat sleep10.sh
#!/bin/sh
exec /bin/sleep 10
The "sleep" processes keep accumulating for 10 seconds and stabilize
around 25 when the bug is present. Just issuing "killall sleep" has no
effect on them, and stopping haproxy leaves these processes behind.
When using a level lower than admin on the master CLI, a \n is output
before the response, this is caused by the response of the "operator" or
"user" that are sent before the actual command.
To fix this problem we introduce the flag APPCTX_CLI_ST1_NOLF which ask
a command response to not be followed by the final \n.
This patch made a special case with the command operator and user
followed by a - so they are not followed by \n.
This patch must be backported to 2.0 and 1.9.
We must take care of this when the stream is detached from the
connection. Otherwise, on the server side, the connexion is inserted in the list
of idle connections of the session. But when reused, because the shutdown for
writes was already catched, nothing is sent to the server and the session is
blocked with a freezed connection.
This patch must be backported to 2.0 and 1.9. It is related to the issue #136
reported on Github.
Checks use ssl_sock_set_alpn() to set the ALPN if check-alpn is used, however
check-alpn failed to check if the connection was indeed using SSL, and thus,
would crash if check-alpn was used on a non-SSL connection. Fix this by
making sure the connection uses SSL before attempting to set the ALPN.
This should be backported to 2.0 and 1.9.
These errors are unexpected at this staged and there is not much more to do than
to close the connection and leave. So now, when it happens, the flag
H1C_F_CS_ERROR is set on the H1 connection and the flag HTX_FL_PARSING_ERROR is
set on the channel's HTX message.
This patch must be backported to 2.0 and 1.9.
During headers parsing, an error is returned if the message is too large and
does not fit in the input buffer. The mux h1 used the function b_full() to do
so. But to allow zero copy transfers, in h1_recv(), the input buffer is
pre-aligned and thus few bytes remains always free.
To fix the bug, as during the trailers parsing, the function
buf_room_for_htx_data() should be used instead.
This patch must be backported to 2.0 and 1.9.
Since the commit b75b5eaf ("MEDIUM: htx: 1xx messages are now part of the final
reponses"), these messages are part of the response and should not contain
EOM. This block is skipped during responses parsing, but analyzers still add it
for "100-Continue" and "103-Eraly-Hints". It can also be added for error files
with 1xx status code.
Now, when HAProxy generate such transitional responses, it does not emit EOM
blocks. And informational messages are now forbidden in error files.
This patch must be backported to 2.0.
Consider a config like:
global
log 127.0.0.1:10001 sample :10 local0
No sampling ranges are given here, leading to NULL being passed
as the first argument to qsort.
This configuration does not make sense anyway, a log without ranges
would never log. Thus output an error if no ranges are given.
This bug was introduced in d95ea2897e.
This fix must be backported to HAProxy 2.0.
When a memory pool is created, it may be allocated from a static array. This
happens for "most common" pools, allocated first. Objects of these pools may
also be cached in a pool cache. Of course, to not cache too much entries, we
track the number of cached objects and the total size of the cache.
But the objects size of each pool in the cache (ie, pool_cache[tid][idx].size,
where tid is the thread-id and idx is the index of the pool) was never set. So
the total size of the cache was never limited. Now when a pool is created, if
these objects may be cached, we set the corresponding objects size in the pool
cache.
This patch must be backported to 2.0 and 1.9.
When an outgoing HTX message is formatted before sending it, a trash chunk is
used to do the formatting. Its content is then copied into the output buffer of
the H1 connection. There are some tricks to avoid this last copy. First, if
possible we perform a zero-copy by swapping the area of the HTX buffer with the
one of the output buffer. If zero-copy is not possible, but if the output buffer
is empty, we don't use a trash chunk. To do so, we change the area of the trash
chunk to point on the one of the output buffer. But it is terribly wrong. Trash
chunks are global variables, allocated statically. If the area is changed, the
old one is lost. Worst, the area of the output buffer is dynamically allocated,
so it is released when emptied, leaving the trash chunk with a freed area (in
fact, it is a bit more complicated because buffers are allocated from a memory
pool).
So, honestly, I don't know why we never experienced any problem because this bug
till now. To fix it, we still use a temporary buffer, but we assign it to a
trash chunk only when other solutions were excluded. This way, we never
overwrite the area of a trash chunk.
This patch must be backported to 2.0 and 1.9.
The HTX start-line contains the number of bytes held by all headers as seen by
the mux during the parsing. So it must not be updated during analysis. It was
done when the start-line is replaced, so this update was removed at this
place. But we still save it from the old start-line to not loose it. It should
not be used outside the mux, but there is no reason to skip it. It is a bug,
however it should have no impact.
This patch must be backported to 2.0.
Since commit 829bd471 ("MEDIUM: stream: rearrange the events to remove
the loop"), the pipelining in the master CLI does not work anymore.
Indeed when doing:
echo "@1 show info; @2 show info; @3 show info" | socat /tmp/haproxy.master -
the CLI will only show the response of the first command.
When debugging we can observe that the command is sent, but the client
closes the connection before receiving the response.
The problem is that the flag CF_READ_NULL is not cleared when we
reiniate the flags of the response and we rely on this flag to close.
Must be backported in 2.0
In ssl_subscribe(), make sure we have a ssl_sock_ctx before doing anything.
When ssl_sock_close() is called, it wakes any subscriber up, and that
subscriber may decide to subscribe again, for some reason. If we no longer
have a context, there's not much we can do.
This should be backported to 2.0.
In connect_server(), we used to only call xprt_add_hs() if CO_FL_SEND_PROXY
was set during the function call, we would not do it if the flag was set
before connect_server() was called. The rational at the time was if the flag
was already set, then the XPRT was already present. But now the xprt_handshake
always removes itself, so we have to re-add it each time, or it wouldn't be
done if the first connection attempt failed.
While I'm there, check any non-ssl handshake flag, instead of just
CO_FL_SEND_PROXY, or we'd miss the SOCKS4 flags.
This should be backported to 2.0.
Only add SI_FL_ERR if the stream_interface is connected, or is attempting
a connection. We may get there because the stream_interface's tasklet
was woken up, but before it actually runs, process_stream() may be called,
detect that there were an error, and change the state of the stream_interface
to SI_ST_TAR. When the stream_interface's tasklet then run, the connection
may still have CO_FL_ERROR, but that error was already accounted for, so
just ignore it.
This should be backported to 2.0.
Before switching to wait mode, the per thread deinit should not be
called, because we didn't initiate threads and fdtab.
The problem is that the master could crash if we try to reload HAProxy
The commit 944e619 ("MEDIUM: mworker: wait mode use standard init code
path") removed the deinit code by accident, but its fix 7c756a8
("BUG/MEDIUM: mworker: fix FD leak upon reload") was incomplete and did
not took care of the WAIT_MODE.
This fix must be backported in 1.9 and 2.0
Consider this configuration:
frontend fe_http
mode http
bind *:8080
default_backend be_http
backend be_http
mode http
server example example.com:80
program foo bar
Running with valgrind results in:
==16252== Invalid read of size 8
==16252== at 0x52AE3F: cfg_parse_program (mworker-prog.c:233)
==16252== by 0x4823B3: readcfgfile (cfgparse.c:2180)
==16252== by 0x47BCED: init (haproxy.c:1649)
==16252== by 0x404E22: main (haproxy.c:2714)
==16252== Address 0x48 is not stack'd, malloc'd or (recently) free'd
Check whether `ext_child` is valid before attempting to free it and its
contents.
This bug was introduced in 9a1ee7ac31.
This fix must be backported to HAProxy 2.0.
Solaris's default shell doesn't support substitutions at the beginning or
end of variables, which are still used to determine the version based on
git. Since we added --abbrev=0 we don't need the last one. And using cut
it's trivial to replace the first one, actually simplifying the whole
expression.
This may be backported to all stable branches.
Solaris's sed doesn't take the 'p' argument on the 's' command so
nothing is printed. Just passing ';p' fixes this without affecting
other implementations. Additionally, optional characters cannot be
matched using a question mark, which is always searched verbatim, so
the leading '#' wasn't stripped. Using \{0,1\} works fine everywhere
so let's use this instead.
The 'tr' command on Solaris doesn't conform to POSIX and requires
brackets around ranges. So the sequence '0-9' is understood as the
3 characters '0', '-', and '9'. This causes tagged versions (those
with no commit after the last commit) to be numberred as an empty
string, resulting in an error being reported while computing the
version number.
All implementations support '[:space:]' to delete heading spaces,
so let's use this instead.
This may be backported to all stable versions.
getpid() is documented as returning a pit pid_t result, not
necessarily an int. This causes a build warning on Solaris 10
because of '%d' or '%u' are used in the format passed to snprintf().
Let's just cast the result as an int (respectively unsigned int).
This can be backported to 2.0 and possibly older versions though
it really has no impact.
This bug was introduced by 1b8e68e commit which supposed the stick-table was always
stored in struct arg at parsing time. This is never the case with the usage of
"if/unless" conditions in stick-table declared as backends. In this case, this is
the name of the proxy which must be considered as the stick-table name.
This must be backported to 2.0.
In the function fwlc_srv_reposition(), the server's lb_tree is tested from
outside the lock. So it is possible to remove it after the test and then call
eb32_insert() in fwlc_queue_srv() with a NULL root pointer, which is
invalid. Moving the test in the scope of the lock fixes the bug.
This issue was reported on Github, issue #126.
This patch must be backported to 2.0, 1.9 and 1.8.
When a DATA frame is processed for a message with a content-length, we first
take care to not have a frame size that exceeds the remaining to
read. Otherwise, an error is triggered. But we must remove the padding length
from the frame size because the padding is not included in the announced
content-length.
This patch must be backported to 2.0 and 1.9.