The HTTP dumps are now configurable in the code : "show errors" now
calls a protocol-specific function to emit the decoded output. For
now only HTTP is implemented.
The output of "show errors" was slightly reordered to split the HTTP part
in a single chunk_appendf() call. The useless buffer total input was
replaced to report the buffer's start offset, which is the offset in the
stream of the first input byte (thus not counting output). Also it was
the opportunity to stop calling the stream "session".
The idea will be to make the error snapshot feature accessible to other
protocols than just HTTP. This patch only introduces an "http_snapshot"
structure and renames a few fields to make things more explicit. The
HTTP part was installed inside a union so that we can easily add more
protocols in the future.
The snapshots have the ability to restart a partial dump and they use
the stream ID as the restart point. Since it's purely HTTP, let's use
the event ID instead.
Let's use an atomic increment for the error snapshot, as we'd rather
not assign the same ID to two errors happening in parallel. It's very
unlikely that it will ever happen though.
This patch must be backported to 1.8 with the other one it relies on
("MINOR: thread: implement HA_ATOMIC_XADD()").
On the Mailing list, Marcos Moreno reported that haproxy configuration
validation (through "haproxy -c cfgfile") does not detect when a
resolvers section does not exist for a server.
That said, this checking is done after HAProxy has started up.
The problem is that this can create production issue, since init
script can't detect the problem before starting / reloading HAProxy.
To fix this issue, this patch registers the function which validates DNS
configuration validity and run it right after configuration parsing is
finished (through cfg_register_postparser()).
Thanks to it, now "haproxy -c cfgfile" will fail when a server
points to a non-existing resolvers section (or any other validation made
by the function above).
Backport status: 1.8
This function returns the proxy associated to a connection. For front
connections it returns the frontend, and for back connections it
returns the backend. This will be used to retrieve some configuration
parameters from within a mux.
Sometimes a connection is prepared before the target is set, sometimes
after. There's no real rule since the few functions involved operate on
different and independent fields. Soon we'll benefit from knowing the
target at the connection layer, in order to figure the associated proxy
and retrieve the various parameters (timeouts etc). This patch slightly
reorders a few calls to conn_prepare() so that we can make sure that the
target is always known to the mux.
Commit 5e74b0b ("MEDIUM: h1: port to new buffer API.") introduced a
minor bug by which a buffer's head could stay shifted by the amount
of removed CRLF if it started with empty lines. This would cause the
second request (or response) not to work until it would receive a few
extra characters. This most only impacts requests sent by hand though.
This is purely 1.9, no backport is needed.
The h2 mux currently lacks some basic transparency. Some errors cause the
connection to be aborted but they couldn't be reported. With this patch,
almost all situations where an error will cause a stream or connection to
be aborted without the ability for an existing stream to report it will be
reported in the logs. This at least provides a solution to monitor the
activity and abnormal traffic.
The new function sess_log() only needs a session to emit a log. It will
ignore the parts that depend on the stream. It is usable to emit a log
to report early errors in muxes. These ones will typically mention
"<BADREQ>" for the request and 0 for the HTTP status code.
Till now it was impossible to emit logs from the lower layers only because
a stream was mandatory. From now on it will at least be possible to emit a
log to report a bad request or some timings for example. When the stream
is null, sess_build_logline() will use default values and will extract the
timing information from the session just like stream_new() does, so the
resulting log line is perfectly valid.
The termination state will indicate a proxy error during the request phase
since it is the only realistic use for such a call with no stream.
When s==NULL we don't have any assigned request counter. Ideally we
should proceed exactly like when a stream is initialized and assign
a unique value. For now we only place it into a local variable.
We'll soon support s==NULL so let's use an intermediary variable for the
logs structure. For now it only points to s->logs but will support a local
variable as an alternative later.
The current build_logline() can only be used with valid streams, which
means it is not suitable for use from muxes. We start by moving it into
another more generic function which takes the session as an argument,
to avoid complexifying all the internal API for jsut a few use cases.
This new function is not supposed to be called directly from outside so
we'll be able to instrument it to support several calling conventions.
For now the behaviour and conditions remain unchanged.
This function was split in two at commit f7d0447 ("MINOR: buffers:
split b_putblk() into __b_putblk()") but it's wrong, the first half's
length is not adjusted to the requested size so it copies more than
desired.
This is purely 1.9-specific, no backport is needed.
While parsing a headers frame, if the frame is wrapped in the buffer
and needs to be unwrapped, it will be duplicated before being processed.
But if it contains certain combinations of invalid flags, the parser
returns without releasing the temporary buffer leading to a memory
leak.
This fix needs to be backported to 1.8.
The handshake processing time used to be stored per stream, which was
valid when there was exactly one stream per session. With H2 and
multiplexing it's not the case anymore and the reported handshake times
are wrong in the logs as it's computed between the TCP accept() and the
stream creation. Let's first move the handshake where it belongs, which
is the session.
However, this is not enough because we don't want to report an excessive
idle time either for H2 (since many requests use the connection).
So the solution used here is to have the stream retrieve sess->tv_accept
and the handshake duration when the stream is created, and let the mux
immediately reset them. This way, the handshake time becomes zero for the
second and subsequent requests in H2 (which was already the case in H1),
and the idle time exactly counts how long the connection remained unused
while it could be used, so in H1 it runs from the end of the previous
response and in H2 it runs from the end of the previous request since the
channel is already available.
This patch will need to be backported to 1.8.
The request counter is incremented when creating a new stream and when
resetting a stream, preparing for a new request. Unfortunately during
the thread migration this was missed, leading to non-atomic increments
in case threads are in use. The most visible side effect is that two
requests may have the same ID from time to time in the logs. However
the SPOE also uses this ID to route responses back to the stream so it
may also lead to occasional spurious SPOE timeouts.
Note that it still doesn't guarantee temporal unicity in the stream
identifiers since a long and a short connection could technically use
the same ID. The likeliness that this happens at the same time is almost
null (roughly threads*runqueue_depth/2^32 that it happens in the same
poll loop), but it will have to be addressed later anyway.
This patch must be backported to 1.8 with the other one it relies on
("MINOR: thread: implement HA_ATOMIC_XADD()").
We've been missing it several times and now we'll need it to increment
a request counter. Let's do it once for all.
This patch will need to be backported to 1.8 with the associated fix.
With openssl >= 1.1.1 and boringssl multi-cert is natively supported.
ECDSA/RSA selection is done and work correctly with TLS >= v1.2.
TLS < v1.2 have no TLSEXT_TYPE_signature_algorithms extension: ECC
certificate can't be selected, and handshake fail if no RSA cert
is present. Safe ECC certificate selection without client announcement
can be very tricky (browser compatibilty). The safer approach is to
select ECDSA certificate if no other certificate matches, like it is
with openssl < 1.1.1: certificate selection is only done via the SNI.
Thanks to Lukas Tribus for reporting this and analysing the problem.
This patch should be backported to 1.8
Server state file has no indication that a server is currently managed
by a DNS SRV resolution.
And thus, both feature (DNS SRV resolution and server state), when used
together, does not provide the expected behavior: a smooth experience...
This patch introduce the "SRV record name" in the server state file and
loads and applies it if found and wherever required.
This patch applies to haproxy-dev branch only. For backport, a specific patch
is provided for 1.8.
If SET_SAFE_LJMP returns 0, the spinlock is already unlocked, and lua_atpanic
is already set back to hlua_panic_safe, so there's no need to call
RESET_SAFE_LJMP.
This should be MFC'd into 1.8.
When calling ->prepare_srv() callback for SSL server which
depends on global "nbthread" value, this latter was not already parsed,
so equal to 1 default value. This lead to bad memory accesses.
Thank you to Pieter (PiBa-NL) for having reported this issue and
for having provided a very helpful reg testing file to reproduce
this issue (reg-test/lua/b00002.*).
Must be backported to 1.8.
Call si_cs_send() at the beginning of si_cs_wake_cb(), instead of from
stream_int_notify-), so that if we get a connection error while trying to
send, the stream_interface will get SI_FL_ERR, the associated task will
be woken up, and the connection will be properly destroyed.
No backport needed.
Instead of calling __event_srv_chk_w, call wake_srv_chk(), which will then
either call tcpcheck_main() or __event_srv_chk_w().
Also make tcpcheck_main() subscribe if it can't send.
In hlua_applet_tcp_fct(), drain the output buffer when the applet is done
running, every time we're called.
Overwise, there's a race condition, and the output buffer could be filled
after the applet ran, and as it is never cleared, the stream interface
will never be destroyed.
This should be backported to 1.8 and 1.7.
This adds the sample fetch 'be_conn_free([<backend>])'. This sample fetch
provides the total number of unused connections across available servers in the
specified backend.
Previously LUA code would maintain the transaction state between http
requests, resulting in things like txn:get_priv() retrieving data from
a previous request. This addresses the issue by ensuring the LUA state
is reset between requests.
Co-authored-by: Tim Düsterhus <tim@bastelstu.be>
mux_pt_wake() calls data->wake() which can return -1 indicating that the
connection was just destroyed. We need to check for this condition and
immediately exit in this case otherwise we dereference a just freed
connection. Note that this mainly happens on idle connections between
two HTTP requests. It can have random implications between requests as
it may lead a wrong connection's polling to be re-enabled or disabled
for example, especially with threads.
This patch must be backported to 1.8.
HTTP LUA applet callback should not update the date on which the HTTP client requests
arrive. This was done just after the LUA applet has completed its job.
This patch simply removes the affected statement. The same fixe has been applied
to TCP LUA applet callback.
To reproduce this issue, as reported by Patrick Hemmer, implement an HTTP LUA applet
which sleeps a bit before replying:
core.register_service("foo", "http", function(applet)
core.msleep(100)
applet:set_status(200)
applet:start_response()
end)
This had as a consequence to log %TR field with approximatively the same value as
the LUA sleep time.
Thank you to Patrick Hemmer for having reported this issue.
Must be backported to 1.8, 1.7 and 1.6.
This patch improves the previous fix by implementing the socket draining
code directly in conn_sock_drain() so that it always applies regardless
of the protocol's family. Thus it gets rid of tcp_drain().
Right now conn_sock_drain() calls the protocol's ->drain() function if
it exists, otherwise it simply tries to disable polling for receiving
on the connection. This doesn't work well anymore since we've implemented
the muxes in 1.8, and it has a side effect with keep-alive backend
connections established over unix sockets. What happens is that if
during the idle time after a request, a connection reports some data,
si_idle_conn_null_cb() is called, which will call conn_sock_drain().
This one sees there's no drain() on unix sockets and will simply disable
polling for data on the connection. But it doesn't do anything on the
conn_stream. Thus while leaving the conn_fd_handler, the mux's polling
is updated and recomputed based on the conn_stream's polling state,
which is still enabled, and nothing changes, so we see the process
use 100% CPU in this case because the FD remains active in the cache.
There are several issues that need to be addressed here. The first and
most important is that we cannot expect some protocols to simply stop
reading data when asked to drain pending data. So this patch make the
unix sockets rely on tcp_drain() since the functions are the same. This
solution is appropriate for backporting, but a better one is desired for
the long term. The second issue is that si_idle_conn_null_cb() shouldn't
drain the connection but the conn_stream.
At the moment we don't have any way to drain a conn_stream, though a flag
on rcv_buf() will do it well. Until we support muxes on the server side
it is not a problem so this part can be addressed later.
This fix must be backported to 1.8.