If we decided to emit the end of stream flag on the H2 response headers
frame, we must remove the EOM block from the HTX stream, otherwise it
will lead to an extra DATA frame being sent with the ES flag and will
violate the protocol.
Wrong variable was used to know if we need to call the callback
post_section_parser() or not. We must use 'cs' and not 'pcs'.
This patch must be backported in 1.8 with the commit 7805e2b ("BUG/MINOR:
cfgparse: Fix transition between 2 sections with the same name").
This is used for uploads, we can now convert H2 DATA frames to HTX
DATA blocks. It's uncertain whether it's better to reuse the same
function or to split it in two at this point. For now the same
function was added with some paths specific to HTX. In this mode
we loop back to the same or next frame in order to try to complete
DATA blocks.
At the moment the way it's done is not optimal. We should aggregate multiple
blocks into a single DATA frame, and we should merge the ES flag with the
last one when we already know we've reached the end. For now and for an
easier tracking of the HTX stream, an individual empty DATA frame is sent
with the ES bit when EOM is met.
The DATA function is called for DATA, EOD and EOM since these stats indicate
that a previous frame was already produced without the ES flag (typically a
headers frame or another DATA frame). Thus it makes sense to handle all these
blocks there.
There's still an uncertainty on the way the EOD and EOM HTX blocks must be
accounted for, as they're counted as one byte in the HTX stream, but if we
count that byte off when parsing these blocks, we end up sending too much
and desynchronizing the HTX stream. Maybe it hides an issue somewhere else.
At least it's possible to reliably retrieve payloads up to 1 GB over H2/HTX
now. It's still unclear why larger ones are interrupted at 1 GB.
When using HTX, we need a separate function to emit a headers frame.
The code is significantly different from the H1 to H2 conversion, though
it borrows some parts there. It looks like the part building the H2 frame
from the headers list could be factored out, however some of the logic
around dealing with end of stream or block sizes remains different.
With this patch it becomes possible to retrieve bodyless HTTP responses
using H2 over HTX.
When the proxy is configured to use HTX mode, the headers frames
will be converted to HTX header blocks instead of HTTP/1 messages.
This requires very little modifications to the existing function
so it appeared better to do it this way than to duplicate it.
Only the request headers are handled, responses are not processed
yet and data frames are not processed yet either. The return value
is inaccurate but this is not an issue since we're using it as a
boolean : data received or not.
Now h2_snd_buf() will check the proxy's mode to decide whether to use
HTX-specific send functions or legacy functions. In HTX mode, the HTX
blocks of the output buffer will be parsed and the related functions
will be called accordingly based on the block type, and unimplemented
blocks will be skipped. For now all blocks are skipped, this is only
helpful for debugging.
The function needs to be slightly adapted to transfer HTX blocks, since
it may face a full buffer on the receive path, thus it needs to transfer
HTX blocks between the two sides ignoring the <count> argument in this
mode.
The H2 mux will now be called for both HTTP and HTX modes. For now the
data transferr functions are not HTX-aware so this will lead to problems
if used as-is but it's convenient for development and debugging.
Till now we could only produce an HTTP/1 request from a list of H2
request headers. Now the new function h2_make_htx_request() does the
same but using the HTX encoding instead, while respecting the H2
semantics. The code is not much different from the first version,
only the encoding differs.
For now it's not used.
Functions analyzing request and response headers have been duplicated and
adapted to support HTX messages. The callback http_payload have been implemented
to handle the data compression itself. It loops on HTX blocks and replace
uncompressed value of DATA block by compressed one. Unlike the HTTP legacy
version, there is no chunk at all. So HTX version is significantly easier.
The callback http_headers has been updated to dump HTX headers when the HTX
internal representation is in use. And the callback http_payload has been
implemented with its hexdump function.
If there is data filters registered on the stream, the function
flt_http_payload() is called before forwarding any data. And the function
flt_http_end() is called when all data are forwarded. While at least one data
filter reamins registered on the stream, no fast forwarding is used.
First, to be called on HTX streams, a filter must explicitly be declared as
compatible by setting the flag STRM_FLT_FL_HAS_FILTERS on the filter's config at
HAProxy startup. This flag is checked when a filter implementation is attached
to a stream.
Then, some changes have been made on HTTP callbacks. The callback http_payload
has been added to filter HTX data. It will be called on HTX streams only. It
replaces the callbacks http_data, http_chunk_trailers and http_forward_data,
called on legacy HTTP streams only and marked as deprecated. The documention
(once updated)) will give all information to implement this new callback. Other
HTTP callbacks will be called for HTX and HTTP legacy streams. So it is the
filter's responsibility to known which kind of data it handles. The macro
IS_HTX_STRM should be used in such cases.
There is at least a noticeable changes in the way data are forwarded. In HTX,
after the call to the callback http_headers, all the headers are considered as
forwarded. So, in http_payload, only the body and eventually the trailers will
be filtered.
First of all, an dedicated error snapshot, h1_snapshot, has been added. It
contains more or less the some info than http_snapshot but adapted for H1
messages. Then, the function h1_capture_bad_message() has been added to capture
bad H1 messages. And finally, the function h1_show_error_snapshot() is used to
dump these errors. Only Headers or data parsing are captured.
in h1_set_cli_conn_mode(), on the response path, If the response's connection
header is explicitly set to close and if the request is unfinished (state !=
DONE), then the client connection is marked as WANT_CLO.
Instead of looking for a connection header just after the start line to know if
we must process the conn_mode by hand or if we wait to parse the connection
header, we now delay this processing when the end of headers is reached. A flag
is used to know if it was already done (or skipped) or not. This save a lookup
on headers.
During startup, after the configuration parsing, all HTTP error messages
(errorloc, errorfile or default messages) are converted into HTX messages and
stored in dedicated buffers. We use it to return errors in the HTX analyzers
instead of using ugly OOB blocks.
Instead of replying by adding an OOB block in the HTX structure, we now add a
valid HTX message. The old code relied on the function http_reply_and_close() to
send 401/407 responses. Now, we push it in the response's buffer. So we take
care to drain the request's channel and to shutdown the response's channel for
the read.
Instead of replying by adding an OOB block in the HTX structure, we now add a
valid HTX message. A header block is added to each early-hint rule, prefixed by
the start line if it is the first one. The response is terminated and forwarded
when the rules execution is stopped or when a rule of another type is applied.
the flags of the HTX start-line (HTX_SL_F_*) are mapped on ones of the HTTP
message (HTTP_MSGS_*). So we can easily retrieve info from the parsing in HTX
analyzers.
Instead, we now use the htx_sl coming from the HTX message. It avoids to have
too H1 specific code in version-agnostic parts. Of course, the concept of the
start-line is higly influenced by the H1, but the structure htx_sl can be
adapted, if necessary. And many things depend on a start-line during HTTP
analyzis. Using the structure htx_sl also avoid boring conversions between HTX
version and H1 version.
If there is no start-line, this offset is set to -1. Otherwise, it is the
relative address where the start-line is stored in the data block. When the
start-line is added, replaced or removed, this offset is updated accordingly. On
remove, if the start-line is no set and if the next block is a start-line, the
offset is updated. Finally, when an HTX structure is defragmented, the offset is
also updated accordingly.
The HTX start-line is now a struct. It will be easier to extend, if needed. Same
info can be found, of course. In addition it is now possible to set flags on
it. It will be used to set some infos about the message.
Some macros and functions have been added in proto/htx.h to help accessing
different parts of the start-line.
The function htx_find_blk() returns the HTX block containing data with a given
offset, relatively to the beginning of the HTX message. It is a good way to skip
outgoing data and find the first HTX block not already processed.
the functions htx_get_next() and htx_get_prev() are used to iterate on an HTX
message using blocks position. With htx_get_next_blk() and htx_get_prev_blk(),
it is possible to do the same, but with HTX blocks. Of course, internally, we
rely on position's versions to do so. But it is handy for callers to not take
care of the blocks position.
The function htx_add_data_before() can be used to add an HTX block before
another one. For instance, it could be used to add some data before the
end-of-message marker.
With the legacy representation, keep-alive outgoing connections are added in
private/idle/safe connections list when the transaction is cleaned up. But this
stage does not exist with the HTX representaion because a new stream, and
therefore a new transaction, is created for each request. So it is now handled
when the stream is detached from the connection.
In h1_snd_buf(), the data sending is done synchronously, as much as possible. So
if some data remains in the channel's buffer, because there was not enougth
place in the output buffer, it may be good the retry after a send because some
space may have been released when sending. Most of time the output buffer is
empty and all channel's data are consumed the first time. And if no data are
sent, we don't retry to do more. So the loop is just here to optimize edge cases
without any cost for all others.
After a call to snd_buf, if some data remain in the channel's buffer, this means
the system buffers are full or we are unable to fully consume an HTX block for
any reason. In the last case, we need to wakeup the stream to process more data
as soon as possible. We do it subscribing to send at the end of h1_snd_buf().
When trailers are parsed, we must add the corrresponsing HTX block and then we
must add the block end-of-message. But this last operation can failed because
there is not enough space the HTX message. This case was left aside till
now. Now, we stay in the state H1_MSG_TRAILERS with the warranty we will be able
to restart at the right stage.
For chunked messages, during output process, the mux is now able to write the
last empty chunk and empty trailers when corrsponding blocks have not been found
in the HTX message. It is handy for filters changing a not-chunked message into
a chunked one (like the compression filter).
In h1_set_srv_conn_mode(), we need to get the frontend proxy of a server
connection. untill now, we relied on the stream to get it. But it was a bit
dirty. The stream always exists at this stage but to get it, we also need to get
the stream-interface. Since the commit 7c6f8b146 ("MAJOR: connections: Detach
connections from streams."), the connection's owner is always the session, even
for outgoing connections. So now, we rely on the session to get the frontend
proxy in h1_set_srv_conn_mode().
Use the session instead of the stream to get
the frontend on the server connection
When the connection client is accepted, the info of the client conn_stream are
filled with the session info (accept_date, tv_accept and t_handshake). For all
other conn_streams, on client and server side, their info are filled using
global values (date and now).
Time to time, the need arises to get some info owned by the multiplexer about a
connection stream from the upper layer. Today we really need to get some dates
and durations specific to the conn_stream. It is only true for the mux H1 and
H2. Otherwise it will be impossible to have correct times reported in the logs.
To do so, the structure cs_info has been defined to provide all info we ever
need on a conn_stream from the upper layer. Of course, it is the first step. So
this structure will certainly envloved. But for now, only the bare minimum is
referenced. On the mux side, the callback get_cs_info() has been added in the
structure mux_ops. Multiplexers can now implement it, if necessary, to return a
pointer on a structure cs_info. And finally, the function si_get_cs_info()
should be used from the upper layer. If the stream interface is not attached to
a connection stream, this function returns NULL, likewise if the callback
get_cs_info() is not defined for the corresponding mux.
htx_cut_data_blk() is used to cut the beginning of a DATA block after a
part of it was tranferred. It simply advances the address, reduces the
advertised length and updates the htx's total data count.
It looks like we forgot to report HTX when listing the muxes and their
respective protocols, leading to "NONE" being displayed. Let's report
"HTX" and "HTTP|HTX" since both will exist. Also fix a minor typo in
the output message.
In http_wait_for_response(), we wait that all outgoing data have really been
sent (from the channel's point of view) to start the processing of the
response. In fact, it is used to send all intermediate 10x responses. For now
the HTX api is not really handy when multiple messages are stored in the HTX
structure.
Because multiple HTTP messages can be stored in an HTX structure, it is
important to not forget to reset the H1 parser at the beginning of each
one. With the current version, this case only happens on the response, when
multiple HTTP-1XX responses are forwarded to the client (for instance
103-Early-Hints). So strickly speaking, it is the same message. But for now,
internally, each one is a standalone message. Note that it might change in a
future version of the HTX.
in h1_process_output(), before formatting the headers, we need to find and check
the "Connection: " header to update the connection mode. But, the context used
to do so was not correctly initialized. We must explicitly set ctx.value to NULL
to be sure to rescan the current header.