This is not exactly a bug but a long-time design limitation. We used not
to decode trailers in H2, resulting in broken connections each time a
trailer was sent, since it was impossible to keep the HPACK decompressor
synchronized. Now that the sequencing of operations permits it, we must
make sure to at least properly decode them.
What we try to do is to identify if a HEADERS frame was already seen and
use this indication to know if it's a headers or a trailers. For this,
h2c_decode_headers() checks if the stream indicates that a HEADERS frame
was already received. If so, it decodes it and emits the trailing
0 CRLF CRLF in case of H1, or the HTX_EOD + HTX_EOM blocks in case of HTX,
to terminate the data stream.
The trailers contents are still deleted for now but the request works, and
the connection remains synchronized and usable for subsequent streams.
The correctness may be tested using a simple config and h2spec :
h2spec -o 1000 -v -t -S -k -h 127.0.0.1 -p 4443 generic/4/4
This should definitely be backported to 1.9 given the low impact for the
benefit. However it cannot be backported to 1.8 since the operations cannot
be resumed. The following patches are also needed with this one :
MINOR: mux-h2: make h2c_decode_headers() return a status, not a count
MINOR: mux-h2: add a new dummy stream : h2_error_stream
MEDIUM: mux-h2: make h2c_decode_headers() support recoverable errors
BUG/MINOR: mux-h2: detect when the HTX EOM block cannot be added after headers
MINOR: mux-h2: check for too many streams only for idle streams
MINOR: mux-h2: set H2_SF_HEADERS_RCVD when a HEADERS frame was decoded
The HEADERS frame parser checks if we still have too many streams, but
this should only be done for idle streams, otherwise it would prevent
us from processing trailer frames.
In h2c_frt_handle_headers() and h2c_bck_handle_headers() we have an unused
error path made of the strm_err label, while send_rst is used to emit an
RST upon stream error after forcing the stream to h2_refused_stream. Let's
remove this unused strm_err block now.
In h2c_frt_handle_headers(), we test the stream for SS_ERROR just after
setting it to SS_OPEN, this makes no sense and creates confusion in the
error path. Remove this misleading test.
In case we receive a very large HEADERS frame which doesn't leave enough
room to place the EOM block after the decoded headers, we must fail the
stream. This test was missing, resulting in the loss of the EOM, possibly
leaving the stream waiting for a time-out.
Note that we also clear h2c->dfl here so that we don't attempt to clear
it twice when going back to the demux.
If this is backported to 1.9, it also requires that the following patches
are backported as well :
MINOR: mux-h2: make h2c_decode_headers() return a status, not a count
MINOR: mux-h2: add a new dummy stream : h2_error_stream
MEDIUM: mux-h2: make h2c_decode_headers() support recoverable errors
When a decoding error is recoverable, we should emit a stream error and
not a connection error. This patch does this by carefully checking the
connection state before deciding to send a connection error. If only the
stream is in error, an RST_STREAM is sent.
This function used to return a byte count for the output produced, or
zero on failure. Not only this value is not used differently than a
boolean, but it prevents us from returning stream errors when a frame
cannot be extracted because it's too large, or from parsing a frame
and producing nothing on output.
This patch modifies its API to return <0 on errors, 0 on inability to
proceed, or >0 on success, irrelevant to the amount of output data.
In h2c_decode_headers() we update the buffer's length according to the
amount of data produced (outlen). But in case of HTX this outlen value
is not a quantity, just an indicator of success, resulting in the buffer
being added one extra byte and temporarily showing .data > .size, which
is wrong. Fortunately this is overridden when leaving the function by
htx_to_buf() so the impact only exists in step-by-step debugging, but
it definitely needs to be fixed.
This must be backported to 1.9.
When dealing with a server's H2 response, we used to set the
end-of-stream flag on the conn_stream and the stream before parsing
the response, which is incorrect since we can fail to process this
response by lack of room, buffer or anything. The extend of this problem
is still limited to a few rare cases, but with trailers it will cause a
systematic failure.
This fix must be backported to 1.9.
This function handles response HEADERS frames, it is not responsible
for creating new streams thus it must not check if we've reached the
stream count limit, otherwise it could lead to some undesired pauses
which bring no benefit.
This must be backported to 1.9.
If we exit this function because some data are pending in the rxbuf, we
currently don't indicate any blocking flag, which will prevent the operation
from being attempted again. Let's set H2_CF_DEM_SFULL in this case to indicate
there's not enough room in the stream buffer so that the operation may be
attempted again once we make room. It seems that this issue cannot be
triggered right now but it definitely will with trailers.
This fix should be backported to 1.9 for completeness.
h2c_restart_reading() is used at various place to resume processing of
demux data, but this one refrains from doing so if the mux is already
subscribed for receiving. It just happens that even if some incoming
frame processing is interrupted, the mux is always subscribed for
receiving, so this condition alone is not enough, it must be combined
with the fact that the demux buffer is empty, otherwise some resume
events are lost. This typically happens when we refrain from processing
some incoming data due to missing room in the stream's rxbuf, and want
to resume in h2c_rcv_buf(). It will become even more visible with trailers
since these ones want to have an empty rxbuf before proceeding.
This must be backported to 1.9.
In h2c_decode_headers() we mistakenly check for H2_F_DATA_END_STREAM
while we should check for H2_F_HEADERS_END_STREAM. Both have the same
value (1) but better stick to the correct flag.
When a session adds a connection to its connection list, we used to remove
connections for an another server if there were not enough room for our
server. This can't work, because those lists are now the list of connections
we're responsible for, not just the idle connections.
To fix this, allow for an unlimited number of servers, instead of using
an array, we're now using a linked list.
In h2_detach(), don't add the connection to the idle list if nb_streams
is at the max. This can happen if we already closed that stream before, so
its slot became available and was used by another stream.
This should be backported to 1.9.
Now that the HEADERS frame decoding is retryable, we can safely try to
fold CONTINUATION frames into a HEADERS frame when the END_OF_HEADERS
flag is missing. In order to do this, h2c_decode_headers() moves the
frames payloads in-situ and leaves a hole that is plugged when leaving
the function. There is no limit to the number of CONTINUATION frames
handled this way provided that all of them fit into the buffer. The
error reported when meeting isolated CONTINUATION frames has now changed
from INTERNAL_ERROR to PROTOCOL_ERROR.
Now there is only one (unrelated) remaining failure in h2spec.
The H2 demux only checks for too many streams in h2c_frt_stream_new(),
then refuses to create a new stream and causes the connection to be
aborted by sending a GOAWAY frame. This will also happen if any error
happens during the stream creation (e.g. memory allocation).
RFC7540#5.1.2 says that attempts to create streams in excess should
instead be dealt with using an RST_STREAM frame conveying either the
PROTOCOL_ERROR or REFUSED_STREAM reason (the latter being usable only
if it is guaranteed that the stream was not processed). In theory it
should not happen for well behaving clients, though it may if we
configure a low enough h2.max_concurrent_streams limit. This error
however may definitely happen on memory shortage.
Previously it was not possible to use RST_STREAM due to the fact that
the HPACK decompressor would be desynchronized. But now we first decode
and only then try to allocate the stream, so the decompressor remains
synchronized regardless of policy or resources issues.
With this patch we enforce stream termination with RST_STREAM and
REFUSED_STREAM if this protocol violation happens, as well as if there
is a temporary condition like a memory allocation issue. It will allow
a client to recover cleanly.
This could possibly be backported to 1.9. Note that this requires that
these five previous patches are merged as well :
MINOR: h2: add a bit-based frame type representation
MEDIUM: mux-h2: remove padlen during headers phase
MEDIUM: mux-h2: decode HEADERS frames before allocating the stream
MINOR: mux-h2: make h2c_send_rst_stream() use the dummy stream's error code
MINOR: mux-h2: add a new dummy stream for the REFUSED_STREAM error code
This patch introduces a new dummy stream, h2_refused_stream, in CLOSED
status with the aforementioned error code. It will be usable to reject
unexpected extraneous streams.
We currently have 2 dummy streams allowing us to send an RST_STREAM
message with an error code matching this one. However h2c_send_rst_stream()
still enforces the STREAM_CLOSED error code for these dummy streams,
ignoring their respective errcode fields which however are properly
set.
Let's make the function always use the stream's error code. This will
allow to create other dummy streams for different codes.
It's hard to recover from a HEADERS frame decoding error after having
already created the stream, and it's not possible to recover from a
stream allocation error without dropping the connection since we can't
maintain the HPACK context, so let's decode it before allocating the
stream, into a temporary buffer that will then be offered to the newly
created stream.
Three types of frames may be padded : DATA, HEADERS and PUSH_PROMISE.
Currently, each of these independently deals with padding and needs to
wait for and skip the initial padlen byte. Not only this complicates
frame processing, but it makes it very hard to process CONTINUATION
frames after a padded HEADERS frame, and makes it complicated to perform
atomic calls to h2s_decode_headers(), which are needed if we want to be
able to maintain the HPACK decompressor's context even when dropping
streams.
This patch takes a different approach : the padding is checked when
parsing the frame header, the padlen byte is waited for and parsed,
and the dpl value is updated with this padlen value. This will allow
the frame parsers to decide to overwrite the padding if needed when
merging adjacent frames.
Since commit f210191 ("BUG/MEDIUM: h2: don't accept new streams if
conn_streams are still in excess") we're refraining from reading input
frames if we've reached the limit of number of CS. The problem is that
it prevents such situations from working fine. The initial purpose was
in fact to prevent from reading new HEADERS frames when this happens,
and causes some occasional transfer hiccups and pauses with large
concurrencies.
Given that we now properly reject extraneous streams before checking
this value, we can be sure never to have too many streams, and that
any higher value is only caused by a scheduling reason and will go
down after the scheduler calls the code.
This fix must be backported to 1.9 and possibly to 1.8. It may be
tested using h2spec this way with an h2spec config :
while :; do
h2spec -o 5 -v -t -S -k -h 127.0.0.1 -p 4443 http2/5.1.2
done
We were returning a stream error of type PROTOCOL_ERROR on empty HEADERS
frames, but RFC7540#4.2 stipulates that we should instead return a
connection error of type FRAME_SIZE_ERROR.
This may be backported to 1.9 and 1.8 though it's unlikely to have any
real life effect.
Commit dc57236 ("BUG/MINOR: mux-h2: advertise a larger connection window
size") caused a WINDOW_UPDATE message to be sent early with the connection
to increase the connection's window size. It turns out that it causes some
minor trouble that need to be worked around :
- varnishtest cannot transparently cope with the WU frames during the
handshake, forcing all tests to explicitly declare the handshake
sequence ;
- some vtc scripts randomly fail if the WU frame is sent after another
expected response frame, adding uncertainty to some tests ;
- h2spec doesn't correctly identify these WU at the connection level
that it believes are the responses to some purposely erroneous frames
it sends, resulting in some errors being reported
None of these are a problem with real clients but they add some confusion
during troubleshooting.
Since the fix above was intended to increase the upload bandwidth, we
have another option which is to increase the window size with the first
WU frame sent for the connection. This way, no WU frame is sent until
one is really needed, and this first frame will adjust the window to
the maximum value. It will make the window increase slightly later, so
the client will experience the first round trip when uploading data,
but this should not be perceptible, and is not worth the extra hassle
needed to maintain our debugging abilities. As an extra bonus, a few
extra bytes are saved for each connection until the first attempt to
upload data.
This should possibly be backported to 1.9 and 1.8.
In some situations, if too short a frame header is received, we may leave
h2_process_demux() waking up the task again without checking that we were
already subscribed.
In order to avoid this once for all, let's introduce an h2_restart_reading()
function which performs the control and calls the task up. This way we won't
needlessly wake the task up if it's already waiting for I/O.
Must be backported to 1.9.
In mux_h2_unsubscribe, don't forget to leave the sending_list if
SUB_CALL_UNSUBSCRIBE was set. SUB_CALL_UNSUBSCRIBE means we were about
to be woken up for writing, unless the mux was too full to get more data.
If there's an unsubscribe call in the meanwhile, we should leave the list,
or we may be put back in the send_list.
This should be backported to 1.9.
In h2_snd_buf(), if we couldn't send the data because of flow control, and
the connection got a shutr, then add CS_FL_ERROR (or CS_FL_ERR_PENDING). We
will never get any window update, so we will never be unlocked, anyway.
No backport is needed.
When dealing with early data we scan the list of stream to notify them.
We're not supposed to have h2s->cs == NULL here but it doesn't cost much
to make the scan more robust and verify it before notifying.
No backport is needed.
If we had no pending read, it could be complicated to report an
RST_STREAM to a sender since we used to only report it via the
rx side if subscribed. Similarly in h2_wake_some_streams() we
now try all methods, hoping to catch all possible events.
No backport is needed.
In order to report an error to the data layer, we have different ways
depending on the situation. At a lot of places it's open-coded and not
always correct. Let's create a new function h2s_alert() to handle this
task. It tries to wake on recv() first, then on send(), then using
wake().
In the mux h2, make sure we set CS_FL_ERR_PENDING and wake the recv task,
instead of setting CS_FL_ERROR, if CS_FL_EOS is not set, so if there's
potentially still some data to be sent.
Commiy 8519357c ("BUG/MEDIUM: mux-h2: report asynchronous errors in
h2_wake_some_streams()") addressed an issue with synchronous errors
but forgot to fix the call places to also pass CS_FL_ERR_PENDING
instead of CS_FL_ERROR.
No backport is needed.
We most often store the mux context there but it can also be something
else while setting up the connection. Better call it "ctx" and know
that it's the owner's context than misleadingly call it mux_ctx and
get caught doing suspicious tricks.
The SUB_CAN_SEND/SUB_CAN_RECV enum values have been confusing a few
times, especially when checking them on reading. After some discussion,
it appears that calling them SUB_RETRY_SEND/SUB_RETRY_RECV more
accurately reflects their purpose since these events may only appear
after a first attempt to perform the I/O operation has failed or was
not completed.
In addition the wait_reason field in struct wait_event which carries
them makes one think that a single reason may happen at once while
it is in fact a set of events. Since the struct is called wait_event
it makes sense that this field is called "events" to indicate it's the
list of events we're subscribed to.
Last, the values for SUB_RETRY_RECV/SEND were swapped so that value
1 corresponds to recv and 2 to send, as is done almost everywhere else
in the code an in the shutdown() call.
Today the demux only wakes a stream up after receiving some contents, but
not necessarily on close or error. Let's do it based on both error flags
and both EOS flags. With a bit of refinement we should be able to only do
it when the pending bits are there but not the static ones.
No backport is needed.
This function is called when dealing with a connection error or a GOAWAY
frame. It used to report a synchronous error instead of an asycnhronous
error, which can lead to data truncation since whatever is still available
in the rxbuf will be ignored. Let's correctly use CS_FL_ERR_PENDING instead
and only fall back to CS_FL_ERROR if CS_FL_EOS was already delivered.
No backport is needed.
If EOS has already been reported on the conn_stream, there won't be
any read anymore to turn ERR_PENDING into ERROR, so we have to do
report it directly.
No backport is needed.
The h2s pointer was used to scan fctl lists prior to being used to scan
the send list by ID, so it could appear non-null eventhough the list is
empty, resulting in misleading information on empty connections.
No backport is needed.
Most of the time when we issue "show fd" to dump a mux's state, it's
to figure why a transfer is frozen. Connection, stream and conn_stream
states are critical there. And most of the time when this happens there
is a single stream left in the H2 mux, so let's always dump the last
known stream on show fd, as most of the time it will be the one of
interest.
Commit 7505f94f9 ("MEDIUM: h2: Don't use a wake() method anymore.")
changed the conditions to restart demuxing so that this happens as soon
as something is read. But similar to previous fix, at an end of stream
we may be woken up with nothing to read but data still available in the
demux buffer, so we must also use this as a valid condition for demuxing.
No backport is needed, this is purely 1.9.
Commit 082f559d3 ("BUG/MEDIUM: h2: restart demuxing after releasing
buffer space") tried to address a situation where transfers could stall
after a read, but the condition was not completely covered : some stalls
may still happen at end of stream because there's nothing anymore to
receive and the last data lie in the demux buffer. Thus we must also
consider this state as a valid condition to restart demuxing.
No backport is needed.