DOC: add more design feedback on the new layering model

Introduce the distinction between structured messages and raw data,
and how to make them coexist in a buffer. This is still a design draft.
This commit is contained in:
Willy Tarreau 2018-07-23 17:29:37 +02:00
parent 842ed9b1cb
commit 7cc040cc74

View File

@ -102,3 +102,172 @@ Both operations should return a composite status :
- number of bytes transfered
- status flags (shutr, shutw, reset, empty, full, ...)
2018-07-23 - Update after merging rxbuf
---------------------------------------
It becomes visible that the mux will not always be welcome to decode incoming
data because it will sometimes imply extra memory copies and/or usage for no
benefit.
Ideally, when when a stream is instanciated based on incoming data, these
incoming data should be passed and the upper layers called, but it should then
be up these upper layers to peek more data in certain circumstances. Typically
if the pending connection data are larger than what is expected to be passed
above, it means some data may cause head-of-line blocking (HOL) to other
streams, and needs to be pushed up through the layers to let other streams
continue to work. Similarly very large H2 data frames after header frames
should probably not be passed as they may require copies that could be avoided
if passed later. However if the decoded frame fits into the conn_stream's
buffer, there is an opportunity to use a single buffer for the conn_stream
and the channel. The H2 demux could set a blocking flag indicating it's waiting
for the upper stream to take over demuxing. This flag would be purged once the
upper stream would start reading, or when extra data come and change the
conditions.
Forcing structured headers and raw data to coexist within a single buffer is
quite challenging for many code parts. For example it's perfectly possible to
see a fragmented buffer containing series of headers, then a small data chunk
that was received at the same time, then a few other headers added by request
processing, then another data block received afterwards, then possibly yet
another header added by option http-send-name-header, and yet another data
block. This causes some pain for compression which still needs to know where
compressed and uncompressed data start/stop. It also makes it very difficult
to account the exact bytes to pass through the various layers.
One solution consists in thinking about buffers using 3 representations :
- a structured message, which is used for the internal HTTP representation.
This message may only be atomically processed. It has no clear byte count,
it's a message.
- a raw stream, consisting in sequences of bytes. That's typically what
happens in data sequences or in tunnel.
- a pipe, which contains data to be forwarded, and that haproxy cannot have
access to.
The processing efficiency decreases with the higher complexity above, but the
capabilities increase. The structured message can contain anything including
serialized data blocks to be processed or forwarded. The raw stream contains
data blocks to be processed or forwarded. The pipe only contains data blocks
to be forwarded. The the latter ones are only an optimization of the former
ones.
Thus ideally a channel should have access to all such 3 storage areas at once,
depending on the use case :
(1) a structured message,
(2) a raw stream,
(3) a pipe
Right now a channel only has (2) and (3) but after the native HTTP rework, it
will only have (1) and (3). Placing a raw stream exclusively in (1) comes with
some performance drawbacks which are not easily recovered, and with some quite
difficult management still involving the reserve to ensure that a data block
doesn't prevent headers from being appended. But during header processing, the
payload may be necessary so we cannot decide to drop this option.
A long-term approach would consist in ensuring that a single channel may have
access to all 3 representations at once, and to enumerate priority rules to
define how they interact together. That's exactly what is currently being done
with the pipe and the raw buffer right now. Doing so would also save the need
for storing payload in the structured message and void the requirement for the
reserve. But it would cost more memory to process POST data and server
responses. Thus an intermediary step consists in keeping this model in mind but
not implementing everything yet.
Short term proposal : a channel has access to a buffer and a pipe. A non-empty
buffer is either in structured message format OR raw stream format. Only the
channel knows. However a structured buffer MAY contain raw data in a properly
formated way (using the envelope defined by the structured message format).
By default, when a demux writes to a CS rxbuf, it will try to use the lowest
possible level for what is being done (i.e. splice if possible, otherwise raw
stream, otherwise structured message). If the buffer already contains a
structured message, then this format is exclusive. From this point the MUX has
two options : either encode the incoming data to match the structured message
format, or refrain from receiving into the CS's rxbuf and wait until the upper
layer request those data.
This opens a simplified option which could be suited even for the long term :
- cs_recv() will take one or two flags to indicate if a buffer already
contains a structured message or not ; the upper layer knows it.
- cs_recv() will take two flags to indicate what the upper layer is willing
to take :
- structured message only
- raw stream only
- any of them
From this point the mux can decide to either pass anything or refrain from
doing so.
- the demux stores the knowledge it has from the contents into some CS flags
to indicate whether or not some structured message are still available, and
whether or not some raw data are still available. Thus the caller knows
whether or not extra data are available.
- when the demux works on its own, it refrains from passing structured data
to a non-empty buffer, unless these data are causing trouble to other
streams (HOL).
- when a demux has to encapsulate raw data into a structured message, it will
always have to respect a configured reserve so that extra header processing
can be done on the structured message inside the buffer, regardless of the
supposed available room. In addition, the upper layer may indicate using an
extra recv() flag whether it wants the demux to defragment serialized data
(for example by moving trailing headers apart) or if it's not necessary.
This flag will be set by the stream interface if compression is required or
if the http-buffer-request option is set for example. Probably that using
to_forward==0 is a stronger indication that the reserve must be respected.
- cs_recv() and cs_send() when fed with a message, should not return byte
counts but message counts (i.e. 0 or 1). This implies that a single call to
either of these functions cannot mix raw data and structured messages at
the same time.
At this point it looks like the conn_stream will have some encapsulation work
to do for the payload if it needs to be encapsulated into a message. This
further magnifies the importance of *not* decoding DATA frames into the CS's
rxbuf until really needed.
The CS will probably need to hold indication of what is available at the mux
level, not only in the CS. Eg: we know that payload is still available.
Using these elements, it should be possible to ensure that full header frames
may be received without enforcing any reserve, that too large frames that do
not fit will be detected because they return 0 message and indicate that such
a message is still pending, and that data availability is correctly detected
(later we may expect that the stream-interface allocates a larger or second
buffer to place the payload).
Regarding the ability for the channel to forward data, it looks like having a
new function "cs_xfer(src_cs, dst_cs, count)" could be very productive in
optimizing the forwarding to make use of splicing when available. It is not yet
totally clear whether it will split into "cs_xfer_in(src_cs, pipe, count)"
followed by "cs_xfer_out(dst_cs, pipe, count)" or anything different, and it
still needs to be studied. The general idea seems to be that the receiver might
have to call the sender directly once they agree on how to transfer data (pipe
or buffer). If the transfer is incomplete, the cs_xfer() return value and/or
flags will indicate the current situation (src empty, dst full, etc) so that
the caller may register for notifications on the appropriate event and wait to
be called again to continue.
Short term implementation :
1) add new CS flags to qualify what the buffer contains and what we expect
to read into it;
2) set these flags to pretend we have a structured message when receiving
headers (after all, H1 is an atomic header as well) and see what it
implies for the code; for H1 it's unclear whether it makes sense to try
to set it without the H1 mux.
3) use these flags to refrain from sending DATA frames after HEADERS frames
in H2.
4) flush the flags at the stream interface layer when performing a cs_send().
5) use the flags to enforce receipt of data only when necessary
We should be able to end up with sequencial receipt in H2 modelling what is
needed for other protocols without interfering with the native H1 devs.