mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2025-01-07 04:39:35 +00:00
DOC: add some description of the pending rework of the buffer structure
The "struct buffer" needs to be reworked, this new doc lists the changes and steps to do this.
This commit is contained in:
parent
633f3bffed
commit
fa9f9ccd6f
245
doc/internals/buffer-rework.txt
Normal file
245
doc/internals/buffer-rework.txt
Normal file
@ -0,0 +1,245 @@
|
||||
2018-05-18 - Buffer rework
|
||||
|
||||
1. Summary
|
||||
|
||||
The situation with the current buffer structure is becoming problematic in
|
||||
the newly introduced muxes and causes serious difficulties preventing muxes
|
||||
from being used on both sides, unless requiring that all code is duplicated
|
||||
to use buf->i on the Rx path and buf->o on the Tx path.
|
||||
|
||||
|
||||
2. History
|
||||
|
||||
A very long time ago, buffers were used to receive data using the recv() call,
|
||||
to parse them, forward them, and send them over the network using the send()
|
||||
call. Buffers were split into (buffer,channel) when some layers started to be
|
||||
introduced, and were reorganized a few times to ease content processing and
|
||||
rewriting. The current definition of the buffer structure is the following :
|
||||
|
||||
struct buffer {
|
||||
char *p;
|
||||
uint size;
|
||||
uint i;
|
||||
uint o;
|
||||
char data[0];
|
||||
};
|
||||
|
||||
data p
|
||||
| |
|
||||
V V
|
||||
+-----------+--------------------+------------+-------------+
|
||||
| |////////////////////|////////////| |
|
||||
+-----------+--------------------+------------+-------------+
|
||||
<---------------------------------------------------------> size
|
||||
<------------------> <---------->
|
||||
o i
|
||||
|
||||
Pointer (p) is initialized to (data) when the buffer is empty. Data are
|
||||
received after (p+i), increasing (i) by the number of bytes read. Data are sent
|
||||
from (p-o) for up to (o) bytes, decreasing (o) by the number of bytes sent.
|
||||
Forwarding data in the channel consists in advancing (p) by the number of bytes
|
||||
to forward, increasing (o) and decreasing (i) by the same amount.
|
||||
|
||||
This representation is convenient for channel operations because most of them
|
||||
require to parse input data between (p) and (p+i), and to have a simple way to
|
||||
forward data. Additionally, it's always easy to know if some data are scheduled
|
||||
for departure (o), or if the buffer has some room available (size-i-o).
|
||||
|
||||
|
||||
3. Problems
|
||||
|
||||
When applets were introduced, the initial code that was made to write data into
|
||||
the output part was modified to send it into the input part since we had to
|
||||
rely on the stream code to forward these data via the channel. This explains
|
||||
the flood of bi_* functions that were introduced to perform the same operations
|
||||
as the initial bo_*, to write into an input buffer from an applet.
|
||||
|
||||
Health checks however continue to use output because checks do not use streams
|
||||
nor channels. Thus the check buffers use buf->o for requests and buf->i for
|
||||
responses.
|
||||
|
||||
The introduction of muxes has changed this again by requiring that most request
|
||||
code was able to write to buf->i, pretending to be the equivalent of a socket
|
||||
recv() call. New bi_* functions had to be created to write headers and chunks
|
||||
from the HTTP/2 mux. Conversely, it was made necessary to parse HTTP traffic
|
||||
from buf->o while all the original code was made to parse this from buf->i.
|
||||
|
||||
Furthermore, implementing an outgoing mux (eg: HTTP/2) will require to
|
||||
duplicate a lot of the code to use buf->i instead of buf->o and conversely,
|
||||
just because the mux will not be placed on the same side of the buffer. Not
|
||||
only it complicates code maintenance but it also emphasizes the risk to use
|
||||
the wrong function at any moment.
|
||||
|
||||
From a performance perspective, applets have to suffer a useless copy most of
|
||||
the time, only due to API limitatoins : it is not possible to write directly to
|
||||
an input buffer, one has to write to a chunk and then copy it into a buffer. A
|
||||
compatible structure could allow to share the same data between the chunk and
|
||||
the buffer without having to perform an extra copy.
|
||||
|
||||
|
||||
4. Proposal
|
||||
|
||||
In checks and muxes, it is obvious that a single "side" of the buffer is used,
|
||||
and it generally is the one associated with the I/O to be performed. Only the
|
||||
channel requires the distinction between (i) and (o).
|
||||
|
||||
The proposal is to remove this distinction from the buffer and move ->o into
|
||||
the channel.
|
||||
|
||||
A buffer will then become only a linear (possibly wrapping) storage area with
|
||||
a beginning, and an end.
|
||||
|
||||
Given the experience gathered from past buffer API updates, we know that the
|
||||
buffer's end is not as much important as its data length. This will give the
|
||||
current representation :
|
||||
|
||||
|
||||
struct buffer {
|
||||
void *area; // start of the storage area
|
||||
uint head; // start offset of remaining data relative to area
|
||||
uint len; // contents length after head
|
||||
uint size; // size of the storage area (wrapping point)
|
||||
};
|
||||
|
||||
area
|
||||
|
|
||||
V
|
||||
+-----------+---------------------------------+-------------+
|
||||
| |/////////////////////////////////| |
|
||||
+-----------+---------------------------------+-------------+
|
||||
<---------------------------------------------------------> size
|
||||
<---------> <------------------------------->
|
||||
head len
|
||||
|
||||
The channel will contain an "output" field corresponding to the current buf->o,
|
||||
indicating how many bytes of the current buffer are actually scheduled for
|
||||
being forwarded and must not be considered anymore. It means that a stream
|
||||
parser will simply start to parse from (buf->area + buf->head + chn->output)
|
||||
and stop at (buf->area + buf->head + buf->len).
|
||||
|
||||
For esnding data, the caller of cs_send() or whatever function will have to
|
||||
pass the desired number of bytes to send, and one will not expect anymore that
|
||||
all the buffer's contents have to be sent. In general the caller will have
|
||||
access to chn->output if it needs to use this (typically from the stream
|
||||
interface code at the moment).
|
||||
|
||||
|
||||
5. First implementation step
|
||||
|
||||
The first step will consist in limiting the changes to the current buffers. The
|
||||
buffer structure will still contain both a descriptor and the storage area. A
|
||||
buffer will first be declared this way :
|
||||
|
||||
struct buffer {
|
||||
uint head; // start offset of remaining data relative to area
|
||||
uint len; // contents length after head
|
||||
uint size; // size of the storage area (wrapping point)
|
||||
void area[0]; // start of the storage area
|
||||
};
|
||||
|
||||
Thanks to this, no changes will have to be performed on memory management, and
|
||||
buffers will continue to be allocated from a pool of size (sizeof(buffer) +
|
||||
tune.bufsize).
|
||||
|
||||
The following translations will have to be performed on the code :
|
||||
- occurrences of (buf->i + buf->o) will have to be replaced with (buf->len)
|
||||
- bi_ptr() -> ci_ptr() ; bi_end() -> b_head()+b_size() ; bi_del() -> b_del()
|
||||
- bo_ptr() -> b_head() ; bo_end() -> co_end()
|
||||
- b_adv() -> c_adv() ; b_rew() -> c_rew()
|
||||
- buf->o will have to be replaced with either chn->output or a function
|
||||
argument containing a copy of chn->output. These ones should cancel out
|
||||
at the end of the operation.
|
||||
- buf->i -> (b_len(buf) - chn->output)
|
||||
|
||||
Temporary difficulties :
|
||||
- compression makes use of both (i) and (o), by taking care of only touching
|
||||
(i) and never (o). The filters know how not to touch (o), and the internal
|
||||
compression API needs a small update so that this previous ->o value is
|
||||
passed as an argument that the filter will collect from the channel. If it
|
||||
is simpler (it probably isn't), a temporary "struct oldbuf" could be
|
||||
created to emulate the old behaviour and be fed/used by the filters code.
|
||||
|
||||
- buffer_slow_realign() distinguishes input data from output data so that the
|
||||
output data is always placed at the end, leaving a clean contigous buffer
|
||||
once forwarded. Instead, a "split" argument will have to be added so that
|
||||
the caller may decide where to split the contents. Muxes will pass zero
|
||||
here while channels will pass chn->output.
|
||||
|
||||
|
||||
6. Second implementation step
|
||||
|
||||
The second step will consist in making "struct buffer" only hold the descriptor
|
||||
and not the data anymore. It will then look like this :
|
||||
|
||||
struct buffer {
|
||||
void *area; // start of the storage area
|
||||
uint head; // start offset of remaining data relative to area
|
||||
uint len; // contents length after head
|
||||
uint size; // size of the storage area (wrapping point)
|
||||
};
|
||||
|
||||
Each buffer allocation will have to atomically allocate a struct buffer and an
|
||||
area. Buffer copies will consist in exchanging the "struct buffer" contents
|
||||
only.
|
||||
|
||||
The chunk API must then be updated so that some new versions of chunk_putblk(),
|
||||
chunk_printf() etc can write to a storage area, and so that bi_putchk() and
|
||||
bo_putchk() instead can swap the storage areas when possible.
|
||||
|
||||
At this point b_size() will be used to know where to release the allocated
|
||||
area. The storage will simply consist in (start,len) which is perfectly suited
|
||||
to have slabs. Just like chunks, b_size()==0 can be used to mention that no
|
||||
free() must be done on the area. Doing so will make it much simpler to send
|
||||
pre-formated messages (eg: error messages) into a buffer because such messages
|
||||
will then be stored into a "struct ist" and sending such a message will be as
|
||||
simple as doing :
|
||||
|
||||
b->area = ist.str;
|
||||
b->len = ist.len;
|
||||
b->head = 0;
|
||||
b->size = 0;
|
||||
|
||||
The chunk struct can then be removed and replaced everywhere with a struct
|
||||
buffer. Only the functions will remain, though they will likely have to be
|
||||
renamed. Maybe the buffer manipulation functions will have to be split between
|
||||
those which support wrapping and those which don't (chunks don't support
|
||||
wrapping).
|
||||
|
||||
The buf_empty structure will then disappear since a small 20-bytes structure
|
||||
will be enough to represent an empty buffer.
|
||||
|
||||
|
||||
7. Third implementation step
|
||||
|
||||
The third step will consist in placing a struct buffer into the struct channel.
|
||||
This way no allocation is needed at all, and any storage can be used to deliver
|
||||
contents. This allows to trivially upgrade a buffer on the fly by picking from
|
||||
a different slab. It also allows to deliver error messages without ever having
|
||||
to perform a buffer allocation. Doing so removes the need for the early buffer
|
||||
allocation for the response in process_stream(), as it is only needed to have a
|
||||
reliable place to send an error message to. This will ensure the buffer
|
||||
allocator can be simplified and made more robust against the risk of deadlock
|
||||
on memory shortage.
|
||||
|
||||
|
||||
8. Caveats
|
||||
|
||||
The following points require extra care :
|
||||
- there will be some subtracts to figure the buffer "input" length (formerly
|
||||
buf->i). In the past it always used to be an unsigned value. Extreme care
|
||||
will have to be taken to always use an inline function to compute this so
|
||||
that it doesn't accidently become signed.
|
||||
|
||||
- supporting buf->size==0 to point to a special string may require some extra
|
||||
checks to avoid causing an integer underflow when calculating (size-len) or
|
||||
(size-len-head) to figure the available room.
|
||||
|
||||
- it is very likely that some further changes will be tempting to do in the
|
||||
channel to better integrate the buffer (which becomes very similar to the
|
||||
pipe), but we must not go as far as removing the visibility of the "struct
|
||||
buffer" because it will be used as entry point for many functions.
|
||||
|
||||
- it is likely that a number of the chunk_*, bi_* and bo_* variants have very
|
||||
minor variations like return codes or error checking that can make their
|
||||
replacement very tricky. Each set of such functions must be studied in
|
||||
advance, and their users as well.
|
Loading…
Reference in New Issue
Block a user