MINOR: buffer: add a buffer list type with functions

The buffer ring is problematic in multiple aspects, one of which being
that it is only usable by one entity. With multiplexed protocols, we need
to have shared buffers used by many entities (streams and connection),
and the only way to use the buffer ring model in this case is to have
each entity store its own array, and keep a shared counter on allocated
entries. But even with the default 32 buf and 100 streams per HTTP/2
connection, we're speaking about 32*101*32 bytes = 103424 bytes per H2
connection, just to store up to 32 shared buffers, spread randomly in
these tables. Some users might want to achieve much higher than default
rates over high speed links (e.g. 30-50 MB/s at 100ms), which is 3 to 5
MB storage per connection, hence 180 to 300 buffers. There it starts to
cost a lot, up to 1 MB per connection, just to store buffer indexes.

Instead this patch introduces a variant which we call a buffer list.
That's basically just a free list encoded in an array. Each cell
contains a buffer structure, a next index, and a few flags. The index
could be reduced to 16 bits if needed, in order to make room for a new
struct member. The design permits initializing a whole freelist at once
using memset(0).

The list pointer is stored at a single location (e.g. the connection)
and all users (the streams) will just have indexes referencing their
first and last assigned entries (head and tail). This means that with
a single table we can now have all our buffers shared between multiple
streams, irrelevant to the number of potential streams which would want
to use them. Now the 180 to 300 entries array only costs 7.2 to 12 kB,
or 80 times less.

Two large functions (bl_deinit() & bl_get()) were implemented in buf.c.
A basic doc was added to explain how it works.
This commit is contained in:
Willy Tarreau 2024-08-28 18:27:51 +02:00
parent ac66df4e2e
commit 8f09bdce10
4 changed files with 303 additions and 0 deletions

View File

@ -0,0 +1,128 @@
2024-09-30 - Buffer List API
1. Use case
The buffer list API allows one to share a certain amount of buffers between
multiple entities, which will each see their own as lists of buffers, while
keeping a sharedd free list. The immediate use case is for muxes, which may
want to allocate up to a certain number of buffers per connection, shared
among all streams. In this case, each stream will first request a new list
for its own use, then may request extra entries from the free list. At any
moment it will be possible to enumerate all allocated lists and to know which
buffer follows which one.
2. Representation
The buffer list is an array of struct bl_elem. It can hold up to N-1 buffers
for N elements. The first one serves as the bookkeeping head and creates the
free list.
Each bl_elem contains a struct buffer, a pointer to the next cell, and a few
flags. The struct buffer is a real struct buffer for all cells, except the
first one where it holds useful data to describe the state of the array:
struct bl_elem {
struct buffer {
size_t size; // head: size of the array in number of elements
char *area; // head: not used (0)
size_t data; // head: number of elements allocated
size_t head; // head: number of users
} buf;
uint32_t next;
uint32_t flags;
};
There are a few important properties here:
- for the free list, the first element isn't part of the list, otherwise
there wouldn't be any head storage anymore.
- the head's buf.data doesn't include the first cell of the array, thus its
maximum value is buf.size - 1.
- allocations are always made by appending to end of the existing list
- releases are always made by releasing the beginning of the existing list
- next == 0 for an allocatable cell implies that all the cells from this
element to the last one of the array are free. This allows to simply
initialize a whole new array with memset(array, 0, sizeof(array))
- next == ~0 for an allocated cell indicates we've reached the last element
of the current list.
- for the head of the list, next points to the first available cell, or 0 if
the free list is depleted.
3. Example
The array starts like this, created with a calloc() and having size initialized
to the total number of cells. The number represented is the 'next' value. "~"
here standands for ~0 (i.e. end marker).
[1|0|0|0|0|0|0|0|0|0] => array entirely free
strm1: bl_get(0) -> 1 = assign 1 to strm1's first cell
[2|~|0|0|0|0|0|0|0|0] => strm1 allocated at [1]
1
strm1: bl_get(1) -> 2 = allocate one cell after cell 1
[3|2|~|0|0|0|0|0|0|0]
1
strm1: bl_get(2) -> 3 = allocate one cell after cell 2
[4|2|3|~|0|0|0|0|0|0]
1
strm2: bl_get(0) -> 4 = assign 4 to strm2's first cell
[5|2|3|~|~|0|0|0|0|0]
1 2
strm1: bl_put(1) -> 2 = release cell 1, jump to next one (2)
[1|5|3|~|~|0|0|0|0|0]
1 2
4. Manipulating buffer lists
The API is very simple, it allows to reserve a buffer for a new stream or for
an existing one, to release a stream's first buffer or release the entire
stream, and to initialize / release the whole array.
====================+==================+=======================================
Function | Arguments/Return | Description
--------------------+------------------+---------------------------------------
bl_users() | const bl_elem *b | returns the current number of users on
| ret: uint32_t | the array (i.e. buf.head).
--------------------+------------------+---------------------------------------
bl_size() | const bl_elem *b | returns the total number of
| ret: uint32_t | allocatable cells (i.e. buf.size-1)
--------------------+------------------+---------------------------------------
bl_used() | const bl_elem *b | returns the number of cells currently
| ret: uint32_t | in use (i.e. buf.data)
--------------------+------------------+---------------------------------------
bl_avail() | const bl_elem *b | returns the number of cells still
| ret: uint32_t | available.
--------------------+------------------+---------------------------------------
bl_init() | bl_elem *b | initializes b for n elements. All are
| uint32_t n | in the free list.
--------------------+------------------+---------------------------------------
bl_put() | bl_elem *b | releases cell <idx> to the free list,
| uint32_t n | possibly deleting the user. Returns
| ret: uint32_t | next cell idx or 0 if none (last one).
--------------------+------------------+---------------------------------------
bl_deinit() | bl_elem *b | only when DEBUG_STRICT==2, scans the
| | array to check for leaks.
--------------------+------------------+---------------------------------------
bl_get() | bl_elem *b | allocates a new cell after to add to n
| uint32_t n | or a new stream. Returns the cell or 0
| ret: uint32_t | if no more space.
====================+==================+=======================================

View File

@ -52,6 +52,26 @@ struct buffer {
#define BUF_WANTED ((struct buffer){ .area = (char *)1 })
#define BUF_RING ((struct buffer){ .area = (char *)2 })
/* An element of a buffer list (bl_*). They're all stored in an array. The
* holder contains a pointer to that array and a count. The first element
* (index zero) builds the free list and may never be used. All owners simply
* have a head and a tail index pointing to their own list. In order to ease
* initialization, for each allocatable cell, next==0 indicates that all
* following cells till the end of the array are free. The end of a list is
* marked by next==~0. For the head, next is always valid or is zero when no
* more entries are available. The struct element doesn't have holes. It's 24
* bytes in 32 bits and 40 bytes in 64 bits, so offsets are trivially obtained
* from indexes. The <next> pointer may be split into two 16 bits fields if
* needed in order to make room for something else later, since we don't
* expect to make 64k-buffer arrays. The first element's buf stores size,
* allocated space and number of users.
*/
struct bl_elem {
struct buffer buf;
uint32_t next;
uint32_t flags;
};
#endif /* _HAPROXY_BUF_T_H */
/*

View File

@ -55,6 +55,9 @@ int b_put_varint(struct buffer *b, uint64_t v);
int b_get_varint(struct buffer *b, uint64_t *vptr);
int b_peek_varint(struct buffer *b, size_t ofs, uint64_t *vptr);
void bl_deinit(struct bl_elem *head);
uint32_t bl_get(struct bl_elem *head, uint32_t idx);
/***************************************************************************/
/* Functions used to compute offsets and pointers. Most of them exist in */
/* both wrapping-safe and unchecked ("__" prefix) variants. Some returning */
@ -656,6 +659,79 @@ static inline struct buffer *br_del_head(struct buffer *r)
return br_head(r);
}
/*
* Buffer list management.
*/
/* Returns the number of users of at least one entry */
static inline uint32_t bl_users(const struct bl_elem *head)
{
return head->buf.head;
}
/* Returns the number of allocatable cells */
static inline uint32_t bl_size(const struct bl_elem *head)
{
return head->buf.size - 1;
}
/* Returns the number of cells currently in use */
static inline uint32_t bl_used(const struct bl_elem *head)
{
return head->buf.data;
}
/* Returns the number of cells still available */
static inline uint32_t bl_avail(const struct bl_elem *head)
{
return bl_size(head) - bl_used(head);
}
/* Initializes an array of <nbelem> elements of type bl_elem (one less will be
* allocatable). The initialized array is returned on success, otherwise NULL
* on allocation failure.
*/
static inline void bl_init(struct bl_elem *head, uint32_t nbelem)
{
BUG_ON_HOT(nbelem < 2);
memset(head, 0, nbelem * sizeof(*head));
head->buf.size = nbelem;
head->next = 1;
}
/* Puts the cell at index <idx> back into the list <head>. It must have been
* freed from its buffer before calling this, and must correspond to the head
* of the caller. It returns the new head for the caller (the next cell
* immediately after the current one), or zero if the list is empty, in which
* case the caller is considered as no longer belonging to the list.
*/
static inline uint32_t bl_put(struct bl_elem *head, uint32_t idx)
{
uint32_t n;
BUG_ON_HOT(!idx || idx >= head->buf.size);
n = head[idx].next;
/* if the element was the last one (head[idx].next == ~0) then the
* chain is entirely gone and the caller is no longer in the list.
*/
if (n == ~0) {
BUG_ON_HOT(!head->buf.head);
head->buf.head--; // #users
n = 0; // no next
}
/* If the free list was empty (next==0), this element becomes both the
* first and the last one, otherwise it inserts itself before the
* previous first free element.
*/
head[idx].next = head->next ? head->next : ~0U;
head->next = idx;
BUG_ON_HOT(!head->buf.data);
head->buf.data--; // one less allocated
return n;
}
#endif /* _HAPROXY_BUF_H */
/*

View File

@ -730,3 +730,82 @@ int b_peek_varint(struct buffer *b, size_t ofs, uint64_t *vptr)
size = b->data - ofs - data;
return size;
}
/*
* Buffer List management.
*/
/* Deinits an array of buffer list. It's the caller's responsibility to check
* that all buffers were already released. This should be done before any
* free() of the array.
*/
void bl_deinit(struct bl_elem *head)
{
BUG_ON_HOT(
/* make sure that all elements are properly released, i.e. all
* are reachable from the free list.
*/
({
uint32_t elem = 0, free = 1;
if (head->next && !head->buf.data && !head->buf.head) {
do {
free++;
elem = head[elem].next ? head[elem].next : elem + 1;
} while (elem != ~0 && elem != head->buf.size);
}
free != head->buf.size;
}), "bl_deinit() of a non-completely released list");
}
/* Gets the index of a spare entry in the buffer list, to be used after element
* of index <idx>. It is detached, appended to the end of the existing list and
* marked as the last one. If <idx> is zero, the caller requests the creation
* of a new list entry. If no more buffer slots are available, the function
* returns zero.
*/
uint32_t bl_get(struct bl_elem *head, uint32_t idx)
{
uint32_t e, n;
BUG_ON_HOT(idx >= head->buf.size);
/* Get the first free element. In the head it's always a valid index or
* 0 to indicate the end of list. We can then always dereference it,
* and if 0 (empty, which is rare), it'll loop back to itself. This
* allows us to save a test in the fast path.
*/
e = head->next; // element to be allocated
n = head[e].next; // next one to replace the free list's top
if (!n) {
/* Happens only with a freshly initialized array, or when the
* free list is depleted (e==0).
*/
if (!e)
goto done;
/* n is in the free area till the end, let's report the next
* free entry, otherwise leave it at zero to mark the end of
* the free list.
*/
if (e + 1 != head->buf.size)
n = e + 1;
}
head->next = n == ~0U ? 0 : n;
head->buf.data++;
if (idx) {
/* append to a tail: idx must point to a tail */
BUG_ON_HOT(head[idx].next != ~0);
head[idx].next = e;
}
else {
/* allocate a new user and offer it this slot */
head->buf.head++; // #users
}
head[e].next = ~0; // mark the end of list
done:
/* and finally return the element's index */
return e;
}