the function htx_find_blk() is used by only one function, htx_truncate(). So
because this function does nothing very smart, we don't use it anymore. It will
be removed by another commit.
The filters filtering HTX body, in the callback http_payload, must now loop on
an HTX message starting from the first block position. The offset passed as
parameter is relative to this position and not the head one. It is mandatory
because once filtered, data are now forwarded using the function
channel_htx_fwd_payload(). So the first block position is always updated.
The functions channel_htx_fwd_payload() and channel_htx_fwd_all() should now be
used to forward, respectively, a part of the HTX payload or all of it. These
functions forward data and update the first block position.
Applets must never rely on the first block position to consume an HTX
message. The head position must be used instead. For the request it is always
the start-line. At this stage, it is not a bug, because the first position of
the request is never changed by HTX analysers.
We don't store the start-line position anymore in the HTX message. Instead we
store the first block position to analyze. For now, it is almost the same. But
once all changes will be made on this part, this position will have to be used
by HTX analyzers, and only in the analysis context, to know where the analyse
should start.
When new blocks are added in an HTX message, if the first block position is not
defined, it is set. When the block pointed by it is removed, it is set to the
block following it. -1 remains the value to unset the position. the first block
position is unset when the HTX message is empty. It may also be unset on a
non-empty message, meaning every blocks were already analyzed.
From HTX analyzers point of view, this position is always set during headers
analysis. When they are waiting for a request or a response, if it is unset, it
means the analysis should wait. But once the analysis is started, and as long as
headers are not forwarded, it points to the message start-line.
As mentionned, outside the HTX analysis, no code must rely on the first block
position. So multiplexers and applets must always use the head position to start
a loop on an HTX message.
The function channel_htx_fwd_headers() should now be used by HTX analyzers to
forward all headers of an HTX message, from the start-line to the corresponding
EOH. It takes care to update the star-line position.
1xx informational messages (all except 101) are now part of the HTTP reponse,
semantically speaking. These messages are not followed by an EOM anymore,
because a final reponse is always expected. All these parts can also be
transferred to the channel in same time, if possible. The HTX response analyzer
has been update to forward them in loop, as the legacy one.
In the function htx_xfer_blks(), we take care to transfer all headers in one
time. When the current block is a start-line, we check if there is enough space
to transfer all headers too. If not, and if the destination is empty, a parsing
error is reported on the source.
The H2 multiplexer is the only one to use this function. When a parsing error is
reported during the transfer, the flag CS_FL_EOI is also set on the conn_stream.
The field hdrs_bytes has been added in the structure htx_sl. It should be used
to set how many bytes are help by all headers, from the start-line to the
corresponding EOH block. it must be set to -1 if it is unknown.
Because the channel_recv_max() always return the right value, for HTX and legacy
streams, we don't need to set this flag. The multiplexer don't use it anymore.
Now, the SI calls h2_rcv_buf() with the right count value. So we can rely on
it. Unlike the H1 multiplexer, it is fairly easier for the H2 multiplexer
because the HTX message already exists, we only transfer blocks from the H2S to
the channel. And this part is handled by htx_xfer_blks().
Now, the SI calls h1_rcv_buf() with the right count value. So we can rely on
it. During the parsing, we now really respect this value to be sure to never
exceed it. To do so, once headers are parsed, we should estimate the size of the
HTX message before copying data.
This patch makes the function more accurate. Thanks to the function
htx_get_max_blksz(), the transfer of data has been simplified. Note that now the
total number of bytes copied (metadata + payload) is returned. This slighly
change how the function is used in the H2 multiplexer.
This functions should be used to get the maximum size for a block, not exceeding
the max amount of bytes passed in argument. Thus max may be set to -1 to have no
limit.
When channel_recv_max() is called for an HTX stream, we fall back on the HTX
version. This function is called from si_cs_recv(). This will let us pass the
max amount of bytes to read to HTX multiplexers.
The first block is the start-line, if defined. Otherwise it the head of the HTX
message. So now, during HTTP analysis, lookup are all done using the first block
instead of the head. Concretely, for now, it is the same because only one HTTP
message is stored at a time in an HTX message. 1xx informational messages are
handled separatly from the final reponse and from each other. But it will make
sense when the 1xx informational messages and the associated final reponse will
be stored in the same HTX message.
Since the HTX start-line is now referenced by position instead of by its payload
address, it is fairly easier to replace it. No need to search the rigth block to
find the start-line comparing the payloads address. It just enough to get the
block at the position sl_pos.
Now, we only return the start-line. If not found, NULL is returned. No lookup is
performed and the HTX message is no more updated. It is now the caller
responsibility to update the position of the start-line to the right value. So
when it is not found, i.e sl_pos is set to -1, it means the last start-line has
been already processed and the next one has not been inserted yet.
It is mandatory to rely on this kind of warranty to store 1xx informational
responses and final reponse in the same HTX message.
in the H2 multiplexer, when a HEADERS frame is built before sending it, we have
the warranty the start-line is the head of the HTX message. It is safer to rely
on this fact than on the sl_pos value. For now, it's safe to use sl_pos in muxes
because HTTP 1xx messages are considered as full messages in HTX and only one
HTTP message can be stored at a time in HTX. But we are trying to handle 1xx
messages as a part of the reponse message. In this way, an HTTP reponse will be
the sum of all 1xx informational messages followed by the final response. So it
will be possible to have several start-line in the same HTX message. And the
sl_pos will point to the first unprocessed start-line from the analyzers point
of view.
It is the first block relatively to the start-line. So it is the start-line if
its position is set (sl_pos != -1), otherwise it is the head. The functions
htx_get_first() and htx_get_first_blk() can be used to get it. This change is
mandatory to consider 1xx informational messages as part of a response.
The head of an HTX message is heavily used whereas the wrap position is only
used when a block is added or removed. So it is more logical to store the head
position in the HTX message instead of the wrap one. The wrap position can be
easily deduced. To get it, the new function htx_get_wrap() may be used.
We've been emitting warnings for over 5 years (since 1.5-dev22) about
configs accidently carrying multiple servers with the same name in the
same backend, and this starts to cause some real trouble in dynamic
environments since it's still very difficult to accurately process
a state-file and we still can't transport a server's name over the
peers protocol because of this.
It's about time to force users to fix their configs if they still
hadn't given that there is zero technical justification for doing this,
beyond the "yyp" (or copy-paste accident) when editing the config.
The message remains as clear as before, indicating the file and lines
of the conflict so that the user can easily fix it.
On armv7 haproxy doesn't work because of the fixes on the double-word
CAS. There are two issues. The first one is that the last argument in
case of dwcas is a pointer to the set of value and not a value ; the
second is that it's not enough to cast the data as (void*) since it will
be a single word. Let's fix this by using the pointers as an array of
long. This was tested on i386, armv7, x86_64 and aarch64 and it is now
fine. An alternate approach using a struct was attempted as well but it
used to produce less optimal code.
This fix must be backported to 1.9. This fixes github issue #105.
Cc: Olivier Houchard <ohouchard@haproxy.com>
In pendconn_redistribute() we scan the queue using eb32_next() on the
node we've just deleted, which is wrong since the node is not in the
tree anymore, and it could dereference one node that has already been
released by another thread. Note that we cannot use eb32_first() in the
loop here instead because we need to skip pendconns having SF_FORCE_PRST.
Instead, let's keep a copy of the next node before deleting it.
In addition, the pendconn retrieved there is wrong, it uses &node as
the pointer instead of node, resulting in very quick crashes when the
server list is scanned.
Fortunately this only happens when "option redispatch" is used in
conjunction with "maxconn" on server lines, "cookie" for the stickiness,
and when a server goes down with entries in its queue.
This bug was introduced by commit 0355dabd7 ("MINOR: queue: replace
the linked list with a tree") so the fix must be backported to 1.9.
In fwrr_get_next_server(), we optionally pass a server to avoid. It
usually points to the current server during a redispatch operation. If
this server is usable, an "avoided" pointer is set and we continue to
look for another server. If in the end no other server is found, then
we fall back to this avoided one, which is still better than nothing.
The problem that may arise with threads is that in the mean time, this
avoided server might have received extra connections and might not be
usable anymore. This causes it to be queued a second time in the "full"
list and the loop to search for a server again, ending up on this one
again and so on.
This patch makes sure that we break out of the loop when we have to
pick the avoided server. It's probably what the code intended to do
as the current break statement causes fwrr_update_position() and
fwrr_dequeue_srv() to be called again on the avoided server.
It must be backported to 1.9 and 1.8, and seems appropriate for older
versions though it's unclear what the impact of this bug might be
there since the race doesn't exist and we're left with the double
update of the server's position.
The unused fd_del and fd_skip were being abused during debugging sessions
as general purpose event counters. With their removal, let's officially
have dedicated counters for such use cases. These counters are called
"ctr0".."ctr2" and are listed at the end when DEBUG_DEV is set.
starting with OpenSSL 1.0.0 recommended way to disable compression is
using SSL_OP_NO_COMPRESSION when creating context.
manipulations with SSL_COMP_get_compression_methods, sk_SSL_COMP_num
are only required for OpenSSL < 1.0.0
Since commit 88698d9 ("MEDIUM: connections: Add a way to control the
number of idling connections.") when building without threads, gcc
complains that the operations made on the idle_orphan_conns[] list is
out of bounds, which is always false since 1) <i> can only equal zero,
and 2) given it's equal to <tid> we never even enter the loop. But as
usual it thinks it knows better, so let's mask the origin of this <i>
value to shut it up. Another solution consists in making <i> unsigned
and adding an explicit range check.
Now when we fail to send because the mux buffer is full, before giving
up and marking MFULL, we try to allocate another buffer in the mux's
ring to try again. Thanks to this (and provided there are enough buffers
allocated to the mux's ring), a single stream picked in the send_list
cannot steal all the mux's room at once. For this, we expand the ring
size to 31 buffers as it seems to be optimal on benchmarks since it
divides the number of context switches by 3. It will inflate each H2
conn's memory by 1 kB.
The bandwidth is now much more stable. Prior to this, it a test on
h2->h1 with very large objects (1 GB), a few tens of connections and
a few tens of streams per connection would show a varying performance
between 34 and 95 Gbps on 2 cores/4 threads, with h2_snd_buf() stopped
on a buffer full condition between 300000 and 600000 times per second.
Now the performance is constantly between 88 and 96 Gbps. Measures show
that buffer full conditions are met around only 159 times per second
in this case, or rougly 2000 to 4000 times less often.
This makes the code more readable and reduces the calls to br_tail().
In addition, all calls to h2_get_buf() are now made via this local
variable, which should significantly help for retries.
Now send() uses a loop to iterate over all buffers to be sent. These
buffers are released and deleted from the vector once completely sent.
If any buffer gets released, offer_buffers() is called to wake up some
waiters.
For now it's only one buffer long so the head and tails are always the
same, thus it doesn't change what used to work. In short, br_tail(h2c->mbuf)
was inserted everywhere we used to have h2c->mbuf.
The purpose is to manipulate rings made of series of buffers so that
it is possible to continue to work on a next buffer once one is full.
This will be used by muxes to deal with contention between multiple
streams and a single output buffer. No data is expected to span over
multiple buffers, all of them will be used like a regular buffer. This
will significantly limit the amount of changes and the code complexity
while still supporting larger output buffering.
The ring is made of a head and a tail indexes both of which point to a
buffer descriptor. At least one descriptor is always valid, so it could
be seen as a form of pagination always presenting one buffer. The root
of the ring is itself stored into a buffer descriptor so that the user
only has to declare a buffer array and to call br_init() on it in order
to use it.