Move the definition of the various _HA_ATOMIC_* macros that use
__atomic_* in the #if GCC_VERSION >= 4.7, not just after it, so that we
can build with older versions of gcc again.
Instead of abusing the SUB_CALL_UNSUBSCRIBE flag, revamp the H2 code a bit
so that it just checks if h2s->sending_list is empty to know if the tasklet
of the stream_interface has been waken up or not.
send_wait is now set to NULL in h2_snd_buf() (ideally we'd set it to NULL
as soon as we're waking the tasklet, but it can't be done, because we still
need it in case we have to remove the tasklet from the task list).
In h2_subscribe(), don't add ourself to the send_list if we're already in it.
That may happen if we try to send and fail twice, as we're only removed
from the send_list if we managed to send data, to promote fairness.
Failing to do so can lead to either an infinite loop, or some random crashes,
as we'd get the same h2s in the send_list twice.
This should be backported to 1.9.
In the h1 and h2 muxes, make sure we unsubscribed before destroying the
mux context.
Failing to do so will lead in a segfault later, as the connection will
attempt to dereference its conn->send_wait or conn->recv_wait, which pointed
to the now-free'd mux context.
This was introduced by commit 39a96ee16e, so
should only be backported if that commit gets backported.
Commit a8f57d51a ("MINOR: cli/activity: report the accept queue sizes
in "show activity"") broke the single-threaded build because the
accept-rings are not implemented there. Let's ifdef this out. Ideally
we should start to think about always having such elements initialized
even without threads to improve the test coverage.
As expected, commit cde7902ac ("MEDIUM: tasks: improve fairness between
the local and global queues") broke the build with threads disabled,
and I forgot to rerun this test before committing. No backport is
needed.
Whenever HAProxy was reloaded with rotated keys, the resumption would be
broken for previous encryption key. The bug was introduced with the addition
of 80 byte keys in 9e7547 (MINOR: ssl: add support of aes256 bits ticket keys
on file and cli.).
This fix needs to be backported to 1.9.
The allocated trash chunk is not freed properly and causes a memory leak
exhibited as the growth in the trash pool allocations. Bug was introduced
in commit 271022 (BUG/MINOR: map: fix map_regm with backref).
This should be backported to all branches where the above commit was
backported.
In the past we used to reduce the number of tasks consulted at once when
some niced tasks were present in the run queue. This was dropped in 1.8
when the scheduler started to take batches. With the recent fixes it now
becomes possible to restore this behaviour which guarantees a better
latency between tasks when niced tasks are present. Thanks to this, with
the default number of 200 for tune.runqueue-depth, with a parasitic load
of 14000 requests per second, nice 0 gives 14000 rps, nice 1024 gives
12000 rps and nice -1024 gives 16000 rps. The amplitude widens if the
runqueue depth is lowered.
The offset calculated for the nice value used to be wrong for a long
time and got even worse when the improved multi-thread sheduler was
implemented because it continued to rely on the run queue size, which
become irrelevant given that we extract tasks in batches, so the run
queue size moves following a sawtooth form.
However the offsets much better reflects insertion positions in the
queue, so it's worth dropping this rq_size component of the equation.
Last point, due to the batches made of runqueue-depth entries at once,
the higher the depth, the lower the effect of the nice setting since
values are picked together in batches and placed into a list. An
intuitive approach consists in multiplying the nice value with the
batch size to allow tasks to participate to a different batch. And
experimentation shows that this works pretty well.
With a runqueue-depth of 16 and a parasitic load of 16000 requests
per second on 100 streams, a default nice of 0 shows 16000 requests
per second for nice 0, 22000 for nice -1024 and 10000 for nice 1024.
The difference is even bigger with a runqueue depth of 5. At 200
however it's much smoother (16000-22000).
Tasks allowed to run on multiple threads, as well as those scheduled by
one thread to run on another one pass through the global queue. The
local queues only see tasks scheduled by one thread to run on itself.
The tasks extracted from the global queue are transferred to the local
queue when they're picked by one thread. This causes a priority issue
because the global tasks experience a priority contest twice while the
local ones experience it only once. Thus if a tasks returns still
running, it's immediately reinserted into the local run queue and runs
much faster than the ones coming from the global queue.
Till 1.9 the tasks going through the global queue were mostly :
- health checks initialization
- queue management
- listener dequeue/requeue
These ones are moderately sensitive to unfairness so it was not that
big an issue.
Since 2.0-dev2 with the multi-queue accept, tasks are scheduled to
remote threads on most accept() and it becomes fairly visible under
load that the accept slows down, even for the CLI.
This patch remedies this by consulting both the local and the global
run queues in parallel and by always picking the task whose deadline
is the earliest. This guarantees to maintain an excellent fairness
between the two queues and removes the cascade effect experienced
by the global tasks.
Now the CLI always continues to respond quickly even in presence of
expensive tasks running for a long time.
This patch may possibly be backported to 1.9 if some scheduling issues
are reported but at this time it doesn't seem necessary.
This one hasn't been used anymore since the scheduler changes after 1.8
but it kept being exported and maintained up to date while it's always
reset when scanning the trees. Let's stop exporting it and updating it.
When a mux context is released, we must be sure it exists before dereferencing
it. The bug was introduced in the commit 39a96ee16 ("MEDIUM: muxes: Be prepared
to don't own connection during the release").
No need to backport this patch, expect if the commit 39a96ee16 is backported
too.
Because the HTX is now the default mode in HAProxy, it is also enabled by
default in reg-tests. The option '--use-htx' is still available, but deprecated
and have concretly no effect. To run reg-tests with the legacy HTTP mode, you
should use the option '--no-htx'.
The legacy HTTP mode is no more the default one. So now, by default, without any
option in your configuration, all proxies will use the HTX mode. The line
"option http-use-htx" in proxy sections are now useless, except to cancel the
legacy HTTP mode. To fallback on legacy HTTP mode, you should use the line "no
option http-use-htx" explicitly.
Note that the reg-tests still work by default on legacy HTTP mode. The HTX will
be enabled by default in a futur commit.
The upgrade is performed when an H2 preface is detected when the first request
on a connection is parsed. The CS is destroyed by setting EOS flag on it. A
special flag is added on the HTX message to warn the HTX analyzers the stream
will be closed because of an upgrade. This way, no error and no log are
emitted. When the mux h1 is released, we create a mux h2, without any CS and
passing the buffer with the unparsed H2 preface.
It is now possible to upgrade TCP streams to HTX when an HTTP backend is set for
a TCP frontend (both with the HTX enabled). So concretely, in such case, an
upgrade is performed from the mux pt to the mux h1. The current CS and the
channel's buffer are used to initialize the mux h1.
This will be mandatory to allow upgrades from TCP to HTTP in HTX. Of course, raw
buffers will still be used by default on TCP proxies, this option sets or
not. But if you want to handle mux upgrades from a TCP proxy, you must enable
the HTX on it and on all its backends.
There is only a small change in the lua code. Because TCP proxies can be HTX
aware, to exclude TCP services only for HTTP proxies, we must also check the
mode (TCP/HTTP) now.
This function will handle mux upgrades, for frontend connections only. It will
retrieve the best mux in the same way than conn_install_mux_fe except that the
mode and optionnally the proto are forced.
The new multiplexer is initialized using a new context and a specific input
buffer. Then, the old one is destroyed. If an error occurred, everything is
rolled back.
This happens during mux upgrades. In such case, when the destroy() callback is
called, the connection points to a different mux's context than the one passed
to the callback. It means the connection is owned by another mux. The old mux is
then released but the connection is not closed.
It is mandatory to handle mux upgrades, because during a mux upgrade, the
connection will be reassigned to another multiplexer. So when the old one is
destroyed, it does not own the connection anymore. Or in other words, conn->ctx
does not point to the old mux's context when its destroy() callback is
called. So we now rely on the multiplexer context do destroy it instead of the
connection.
In addition, h1_release() and h2_release() have also been updated in the same
way.
The mux's callback init() now take a pointer to a buffer as extra argument. It
must be used by the multiplexer as its input buffer. This buffer is always NULL
when a multiplexer is initialized with a fresh connection. But if a mux upgrade
is performed, it may be filled with existing data. Note that, for now, mux
upgrades are not supported. But this commit is mandatory to do so.
Instead of using the connection context to make the difference between a
frontend connection and a backend connection, we now rely on the function
conn_is_back().
In the function flt_stream_add_filter(), if the HTX is enabled, before attaching
a filter to a stream, we test if the filter can handle it or not. If not, the
filter is ignored. Before the proxy mode was tested. Now we test if the stream
is an HTX stream or not.
In the function smp_prefetch_htx(), we must know if data in the channel's buffer
are structured or not. Before the proxy mode was tested. Now we test if the
stream is an HTX stream or not. If yes, we know the HTX is used to structure
data in the channel's buffer.
The flag SF_HTX has been added to know when a stream uses the HTX or not. It is
set when an HTX stream is created. There are 2 conditions to set it. The first
one is when the HTTP frontend enables the HTX. The second one is when the attached
conn_stream uses an HTX multiplexer.
A multiplexer must now set the flag MX_FL_HTX when it uses the HTX to structured
the data exchanged with channels. the muxes h1 and h2 set this flag. Of course,
for the mux h2, it is set on h2_htx_ops only.
Instead of using the same mux_ops structure for the legacy HTTP mode and the HTX
mode, a dedicated mux_ops is now used for the HTX mode. Same callbacks are used
for both. But the flags may be different depending on the mode used.
This flag is used to explicitly kill the connection when the CS is closed. It
may be set by tcp rules. It must be respect by the mux-h1.
This patch must be backported to 1.9.
An H1 stream is destroyed when the conn_stream is detached or when the H1
connection is destroyed. In the first case, the CS is released by the caller. In
the second one, because the connection is closed, no CS is attached anymore. In
both, there is no reason to release the conn_stream in h1s_destroy().
Connection headers are now sanitized during the parsing and the formatting. This
means "close" and "keep-alive" values are always removed but right flags are
set. This way, client side and server side are independent of each other. On the
input side, after the parsing, neither "close" nor "keep-alive" values
remain. So on the output side, if we found one of these values in a connection
headers, it means it was explicitly added by HAProxy. So it overwrites the other
rules, if applicable. Always sanitizing the output is also a way to simplifiy
conditions to update the connection header. Concretly, only additions of "close"
or "keep-alive" values remain, depending the case.
No need to backport this patch.
The flag H1_MF_CLEAN_CONN_HDR has been added to let the H1 parser sanitize
connection headers. It means it will remove all "close" and "keep-alive" values
during the parsing. One noticeable effect is that connection headers may be
unfolded. In practice, this is not a problem because it is not frequent to have
multiple values for the connection headers.
If this flag is set, during the parsing The function
h1_parse_next_connection_header() is called in a loop instead of
h1_parse_conection_header().
No need to backport this patch
On the client side, as far as possible, we will try to keep connection
alive. So, in most of cases, this header will be removed. So it is better to not
add it at all. If finally the connection must be closed, the header will be
added by the mux h1.
No need to backport this patch.
Because of previous changes on http tunneling, the synchronization of the
transaction can be simplified. Only the check on intermediate messages remains
and it only concerns the response path.
This patch must be backported to 1.9. It is not strictly speaking required but
it will ease futur backports.
In HTX, the HTTP tunneling does not work if h1 and h2 are mixed (an h1 client
sending requests to an h2 server or this opposite) because the h1 multiplexer
always adds an EOM before switching it to tunnel mode. The h2 multiplexer
interprets it as an end of stream, closing the stream as for any other
transaction.
To make it works again, we need to swith to the tunnel mode without emitting any
EOM blocks. Because of that, HTX analyzers have been updated to switch the
transaction to tunnel mode before end of the message (because there is no end of
message...).
To be consistent, the protocol switching is also handled the same way even
though the 101 responses are not supported in h2.
This patch must be backported to 1.9.
Because the option http-tunnel is now ignored in HTX, there is no longer any
need to adjust the transaction mode in HTX analyzers. A channel can still be
switch to the tunnel mode for legitimate cases (HTTP CONNECT or switching
protocols). So the function htx_adjust_conn_mode() is now useless.
This patch must be backported to 1.9. It is not strictly speaking required but
it will ease futur backports.
The option http-tunnel disables any HTTP processing past the first
transaction. In HTX, it works for full h1 transactions. As for the legacy HTTP,
it is a workaround, but it works. But it is impossible to make it works with an
h2 connection. In such case, it has no effect, the stream is closed at the end
of the transaction. So to avoid any inconsistancies between h1 and h2
connections, this option is now always ignored when the HTX is enabled. It is
also a good opportinity to deprecate an old and ugly option. A warning is
emitted during HAProxy startup to encourage users to remove this option.
Note that in legacy HTTP, this option only works with full h1 transactions
too. If an h2 connection is established on a frontend with this option enabled,
it will have no effect at all. But we keep it for the legacy HTTP for
compatibility purpose. It will be removed with the legacy HTTP.
So to be short, if you have to really (REALLY) use it, it will only work for
legacy HTTP frontends with H1 clients.
The documentation has been updated accordingly.
This patch must be backported to 1.9. It is not strictly speaking required but
it will ease futur backports.
If there is a data block when a header block is added in a HTX message, its
payload will be inserted after the data block payload. But its index will be
moved before the EOH block. So at this stage, if a new data block is added, we
will try to append its payload to the last data block (because it is also the
tail). Thus the payload of the further header block will be crushed.
This cannot happens if the payloads wrap thanks to the previous fix. But it
happens when the tail is not the front too. So now, in this case, we add a new
block instead of appending.
This patch must be backported in 1.9.
When a header is added or when a data block is added before another one, the
blocks position may be changed (but not their payloads position). For instance,
when a header is added, we move the block just before the EOH, if any. When the
payloads wraps, it is pretty annoying because we loose the last inserted
block. It is neither the tail nor the head. And it is not the front either.
It is a design problem. Waiting for fixing this problem, we force a
defragmentation in such case. Anyway, it should be pretty rare, so it's not
really critical.
This patch must be backported to 1.9.
When a message or a fragment is encoded, the date the frame processing starts
must be set if it is undefined. The test on tv_request field was wrong.
This patch must be backported to 1.9.
If the maximum frame size is very small with a large message or argument name,
it is possible to be unable to encode anything. In such case, it is important to
stop processing returning an error otherwise we will retry in loop to encode the
message, failing each time because of the too small frame size.
This patch must be backported to 1.9 and 1.8.
If a SPOE applet is already attached to a stream to handle its messages, we must
not queue them. Otherwise it could be handled by another applet leading to
errors. This happens with fragmented messages only. When the first framgnent is
sent, the SPOE applet sending it is attached to the stream. It should be used to
send all other fragments.
This patch must be backported to 1.9 and 1.8.
Seeing the size of each ring helps understand which threads are
overloaded and why some of them are less often elected than others
by the multi-queue load balancer.
The "show activity" command reports the number of incoming connections
dispatched per thread but doesn't report the number of connections
received by each thread. It is important to be able to monitor this
value as it can show that for whatever reason a smaller set of threads
is receiving the connections and dispatching them to all other ones.
It is not acceptable that the accept queues are handled with a normal
priority since they are supposed to quickly dispatch the incoming
traffic, resulting in tasks which will have their respective nice
values and place in the queue. Let's renice the accept ring tasks
to -1024.
No backport is needed, this is strictly 2.0.
The run queue offset computed from the nice value depends on the run
queue size, but for the first task to enter the run queue, this size
is zero and the task gets queued just as if its nice value was zero as
well. This is problematic for example for the CLI socket if another
higher priority task gets queued immediately after as it can steal its
place.
This patch simply adds one to the rq_size value to make sure the nice
is never multiplied by zero. The way the offset is calculated is
questionable anyway these days, since with the newer scheduler it seems
that just using the nice value as an offset should work (possibly damped
by the task's number of calls).
This fix must be backported to 1.9. It may possibly be backported to
older versions if it proves to make the CLI more interactive.
It is possible to hit a fairness issue in the scheduler when a local
task runs for a long time (i.e. process_stream() returns running), and
a global task wants to run on the same thread and remains in the global
queue. What happens in this case is that the condition to extract tasks
from the global queue will rarely be satisfied for very low task counts
since whatever non-null queue size multiplied by a thread count >1 is
always greater than the small remaining number of tasks in the queue.
In theory another thread should pick the task but we do have some mono
threaded tasks in the global queue as well during inter-thread wakeups.
Note that this can only happen with task counts lower than the thread
counts, typically one task in each queue for more than two threads.
This patch works around the problem by allowing a very small unfairness,
making sure that we can always pick at least one task from the global
queue even if there is already one in the local queue.
A better approach will consist in scanning the two trees in parallel
and always pick the best task. This will be more complex and will
constitute a separate patch.
This fix must be backported to 1.9.
If the interface is not in state SI_ST_CON or SI_ST_EST, don't bother
trying to send/recv data, we can't do it anyway, and if we're in SI_ST_TAR,
that may lead to adding the SI_FL_ERR flag back on the stream_interface,
while we don't want it.
This should be backported to 1.9.