Currently we have per-thread arrays of trees and counts, but these
ones unfortunately share cache lines and are accessed very often. This
patch moves the task-specific stuff into a structure taking a multiple
of a cache line, and has one such per thread. Just doing this has
reduced the cache miss ratio from 19.2% to 18.7% and increased the
12-thread test performance by 3%.
It starts to become visible that we really need a process-wide per-thread
storage area that would cover more than just these parts of the tasks.
The code was arranged so that it's easy to move the pieces elsewhere if
needed.
Now we still have a main contention point with the timers in the main
wait queue, but the vast majority of the tasks are pinned to a single
thread. This patch creates a per-thread wait queue and queues a task
to the local wait queue without any locking if the task is bound to a
single thread (the current one) otherwise to the shared queue using
locking. This significantly reduces contention on the wait queue. A
test with 12 threads showed 11 ms spent in the WQ lock compared to
4.7 seconds in the same test without this change. The cache miss ratio
decreased from 19.7% to 19.2% on the 12-thread test, and its performance
increased by 1.5%.
Another indirect benefit is that the average queue size is divided
by the number of threads, which roughly removes log(nbthreads) levels
in the tree and further speeds up lookups.
The vast majority of FDs are only seen by one thread. Currently the lock
on FDs costs a lot because it's touched often, though there should be very
little contention. This patch ensures that the lock is only grabbed if the
FD is shared by more than one thread, since otherwise the situation is safe.
Doing so resulted in a 15% performance boost on a 12-threads test.
peers_init_sync() doesn't check task_new()'s return value and doesn't
return any result to indicate success or failure. Let's make it return
an int and check it from the caller.
This can be backported as far as 1.6.
To ease the refactoring, the function "http_header_add_tail" have been
remove. Now, "http_header_add_tail2" is always used. And the function
"capture_headers" have been renamed into "http_capture_headers". Finally, some
functions have been exported.
HTTP_FLG_* and HTTP_IS_* were moved from "proto/proto_http.h" to "common/http.h"
but the associated comment was forgotten during the move.
This is 1.9-specific and should not be backported.
Make sure we unsubscribe from events before si_release_endpoint destroys
the conn_stream, or it will be never called. To do so, move the call to
unsubscribe to si_release_endpoint() directly.
This is 1.9-specific and shouldn't be backported.
When subscribing, we don't need to provide a list element, only the h2 mux
needs it. So instead, Add a list element to struct h2s, and use it when a
list is needed.
This forces us to use the unsubscribe method, since we can't just unsubscribe
by using LIST_DEL anymore.
This patch is larger than it should be because it includes some renaming.
As we don't know how subscriptions are handled, we can't just assume we can
use LIST_DEL() to unsubscribe, so introduce a new method to mux and connections
to do so.
The prototypes of functions find_hdr_value_end(), extract_cookie_value()
and http_header_match2() were still in proto_http.h while some of them
don't exist anymore and the others were just moved. Let's remove them.
In addition, da.c was updated to use http_extract_cookie_value() which
is the correct one.
These ones are mostly called from cfgparse.c for the parsing and do
not depend on the HTTP representation. The functions's prototypes
were moved to proto/http_rules.h, making this file work exactly like
tcp_rules. Ideally we should stop calling these functions directly
from cfgparse and register keywords, but there are a few cases where
that wouldn't work (stats http-request) so it's probably not worth
trying to go this far.
The current proto_http.c file is huge and contains different processing
domains making it very difficult to work on an alternative representation.
This commit moves some parts to other files :
- ACL registration code => http_acl.c
This code only creates some ACL mappings and doesn't know anything
about HTTP nor about the representation. This code could even have
moved to acl.c but it was not worth polluting it again.
- HTTP sample conversion => http_conv.c
This code doesn't depend on the internal representation but definitely
manipulates some HTTP elements, such as dates. It also has access to
captures.
- HTTP sample fetching => http_fetch.c
This code does depend entirely on the internal representation but is
totally independent on the analysers. Placing it into a different
file will ease the transition to the new representation and the
creation of a wrapper if required. An include file was created due
to CHECK_HTTP_MESSAGE_FIRST() being used at various places.
- HTTP action registration => http_act.c
This code doesn't directly interact with the messages nor the
transaction but it does so via some exported http functions like
http_replace_req_line() or http_set_status() so it will be easier
to change only this after the conversion.
- a few very generic parts were found and moved to http.{c,h} as
relevant.
It is worth noting that the functions moved to these new files are not
referenced anywhere outside of the files and are only called as registered
callbacks, so these files do not even require associated include files.
Instead of using si_cs_io_cb() in process_stream() use si_cs_send/si_cs_recv
instead, as si_cs_io_cb() may lead to process_stream being woken up when it
shouldn't be, and thus timeout would never get triggered.
Callers of si_appctx() always use the result without checking it because
they know by construction that it's valid. This results in unchecked null
pointer warnings at -Wextra, so let's remove this test and make it clear
that it's up to the caller to check validity first.
stktable_data_ptr() currently performs null pointer checks but most
callers don't check the result since they know by construction that
it cannot be null. This causes valid warnings when building with
-Wextra which are worth addressing since it will result in better
code. Let's provide an unguarded version of this function for use
where the check is known to be useless and untested.
Till now it was very difficult for a mux to know what proxy it was
working for. Let's pass the proxy when the mux is instanciated at
init() time. It's not yet used but the H1 mux will definitely need
it, just like the H2 mux when dealing with backend connections.
This state was only a delimiter between headers and body but it now
causes more harm than good because it requires someone to change it.
Since the H1 parser knows if we're in DATA or CHUNK_SIZE, simply let
it set the right next state so that h1m->state constantly matches
what is expected afterwards.
This will allow the parser to fill some extra fields like the method or
status without having to store them permanently in the HTTP message. At
this point however the parser cannot restart from an interrupted read.
This way we maintain the old mechanism stating that -2 means we block
on errors, -1 means we only capture them, and a positive value indicates
the position of the first error.
Currently the only user of struct h1m is the h2 mux when it has to parse
an H1 message coming from the channel. Unfortunately this is not enough
to efficiently parse HTTP/1 messages like those coming from the network
as we don't want to restart from scratch at every byte received.
This patch reintroduces the "next" offset into the H1 message so that any
H1 parser can use it to restart when called with a state that is not the
initial state.
This is the *parsing* state of an HTTP/1 message. Currently the h1_state
is composite as it's made both of parsing and control (100SENT, BODY,
DONE, TUNNEL, ENDING etc). The purpose here is to have a purely H1 state
that can be used by H1 parsers. For now it's equivalent to h1_state.
For struct connection, struct conn_stream, and for the h2 mux, add 2 new
lists, one that handles waiters for recv, and one that handles waiters for
recv and send. That way we can ask to subscribe for either recv or send.
In tasklet_free(), if we're currently in the runnable task list, don't
forget to decrement taks_list_size, or it'll end up being to big, and we may
not process tasks in the global runqueue.
This protocol is based on the uxst one, but it uses socketpair and FD
passing insteads of a connect()/accept().
The "sockpair@" prefix has been implemented for both bind and server
keywords.
When HAProxy wants to connect through a sockpair@, it creates 2 new
sockets using the socketpair() syscall and pass one of the socket
through the FD specified on the server line.
On the bind side, haproxy will receive the FD, and will use it like it
was the FD of an accept() syscall.
This protocol was designed for internal communication within HAProxy
between the master and the workers, but it's possible to use it
externaly with a wrapper and pass the FD through environment variabls.
It's possible to have several protocols per family which is a problem
with the current way the protocols are stored.
This allows to register a new protocol in HAProxy which is not a
protocol in the strict socket definition. It will be used to register a
SOCK_STREAM protocol using socketpair().
These error codes and messages are agnostic to the version, even if
they are represented as HTTP/1.0 messages. Ultimately they will have
to be transformed into internal HTTP messages to be used everywhere.
The HTTP/1.1 100 Continue message was turned to an IST and the local
copy in the Lua code was removed.
This function is purely HTTP once http_txn is put aside. So the original
one was renamed to http_txn_get_path() and it extracts the relevant offsets
from the txn to pass them to http_get_path(). One benefit of the new version
is that it returns the length at the same time so that allowed to slightly
simplify http_get_path_from_string() which had to look up the end pointer
previously and which is not needed anymore.
It's a bit painful to have to deal with HTTP semantics for each protocol
version (H1 and H2), and working on the version-agnostic code further
emphasizes the problem.
This patch creates http.h and http.c which are agnostic to the version
in use, and which borrow a few parts from proto_http and from h1. For
example the once thought h1-specific h1_char_classes array is in fact
dictated by RFC7231 and is used to parse HTTP headers. A few changes
were made to a few files which were including proto_http.h while they
only needed http.h.
Certain string definitions pre-dated the introduction of indirect
strings (ist) so some were used to simplify the definition of the known
HTTP methods. The current lookup code saves 2 kB of a heavily used table
and is faster than the previous table based lookup (typ. 14 ns vs 16
before).
This function now captures an error regardless of its side and protocol.
The caller must pass a number of elements and may pass a protocol-specific
structure and a callback to display it. Later this function may deal with
more advanced allocation techniques to avoid allocating as many buffers
as proxies.
This function returns the proxy associated to a connection. For front
connections it returns the frontend, and for back connections it
returns the backend. This will be used to retrieve some configuration
parameters from within a mux.
Sometimes a connection is prepared before the target is set, sometimes
after. There's no real rule since the few functions involved operate on
different and independent fields. Soon we'll benefit from knowing the
target at the connection layer, in order to figure the associated proxy
and retrieve the various parameters (timeouts etc). This patch slightly
reorders a few calls to conn_prepare() so that we can make sure that the
target is always known to the mux.
The new function sess_log() only needs a session to emit a log. It will
ignore the parts that depend on the stream. It is usable to emit a log
to report early errors in muxes. These ones will typically mention
"<BADREQ>" for the request and 0 for the HTTP status code.
The current build_logline() can only be used with valid streams, which
means it is not suitable for use from muxes. We start by moving it into
another more generic function which takes the session as an argument,
to avoid complexifying all the internal API for jsut a few use cases.
This new function is not supposed to be called directly from outside so
we'll be able to instrument it to support several calling conventions.
For now the behaviour and conditions remain unchanged.
This patch improves the previous fix by implementing the socket draining
code directly in conn_sock_drain() so that it always applies regardless
of the protocol's family. Thus it gets rid of tcp_drain().
Since commit 843b7cb ("MEDIUM: chunks: make the chunk struct's fields
match the buffer struct") a chunk length is unsigned so we can remove
negative size checks.
During a test it happened that a connection was deleted before the
stream it's attached to, resulting in a crash related to the fix
18a85fe ("BUG/MEDIUM: streams: Don't forget to remove the si from
the wait list.") during the LIST_DEL(). Make sure to always delete
the list's head in this case so that other elements can safely
detach later.
This is purely 1.9, no backport is needed.