In order to support all shutdown modes on the CS, we introduce the
following flags :
CS_FL_SHRD : shut read, drain extra data
CS_FL_SHRR : shut read, reset extra data
CS_FL_SHWN : shut write, normal notification
CS_FL_SHWS : shut write, silent mode (no notification)
And the following modes for shutr/shutw :
CS_SHR_DRAIN, CS_SHR_RESET, CS_SHW_NORMAL, CS_SHW_SILENT.
Note: it's possible that we won't need to distinguish the two shutw
above as they're only an action.
For now they are not used.
All the references to connections in the data path from streams and
stream_interfaces were changed to use conn_streams. Most functions named
"something_conn" were renamed to "something_cs" for this. Sometimes the
connection still is what matters (eg during a connection establishment)
and were not always renamed. The change is significant and minimal at the
same time, and was quite thoroughly tested now. As of this patch, all
accesses to the connection from upper layers go through the pass-through
mux.
Most of the functions dealing with conn_streams are here. They act at
the data layer and interact with the mux. For now they are not used yet
but everything builds.
This patch introduces a new struct conn_stream. It's the stream-side of
a multiplexed connection. A pool is created and destroyed on exit. For
now the conn_streams are not used at all.
When an incoming connection is made on an HTTP mode frontend, the
session now looks up the mux to use based on the ALPN token and the
proxy mode. This will allow easier mux registration, and we don't
need to hard-code the mux_pt_ops anymore.
Selecting a mux based on ALPN and the proxy mode will quickly become a
pain. This commit provides new functions to register/lookup a mux based
on the ALPN string and the proxy mode to make this easier. Given that
we're not supposed to support a wide range of muxes, the lookup should
not have any measurable performance impact.
For HTTP/2 and QUIC, we'll need to deal with multiplexed streams inside
a connection. After quite a long brainstorming, it appears that the
connection interface to the existing streams is appropriate just like
the connection interface to the lower layers. In fact we need to have
the mux layer in the middle of the connection, between the transport
and the data layer.
A mux can exist on two directions/sides. On the inbound direction, it
instanciates new streams from incoming connections, while on the outbound
direction it muxes streams into outgoing connections. The difference is
visible on the mux->init() call : in one case, an upper context is already
known (outgoing connection), and in the other case, the upper context is
not yet known (incoming connection) and will have to be allocated by the
mux. The session doesn't have to create the new streams anymore, as this
is performed by the mux itself.
This patch introduces this and creates a pass-through mux called
"mux_pt" which is used for all new connections and which only
calls the data layer's recv,send,wake() calls. One incoming stream
is immediately created when init() is called on the inbound direction.
There should not be any visible impact.
Note that the connection's mux is purposely not set until the session
is completed so that we don't accidently run with the wrong mux. This
must not cause any issue as the xprt_done_cb function is always called
prior to using mux's recv/send functions.
This is needed in the H2->H1 gateway so that we know how long the trailers
block is in chunked encoding. It returns the number of bytes, or 0 if some
are missing, or -1 in case of parse error.
It was a leftover from the last cleaning session; this mask applies
to threads and calling it process_mask is a bit confusing. It's the
same in fd, task and applets.
srv_set_fqdn() may be called with the DNS lock already held, but tries to
lock it anyway. So, add a new parameter to let it know if it was already
locked or not;
Commit 819fc6f ("MEDIUM: threads/stick-tables: handle multithreads on
stick tables") introduced a valid warning about an uninitialized return
value in stksess_kill_if_expired(). It just happens that this result is
never used, so let's turn the function back to void as previously.
The wrong bit was set to keep the lock on freq counter update. And the read
functions were re-worked to use volatile.
Moreover, when a freq counter is updated, it is now rotated only if the current
counter is in the past (now.tv_sec > ctr->curr_sec). It is important with
threads because the current time (now) is thread-local. So, rounded to the
second, the time may vary by more or less 1 second. So a freq counter rotated by
one thread may be see 1 second in the future. In this case, it is updated but
not rotated.
There was a flaw in the way the threads was created. the main one was just used
to create all the others and just wait to exit. Now, it is used to run a poll
loop. So we only create nbthread-1 threads.
This also fixes a bug about the compression filter when there is only 1 thread
(nbthread == 1 or no threads support). The bug was in the way thread-local
resources was initialized. per-thread init/deinit callbacks were never called
for the main process. So, with nthread set to 1, some buffers remained
uninitialized.
By default, no affinity is set for threads. To bind threads on CPU, you must
define a "thread-map" in the global section. The format is the same than the
"cpu-map" parameter, with a small difference. The process number must be
defined, with the same format than cpu-map ("all", "even", "odd" or a number
between 1 and 31/63).
A thread will be bound on the intersection of its mapping and the one of the
process on which it is attached. If the intersection is null, no specific bind
will be set for the thread.
Because there is not migration mechanism yet, all runtime information about an
SPOE agent are thread-local and async exchanges with agents are disabled when we
have serveral threads. Howerver, pipelining is still available. So for now, the
thread part of the SPOE is pretty simple.
We have two y for nsuring that the data is not concurently manipulated:
- locks
- running task on the same thread.
locks are expensives, it is better to avoid it.
This patch cecks that the Lua task run on the same thread that
the stream associated to the coprocess.
TODO: in a next version, the error should be replaced by a yield
and thread migration request.
Note that the Lua processing is not really thread safe. It provides
heavy system which consists to add our own lock function in the Lua
code and recompile the library. This system will probably not accepted
by maintainers of various distribs.
Our main excution point of the Lua is the function lua_resume(). A
quick looking on the Lua sources displays a lua_lock() a the start
of function and a lua_unlock() at the end of the function. So I
conclude that the Lua thread safe mode just perform a mutex around
all execution. So I prefer to do this in the HAProxy code, it will be
easier for distro maintainers.
Note that the HAProxy lua functions rounded by the macro SET_SAFE_LJMP
and RESET_SAFE_LJMP manipulates the Lua stack, so it will be careful
to set mutex around these functions.
Now, it is possible to define init_per_thread and deinit_per_thread callbacks to
deal with ressources allocation for each thread.
This is the filter responsibility to deal with concurrency. This is also the
filter responsibility to know if HAProxy is started with some threads. A good
way to do so is to check "global.nbthread" value. If it is greater than 1, then
_per_thread callbacks will be called.
A RW lock has been added to the vars structure to protect each list of
variables. And a global RW lock is used to protect registered names.
When a varibable is fetched, we duplicate sample data because the variable could
be modified by another thread.
When a frequency counter must be updated, we use the curr_sec/curr_tick fields
as a lock, by setting the MSB to 1 in a compare-and-swap to lock and by reseting
it to unlock. And when we need to read it, we loop until the counter is
unlocked. This way, the frequency counters are thread-safe without any external
lock. It is important to avoid increasing the size of many structures (global,
proxy, server, stick_table).
locks have been added in pat_ref and pattern_expr structures to protect all
accesses to an instance of on of them. Moreover, a global lock has been added to
protect the LRU cache used for pattern matching.
Patterns are now duplicated after a successfull matching, to avoid modification
by other threads when the result is used.
Finally, the function reloading a pattern list has been modified to be
thread-safe.
First, OpenSSL is now initialized to be thread-safe. This is done by setting 2
callbacks. The first one is ssl_locking_function. It handles the locks and
unlocks. The second one is ssl_id_function. It returns the current thread
id. During the init step, we create as much as R/W locks as needed, ie the
number returned by CRYPTO_num_locks function.
Next, The reusable SSL session in the server context is now thread-local.
Shctx is now also initialized if HAProxy is started with several threads.
And finally, a global lock has been added to protect the LRU cache used to store
generated certificates. The function ssl_sock_get_generated_cert is now
deprecated because the retrieved certificate can be removed by another threads
in same time. Instead, a new function has been added,
ssl_sock_assign_generated_cert. It must be used to search a certificate in the
cache and set it immediatly if found.
A lock is used to protect accesses to a peer structure.
A the lock is taken in the applet handler when the peer is identified
and released living the applet handler.
In the scheduling task for peers section, the lock is taken for every
listed peer and released at the end of the process task function.
The peer 'force shutdown' function was also re-worked.
A global lock has been added to protect accesses to the list of active
applets. A process mask has also been added on each applet. Like for FDs and
tasks, it is used to know which threads are allowed to process an
applet. Because applets are, most of time, linked to a session, it should be
sticky on the same thread. But in all cases, it is the responsibility of the
applet handler to lock what have to be protected in the applet context.
This is done by passing the right stream's proxy (the frontend or the backend,
depending on the context) to lock the error snapshot used to store the error
info.
The stick table API was slightly reworked:
A global spin lock on stick table was added to perform lookup and
insert in a thread safe way. The handling of refcount on entries
is now handled directly by stick tables functions under protection
of this lock and was removed from the code of callers.
The "stktable_store" function is no more externalized and users should
now use "stktable_set_entry" in any case of insertion. This last one performs
a lookup followed by a store if not found. So the code using "stktable_store"
was re-worked.
Lookup, and set_entry functions automatically increase the refcount
of the returned/stored entry.
The function "sticktable_touch" was renamed "sticktable_touch_local"
and is now able to decrease the refcount if last arg is set to true. It
is allowing to release the entry without taking the lock twice.
A new function "sticktable_touch_remote" is now used to insert
entries coming from remote peers at the right place in the update tree.
The code of peer update was re-worked to use this new function.
This function is also able to decrease the refcount if wanted.
The function "stksess_kill" also handle a parameter to decrease
the refcount on the entry.
A read/write lock is added on each entry to protect the data content
updates of the entry.
A lock for LB parameters has been added inside the proxy structure and atomic
operations have been used to update server variables releated to lb.
The only significant change is about lb_map. Because the servers status are
updated in the sync-point, we can call recalc_server_map function synchronously
in map_set_server_status_up/down function.
This list is used to save changes on the servers state. So when serveral threads
are used, it must be locked. The changes are then applied in the sync-point. To
do so, servers_update_status has be moved in the sync-point. So this is useless
to lock it at this step because the sync-point is a protected area by iteself.
For now, we have a list of each type per thread. So there is no need to lock
them. This is the easiest solution for now, but not the best one because there
is no sharing between threads. An idle connection on a thread will not be able
be used by a stream on another thread. So it could be a good idea to rework this
patch later.
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
First, we use atomic operations to update jobs/totalconn/actconn variables,
listener's nbconn variable and listener's counters. Then we add a lock on
listeners to protect access to their information. And finally, listener queues
(global and per proxy) are also protected by a lock. Here, because access to
these queues are unusal, we use the same lock for all queues instead of a global
one for the global queue and a lock per proxy for others.
2 global locks have been added to protect, respectively, the run queue and the
wait queue. And a process mask has been added on each task. Like for FDs, this
mask is used to know which threads are allowed to process a task.
For many tasks, all threads are granted. And this must be your first intension
when you create a new task, else you have a good reason to make a task sticky on
some threads. This is then the responsibility to the process callback to lock
what have to be locked in the task context.
Nevertheless, all tasks linked to a session must be sticky on the thread
creating the session. It is important that I/O handlers processing session FDs
and these tasks run on the same thread to avoid conflicts.
Many changes have been made to do so. First, the fd_updt array, where all
pending FDs for polling are stored, is now a thread-local array. Then 3 locks
have been added to protect, respectively, the fdtab array, the fd_cache array
and poll information. In addition, a lock for each entry in the fdtab array has
been added to protect all accesses to a specific FD or its information.
For pollers, according to the poller, the way to manage the concurrency is
different. There is a poller loop on each thread. So the set of monitored FDs
may need to be protected. epoll and kqueue are thread-safe per-se, so there few
things to do to protect these pollers. This is not possible with select and
poll, so there is no sharing between the threads. The poller on each thread is
independant from others.
Finally, per-thread init/deinit functions are used for each pollers and for FD
part for manage thread-local ressources.
Now, you must be carefull when a FD is created during the HAProxy startup. All
update on the FD state must be made in the threads context and never before
their creation. This is mandatory because fd_updt array is thread-local and
initialized only for threads. Because there is no pollers for the main one, this
array remains uninitialized in this context. For this reason, listeners are now
enabled in run_thread_poll_loop function, just like the worker pipe.