Commit Graph

13005 Commits

Author SHA1 Message Date
Willy Tarreau
23d3b00bdd MINOR: threads/debug: only report used lock stats
The lock stats are very verbose and more than half of them are used in
a typical test, making it hard to spot the sought values. Let's simply
report "not used" for those which have not been called at all.
2020-10-22 17:32:28 +02:00
Christopher Faulet
9a3d3fcb5d BUG/MAJOR: mux-h2: Don't try to send data if we know it is no longer possible
In h2_send(), if we are in a state where we know it is no longer possible to
send data, we must exit the sending loop to avoid any possiblity to loop
forever. It may happen if the mbuf ring is released while the H2_CF_MUX_MFULL
flag is still set. Here is a possible scenario to trigger the bug :

  1) The mbuf ring is full because we are unable to send data. The
     H2_CF_MUX_MFULL flag is set on the H2 connection.

  2) At this stage, the task timeout expires because the H2 connection is
     blocked. We enter in h2_timeout_task() function. Because the mbuf ring is
     full, we cannot send the GOAWAY frame. Thus the H2_CF_GOAWAY_FAILED flag is
     set. The H2 connection is not released yet because there is still a stream
     attached. Here we leave h2_timeout_task() function.

  3) A bit later, the H2 connection is woken up. If h2_process(), nothing is
     performed by the first attempt to send data, in h2_send(). Then, because
     the H2_CF_GOAWAY_FAILED flag is set, the mbuf ring is released. But the
     H2_CF_MUX_MFULL flag is still there. At this step a second attempt to send
     data is performed.

  4) In h2_send(), we try to send data in a loop. To exist this loop, done
     variable must be set to 1. Because the H2_CF_MUX_MFULL flag is set, we
     don't call h2_process_mux() and done is not updated. Because the mbuf ring
     is now empty, nothing is sent and the H2_CF_MUX_MFULL flag is never
     removed. Now, we loop forever... waiting for the watchdog.

To fix the bug, we now exit the loop if one of these conditions is true :

  - The H2_CF_GOAWAY_FAILED flag is set on the H2 connection
  - The CO_FL_SOCK_WR_SH flag is set on the underlying connection
  - The H2 connection is in the H2_CS_ERROR2 state

This patch should fix the issue #912 and most probably #875. It must be
backported as far as the 1.8.
2020-10-22 17:13:22 +02:00
Christopher Faulet
d6c48366b8 BUG/MINOR: http-ana: Don't send payload for internal responses to HEAD requests
When an internal response is returned to a client, the message payload must be
skipped if it is a reply to a HEAD request. The payload is removed from the HTX
message just before the message forwarding.

This bugs has been around for a long time. It was already there in the pre-HTX
versions. In legacy HTTP mode, internal errors are not parsed. So this bug
cannot be easily fixed. Thus, this patch should only be backported in all HTX
versions, as far as 2.0. However, the code has significantly changed in the
2.2. Thus in the 2.1 and 2.0, the patch must be entirely reworked.
2020-10-22 17:13:22 +02:00
Tim Duesterhus
6414cd1fc0 CLEANUP: compression: Make use of http_get_etag_type()
This commit makes the compressor use http_get_etag_type to validate the
ETag instead of using an ad-hoc condition.
2020-10-22 16:59:36 +02:00
Remi Tricot-Le Breton
5bbdc81cf1 REGTEST: cache: Add if-none-match test case
Test that if-none-match header is properly taken into account and that
when the conditions are fulfilled, a "304 Not Modified" response can be
sent to the client.

Co-authored-by: Tim Duesterhus <tim@bastelstu.be>
2020-10-22 16:10:20 +02:00
Remi Tricot-Le Breton
6cb10384a3 MEDIUM: cache: Add support for 'If-None-Match' request header
Partial support of conditional HTTP requests. This commit adds the
support of the 'If-None-Match' header (see RFC 7232#3.2).
When a client specifies a list of ETags through one or more
'If-None-Match' headers, they are all compared to the one that might have
been stored in the corresponding http cache entry until one of them
matches.
If a match happens, a specific "304 Not Modified" response is
sent instead of the cached data. This response has all the stored
headers but no other data (see RFC 7232#4.1). Otherwise, the whole cached data
is sent.
Although unlikely in a GET/HEAD request, the "If-None-Match: *" syntax is
valid and also receives a "304 Not Modified" response (RFC 7434#4.3.2).

This resolves a part of GitHub issue #821.
2020-10-22 16:10:20 +02:00
Remi Tricot-Le Breton
dbb65b5a7a MEDIUM: cache: Store the ETag information in the cache_entry
When sent by a server for a given resource, the ETag header is
stored in the coresponding cache entry (as any other header). So in
order to perform future ETag comparisons (for subsequent conditional
HTTP requests), we keep the length of the ETag and its offset
relative to the start of the cache_entry.
If no ETag header exists, the length and offset are zero.
2020-10-22 16:10:20 +02:00
Remi Tricot-Le Breton
bcced09b91 MINOR: http: Add etag comparison function
Add a function that compares two etags that might be of different types.
If any of them is weak, the 'W/' prefix is discarded and a strict string
comparison is performed.

Co-authored-by: Tim Duesterhus <tim@bastelstu.be>
2020-10-22 16:06:20 +02:00
Tim Duesterhus
2493ee81d4 MINOR: http: Add enum etag_type http_get_etag_type(const struct ist)
http_get_etag_type returns whether a given `etag` is a strong, weak, or invalid
ETag.
2020-10-22 16:02:29 +02:00
Willy Tarreau
1e690bb6c4 BUG/MEDIUM: server: support changing the slowstart value from state-file
If the slowstart value in a state file implies the latest state change
is within the slowstart period, we end up calling srv_update_status()
to reschedule the server's state change but its task is not yet
allocated and remains null, causing a crash on startup.

Make sure srv_update_status() supports being called with partially
initialized servers which do not yet have a task. If the task has to
be scheduled, it will necessarily happen after initialization since
it will result from a state change.

This should be backported wherever server-state is present.
2020-10-22 12:07:07 +02:00
Willy Tarreau
5c643f37d0 BUILD: makefile: add entries to build common debugging tools
A few tools in contrib/ such as halog, flags, poll and tcploop are
occasionally useful at least to developers, and some of them such as
halog or flags can occasionally break due to some changes in the include
files. As reported in issue #907, their inability to inherit the global
build options also causes some warnings related to some specificities
of the main include files. Let's just add entries in the main makefile
to build them.
2020-10-22 05:17:08 +02:00
Willy Tarreau
9018ca9655 CONTRIB: tcploop: remove unused local variables in tcp_pause()
Building with -Wall shows that "pollfd" and "ret" are not used. Silly
copy-paste...
2020-10-22 05:17:08 +02:00
Willy Tarreau
ef71f0194c BUG/MINOR: queue: properly report redistributed connections
In commit 5cd4bbd7a ("BUG/MAJOR: threads/queue: Fix thread-safety issues
on the queues management") the counter of transferred connections was
accidently lost, so that when a server goes down with connections in its
queue, it will always be reported that 0 connection were transferred.

This should be backported as far as 1.8 since the patch above was
backported there.
2020-10-21 12:04:53 +02:00
William Lallemand
8e8581e242 MINOR: ssl: 'ssl-load-extra-del-ext' removes the certificate extension
In issue #785, users are reporting that it's not convenient to load a
".crt.key" when the configuration contains a ".crt".

This option allows to remove the extension of the certificate before
trying to load any extra SSL file (.key, .ocsp, .sctl, .issuer etc.)

The patch changes a little bit the way ssl_sock_load_files_into_ckch()
looks for the file.
2020-10-20 18:25:46 +02:00
William Dauchy
835712ad90 BUG/MINOR: listener: close before free in listener_accept
safer to close handle before the object is put back in the global pool.

this was introduced by commit 9378bbe0be ("MEDIUM: listener:
use protocol->accept_conn() to accept a connection")

this should fix github issue #902

no backport needed.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2020-10-20 15:40:36 +02:00
Willy Tarreau
f42d794d96 MEDIUM: config: report that "nbproc" is deprecated
As previously discussed, nbproc usage is bad, deprecated, and scheduled
for removal in 2.5.

If "nbproc" is found with more than one process while nbthread is not
set, a warning will be emitted encouraging to remove it or to migrate
to nbthread instead. This makes sure the user has an opportunity to
both see the message and silence it.
2020-10-20 11:54:49 +02:00
Christopher Faulet
96ddc8ab43 BUG/MEDIUM: connection: Never cleanup server lists when freeing private conns
When a connection is released, depending on its state, it may be detached from
the session and it may be removed from the server lists. The first case may
happen for private or unsharable active connections. The second one should only
be performed for idle or available connections. We never try to remove a
connection from the server list if it is attached to a session. But it is also
important to never try to remove a private connecion from the server lists, even
if it is not attached to a session. Otherwise, the curr_used_conn server counter
is decremented once too often.

This bug was introduced by the commit 04a24c5ea ("MINOR: connection: don't check
priv flag on free"). It is related to the issue #881. It only affects the 2.3,
no backport is needed.
2020-10-19 17:19:10 +02:00
Willy Tarreau
69a7b8fc6c CLEANUP: task: remove the unused and mishandled global_rqueue_size
This counter is only updated and never used, and in addition it's done
without any atomicity so it's very unlikely to be correct on multi-CPU
systems! Let's just remove it since it's not used.
2020-10-19 14:08:13 +02:00
Willy Tarreau
e72a3f4489 CLEANUP: tree-wide: reorder a few structures to plug some holes around locks
A few structures were slightly rearranged in order to plug some holes
left around the locks. Sizes ranging from 8 to 32 bytes could be saved
depending on the structures. No performance difference was noticed (none
was expected there), though memory usage might be slightly reduced in
some rare cases.
2020-10-19 14:08:13 +02:00
Willy Tarreau
8f1f177ed0 MINOR: threads: change lock_t to an unsigned int
We don't need to waste the size of a long for the locks: with the plocks,
even an unsigned short would offer enough room for up to 126 threads! Let's
use an unsigned int which will be easier to place in certain structures
and will more conveniently plug some holes, and Atomic ops are at least
as fast on 32-bit as on 64-bit. This will not change anything for 32-bit
platforms.
2020-10-19 14:08:13 +02:00
Willy Tarreau
3d18498645 CLEANUP: threads: don't register an initcall when not debugging
It's a bit overkill to register an initcall to call a function to set
a lock to zero when not debugging, let's just declare the lock as
pre-initialized to zero.
2020-10-19 14:08:13 +02:00
Ilya Shipitsin
fcb69d768b BUILD: ssl: make BoringSSL use its own version numbers
BoringSSL is a fork of OpenSSL 1.1.0, however in
49e9f67d8b7cbeb3953b5548ad1009d15947a523 it has changed version to 1.1.1.

Should fix issue #895.

This must be backported to 2.2, 2.1, 2.0, 1.8
2020-10-19 11:34:37 +02:00
Ilya Shipitsin
b3201a3e07 BUG/MINOR: disable dynamic OCSP load with BoringSSL
it was accidently enabled on BoringSSL while
actually it is not supported

wla: Fix part of the issue mentionned in #895.
It fixes build of boringSSL versions prior to commit
https://boringssl.googlesource.com/boringssl/+/49e9f67d8b7cbeb3953b5548ad1009d15947a523

Must be backported in 2.2.

Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2020-10-19 11:00:51 +02:00
Willy Tarreau
4b6e3c284a MINOR: lb/chash: use a read lock in chash_get_server_hash()
When using a low hash-balance-factor value, it's possible to loop
many times trying to find the best server. Figures in the order of
100-300 times were observed for 1000 servers with a factor of 101
(which seems a bit excessive for such a large farm). Given that
there's nothing in that function that prevents multiple threads
from working in parallel, let's switch to a read lock. Tests on
8 threads show roughly a 2% performance increase with this.
2020-10-17 20:15:49 +02:00
Willy Tarreau
f76a21f78c MINOR: lb/first: use a read lock in fas_get_next_server()
The "first" algorithm creates a lot of contention because all threads
focus on the same server by definition (the first available one). By
turning the exclusive lock to a read lock in fas_get_next_server(),
the request rate increases by 16% for 8 threads when many servers are
getting close to their maxconn.
2020-10-17 19:49:49 +02:00
Willy Tarreau
58bc9c1ced MINOR: lb/leastconn: only take a read lock in fwlc_get_next_server()
This function doesn't change the tree, it only looks for the first
usable server, so let's do that under a read lock to limit the
situations like the ones described in issue #881 where finding a
usable server when dealing with lots of saturated ones can be
expensive. At least threads will now be able to look up in
parallel.

It's interesting to note that s->served is not incremented during the
server choice, nor is the server repositionned. So right now already,
nothing prevents multiple threads from picking the same server. This
will not cause a significant imbalance anyway given that the server
will automatically be repositionned at the right place, but this might
be something to improve in the future if it doesn't come with too high
a cost.

It also looks like the way a server's weight is updated could be
revisited so that the write lock gets tighter at the expense of a
short part of inconsistency between weights and servers still present
in the tree.
2020-10-17 19:37:40 +02:00
Willy Tarreau
ae99aeb135 MINOR: lb/map: use seek lock and read locks where appropriate
- map_get_server_hash() doesn't need a write lock since it only
  reads the array, let's only use a read lock here.

- map_get_server_rr() only needs exclusivity to adjust the rr_idx
  while looking for its entry. Since this one is not used by
  map_get_server_hash(), let's turn this lock to a seek lock that
  doesn't block reads.

With 8 threads, no significant performance difference was noticed
given that lookups are usually instant with this LB algo so the
lock contention is rare.
2020-10-17 19:04:27 +02:00
Willy Tarreau
cd10def825 MINOR: backend: replace the lbprm lock with an rwlock
It was previously a spinlock, and it happens that a number of LB algos
only lock it for lookups, without performing any modification. Let's
first turn it to an rwlock and w-lock it everywhere. This is strictly
identical.

It was carefully checked that every HA_SPIN_LOCK() was turned to
HA_RWLOCK_WRLOCK() and that HA_SPIN_UNLOCK() was turned to
HA_RWLOCK_WRUNLOCK() on this lock. _INIT and _DESTROY were updated too.
2020-10-17 18:51:41 +02:00
Willy Tarreau
9d58c9b251 [RELEASE] Released version 2.3-dev7
Released version 2.3-dev7 with the following main changes :
    - CI: travis-ci: replace not defined SSL_LIB, SSL_INC for BotringSSL builds
    - BUG/MINOR: init: only keep rlim_fd_cur if max is unlimited
    - BUG/MINOR: mux-h2: do not stop outgoing connections on stopping
    - MINOR: fd: report an error message when failing initial allocations
    - MINOR: proto-tcp: make use of connect(AF_UNSPEC) for the pause
    - MINOR: sock: add sock_accept_conn() to test a listening socket
    - MINOR: protocol: make proto_tcp & proto_uxst report listening sockets
    - MINOR: sockpair: implement the .rx_listening function
    - CLEANUP: tcp: make use of sock_accept_conn() where relevant
    - CLEANUP: unix: make use of sock_accept_conn() where relevant
    - BUG/MINOR: listener: detect and handle shared sockets stopped in other processes
    - CONTRIB: tcploop: implement a disconnect operation 'D'
    - CLEANUP: protocol: intitialize all of the sockaddr when disconnecting
    - BUG/MEDIUM: deinit: check fdtab before fdtab[fd].owner
    - BUG/MINOR: connection: fix loop iter on connection takeover
    - BUG/MEDIUM: connection: fix srv idle count on conn takeover
    - MINOR: connection: improve list api usage
    - MINOR: mux/connection: add a new mux flag for HOL risk
    - MINOR: connection: don't check priv flag on free
    - MEDIUM: backend: add new conn to session if mux marked as HOL blocking
    - MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking
    - MEDIUM: h2: remove conn from session on detach
    - MEDIUM: fcgi: remove conn from session on detach
    - DOC: Describe reuse safe for HOL handling
    - MEDIUM: proxy: remove obsolete "mode health"
    - MEDIUM: proxy: remove obsolete "monitor-net"
    - CLEANUP: protocol: remove the ->drain() function
    - CLEANUP: fd: finally get rid of fd_done_recv()
    - MINOR: connection: make sockaddr_alloc() take the address to be copied
    - MEDIUM: listener: allocate the connection before queuing a new connection
    - MINOR: session: simplify error path in session_accept_fd()
    - MINOR: connection: add new error codes for accept_conn()
    - MINOR: sock: rename sock_accept_conn() to sock_accepting_conn()
    - MINOR: protocol: add a new function accept_conn()
    - MINOR: sock: implement sock_accept_conn() to accept a connection
    - MINOR: sockpair: implement sockpair_accept_conn() to accept a connection
    - MEDIUM: listener: use protocol->accept_conn() to accept a connection
    - MEDIUM: listener: remove the second pass of fd manipulation at the end
    - MINOR: protocol: add a default I/O callback and put it into the receiver
    - MINOR: log: set the UDP receiver's I/O handler in the receiver
    - MINOR: protocol: register the receiver's I/O handler and not the protocol's
    - CLEANUP: protocol: remove the now unused <handler> field of proto_fam->bind()
    - DOC: improve the documentation for "option nolinger"
    - BUG/MEDIUM: proxy: properly stop backends
    - BUG/MEDIUM: task: bound the number of tasks picked from the wait queue at once
    - MINOR: threads: augment rwlock debugging stats to report seek lock stats
    - MINOR: threads: add the transitions to/from the seek state
    - MEDIUM: task: use an upgradable seek lock when scanning the wait queue
    - BUILD: listener: avoir a build warning when threads are disabled
    - BUG/MINOR: peers: Possible unexpected peer seesion reset after collisions.
    - MINOR: ssl: add volatile flags to ssl samples
    - MEDIUM: backend: reuse connection if using a static sni
    - BUG/MEDIUM: spoe: Unset variable instead of set it if no data provided
    - BUG/MEDIUM: mux-h1: Get the session from the H1S when capturing bad messages
    - BUG/MEDIUM: lb: Always lock the server when calling server_{take,drop}_conn
    - DOC: fix typo in MAX_SESS_STKCTR
2020-10-17 10:31:50 +02:00
Matteo Contrini
1857b8cf4d DOC: fix typo in MAX_SESS_STKCTR
MAX_SESS_STKCTR is spelled wrongly a couple of times
in the configuration docs (K and C are swapped). This patch
fixes the typos.
2020-10-17 09:37:25 +02:00
Christopher Faulet
26a52af642 BUG/MEDIUM: lb: Always lock the server when calling server_{take,drop}_conn
The server lock must be held when server_take_conn() and server_drop_conn()
lbprm callback functions are called. It is a documented prerequisite but it is
not always performed. It only affects leastconn and fas lb algorithm. Others
don't use these callback functions.

A race condition on the next pending effecive weight (next_eweight) may be
encountered with the leastconn lb algorithm. An agent check may set it to 0
while fwlc_srv_reposition() is called. The server is locked during the
next_eweight update. But because the server lock is not acquired when
fwlc_srv_reposition() is called, we may use it to recompute the server key,
leading to a division by 0.

This patch must be backported as far as 1.8.
2020-10-17 09:29:43 +02:00
Christopher Faulet
db2c17da60 BUG/MEDIUM: mux-h1: Get the session from the H1S when capturing bad messages
It is not guaranteed that the backend connection has an owner. It is set when
the connection is created. But when the connection is moved in a server idle
list, the connection owner is set to NULL and may never be set again. On the
other hand, when a mux is created or when a CS is attached, the session is
always defined. The H1 stream always keep a reference on it when it is
created. Thus, when a bad message is captured we should not rely on the
connection owner to retrieve the session. Instead we should get it from the H1
stream.
2020-10-16 19:53:17 +02:00
Christopher Faulet
2469eba20f BUG/MEDIUM: spoe: Unset variable instead of set it if no data provided
If an agent try to set a variable with the NULL data type, an unset is perform
instead to avoid undefined behaviors. Once decoded, such data are translated to
a sample with the type SMP_T_ANY. It is unexpected in HAProxy. When a variable
is set with such sample, no data are attached to the variable. Thus, when the
variable is retrieved later in the transaction, the sample data are
uninitialized, leading to undefined behaviors depending on how it is used. For
instance, it leads to a crash if the debug converter is used on such variable.

This patch should fix the issue #855. It must be backported as far as 1.8.
2020-10-16 19:53:17 +02:00
Amaury Denoyelle
7239c24986 MEDIUM: backend: reuse connection if using a static sni
Detect if the sni used a constant value and if so, allow to reuse this
connection for later sessions. Use a combination of SMP_USE_INTRN +
!SMP_F_VOLATILE to consider a sample as a constant value.

This features has been requested on github issue #371.
2020-10-16 17:48:01 +02:00
Amaury Denoyelle
2f0a797631 MINOR: ssl: add volatile flags to ssl samples
The ssl samples are not constant over time and change according to the
session. Add the flag SMP_F_VOL_SESS to indicate this.
2020-10-16 17:47:29 +02:00
Frédéric Lécaille
baeb919177 BUG/MINOR: peers: Possible unexpected peer seesion reset after collisions.
During a peers session collision (two peer sessions opened on both side) we must
mark the peer the session of which will be shutdown as alive, if not ->reconnect
timer will be set with a wrong value if the synchro task expires after the peer
has been reconnected. This possibly leads to unexpected deconnections during handshakes.
Furthermore, this patch cancels any heartbeat tranmimission when a reconnection
is prepared.
2020-10-16 17:45:58 +02:00
Willy Tarreau
0aa5a5b175 BUILD: listener: avoir a build warning when threads are disabled
It's just a __decl_thread() that appeared before the last variable.
2020-10-16 17:43:04 +02:00
Willy Tarreau
d48ed6643b MEDIUM: task: use an upgradable seek lock when scanning the wait queue
Right now when running a configuration with many global timers (e.g. many
health checks), there is a lot of contention on the global wait queue
lock because all threads queue up in front of it to scan it.

With 2000 servers checked every 10 milliseconds (200k checks per second),
after 23 seconds running on 8 threads, the lock stats were this high:

  Stats about Lock TASK_WQ:
      write lock  : 9872564
      write unlock: 9872564 (0)
      wait time for write     : 9208.409 msec
      wait time for write/lock: 932.727 nsec
      read lock   : 240367
      read unlock : 240367 (0)
      wait time for read      : 149.025 msec
      wait time for read/lock : 619.991 nsec

i.e. ~5% of the total runtime spent waiting on this specific lock.

With upgradable locks we don't need to work like this anymore. We
can just try to upgade the read lock to a seek lock before scanning
the queue, then upgrade the seek lock to a write lock for each element
we want to delete there and immediately downgrade it to a seek lock.

The benefit is double:
  - all other threads which need to call next_expired_task() before
    polling won't wait anymore since the seek lock is compatible with
    the read lock ;

  - all other threads competing on trying to grab this lock will fail
    on the upgrade attempt from read to seek, and will let the current
    lock owner finish collecting expired entries.

Doing only this has reduced the wake_expired_tasks() CPU usage in a
very large servers test from 2.15% to 1.04% as reported by perf top,
and increased by 3% the health check rate (all threads being saturated).

This is expected to help against (and possibly solve) the problem
described in issue #875.
2020-10-16 17:15:54 +02:00
Willy Tarreau
61f799b8da MINOR: threads: add the transitions to/from the seek state
Since our locks are based on progressive locks, we support the upgradable
seek lock that is compatible with readers and upgradable to a write lock.
The main purpose is to take it while seeking down a tree for modification
while other threads may seek the same tree for an input (e.g. compute the
next event date).

The newly supported operations are:

  HA_RWLOCK_SKLOCK(lbl,l)         pl_take_s(l)      /* N --> S */
  HA_RWLOCK_SKTOWR(lbl,l)         pl_stow(l)        /* S --> W */
  HA_RWLOCK_WRTOSK(lbl,l)         pl_wtos(l)        /* W --> S */
  HA_RWLOCK_SKTORD(lbl,l)         pl_stor(l)        /* S --> R */
  HA_RWLOCK_WRTORD(lbl,l)         pl_wtor(l)        /* W --> R */
  HA_RWLOCK_SKUNLOCK(lbl,l)       pl_drop_s(l)      /* S --> N */
  HA_RWLOCK_TRYSKLOCK(lbl,l)      (!pl_try_s(l))    /* N -?> S */
  HA_RWLOCK_TRYRDTOSK(lbl,l)      (!pl_try_rtos(l)) /* R -?> S */

Existing code paths are left unaffected so this patch doesn't affect
any running code.
2020-10-16 16:53:46 +02:00
Willy Tarreau
8d5360ca7f MINOR: threads: augment rwlock debugging stats to report seek lock stats
We currently use only read and write lock operations with rwlocks, but
ours also support upgradable seek locks for which we do not report any
stats. Let's add them now when DEBUG_THREAD is enabled.
2020-10-16 16:51:49 +02:00
Willy Tarreau
3cfaa8d1e0 BUG/MEDIUM: task: bound the number of tasks picked from the wait queue at once
There is a theorical problem in the wait queue, which is that with many
threads, one could spend a lot of time looping on the newly expired tasks,
causing a lot of contention on the global wq_lock and on the global
rq_lock. This initially sounds bening, but if another thread does just
a task_schedule() or task_queue(), it might end up waiting for a long
time on this lock, and this wait time will count on its execution budget,
degrading the end user's experience and possibly risking to trigger the
watchdog if that lasts too long.

The simplest (and backportable) solution here consists in bounding the
number of expired tasks that may be picked from the global wait queue at
once by a thread, given that all other ones will do it as well anyway.

We don't need to pick more than global.tune.runqueue_depth tasks at once
as we won't process more, so this counter is updated for both the local
and the global queues: threads with more local expired tasks will pick
less global tasks and conversely, keeping the load balanced between all
threads. This will guarantee a much lower latency if/when wakeup storms
happen (e.g. hundreds of thousands of synchronized health checks).

Note that some crashes have been witnessed with 1/4 of the threads in
wake_expired_tasks() and, while the issue might or might not be related,
not having reasonable bounds here definitely justifies why we can spend
so much time there.

This patch should be backported, probably as far as 2.0 (maybe with
some adaptations).
2020-10-16 15:18:48 +02:00
Willy Tarreau
ba29687bc1 BUG/MEDIUM: proxy: properly stop backends
The proxy stopping mechanism was changed with commit 322b9b94e ("MEDIUM:
proxy: make stop_proxy() now use stop_listener()") so that it's now
entirely driven by the listeners. One thing was forgotten though, which
is that pure backends will not stop anymore since they don't have any
listener, and that it's necessary to stop them in order to stop the
health checks.

No backport is needed.
2020-10-16 15:16:17 +02:00
Willy Tarreau
4a32103a48 DOC: improve the documentation for "option nolinger"
This reminds the different issues caused by option nolinger, as discussed
in issue #896.
2020-10-16 04:55:19 +02:00
Willy Tarreau
233ad288cd CLEANUP: protocol: remove the now unused <handler> field of proto_fam->bind()
We don't need to specify the handler anymore since it's set in the
receiver. Let's remove this argument from the function and clean up
the remains of code that were still setting it.
2020-10-15 21:47:56 +02:00
Willy Tarreau
a74cb38e7c MINOR: protocol: register the receiver's I/O handler and not the protocol's
Now we define a new sock_accept_iocb() for socket-based stream protocols
and use it as a wrapper for listener_accept() which now takes a listener
and not an FD anymore. This will allow the receiver's I/O cb to be
redefined during registration, and more specifically to get rid of the
hard-coded hacks in protocol_bind_all() made for syslog.

The previous ->accept() callback in the protocol was removed since it
doesn't have anything to do with accept() anymore but is more generic.
A few places where listener_accept() was compared against the FD's IO
callback for debugging purposes on the CLI were updated.
2020-10-15 21:47:56 +02:00
Willy Tarreau
e140a6921f MINOR: log: set the UDP receiver's I/O handler in the receiver
The I/O handler is syslog_fd_handler(), let's set it when creating
the receivers.
2020-10-15 21:47:56 +02:00
Willy Tarreau
d2fb99f9d5 MINOR: protocol: add a default I/O callback and put it into the receiver
For now we're still using the protocol's default accept() function as
the I/O callback registered by the receiver into the poller. While
this is usable for most TCP connections where a listener is needed,
this is not suitable for UDP where a different handler is needed.

Let's make this configurable in the receiver just like the upper layer
is configurable for listeners. In order to ease stream protocols
handling, the protocols will now provide a default I/O callback
which will be preset into the receivers upon allocation so that
almost none of them has to deal with it.
2020-10-15 21:47:56 +02:00
Willy Tarreau
caa91de718 MEDIUM: listener: remove the second pass of fd manipulation at the end
The receiver FDs must not be manipulated by the listener_accept()
function anymore, it must exclusively rely on the job performed by
its listeners, as it is also the only way to keep the receivers
working for established connections regardless of the listener's
state (typically for multiplexed protocols like QUIC). This used
to be necessary when the FDs were adjusted at once only but now
that fd_done() is gone and the need for polling enabled by the
accept_conn() function which detects the EAGAIN, we have nothing
to do there to fixup any possible previous bad decision anymore.

Interestingly, as a side effect of making the code not depend on
the FD anymore, it also removes the need for a second lock, which
increase the accept rate by about 1% on 8 threads.
2020-10-15 21:47:56 +02:00
Willy Tarreau
9378bbe0be MEDIUM: listener: use protocol->accept_conn() to accept a connection
Now listener_accept() doesn't have to deal with the incoming FD anymore
(except for a little bit of side band stuff). It directly retrieves a
valid connection from the protocol layer, or receives a well-defined
error code that helps it decide how to proceed. This removes a lot of
hardly maintainable low-level code and opens the function to receive
new protocol stacks.
2020-10-15 21:47:56 +02:00
Willy Tarreau
344b8fcf87 MINOR: sockpair: implement sockpair_accept_conn() to accept a connection
This is the same as previous commit, but this time for the sockpair-
specific stuff, relying on recv_fd_uxst() instead of accept(), so the
code is simpler. The various errno cases are handled like for regular
sockets, though some of them will probably never happen, but this does
not hurt.
2020-10-15 21:47:56 +02:00