Commit Graph

51 Commits

Author SHA1 Message Date
Willy Tarreau
b073573c10 MINOR: sock: add a function to check for SO_REUSEPORT support at runtime
The new function _sock_supports_reuseport() will be used to check if a
protocol type supports SO_REUSEPORT or not. This will be useful to verify
that shards can really work.
2023-04-23 09:46:15 +02:00
Willy Tarreau
9b773ec118 CLEANUP: sock: always perform last connection updates before wakeup
Normally the task_wakeup() in sock_conn_io_cb() is expected to
happen on the same thread the FD is attached to. But due to the
way the code was arranged in the past (with synchronous callbacks)
we continue to update connections after the wakeup, which always
makes the reader have to think deeply whether it's possible or not
to call another thread there. Let's just move the tasklet_wakeup()
at the end to make sure there's no problem with that.
2023-03-08 16:07:32 +01:00
William Lallemand
708949da49 MINOR: sockpair: move send_fd_uxst() error message in caller
Move the ha_alert() in send_fd_uxst() in the callers and add the FD
numbers in the message.
2022-07-25 16:11:11 +02:00
Willy Tarreau
27a3245599 MEDIUM: fd: make fd_insert() take local thread masks
fd_insert() was already given a thread group ID and a global thread mask.
Now we're changing the few callers to take the group-local thread mask
instead. It's passed directly into the FD's thread mask. Just like for
previous commit, it must not change anything when a single group is
configured.
2022-07-15 20:16:30 +02:00
Willy Tarreau
9464bb1f05 MEDIUM: fd: add the tgid to the fd and pass it to fd_insert()
The file descriptors will need to know the thread group ID in addition
to the mask. This extends fd_insert() to take the tgid, and will store
it into the FD.

In the FD, the tgid is stored as a combination of tgid on the lower 16
bits and a refcount on the higher 16 bits. This allows to know when it's
really possible to trust the tgid and the running mask. If a refcount is
higher than 1 it indeed indicates another thread else might be in the
process of updating these values.

Since a closed FD must necessarily have a zero refcount, a test was
added to fd_insert() to make sure that it is the case.
2022-07-15 19:58:06 +02:00
Willy Tarreau
030b3e6bcc MINOR: connection: get rid of the CO_FL_ADDR_*_SET flags
Just like for the conn_stream, now that these addresses are dynamically
allocated, there is no single case where the pointer is set without the
corresponding flag, and the flag is used as a permission to dereference
the pointer. Let's just replace the test of the flag with a test of the
pointer and remove all flag assignment. This makes the code clearer
(especially in "if" conditions) and saves the need for future code to
think about properly setting the flag after setting the pointer.
2022-05-02 17:47:46 +02:00
Willy Tarreau
382474348c CLEANUP: tree-wide: use fd_set_nonblock() and fd_set_cloexec()
This gets rid of most open-coded fcntl() calls, some of which were passed
through DISGUISE() to avoid a useless test. The FD_CLOEXEC was most often
set without preserving previous flags, which could become a problem once
new flags are created. Now this will not happen anymore.
2022-04-26 10:59:48 +02:00
Willy Tarreau
acef5e27b0 MINOR: tree-wide: always consider EWOULDBLOCK in addition to EAGAIN
Some older systems may routinely return EWOULDBLOCK for some syscalls
while we tend to check only for EAGAIN nowadays. Modern systems define
EWOULDBLOCK as EAGAIN so that solves it, but on a few older ones (AIX,
VMS etc) both are different, and for portability we'd need to test for
both or we never know if we risk to confuse some status codes with
plain errors.

There were few entries, the most annoying ones are the switch/case
because they require to only add the entry when it differs, but the
other ones are really trivial.
2022-04-25 20:32:15 +02:00
Willy Tarreau
8f6c0c32b8 BUG/MINOR: sock: do not double-close the accepted socket on the error path
Coverity found in issue #1646 that I added a double-close bug in last
commit e4d09cedb ("MINOR: sock: check configured limits at the sock layer,
not the listener's") because the error path already closes the FD. No
backport needed.
2022-04-12 07:49:11 +02:00
Willy Tarreau
07ecfc5e88 MEDIUM: connection: panic when calling FD-specific functions on FD-less conns
Certain functions cannot be called on an FD-less conn because they are
normally called as part of the protocol-specific setup/teardown sequence.
Better place a few BUG_ON() to make sure none of them is called in other
situations. If any of them would trigger in ambiguous conditions, it would
always be possible to replace it with an error.
2022-04-11 19:31:47 +02:00
Willy Tarreau
e4d09cedb6 MINOR: sock: check configured limits at the sock layer, not the listener's
listener_accept() used to continue to enforce the FD limits relative to
global.maxsock by itself while it's the last FD-specific test in the
whole file. This test has nothing to do there, it ought to be placed in
sock_accept_conn() which is the one in charge of FD allocation and tests.
Similar tests are already located there by the way. The only tiny
difference is that listener_accept() used to pause for one second when
this limit was reached, while other similar conditions were pausing only
100ms, so now the same 100ms will apply. But that's not important and
could even be considered as an improvement.
2022-04-11 19:31:47 +02:00
Willy Tarreau
b510116fd2 MINOR: sock: move the unused socket cleaning code into its own function
The startup code used to scan the list of unused sockets retrieved from
an older process, and to close them one by one. This also required that
the knowledge of the internal storage of these temporary sockets was
known from outside sock.c and that the code was copy-pasted at every
call place.

This patch moves this into sock.c under the name
sock_drop_unused_old_sockets(), and removes the xfer_sock_list
definition from sock.h since the rest of the code doesn't need to know
this.

This cleanup is minimal and preliminary to a future fix that will need
to be backported to all versions featuring FD transfers over the CLI.
2022-01-28 19:04:02 +01:00
William Lallemand
2be557f7cb MEDIUM: mworker: seamless reload use the internal sockpairs
With the master worker, the seamless reload was still requiring an
external stats socket to the previous process, which is a pain to
configure.

This patch implements a way to use the internal socketpair between the
master and the workers to transfer the sockets during the reload.
This way, the master will always try to transfer the socket, even
without any configuration.

The master will still reload with the -x argument, followed by the
sockpair@ syntax. ( ex -x sockpair@4 ). Which use the FD of internal CLI
to the worker.
2021-11-24 19:00:39 +01:00
Tim Duesterhus
f897fc99bd CLEANUP: sock: Wrap accept4_broken = 1 into additional parenthesis
This makes it clear to static analysis tools that this assignment is
intentional and not a mistyped comparison.
2021-11-20 14:52:01 +01:00
Willy Tarreau
e3b4518414 MINOR: protocols: make use of the protocol type to select the protocol
Instead of using sock_type and ctrl_type to select a protocol, let's
make use of the new protocol type. For now they always match so there
is no change. This is applied to address parsing and to socket retrieval
from older processes.
2021-10-27 17:31:20 +02:00
Willy Tarreau
20b622e04b MINOR: connection: add a new CO_FL_WANT_DRAIN flag to force drain on close
Sometimes we'd like to do our best to drain pending data before closing
in order to save the peer from risking to receive an RST on close.

This adds a new connection flag CO_FL_WANT_DRAIN that is used to
trigger a call to conn_ctrl_drain() from conn_ctrl_close(), and the
sock_drain() function ignores fd_recv_ready() if this flag is set,
in order to catch latest data. It's not used for now.
2021-10-21 21:48:23 +02:00
Willy Tarreau
5d9ddc5442 BUILD: tree-wide: add several missing activity.h
A number of files currently access activity counters but rely on their
definitions to be inherited from other files (task.c, backend.c hlua.c,
sock.c, pool.c, stats.c, fd.c).
2021-10-07 01:36:51 +02:00
Willy Tarreau
5a9c637bf3 BUG/MEDIUM: sock: make sure to never miss early connection failures
As shown in issue #1251, it is possible for a connect() to report an
error directly via the poller without ever reporting send readiness,
but currentlt sock_conn_check() manages to ignore that situation,
leading to high CPU usage as poll() wakes up on these FDs.

The bug was apparently introduced in 1.5-dev22 with commit fd803bb4d
("MEDIUM: connection: add check for readiness in I/O handlers"), but
was likely only woken up by recent changes to conn_fd_handler() that
made use of wakeups instead of direct calls between 1.8 and 1.9,
voiding any chance to catch such errors in the early recv() callback.

The exact sequence that leads to this situation remains obscure though
because the poller does not report send readiness nor does it report an
error. Only HUP and IN are reported on the FD. It is also possible that
some recent kernel updates made this condition appear while it never
used to previously.

This needs to be backported to all stable branches, at least as far
as 2.0. Before 2.2 the code was in tcp_connect_probe() in proto_tcp.c.
2021-07-06 10:52:19 +02:00
Willy Tarreau
b41a6e9101 MINOR: fd: move .linger_risk into fdtab[].state
No need to keep this flag apart any more, let's merge it into the global
state. The CLI's output state was extended to 6 digits and the linger/cloned
flags moved inside the parenthesis.
2021-04-07 18:07:49 +02:00
Willy Tarreau
f509065191 MEDIUM: fd: merge fdtab[].ev and state for FD_EV_* and FD_POLL_* into state
For a long time we've had fdtab[].ev and fdtab[].state which contain two
arbitrary sets of information, one is mostly the configuration plus some
shutdown reports and the other one is the latest polling status report
which also contains some sticky error and shutdown reports.

These ones used to be stored into distinct chars, complicating certain
operations and not even allowing to clearly see concurrent accesses (e.g.
fd_delete_orphan() would set the state to zero while fd_insert() would
only set the event to zero).

This patch creates a single uint with the two sets in it, still delimited
at the byte level for better readability. The original FD_EV_* values
remained at the lowest bit levels as they are also known by their bit
value. The next step will consist in merging the remaining bits into it.

The whole bits are now cleared both in fd_insert() and _fd_delete_orphan()
because after a complete check, it is certain that in both cases these
functions are the only ones touching these areas. Indeed, for
_fd_delete_orphan(), the thread_mask has already been zeroed before a
poller can call fd_update_event() which would touch the state, so it
is certain that _fd_delete_orphan() is alone. Regarding fd_insert(),
only one thread will get an FD at any moment, and it as this FD has
already been released by _fd_delete_orphan() by definition it is certain
that previous users have definitely stopped touching it.

Strictly speaking there's no need for clearing the state again in
fd_insert() but it's cheap and will remove some doubts during some
troubleshooting sessions.
2021-04-07 18:04:39 +02:00
Willy Tarreau
61cfdf4fd8 CLEANUP: tree-wide: replace free(x);x=NULL with ha_free(&x)
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).

The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.

178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:

  @ rule @
  expression E;
  @@
  - free(E);
  - E = NULL;
  + ha_free(&E);

It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.

A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
2021-02-26 21:21:09 +01:00
Remi Tricot-Le Breton
25dd0ad123 BUG/MINOR: sock: Unclosed fd in case of connection allocation failure
If allocating a connection object failed right after a successful accept
on a listener, the new file descriptor was not properly closed.

This fixes GitHub issue #905.
It can be backported to 2.3.
2021-02-05 12:14:51 +01:00
Willy Tarreau
472125bc04 MINOR: protocol: add a pair of check_events/ignore_events functions at the ctrl layer
Right now the connection subscribe/unsubscribe code needs to manipulate
FDs, which is not compatible with QUIC. In practice what we need there
is to be able to either subscribe or wake up depending on readiness at
the moment of subscription.

This commit introduces two new functions at the control layer, which are
provided by the socket code, to check for FD readiness or subscribe to it
at the control layer. For now it's not used.
2020-12-11 17:02:50 +01:00
Willy Tarreau
427c846cc9 MINOR: protocol: add a ->drain() function at the connection control layer
This is what we need to drain pending incoming data from an connection.
The code was taken from conn_sock_drain() without the connection-specific
stuff. It still takes a connection for now for API simplicity.
2020-12-11 16:26:00 +01:00
Willy Tarreau
586f71b43f REORG: connection: move the socket iocb (conn_fd_handler) to sock.c
conn_fd_handler() is 100% specific to socket code. It's about time
it moves to sock.c which manipulates socket FDs. With it comes
conn_fd_check() which tests for the socket's readiness. The ugly
connection status check at the end of the iocb was moved to an inlined
function in connection.h so that if we need it for other socket layers
it's not too hard to reuse.

The code was really only moved and not changed at all.
2020-12-11 16:26:00 +01:00
Willy Tarreau
de471c4655 MINOR: protocol: add a set of ctrl_init/ctrl_close methods for setup/teardown
Currnetly conn_ctrl_init() does an fd_insert() and conn_ctrl_close() does an
fd_delete(). These are the two only short-term obstacles against using a
non-fd handle to set up a connection. Let's have pur these into the protocol
layer, along with the other connection-level stuff so that the generic
connection code uses them instead. This will allow to define new ones for
other protocols (e.g. QUIC).

Since we only support regular sockets at the moment, the code was placed
into sock.c and shared with proto_tcp, proto_uxst and proto_sockpair.
2020-12-08 15:50:56 +01:00
Willy Tarreau
b4daeeb094 MINOR: sock: add a check against cross worker<->master socket activities
Given that the previous issues caused spurious worker socket wakeups in
the master for inherited FDs that couldn't be closed, let's add a strict
test in the I/O callback to make sure that an accept() event is always
caught by the appropriate type of process (master for master listeners,
worker for worker listeners).
2020-11-04 15:05:50 +01:00
Willy Tarreau
a4380b211f MEDIUM: listeners: make use of fd_want_recv_safe() to enable early receivers
We used to refrain from calling fd_want_recv() if fd_updt was not allocated
but it's not the right solution as this does not allow the FD to be set.
Instead, let's use the new fd_want_recv_safe() which will update the FD and
create an update entry only if possible. In addition, the equivalent test
before calling fd_stop_recv() was removed as totally useless since there's
not fd_updt creation in this case.
2020-11-04 14:22:42 +01:00
Willy Tarreau
22ccd5ebaf BUG/MEDIUM: listener: make the master also keep workers' inherited FDs
In commit 374e9af35 ("MEDIUM: listener: let do_unbind_listener() decide
whether to close or not") it didn't appear necessary to have the master
process keep open the workers' inherited FDs. But this is actually
necessary to handle the reload on "bind fd@foo" situations, otherwise
the FD may be reassigned and the new socket cannot be set up, sometimes
causing "socket operation on non-socket" or other types of errors.

William found that this was the cause for the consistent failures of the
abns regtest, which already used to fail very often before this and was
as such marked as broken.

Interestingly I didn't have this issue with my test configs because
the FD number I used was higher and within the range of other listening
sockets. But this means that one of these wouldn't work as expected.

No backport is needed, this was introduced as part of the listeners
rework in 2.3.
2020-11-04 14:22:42 +01:00
Willy Tarreau
a74cb38e7c MINOR: protocol: register the receiver's I/O handler and not the protocol's
Now we define a new sock_accept_iocb() for socket-based stream protocols
and use it as a wrapper for listener_accept() which now takes a listener
and not an FD anymore. This will allow the receiver's I/O cb to be
redefined during registration, and more specifically to get rid of the
hard-coded hacks in protocol_bind_all() made for syslog.

The previous ->accept() callback in the protocol was removed since it
doesn't have anything to do with accept() anymore but is more generic.
A few places where listener_accept() was compared against the FD's IO
callback for debugging purposes on the CLI were updated.
2020-10-15 21:47:56 +02:00
Willy Tarreau
344b8fcf87 MINOR: sockpair: implement sockpair_accept_conn() to accept a connection
This is the same as previous commit, but this time for the sockpair-
specific stuff, relying on recv_fd_uxst() instead of accept(), so the
code is simpler. The various errno cases are handled like for regular
sockets, though some of them will probably never happen, but this does
not hurt.
2020-10-15 21:47:56 +02:00
Willy Tarreau
f1dc9f2f17 MINOR: sock: implement sock_accept_conn() to accept a connection
The socket-specific accept() code in listener_accept() has nothing to
do there. Let's move it to sock.c where it can be significantly cleaned
up. It will now directly return an accepted connection and provide a
status code instead of letting listener_accept() deal with various errno
values. Note that this doesn't support the sockpair specific code.

The function is now responsible for dealing with its own receiver's
polling state and calling fd_cant_recv() when facing EAGAIN.

One tiny change from the previous implementation is that the connection's
sockaddr is now allocated before trying accept(), which saves a memcpy()
of the resulting address for each accept at the expense of a cheap
pool_alloc/pool_free on the final accept returning EAGAIN. This still
apparently slightly improves accept performance in microbencharks.
2020-10-15 21:47:56 +02:00
Willy Tarreau
7d053e4211 MINOR: sock: rename sock_accept_conn() to sock_accepting_conn()
This call was introduced by commit 5ced3e887 ("MINOR: sock: add
sock_accept_conn() to test a listening socket") but is actually quite
confusing because it makes one think the socket will accept a connection
(which is what we want to have in a new function) while it only tells
whether it's configured to accept connections. Let's call it
sock_accepting_conn() instead.

The same change was applied to sockpair which had the same issue.
2020-10-15 21:47:56 +02:00
Willy Tarreau
5ced3e8879 MINOR: sock: add sock_accept_conn() to test a listening socket
At several places we need to check if a socket is still valid and still
willing to accept connections. Instead of open-coding this, each time,
let's add a new function for this.
2020-10-13 18:15:33 +02:00
Willy Tarreau
f58b8db47b MEDIUM: receivers: add an rx_unbind() method in the protocols
This is used as a generic way to unbind a receiver at the end of
do_unbind_listener(). This allows to considerably simplify that function
since we can now let the protocol perform the cleanup. The generic code
was moved to sock.c, along with the conditional rx_disable() call. Now
the code also supports that the ->disable() function of the protocol
which acts on the listener performs the close itself and adjusts the
RX_F_BUOND flag accordingly.
2020-10-09 18:44:36 +02:00
Willy Tarreau
e70c7977f2 MINOR: sock: provide a set of generic enable/disable functions
These will be used on receivers, to enable or disable receiving on a
listener, which most of the time just consists in enabling/disabling
the file descriptor.

We have to take care of the existence of fd_updt to know if we may
or not call fd_{want,stop}_recv() since it's not permitted in very
early boot.
2020-10-09 11:27:30 +02:00
Willy Tarreau
f1f660978c MINOR: protocol: retrieve the family-specific fields from the family
We now take care of retrieving sock_family, l3_addrlen, bind(),
addrcmp(), get_src() and get_dst() from the protocol family and
not just the protocol itself. There are very few places, this was
only seldom used. Interestingly in sock_inet.c used to rely on
->sock_family instead of ->sock_domain, and sock_unix.c used to
hard-code PF_UNIX instead of using ->sock_domain.

Also it appears obvious we have something wrong it the protocol
selection algorithm because sock_domain is the one set to the custom
protocols while it ought to be sock_family instead, which would avoid
having to hard-code some conversions for UDP namely.
2020-09-16 22:08:07 +02:00
Willy Tarreau
c049c0d5ad MINOR: sock: make sock_find_compatible_fd() only take a receiver
We don't need to have a listener anymore to find an fd, a receiver with
its settings properly set is enough now.
2020-09-16 22:08:07 +02:00
Willy Tarreau
3fd3bdc836 MINOR: receiver: move the FOREIGN and V6ONLY options from listener to settings
The new RX_O_FOREIGN, RX_O_V6ONLY and RX_O_V4V6 options are now set into
the rx_settings part during the parsing, so that we don't need to adjust
them in each and every listener anymore. We have to keep both v4v6 and
v6only due to the precedence from v6only over v4v6.
2020-09-16 22:08:07 +02:00
Willy Tarreau
818a92e87a MINOR: listener: prefer to retrieve the socket's settings via the receiver
Some socket settings used to be retrieved via the listener and the
bind_conf. Now instead we use the receiver and its settings whenever
appropriate. This will simplify the removal of the dependency on the
listener.
2020-09-16 22:08:07 +02:00
Willy Tarreau
4dfabfed13 MINOR: listener: make sock_find_compatible_fd() check the socket type
sock_find_compatible_fd() can now access the protocol via the receiver
hence it can access its socket type and know whether the receiver has
dgram or stream sockets, so we don't need to hack around AF_CUST_UDP*
anymore there.
2020-09-16 22:08:07 +02:00
Willy Tarreau
b743661f04 REORG: listener: move the listener's proto to the receiver
The receiver is the one which depends on the protocol while the listener
relies on the receiver. Let's move the protocol there. Since there's also
a list element to get back to the listener from the proto list, this list
element (proto_list) was moved as well. For now when scanning protos, we
still see listeners which are linked by their rx.proto_list part.
2020-09-16 22:08:05 +02:00
Willy Tarreau
371590661e REORG: listener: move the listening address to a struct receiver
The address will be specific to the receiver so let's move it there.
2020-09-16 22:08:01 +02:00
Willy Tarreau
be56c1038f MINOR: listener: move the network namespace to the struct settings
The netns is common to all listeners/receivers and is used to bind the
listening socket so it must be in the receiver settings and not in the
listener. This removes some yet another set of unnecessary loops.
2020-09-16 20:13:13 +02:00
Willy Tarreau
7e307215e8 MINOR: listener: move the interface to the struct settings
The interface is common to all listeners/receivers and is used to bind
the listening socket so it must be in the receiver settings and not in
the listener. This removes some unnecessary loops.
2020-09-16 20:13:13 +02:00
Willy Tarreau
9dbb6c43ce MINOR: sock: distinguish dgram from stream types when retrieving old sockets
For now we still don't retrieve dgram sockets, but the code must be able
to distinguish them before we switch to receivers. This adds a new flag
to the xfer_sock_list indicating that a socket is of type SOCK_DGRAM. The
way to set the flag for now is by looking at the dummy address family which
equals AF_CUST_UDP{4,6} in this case (given that other dgram sockets are not
yet supported).
2020-08-28 19:26:39 +02:00
Willy Tarreau
a2c17877b3 MINOR: sock: do not use LI_O_* in xfer_sock_list anymore
We'll want to store more info there and some info that are not represented
in listener options at the moment (such as dgram vs stream) so let's get
rid of these and instead use a new set of options (SOCK_XFER_OPT_*).
2020-08-28 19:26:38 +02:00
Willy Tarreau
429617459d REORG: sock: move get_old_sockets() from haproxy.c
The new function was called sock_get_old_sockets() and was left as-is
except a minimum amount of style lifting to make it more readable. It
will never be awesome anyway since it's used very early in the boot
sequence and needs to perform socket I/O without any external help.
2020-08-28 19:24:55 +02:00
Willy Tarreau
2d34a710b1 MINOR: sock: implement sock_find_compatible_fd()
This is essentially a merge from tcp_find_compatible_fd() and
uxst_find_compatible_fd() that relies on a listener's address and
compare function and still checks for other variations. For AF_INET6
it compares a few of the listener's bind options. A minor change for
UNIX sockets is that transparent mode, interface and namespace used
to be ignored when trying to pick a previous socket while now if they
are changed, the socket will not be reused. This could be refined but
it's still better this way as there is no more risk of using a
differently bound socket by accident.

Eventually we should not pass a listener there but a set of binding
parameters (address, interface, namespace etc...) which ultimately will
be grouped into a receiver. For now this still doesn't exist so let's
stick to the listener to break dependencies in the rest of the code.
2020-08-28 18:51:36 +02:00
Willy Tarreau
063d47d136 REORG: listener: move xfer_sock_list to sock.{c,h}.
This will be used for receivers as well thus it is not specific to
listeners but to sockets.
2020-08-28 18:51:36 +02:00