There's a test after a successful synchronous connect() consisting
in waking the data layer up asap if there's no more handshake.
Unfortunately this test is run before setting the CO_FL_SEND_PROXY
flag and before the transport layer adds its own flags, so it can
indicate a willingness to send data while it's not the case and it
will have to be handled later.
This has no visible effect except a useless call to a function in
case of health checks making use of the proxy protocol for example.
Additionally a corner case where EALREADY was returned and considered
equivalent to EISCONN was fixed so that it's considered equivalent to
EINPROGRESS given that the connection is not complete yet. But this
code should never return on the first call anyway so it's mostly a
cleanup.
This fix should be backported to 1.7 and 1.6 at least to avoid
headaches during some debugging.
When a connect() to a unix socket returns EAGAIN we talk about
"no free ports" in the error/debug message, which only makes
sense when using TCP.
Explain connect() failure and suggest troubleshooting server
backlog size.
Abstract namespace sockets ignore the shutdown() call and do not make
it possible to temporarily stop listening. The issue it causes is that
during a soft reload, the new process cannot bind, complaining that the
address is already in use.
This change registers a new pause() function for unix sockets and
completely unbinds the abstract ones since it's possible to rebind
them later. It requires the two previous patches as well as preceeding
fixes.
This fix should be backported into 1.5 since the issue apperas there.
Jan Seda noticed that abstract sockets are incompatible with soft reload,
because the new process cannot bind and immediately fails. This patch marks
the binding as retryable and not fatal so that the new process can try to
bind again after sending a signal to the old process.
Note that this fix is not enough to completely solve the problem, but it
is necessary. This patch should be backported to 1.5.
When bind() fails (function uxst_bind_listener()), the fail path doesn't
consider the abstract namespace and tries to unlink paths held in
uninitiliazed memory (tempname and backname). See the strace excerpt;
the strings still hold the path from test1.
===============================================================================================
23722 bind(5, {sa_family=AF_FILE, path=@"test2"}, 110) = -1 EADDRINUSE (Address already in use)
23722 unlink("/tmp/test1.sock.23722.tmp") = -1 ENOENT (No such file or directory)
23722 close(5) = 0
23722 unlink("/tmp/test1.sock.23722.bak") = -1 ENOENT (No such file or directory)
===============================================================================================
This patch should be backported to 1.5.
Plain "tcp" health checks sent to a unix socket cause two connect()
calls to be made, one to connect, and a second one to verify that the
connection properly established. But with unix sockets, we get
immediate notification of success, so we can avoid this second
attempt. However we need to ensure that we'll visit the connection
handler even if there's no remaining handshake pending, so for this
we claim we have some data to send in order to enable polling for
writes if there's no more handshake.
These sockets are the same as Unix sockets except that there's no need
for any filesystem access. The address may be whatever string both sides
agree upon. This can be really convenient for inter-process communications
as well as for chaining backends to frontends.
These addresses are forced by prepending their address with "abns@" for
"abstract namespace".
We've had everything in place for this for a while now, we just missed
the connect function for UNIX sockets. Note that in order to connect to
a UNIX socket inside a chroot, the path will have to be relative to the
chroot.
UNIX sockets connect about twice as fast as TCP sockets (or consume
about half of the CPU at the same rate). This is interesting for
internal communications between SSL processes and HTTP processes
for example, or simply to avoid allocating source ports on the
loopback.
The tcp_connect_probe() function is still used to probe a dataless
connection, but it is compatible so that's not an issue for now.
Health checks are not yet fully supported since they require a port.
Using the address syntax "fd@<num>", a listener may inherit a file
descriptor that the caller process has already bound and passed as
this number. The fd's socket family is detected using getsockname(),
and the usual initialization is performed through the existing code
for that family, but the socket creation is skipped.
Whether the parent has performed the listen() call or not is not
important as this is detected.
For UNIX sockets, we immediately clear the path after preparing a
socket so that we never remove it in case an abort would happen due
to a late error during startup.
Unix permissions are per-bind configuration line and not per listener,
so let's concretize this in the way the config is stored. This avoids
some unneeded loops to set permissions on all listeners.
The access level is not part of the unix perms so it has been moved
away. Once we can use str2listener() to set all listener addresses,
we'll have a bind keyword parser for this one.
Navigating through listeners was very inconvenient and error-prone. Not to
mention that listeners were linked in reverse order and reverted afterwards.
In order to definitely get rid of these issues, we now do the following :
- frontends have a dual-linked list of bind_conf
- frontends have a dual-linked list of listeners
- bind_conf have a dual-linked list of listeners
- listeners have a pointer to their bind_conf
This way we can now navigate from anywhere to anywhere and always find the
proper bind_conf for a given listener, as well as find the list of listeners
for a current bind_conf.
The "mode", "uid", "gid", "user" and "group" bind options were moved to
proto_uxst as they are unix-specific.
Note that previous versions had a bug here, only the last listener was
updated with the specified settings. However, it almost never happens
that bind lines contain multiple UNIX socket paths so this is not that
much of a problem anyway.
The "raw_sock" prefix will be more convenient for naming functions as
it will be prefixed with the data layer and suffixed with the data
direction. So let's rename the files now to avoid any further confusion.
The #include directive was also removed from a number of files which do
not need it anymore.
fdtab[].state was only used to know whether a connection was in progress
or an error was encountered. Instead we now use connection->flags to store
a flag for both. This way, connection management will be able to update the
connection status on I/O.
The destination address is purely a connection thing and not an fd thing.
It's also likely that later the address will be stored into the connection
and linked to by the SI.
struct fdinfo only keeps the pointer to the port range and the local port
for now. All of this also needs to move to the connection but before this
the release of the port range must move from fd_delete() to a new function
dedicated to the connection.
These pointers were used to hold pointers to buffers in the past, but
since we introduced the stream interface, they're no longer used but
they were still sometimes set.
Removing them shrink the struct fdtab from 32 to 24 bytes on 32-bit machines,
and from 52 to 36 bytes on 64-bit machines, which is a significant saving. A
quick tests shows a steady 0.5% performance gain, probably due to the better
cache efficiency.
Commit e164e7a removed get_src/get_dst setting in the stream interfaces but
forgot to set it in proto_tcp. Get the feature back because we need it for
logging, transparent mode, ACLs etc... We now rely on the stream interface
direction to know what syscall to use.
One benefit of doing it this way is that we don't use getsockopt() anymore
on outgoing stream interfaces nor on UNIX sockets.
We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
The previous sockstream_accept() function uses nothing from sockstream, and
is totally irrelevant to stream interfaces. Move this to the protocols.c
file which handles listeners and protocols, and call it listener_accept().
It now makes much more sense that the code dealing with listen() also handles
accept() and passes it to upper layers.
Managing listeners state is difficult because they have their own state
and can at the same time have theirs dictated by their proxy. The pause
is not done properly, as the proxy code is fiddling with sockets. By
introducing new functions such as pause_listener()/resume_listener(), we
make it a bit more obvious how/when they're supposed to be used. The
listen_proxies() function was also renamed to resume_proxies() since
it's only used for pause/resume.
This patch is the first in a series aiming at getting rid of the maintain_proxies
mess. In the end, proxies should not call enable_listener()/disable_listener()
anymore.
Since unix sockets are supported for bind, the default backlog size was not
enough to accept the traffic. The size is now inherited from the listener
to behave like the tcp listeners.
This also affects the "stats socket" backlog, which is now determined by
"stats maxconn".
There were a lot of snprintf() everywhere in the UNIX bind code. Now we
proceed as for tcp and indicate the socket path at the end between square
brackets. The code is smaller and more readable.
MAXPATHLEN may be used at other places, it's unconvenient to have it
redefined in a few files. Also, since checking it requires including
sys/param.h, some versions of it cause a macro declaration conflict
with MIN/MAX which are defined in tools.h. The solution consists in
including sys/param.h in both files so that we ensure it's loaded
before the macros are defined and MAXPATHLEN is checked.
For a long time we had two large accept() functions, one for TCP
sockets instanciating proxies, and another one for UNIX sockets
instanciating the stats interface.
A lot of code was duplicated and both did not work exactly the same way.
Now we have a stream_sock layer accept() called for either TCP or UNIX
sockets, and this function calls the frontend-specific accept() function
which does the rest of the frontend-specific initialisation.
Some code is still duplicated (session & task allocation, stream interface
initialization), and might benefit from having an intermediate session-level
accept() callback to perform such initializations. Still there are some
minor differences that need to be addressed first. For instance, the monitor
nets should only be checked for proxies and not for other connection templates.
Last, we renamed l->private as l->frontend. The "private" pointer in
the listener is only used to store a frontend, so let's rename it to
eliminate this ambiguity. When we later support detached listeners
(eg: FTP), we'll add another field to avoid the confusion.
Right now we count the incoming connection only once everything has
been allocated. Since we're planning on considering early ACL rules,
we need to count the connection earlier.
The response analyser was not emptied upon creation of a new session. In
fact it was always zero just because last session leaved it in a zero state,
but in case of shared pools this cannot be guaranteed. The net effect is
that it was possible to have some HTTP (or any other) analysers on the
response path of a stats unix socket, which would reject the response.
This fix must be backported to 1.4.
Since the last documentation cleanups, I've found more typos that I kept
in a corner instead of sending you a mail just for one character :)
--
Cyril Bont
Some rarely information are stored in fdtab, making it larger for no
reason (source port ranges, remote address, ...). Such information
lie there because the checks can't find them anywhere else. The goal
will be to move these information to the stream interface once the
checks make use of it.
For now, we move them to an fdinfo array. This simple change might
have improved the cache hit ratio a little bit because a 0.5% of
performance increase has measured.
The stats handler used to store internal states in s->ana_state. Now
we only rely on si->st0 in which we can store as many states as we
have possible outputs. This cleans up the stats code a lot and makes
it more maintainable. It has also reduced code size by a few hundred
bytes.
By default, when data is sent over a socket, both the write timeout and the
read timeout for that socket are refreshed, because we consider that there is
activity on that socket, and we have no other means of guessing if we should
receive data or not.
While this default behaviour is desirable for almost all applications, there
exists a situation where it is desirable to disable it, and only refresh the
read timeout if there are incoming data. This happens on sessions with large
timeouts and low amounts of exchanged data such as telnet session. If the
server suddenly disappears, the output data accumulates in the system's
socket buffers, both timeouts are correctly refreshed, and there is no way
to know the server does not receive them, so we don't timeout. However, when
the underlying protocol always echoes sent data, it would be enough by itself
to detect the issue using the read timeout. Note that this problem does not
happen with more verbose protocols because data won't accumulate long in the
socket buffers.
When this option is set on the frontend, it will disable read timeout updates
on data sent to the client. There probably is little use of this case. When
the option is set on the backend, it will disable read timeout updates on
data sent to the server. Doing so will typically break large HTTP posts from
slow lines, so use it with caution.
We can get rid of the stats analyser by moving all the stats code
to a stream interface applet. Above being cleaner, it provides new
advantages such as the ability to process requests and responses
from the same function and work only with simple state machines.
There's no need for any hijack hack anymore.
The direct advantage for the user are the interactive mode and the
ability to chain several commands delimited by a semi-colon. Now if
the user types "prompt", he gets a prompt from which he can send
as many requests as he wants. All outputs are terminated by a
blank line followed by a new prompt, so this can be used from
external tools too.
The code is not very clean, it needs some rework, but some part
of the dirty parts are due to the remnants of the hijack mode used
in the old functions we call.
The old AN_REQ_STATS_SOCK analyser flag is now unused and has been
removed.
When stream interfaces will embedded applets running as part as their
holding task, we'll need a new callback to process them from the
session processor.
We used to call stream_sock_data_finish() directly at the end of
a session update, but if we want to support non-socket interfaces,
we need to have this function configurable. Now we access it via
->update().