These sockets are the same as Unix sockets except that there's no need
for any filesystem access. The address may be whatever string both sides
agree upon. This can be really convenient for inter-process communications
as well as for chaining backends to frontends.
These addresses are forced by prepending their address with "abns@" for
"abstract namespace".
We've had everything in place for this for a while now, we just missed
the connect function for UNIX sockets. Note that in order to connect to
a UNIX socket inside a chroot, the path will have to be relative to the
chroot.
UNIX sockets connect about twice as fast as TCP sockets (or consume
about half of the CPU at the same rate). This is interesting for
internal communications between SSL processes and HTTP processes
for example, or simply to avoid allocating source ports on the
loopback.
The tcp_connect_probe() function is still used to probe a dataless
connection, but it is compatible so that's not an issue for now.
Health checks are not yet fully supported since they require a port.
We used to have is_addr() in place to validate sometimes the existence
of an address, sometimes a valid IPv4 or IPv6 address. Replace them
carefully so that is_inet_addr() is used wherever we can only use an
IPv4/IPv6 address.
The is_addr() function indicates if an address is set and is an IPv4
or IPv6 address. Let's rename it is_inet_addr() and make is_addr()
also accept AF_UNIX addresses.
Currently, mixing an IPv4 and an IPv6 address in checks happens to
work by pure luck because the two protocols use the same functions
at the socket level and both use IPPROTO_TCP. However, they're
definitely wrong as the protocol for the check address is retrieved
from the server's address.
Now the protocol assigned to the connection is the same as the one
the address in use belongs to (eg: the server's address or the
explicit check address).
The RDP cookie extractor compares the 32-bit address from the request
to the address of each server in the farm without first checking that
the server's address is IPv4. This is a leftover from the IPv4 to IPv6
conversion. It's harmless as it's unlikely that IPv4 and IPv6 servers
will be mixed in an RDP farm, but better fix it.
This patch does not need to be backported.
Till now a warning was emitted if the "stats bind-process" was not
specified when nbproc was greater than 1. Now we can be much finer
and only emit a warning when at least of the stats socket is bound
to more than one process at a time.
Now that we know what processes a "bind" statement is attached to, we
have the ability to avoid starting some of them when they're not on the
proper process. This feature is disabled when running in foreground
however, so that debug mode continues to work with everything bound to
the first and only process.
The main purpose of this change is to finally allow the global stats
sockets to be each bound to a different process.
It can also be used to force haproxy to use different sockets in different
processes for the same IP:port. The purpose is that under Linux 3.9 and
above (and possibly other OSes), when multiple processes are bound to the
same IP:port via different sockets, the system is capable of performing
a perfect round-robin between the socket queues instead of letting any
process pick all the connections from a queue. This results in a smoother
load balancing and may achieve a higher performance with a large enough
maxaccept setting.
When a process list is specified on either the proxy or the bind lines,
the latter is refined to the intersection of the two. A warning is emitted
if no intersection is found, and the situation is fixed by either falling
back to the first process of the proxy or to all processes.
When a bind-process setting is present in a frontend or backend, we
now verify that the specified process range at least shares one common
process with those defined globally by nbproc. Then if the value is
set, it is reduced to the one enforced by nbproc.
A warning is emitted if process count does not match, and the fix is
done the following way :
- if a single process was specified in the range, it's remapped to
process #1
- if more than one process was specified, the binding is removed
and all processes are usable.
Note that since backends may inherit their settings from frontends,
depending on the declaration order, they may or may not be reported
as warnings.
Some consistency checks cannot be performed between frontends, backends
and peers at the moment because there is no way to check for intersection
between processes bound to some processes when the number of processes is
higher than the number of bits in a word.
So first, let's limit the number of processes to the machine's word size.
This means nbproc will be limited to 32 on 32-bit machines and 64 on 64-bit
machines. This is far more than enough considering that configs rarely go
above 16 processes due to scalability and management issues, so 32 or 64
should be fine.
This way we'll ensure we can always build a mask of all the processes a
section is bound to.
By default, a proxy's bind_proc is zero, meaning "bind to all processes".
It's only when not zero that its process list is restricted. So we don't
want the frontends to enforce the value on the backends when the backends
are still set to zero.
Now, haproxy exit an error saying:
Unable to initialize the lock for the shared SSL session cache. You can retry using
the global statement 'tune.ssl.force-private-cache' but it could increase the CPU
usage due to renegotiation if nbproc > 1.
They could match different strings as equal if the chunk was shorter
than the string. Those functions are currently only used for SSL's
certificate DN entry extract.
This commit modifies the PROXY protocol V2 specification to support headers
longer than 255 bytes allowing for optional extensions. It implements the
PROXY protocol V2 which is a binary representation of V1. This will make
parsing more efficient for clients who will know in advance exactly how
many bytes to read. Also, it defines and implements some optional PROXY
protocol V2 extensions to send information about downstream SSL/TLS
connections. Support for PROXY protocol V1 remains unchanged.
John-Paul Bader reported a nasty segv which happens after a few hours
when SSL is enabled under a high load. Fortunately he could catch a
stack trace, systematically looking like this one :
(gdb) bt full
level = 6
conn = (struct connection *) 0x0
err_msg = <value optimized out>
s = (struct session *) 0x80337f800
conn = <value optimized out>
flags = 41997063
new_updt = <value optimized out>
old_updt = 1
e = <value optimized out>
status = 0
fd = 53999616
nbfd = 279
wait_time = <value optimized out>
updt_idx = <value optimized out>
en = <value optimized out>
eo = <value optimized out>
count = 78
sr = <value optimized out>
sw = <value optimized out>
rn = <value optimized out>
wn = <value optimized out>
The variable "flags" in conn_fd_handler() holds a copy of connection->flags
when entering the function. These flags indicate 41997063 = 0x0280d307 :
- {SOCK,DATA,CURR}_RD_ENA=1 => it's a handshake, waiting for reading
- {SOCK,DATA,CURR}_WR_ENA=0 => no need for writing
- CTRL_READY=1 => FD is still allocated
- XPRT_READY=1 => transport layer is initialized
- ADDR_FROM_SET=1, ADDR_TO_SET=0 => clearly it's a frontend connection
- INIT_DATA=1, WAKE_DATA=1 => processing a handshake (ssl I guess)
- {DATA,SOCK}_{RD,WR}_SH=0 => no shutdown
- ERROR=0, CONNECTED=0 => handshake not completed yet
- WAIT_L4_CONN=0 => normal
- WAIT_L6_CONN=1 => waiting for an L6 handshake to complete
- SSL_WAIT_HS=1 => the pending handshake is an SSL handshake
So this is a handshake is in progress. And the only way to reach line 88
is for the handshake to complete without error. So we know for sure that
ssl_sock_handshake() was called and completed the handshake then removed
the CO_FL_SSL_WAIT_HS flag from the connection. With these flags,
ssl_sock_handshake() does only call SSL_do_handshake() and retruns. So
that means that the problem is necessarily in data->init().
The fd is wrong as reported but is simply mis-decoded as it's the lower
half of the last function pointer.
What happens in practice is that there's an issue with the way we deal
with embryonic sessions during their conversion to regular sessions.
Since they have no stream interface at the beginning, the pointer to
the connection is temporarily stored into s->target. Then during their
conversion, the first stream interface is properly initialized and the
connection is attached to it, then s->target is set to NULL.
The problem is that if anything fails in session_complete(), the
session is left in this intermediate state where s->target is NULL,
and kill_mini_session() is called afterwards to perform the cleanup.
It needs the connection, that it finds in s->target which is NULL,
dereferences it and dies. The only reasons for dying here are a problem
on the TCP connection when doing the setsockopt(TCP_NODELAY) or a
memory allocation issue.
This patch implements a solution consisting in restoring s->target in
session_complete() on the error path. That way embryonic sessions that
were valid before calling it are still valid after.
The bug was introduced in 1.5-dev20 by commit f8a49ea ("MEDIUM: session:
attach incoming connection to target on embryonic sessions"). No backport
is needed.
Special thanks to John for his numerous tests and traces.
Prevously pthread process shared lock were used by default,
if USE_SYSCALL_FUTEX is not specified.
This patch implements an OS independant kind of lock:
An active spinlock is usedf if USE_SYSCALL_FUTEX is not specified.
The old behavior is still available if USE_PTHREAD_PSHARED=1.
Process shared mutex seems not supported on some OSs (FreeBSD).
This patch checks errors on mutex lock init to fallback
on a private session cache (per process cache) in error cases.
1.5-dev24 introduced SSL_CTX_set_msg_callback(), which came with OpenSSL
0.9.7. A build attempt with an older one failed and we're still compatible
with 0.9.6 in 1.5.
Trying to build with an old gcc and glibc revealed that we must not
state "inline" in our _syscall* definitions since it's already present
in the declaration making use of the _syscall* macros.
During some tests in multi-process mode under Linux, it appeared that
issuing "disable frontend foo" on the CLI to pause a listener would
make the shutdown(read) of certain processes disturb another process
listening on the same socket, resulting in a 100% CPU loop. What
happens is that accept() returns EAGAIN without accepting anything.
Fortunately, we see that epoll_wait() reports EPOLLIN+EPOLLRDHUP
(likely because the FD points to the same file in the kernel), so we
can use that to stop the other process from trying to accept connections
for a short time and try again later, hoping for the situation to change.
We must not disable the FD otherwise there's no way to re-enable it.
Additionally, during these tests, a loop was encountered on EINVAL which
was not caught. Now if we catch an EINVAL, we proceed the same way, in
case the socket is re-enabled later.
It's the final part of the 2 previous patches. We prevent the server from
timing out if we still have some data to pass to it. That way, even if the
server runs with a short timeout and the client with a large one, the server
side timeout will only start to count once the client sends everything. This
ensures we don't report a 504 before the server gets the whole request.
It is not certain whether the 1.4 state machine is fully compatible with
this change. Since the purpose is only to ensure that we never report a
server error before a client error if some data are missing from the client
and when the server-side timeout is smaller than or equal to the client's,
it's probably not worth attempting the backport.
This is the continuation of previous patch "BUG/MEDIUM: http/session:
disable client-side expiration only after body".
This one takes care of properly reporting the client-side read timeout
when waiting for a body from the client. Since the timeout may happen
before or after the server starts to respond, we have to take care of
the situation in three different ways :
- if the server does not read our data fast enough, we emit a 504
if we're waiting for headers, or we simply break the connection
if headers were already received. We report either sH or sD
depending on whether we've seen headers or not.
- if the server has not yet started to respond, but has read all of
the client's data and we're still waiting for more data from the
client, we can safely emit a 408 and abort the request ;
- if the server has already started to respond (thus it's a transfer
timeout during a bidirectional exchange), then we silently break
the connection, and only the session flags will indicate in the
logs that something went wrong with client or server side.
This bug is tagged MEDIUM because it touches very sensible areas, however
its impact is very low. It might be worth performing a careful backport
to 1.4 once it has been confirmed that everything is correct and that it
does not introduce any regression.
For a very long time, back in the v1.3 days, we used to rely on a trick
to avoid expiring the client side while transferring a payload to the
server. The problem was that if a client was able to quickly fill the
buffers, and these buffers took some time to reach the server, the
client should not expire while not sending anything.
In order to cover this situation, the client-side timeout was disabled
once the connection to the server was OK, since it implied that we would
at least expire on the server if required.
But there is a drawback to this : if a client stops uploading data before
the end, its timeout is not enforced and we only expire on the server's
timeout, so the logs report a 504.
Since 1.4, we have message body analysers which ensure that we know whether
all the expected data was received or not (HTTP_MSG_DATA or HTTP_MSG_DONE).
So we can fix this problem by disabling the client-side or server-side
timeout at the end of the transfer for the respective side instead of
having it unconditionally in session.c during all the transfer.
With this, the logs now report the correct side for the timeout. Note that
this patch is not enough, because another issue remains : the HTTP body
forwarders do not abort upon timeout, they simply rely on the generic
handling from session.c. So for now, the session is still aborted when
reaching the server timeout, but the culprit is properly reported. A
subsequent patch will address this specific point.
This bug was tagged MEDIUM because of the changes performed. The issue
it fixes is minor however. After some cooling down, it may be backported
to 1.4.
It was reported by and discussed with Rachel Chavez and Patrick Hemmer
on the mailing list.
Previously ssl_fc_unique_id and ssl_bc_unique_id return a string encoded
in base64 of the RFC 5929 TLS unique identifier. This patch modify those fetches
to return directly the ID in the original binary format. The user can make the
choice to encode in base64 using the converter.
i.e. : ssl_fc_unique_id,base64
ssl_f_sha1 is a binary binary fetch used to returns the SHA-1 fingerprint of
the certificate presented by the frontend when the incoming connection was
made over an SSL/TLS transport layer. This can be used to know which
certificate was chosen using SNI.
Adds ssl fetchs and ACLs for outgoinf SSL/Transport layer connection with their
docs:
ssl_bc, ssl_bc_alg_keysize, ssl_bc_cipher, ssl_bc_protocol, ssl_bc_unique_id,
ssl_bc_session_id and ssl_bc_use_keysize.
On the mailing list, seri0528@naver.com reported an issue when
using balance url_param or balance uri. The request would sometimes
stall forever.
Cyril Bonté managed to reproduce it with the configuration below :
listen test :80
mode http
balance url_param q
hash-type consistent
server s demo.1wt.eu:80
and found it appeared with this commit : 80a92c0 ("BUG/MEDIUM: http:
don't start to forward request data before the connect").
The bug is subtle but real. The problem is that the HTTP request
forwarding analyzer refrains from starting to parse the request
body when some LB algorithms might need the body contents, in order
to preserve the data pointer and avoid moving things around during
analysis in case a redispatch is later needed. And in order to detect
that the connection establishes, it watches the response channel's
CF_READ_ATTACHED flag.
The problem is that a request analyzer is not subscribed to a response
channel, so it will only see changes when woken for other (generally
correlated) reasons, such as the fact that part of the request could
be sent. And since the CF_READ_ATTACHED flag is cleared once leaving
process_session(), it is important not to miss it. It simply happens
that sometimes the server starts to respond in a sequence that validates
the connection in the middle of process_session(), that it is detected
after the analysers, and that the newly assigned CF_READ_ATTACHED is
not used to detect that the request analysers need to be called again,
then the flag is lost.
The CF_WAKE_WRITE flag doesn't work either because it's cleared upon
entry into process_session(), ie if we spend more than one call not
connecting.
Thus we need a new flag to tell the connection initiator that we are
specifically interested in being notified about connection establishment.
This new flag is CF_WAKE_CONNECT. It is set by the requester, and is
cleared once the connection succeeds, where CF_WAKE_ONCE is set instead,
causing the request analysers to be scanned again.
For future versions, some better options will have to be considered :
- let all analysers subscribe to both request and response events ;
- let analysers subscribe to stream interface events (reduces number
of useless calls)
- change CF_WAKE_WRITE's semantics to persist across calls to
process_session(), but that is different from validating a
connection establishment (eg: no data sent, or no data to send)
The bug was introduced in 1.5-dev23, no backport is needed.
Commit fc6c032 ("MEDIUM: global: add support for CPU binding on Linux ("cpu-map")")
merged into 1.5-dev13 involves a useless test that clang reports as a warning. The
"low" variable cannot be negative here. Issue reported by Charles Carter.
Commit 5338eea ("MEDIUM: pattern: The match function browse itself the
list or the tree") changed the return type of pattern matching functions.
One enum was left over in pat_match_auth(). Fortunately, this one equals
zero where a null pointer is expected, so it's cast correctly.
This detected and reported by Charles Carter was introduced in 1.5-dev23,
no backport is needed.
Till now we used to return a pointer to a rule, but that makes it
complicated to later add support for registering new actions which
may fail. For example, the redirect may fail if the response is too
large to fit into the buffer.
So instead let's return a verdict. But we needed the pointer to the
last rule to get the address of a redirect and to get the realm used
by the auth page. So these pieces of code have moved into the function
and they produce a verdict.
Both use exactly the same mechanism, except for the choice of the
default realm to be emitted when none is selected. It can be achieved
by simply comparing the ruleset with the stats' for now. This achieves
a significant code reduction and further, removes the dependence on
the pointer to the final rule in the caller.
The "block" rules are redundant with http-request rules because they
are performed immediately before and do exactly the same thing as
"http-request deny". Moreover, this duplication has led to a few
minor stats accounting issues fixed lately.
Instead of keeping the two rule sets, we now build a list of "block"
rules that we compile as "http-request block" and that we later insert
at the beginning of the "http-request" rules.
The only user-visible change is that in case of a parsing error, the
config parser will now report "http-request block rule" instead of
"blocking condition".