haproxy/reg-tests/seamless-reload/abns_socket.vtc

55 lines
1.6 KiB
Plaintext
Raw Normal View History

# commit b4dd15b
# BUG/MINOR: unix: Make sure we can transfer abns sockets on seamless reload.
#
# When checking if a socket we got from the parent is suitable for a listener,
# we just checked that the path matched sockname.tmp, however this is
# unsuitable for abns sockets, where we don't have to create a temporary
# file and rename it later.
# To detect that, check that the first character of the sun_path is 0 for
# both, and if so, that &sun_path[1] is the same too.
#
# Note: there are some tricks here. One of them is that we must not bind the
# same abns address to multiple processes that may run in parallel. Since
# vtest cannot provide abns sockets, we're instead concatenating the number
# of the listening port that vtest allocated for another frontend to the abns
# path, which guarantees to make them unique in the system.
varnishtest "Seamless reload issue with abns sockets"
feature ignore_unknown_macro
# abns@ sockets are not available on freebsd
#EXCLUDE_TARGETS=freebsd,osx,generic
#REQUIRE_VERSION=1.8
#REGTEST_TYPE=broken
haproxy h1 -W -conf {
global
stats socket "${tmpdir}/h1/stats" level admin expose-fd listeners
defaults
mode http
log global
option httplog
REGTEST: increase timeouts on the seamless-reload test The abns_socket in seamless-reload regtest regularly fails in Travis-CI on smaller machines only (typically the ppc64le and sometimes s390x). The error always reports an incomplete HTTP header as seen from the client. And this can occasionally be reproduced on the minicloud ppc64le image when setting a huge file descriptors limit (1 million). What happens in fact is the following: depending on the binding order, some connections from the client might reach the TCP listener on the old instance and be forwarded to the ABNS listener of the second instance just being prepared to start up. But due to the huge number of FDs, setting them up takes slightly more time and the 20ms server timeout may expire before the new instance finishes its startup. This can result in an occasional 504, except that since the client timeout is the same as the server timeout, both sides are closed at the same time and the client doesn't receive the 504. In addition a second problem plugs onto this: by default http-reuse is enabled. Some requests being forwarded to the older instance will be sent over an already established connection. But the CPU used by the starting process using many FDs will be taken away from the older process, whose abns listener will not see a request for more than 20ms, and will decide to kill the idle client connection. At the same moment the TCP proxy forwards a request over this closing connection, it detects the close and silently closes the other side to let the client retry, which is detected by the vtest client as another case of empty header. This is easier to reproduce in VMs with few CPUs (2 or less) and some noisy neighbors such as a few spinning loops in background. Let's just increase this tests' timeout to avoid this. While a few ms are close to the scheduler's granularity, this test is never supposed to trigger the timeouts so it's safe to go higher without impacts on the test execution time. At one second the problem seems impossible to reproduce on the minicloud VMs.
2020-03-23 08:11:51 +00:00
timeout connect 1s
timeout client 1s
timeout server 1s
listen testme
bind "fd@${testme}"
server test_abns_server abns@wpproc1_${h1_testme_port} send-proxy-v2
frontend test_abns
bind abns@wpproc1_${h1_testme_port} accept-proxy
http-request deny deny_status 200
} -start
shell {
kill -USR2 $(cat "${tmpdir}/h1/pid")
}
client c1 -connect ${h1_testme_sock} {
txreq -url "/"
rxresp
} -repeat 50 -run