197 lines
8.1 KiB
Plaintext
197 lines
8.1 KiB
Plaintext
|
Using Linux TCP Splicing with HAProxy
|
||
|
Willy Tarreau <w@1wt.eu>
|
||
|
- 2007/01/06 -
|
||
|
|
||
|
|
||
|
Alexandre Cassen has started a project called Linux Layer7 Switching (L7SW),
|
||
|
whose goal is to provide kernel services to help userland proxies achieving
|
||
|
very high performance. Right now, the project consists in a loadable kernel
|
||
|
module providing TCP Splicing under Linux.
|
||
|
|
||
|
TCP Splicing is a method by which a userland proxy can tell the kernel that
|
||
|
it considers it has no added value on the data part of a connection, and that
|
||
|
the kernel can perform the transfers it itself, thus relieving the proxy from
|
||
|
a potentially heavy job. There are two advantages to this method :
|
||
|
|
||
|
- it reduces the number of process wakeups
|
||
|
- it reduces the number of data copies between user-space and kernel buffers
|
||
|
|
||
|
This method is particularly suited to protocols in which data is sent till
|
||
|
the end of the session. This is the case for FTP data for instance, and it
|
||
|
is also the case for the BODY part of HTTP/1.0.
|
||
|
|
||
|
The great news is that haproxy has been designed from the beginning with a
|
||
|
clear distinction between the headers and the DATA phase, so it was a child's
|
||
|
game to add hooks to Alex's library in it
|
||
|
|
||
|
Be careful! Both versions are to be considered BETA software ! Run them on
|
||
|
your systems if you want, but do not complain if it crashes twice a day !
|
||
|
Anyway, it seems stable on our test machines.
|
||
|
|
||
|
In order to use TCP Splicing on haproxy, you need :
|
||
|
|
||
|
- Linux Layer7 Switching code version 0.1.1 : [ http://linux-l7sw.sf.net/ ]
|
||
|
- Haproxy version 1.3.5 : [ http://haproxy.1wt.eu/download/1.3/src/ ]
|
||
|
|
||
|
Then, you must untar both packages in any location, let's assume you'll
|
||
|
be using /tmp. First extract l7sw and :
|
||
|
|
||
|
$ cd /tmp
|
||
|
$ tar zxf layer7switch-0.1.1.tar.gz
|
||
|
$ cd layer7switch-0.1.1
|
||
|
|
||
|
L7SW currently only supports Linux kernel 2.6.19+. If you prefer to use it
|
||
|
on a more stable kernel, such as 2.6.16.X, you can apply this patch to the
|
||
|
L7SW directory :
|
||
|
|
||
|
[ http://haproxy.1wt.eu/download/patches/tcp_splice-0.1.1-linux-2.6.16.diff ]
|
||
|
|
||
|
$ patch -p1 -d kernel < tcp_splice-0.1.1-linux-2.6.16.diff
|
||
|
|
||
|
Alternatively, if you prefer to run it on 2.4.33+, you can apply this patch
|
||
|
to the L7SW directory :
|
||
|
|
||
|
[ http://haproxy.1wt.eu/download/patches/tcp_splice-0.1.1-linux-2.4.33.diff ]
|
||
|
|
||
|
$ patch -p1 -d kernel < tcp_splice-0.1.1-linux-2.4.33.diff
|
||
|
|
||
|
Then build the kernel module as described in the L7SW README. Basically, you
|
||
|
just have to do this once your tree has been patched :
|
||
|
|
||
|
$ cd kernel
|
||
|
$ make
|
||
|
|
||
|
You can either install the resulting module (tcp_splice) or load it now. During
|
||
|
early testing periods, it might be preferable to avoid installing anything and
|
||
|
just load it manually :
|
||
|
|
||
|
$ sudo insmod tcp_splice.*o
|
||
|
$ cd ..
|
||
|
|
||
|
Now that the module is loaded, you need to build the libtcpsplice library on
|
||
|
which haproxy currently relies :
|
||
|
|
||
|
$ cd userland/libtcpsplice
|
||
|
$ make
|
||
|
$ cd ..
|
||
|
|
||
|
For the adventurous, there's also a proof of concept in the userlan/switchd
|
||
|
directory, it may be useful if you encounter problems with haproxy for
|
||
|
instance. But it is not needed at all here.
|
||
|
|
||
|
OK, L7SW is ready. Now you have to extract haproxy and tell it to build using
|
||
|
libtcpsplice :
|
||
|
|
||
|
$ cd /tmp
|
||
|
$ tar zxf haproxy-1.3.5.tar.gz
|
||
|
$ cd haproxy-1.3.5
|
||
|
$ make USE_TCPSPLICE=1 TCPSPLICEDIR=/tmp/layer7switch-0.1.1/userland/libtcpsplice
|
||
|
|
||
|
There are other options to make, which are hugely recommended, such as
|
||
|
CPU=, REGEX=, and above all, TARGET= so that you use the best syscalls and
|
||
|
functions for your system. Generally you will use TARGET=linux26, but 2.4 users
|
||
|
with an epoll-patched kernel will use TARGET=linux24e. This is very important
|
||
|
because failing to specify those options will disable important optimizations
|
||
|
which might hide the tcpsplice benefits ! Please consult the haproxy's README.
|
||
|
|
||
|
Now that you have haproxy built with support for tcpsplice, and that the module
|
||
|
is loaded, you have to write a config. There is an example in the 'examples'
|
||
|
directory. Basically, you just have to add the "option tcpsplice" keyword BOTH
|
||
|
in the frontend AND in the backend sections that you want to accelerate.
|
||
|
|
||
|
If the option is specified only in the frontend or in the backend, then no
|
||
|
acceleration will be used. It is designed this way to allow some front-back
|
||
|
combinations to use it without forcing others to use it. Of course, if you use
|
||
|
a single "listen" section, you just have to specify it once.
|
||
|
|
||
|
As of now (l7sw-0.1.1 and haproxy-1.3.5), you need the CAP_NETADMIN capability
|
||
|
to START and to RUN. For human beings, it means that you have to start haproxy
|
||
|
as root and keep it running as root, so it must not drop its priviledges. This
|
||
|
is somewhat annoying, but we'll try to find a solution later.
|
||
|
|
||
|
Also, l7sw-0.1.1 does not yet support TCP window scaling nor SACK. So you have
|
||
|
to disable both features on the proxy :
|
||
|
|
||
|
$ sudo sysctl -w net.ipv4.tcp_window_scaling=0
|
||
|
$ sudo sysctl -w net.ipv4.tcp_sack=0
|
||
|
$ sudo sysctl -w net.ipv4.tcp_dsack=0
|
||
|
$ sudo sysctl -w net.ipv4.tcp_tw_recycle=1
|
||
|
|
||
|
You can now check that everything works as expected. Run "vmstat 1" or "top"
|
||
|
in one terminal, and haproxy in another one :
|
||
|
|
||
|
$ sudo ./haproxy -f examples/tcp-splicing-sample.cfg
|
||
|
|
||
|
Transfering large file through it should not affect it much. You should observe
|
||
|
something like 10% CPU instead of 95% when transferring 1 MB files at full
|
||
|
speed. You can play with the tcpsplice option in the configuration to see the
|
||
|
effects.
|
||
|
|
||
|
|
||
|
Troubleshooting
|
||
|
---------------
|
||
|
|
||
|
This software is still beta, and you will probably encounter some caveats.
|
||
|
I personnally ran into a few issues that we'll try to address with Alex. First
|
||
|
of all, I had occasionnal lockups on my SMP machine which I never had on an UP
|
||
|
one. So if you get problems on an SMP machine, please reboot it in UP and do
|
||
|
not lose your time on this.
|
||
|
|
||
|
I also noticed that sometimes, some sessions remained established even after
|
||
|
the end of the program. You might also see some situtations where even after
|
||
|
the proxy's exit, the traffic still passes through the system. It may happen
|
||
|
when you have a limited source port range and that you reuse a TIME_WAIT
|
||
|
session matching exactly the same source and destinations. This will need
|
||
|
to be addressed too.
|
||
|
|
||
|
You can play with tcp_splice variables and timeouts here in /proc/sys/net/ :
|
||
|
|
||
|
$ ls /proc/sys/net/tcp_splice/
|
||
|
debug_level timeout_established timeout_listen timeout_synsent
|
||
|
timeout_close timeout_finwait timeout_synack timeout_timewait
|
||
|
timeout_closewait timeout_lastack timeout_synrecv
|
||
|
|
||
|
$ sysctl net/tcp_splice
|
||
|
net.tcp_splice.debug_level = 0
|
||
|
net.tcp_splice.timeout_synack = 120
|
||
|
net.tcp_splice.timeout_listen = 120
|
||
|
net.tcp_splice.timeout_lastack = 30
|
||
|
net.tcp_splice.timeout_closewait = 60
|
||
|
net.tcp_splice.timeout_close = 10
|
||
|
net.tcp_splice.timeout_timewait = 120
|
||
|
net.tcp_splice.timeout_finwait = 120
|
||
|
net.tcp_splice.timeout_synrecv = 60
|
||
|
net.tcp_splice.timeout_synsent = 120
|
||
|
net.tcp_splice.timeout_established = 900
|
||
|
|
||
|
You can also consult the full session list here :
|
||
|
|
||
|
$ head /proc/net/tcp_splice_conn
|
||
|
FromIP FPrt ToIP TPrt LocalIP LPrt DestIP DPrt State Expires
|
||
|
0A000301 4EBB 0A000302 1F40 0A000302 817B 0A000301 0050 CLOSE 7
|
||
|
0A000301 4E9B 0A000302 1F40 0A000302 8165 0A000301 0050 CLOSE 7
|
||
|
|
||
|
Since a session exists at least in CLOSE state for 10 seconds, you just have
|
||
|
to consult this entry less than 10 seconds after a test to see a session.
|
||
|
|
||
|
Please report your successes, failures, suggestions or fixes to the L7SW
|
||
|
mailing list here (do not use the list to report other haproxy bugs) :
|
||
|
|
||
|
https://lists.sourceforge.net/lists/listinfo/linux-l7sw-devel
|
||
|
|
||
|
|
||
|
Motivations
|
||
|
-----------
|
||
|
|
||
|
I've always wanted haproxy to be the fastest and most reliable software load
|
||
|
balancer available. L7SW is an opportunity to make get a huge performance boost
|
||
|
on high traffic sites (eg: photo sharing, streaming, ...). In turn, I find it a
|
||
|
shame that Alex wastes his time redevelopping a proxy as a proof of concept for
|
||
|
his kernel code. While it is a fun game to enter into, it really becomes harder
|
||
|
when you need to get close to customers' needs. So by porting haproxy early to
|
||
|
L7SW, I get both the opportunity to get an idea of what it will soon be capable
|
||
|
of, and help Alex spend more time on the complex kernel part.
|
||
|
|
||
|
Have fun !
|
||
|
Willy
|