haproxy public development tree
Go to file
Willy Tarreau 5bd8c376ad [MAJOR] complete support for linux 2.6 kernel splicing
This code provides support for linux 2.6 kernel splicing. This feature
appeared in kernel 2.6.25, but initial implementations were awkward and
buggy. A kernel >= 2.6.29-rc1 is recommended, as well as some optimization
patches.

Using pipes, this code is able to pass network data directly between
sockets. The pipes are a bit annoying to manage (fd creation, release,
...) but finally work quite well.

Preliminary tests show that on high bandwidths, there's a substantial
gain (approx +50%, only +20% with kernel workarounds for corruption
bugs). With 2000 concurrent connections, with Myricom NICs, haproxy
now more easily achieves 4.5 Gbps for 1 process and 6 Gbps for two
processes buffers. 8-9 Gbps are easily reached with smaller numbers
of connections.

We also try to splice out immediately after a splice in by making
profit from the new ability for a data producer to notify the
consumer that data are available. Doing this ensures that the
data are immediately transferred between sockets without latency,
and without having to re-poll. Performance on small packets has
considerably increased due to this method.

Earlier kernels return only one TCP segment at a time in non-blocking
splice-in mode, while newer return as many segments as may fit in the
pipe. To work around this limitation without hurting more recent kernels,
we try to collect as much data as possible, but we stop when we believe
we have read 16 segments, then we forward everything at once. It also
ensures that even upon shutdown or EAGAIN the data will be forwarded.

Some tricks were necessary because the splice() syscall does not make
a difference between missing data and a pipe full, it always returns
EAGAIN. The trick consists in stop polling in case of EAGAIN and a non
empty pipe.

The receiver waits for the buffer to be empty before using the pipe.
This is in order to avoid confusion between buffer data and pipe data.
The BF_EMPTY flag now covers the pipe too.

Right now the code is disabled by default. It needs to be built with
CONFIG_HAP_LINUX_SPLICE, and the instances intented to use splice()
must have "option splice-response" (or option splice-request) enabled.

It is probably desirable to keep a pool of pre-allocated pipes to
avoid having to create them for every session. This will be worked
on later.

Preliminary tests show very good results, even with the kernel
workaround causing one memcpy(). At 3000 connections, performance
has moved from 3.2 Gbps to 4.7 Gbps.
2009-01-19 00:32:22 +01:00
contrib/netsnmp-perl [MAJOR] proto_uxst rework -> SNMP support 2008-03-04 06:32:16 +01:00
doc [BUG] "option transparent" is for backend, not frontend ! 2008-12-23 23:13:55 +01:00
examples [RELEASE] Released version 1.3.15 2008-04-19 21:25:12 +02:00
include [MINOR] introduce structures required to support Linux kernel splicing 2009-01-18 21:56:21 +01:00
src [MAJOR] complete support for linux 2.6 kernel splicing 2009-01-19 00:32:22 +01:00
tests [MINOR] redirect: in prefix mode a "/" means not to change the URI 2008-12-07 23:48:39 +01:00
.gitignore [CLEANUP] update .gitignore to ignore more temporary files 2008-03-07 09:39:37 +01:00
CHANGELOG [RELEASE] Released version 1.3.15 2008-04-19 21:25:12 +02:00
CONTRIB [DOC] Update a "contrib" file with a hint about a scheme used for formathing subjects 2008-02-04 21:34:59 +01:00
LICENSE [LICENSE] licensing clarifications 2006-06-15 21:48:13 +02:00
Makefile [BUILD] fix MANDIR default location to match documentation 2008-12-07 23:37:20 +01:00
Makefile.bsd [MINOR] store the build options to report with -vv 2007-12-02 11:28:59 +01:00
Makefile.osx [BUILD] update MacOS Makefile to build on newer versions 2008-02-10 17:00:13 +01:00
README [DOC] update the README file with new build options 2008-05-25 10:32:50 +02:00
ROADMAP [MEDIUM] implemented the 'monitor-uri' keyword. 2006-07-09 17:01:40 +02:00
SUBVERS [BUILD] centralize version and date into one file for each 2007-09-09 23:31:11 +02:00
TODO [MEDIUM] Implement "track [<backend>/]<server>" 2008-02-27 10:39:53 +01:00
VERDATE [RELEASE] Released version 1.3.15 2008-04-19 21:25:12 +02:00
VERSION [RELEASE] Released version 1.3.15 2008-04-19 21:25:12 +02:00

                           -------------------
                             H A - P r o x y
                             How to build it
                           -------------------
                              version 1.3.15
                              willy tarreau
                                2008/05/25


To build haproxy, you will need :
  - GNU make. Neither Solaris nor OpenBSD's make work with this makefile.
    However, specific Makefiles for BSD and OSX are provided.
  - GCC between 2.91 and 4.3. Others may work, but not tested.
  - GNU ld

Also, you might want to build with libpcre support, which will provide a very
efficient regex implementation and will also fix some badness on Solaris's one.

To build haproxy, you have to choose your target OS amongst the following ones
and assign it to the TARGET variable :

  - linux22     for Linux 2.2
  - linux24     for Linux 2.4 and above (default)
  - linux24e    for Linux 2.4 with support for a working epoll (> 0.21)
  - linux24eold for Linux 2.4 with support for a broken  epoll (<= 0.21)
  - linux26     for Linux 2.6 and above
  - solaris     for Solaris 8 or 10 (others untested)
  - freebsd     for FreeBSD 5 to 6.2 (others untested)
  - openbsd     for OpenBSD 3.1 to 3.7 (others untested)
  - generic     for any other OS.
  - custom      to manually adjust every setting

You may also choose your CPU to benefit from some optimizations. This is
particularly important on UltraSparc machines. For this, you can assign
one of the following choices to the CPU variable :

  - i686 for intel PentiumPro, Pentium 2 and above, AMD Athlon
  - i586 for intel Pentium, AMD K6, VIA C3.
  - ultrasparc : Sun UltraSparc I/II/III/IV processor
  - generic : any other processor or no specific optimization. (default)

Alternatively, you may just set the CPU_CFLAGS value to the optimal GCC options
for your platform.

If your system supports PCRE (Perl Compatible Regular Expressions), then you
really should build with libpcre which is between 2 and 10 times faster than
other libc implementations. Regex are used for header processing (deletion,
rewriting, allow, deny). The only inconvenient of libpcre is that it is not
yet widely spread, so if you build for other systems, you might get into
trouble if they don't have the dynamic library. In this situation, you should
statically link libpcre into haproxy so that it will not be necessary to
install it on target systems. Available build options for PCRE are :

  - USE_PCRE=1 to use libpcre, in whatever form is available on your system
    (shared or static)

  - USE_STATIC_PCRE=1 to use a static version of libpcre even if the dynamic
    one is available. This will enhance portability.

  - with no option, use your OS libc's standard regex implemntation (default).
    Warning! group references on Solaris seem broken. Use static-pcre whenever
    possible.

By default, the DEBUG variable is set to '-g' to enable debug symbols. It is
not wise to disable it on uncommon systems, because it's often the only way to
get a complete core when you need one. Otherwise, you can set DEBUG to '-s' to
strip the binary.

For example, I use this to build for Solaris 8 :

    $ make TARGET=solaris CPU=ultrasparc USE_STATIC_PCRE=1

And I build it this way on OpenBSD or FreeBSD :

    $ make -f Makefile.bsd REGEX=pcre DEBUG= COPTS.generic="-Os -fomit-frame-pointer -mgnu"

If you need to pass other defines, includes, libraries, etc... then please
check the Makefile to see which ones will be available in your case, and
use the USE_* variables in the GNU Makefile, or ADDINC, ADDLIB, and DEFINE
variables in the BSD makefiles.

-- end