When trying to build with various versions of openssl, forcing the
path is still cumbersome. Let's add SSL_INC and SSL_LIB similar to
PCRE_INC and PCRE_LIB to allow forcing the path to the SSL includes
and libs.
JIT was introduced in 8.20 but it's said everywhere that it was
significantly improved in 8.32. Let's not tempt users of older
versions then. BTW the patch was developped on 8.32.
haproxy -vv shows build informations about USE flags and lib versions.
This patch introduces informations about PCRE and the new JIT feature.
It also makes USE_PCRE_JIT=1 appear in the haproxy -vv "OPTIONS".
This is useful since with the introduction of JIT we will see libpcre
related issues.
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
TCP Fast Open is supported in server mode since Linux 3.7, but current
libc's don't define TCP_FASTOPEN=23. Introduce the new USE flag USE_TFO
to define it manually in compat.h. Also note this in the TFO related
documentation.
The "osx" target may now be passed in the TARGET variable. It supports
the same features as FreeBSD and allows its users to use the GNU makefile
instead of the platform-specific makefile which lacks some features.
This allows to build haproxy for unknown targets and still have poll().
If for any reason a target does not support it, just passing USE_POLL=""
disables it.
Currently when cross-compiling, it's generally necessary to force
PCREDIR which the Makefile automatically appends /include and /lib to.
Unfortunately on most 64-bit linux distros, the lib path is instead
/lib64, which is really annoying to fix in the makefile.
So now we're computing PCRE_INC and PCRE_LIB from PCREDIR and using
these ones instead. If one wants to force paths individually, it is
possible to set them instead of setting PCREDIR. The old behaviour
of not passing anything to the compiler when PCREDIR is forced to blank
is conserved.
Currently, to reload haproxy configuration, you have to use "-sf".
There is a problem with this way of doing things. First of all, in the systemd world,
reload commands should be "oneshot" ones, which means they should not be the new main
process but rather a tool which makes a call to it and then exits. With the current approach,
the reload command is the new main command and moreover, it makes the previous one exit.
Systemd only tracks the main program, seeing it ending, it assumes it either finished or failed,
and kills everything remaining as a grabage collector. We then end up with no haproxy running
at all.
This patch adds wrapper around haproxy, no changes at all have been made into it,
so it's not intrusive and doesn't change anything for other hosts. What this wrapper does
is basically launching haproxy as a child, listen to the SIGUSR2 (not to conflict with
haproxy itself) signal, and spawing a new haproxy with "-sf" as a child to relay the
first one.
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
Versions of splice between 2.6.25 and 2.6.27.12 were bogus and would return EAGAIN
on incoming shutdowns. On these versions, we have to call recv() after such a return
in order to find whether splice is OK or not. Since 2.6.27.13 we don't need to do
this anymore, saving one useless recv() call after each splice() returning EAGAIN,
and we can avoid this logic by defining ASSUME_SPLICE_WORKS.
Building with linux2628 automatically enables splice and the flag above since the
kernel is safe. People enabling splice for custom kernels will be able to disable
this logic by hand too.
The new "cpu-map" directive allows one to assign the CPU sets that
a process is allowed to bind to. This is useful in combination with
the "nbproc" and "bind-process" directives.
The support is implicit on Linux 2.6.28 and above.
Now that all pollers make use of speculative I/O, there is no point
having two epoll implementations, so replace epoll with the sepoll code
and remove sepoll which has just become the standard epoll method.
This commit introduces HTTP compression using the zlib library.
http_response_forward_body has been modified to call the compression
functions.
This feature includes 3 algorithms: identity, gzip and deflate:
* identity: this is mostly for debugging, and it was useful for
developping the compression feature. With Content-Length in input, it
is making each chunk with the data available in the current buffer.
With chunks in input, it is rechunking, the output chunks will be
bigger or smaller depending of the size of the input chunk and the
size of the buffer. Identity does not apply any change on data.
* gzip: same as identity, but applying a gzip compression. The data
are deflated using the Z_NO_FLUSH flag in zlib. When there is no more
data in the input buffer, it flushes the data in the output buffer
(Z_SYNC_FLUSH). At the end of data, when it receives the last chunk in
input, or when there is no more data to read, it writes the end of
data with Z_FINISH and the ending chunk.
* deflate: same as gzip, but with deflate algorithm and zlib format.
Note that this algorithm has ambiguous support on many browsers and
no support at all from recent ones. It is strongly recommended not
to use it for anything else than experimentation.
You can't choose the compression ratio at the moment, it will be set to
Z_BEST_SPEED (1), as tests have shown very little benefit in terms of
compression ration when going above for HTML contents, at the cost of
a massive CPU impact.
Compression will be activated depending of the Accept-Encoding request
header. With identity, it does not take care of that header.
To build HAProxy with zlib support, use USE_ZLIB=1 in the make
parameters.
This work was initially started by David Du Colombier at Exceliance.
On Linux, accept4() does the same as accept() except that it allows
the caller to specify some flags to set on the resulting socket. We
use this to set the O_NONBLOCK flag and thus to save one fcntl()
call in each connection. The effect is a small performance gain of
around 1%.
The option is automatically enabled when target linux2628 is set, or
when the USE_ACCEPT4 Makefile variable is set. If the libc is too old
to provide the equivalent function, this is automatically detected and
our own function is used instead. In any case it is possible to force
the use of our implementation with USE_MY_ACCEPT4.
These ones are used to set the default ciphers suite on "bind" lines and
"server" lines respectively, instead of using OpenSSL's defaults. These
are probably mainly useful for distro packagers.
It removes dependencies with futex or mutex but ssl performances decrease
using nbproc > 1 because switching process force session renegotiation.
This can be useful on small systems which never intend to run in multi-process
mode.
This SSL session cache was developped at Exceliance and is the same that
was proposed for stunnel and stud. It makes use of a shared memory area
between the processes so that sessions can be handled by any process. It
is only useful when haproxy runs with nbproc > 1, but it does not hurt
performance at all with nbproc = 1. The aim is to totally replace OpenSSL's
internal cache.
The cache is optimized for Linux >= 2.6 and specifically for x86 platforms.
On Linux/x86, it makes use of futexes for inter-process locking, with some
x86 assembly for the locked instructions. On other architectures, GCC
builtins are used instead, which are available starting from gcc 4.1.
On other operating systems, the locks fall back to pthread mutexes so
libpthread is automatically linked. It is not recommended since pthreads
are much slower than futexes. The lib is only linked if SSL is enabled.
When this flag is set, the SSL data layer is enabled.
At the moment, only the GNU makefile was touched, the other ones
make the option handling a bit tricky.
The "raw_sock" prefix will be more convenient for naming functions as
it will be prefixed with the data layer and suffixed with the data
direction. So let's rename the files now to avoid any further confusion.
The #include directive was also removed from a number of files which do
not need it anymore.
This feature relies on GCC's ability to call helpers at function entry/exit
points. We define these helpers to quickly dump the minimum info into a trace
file that can be converted to a human readable format using a script in the
contrib/trace directory. This has only been implemented in the GNU makefile
for now on as it is unsure whether it's supported on all OSes.
The feature is enabled by building with "TRACE=1". The performance impact is
huge, so this feature should only be used when debugging. To limit the loss
of performance, fprintf() has been disabled and the output is hand-crafted
and emitted using fwrite(), resulting in doubling the performance. Using the
TSC instead of gettimeofday() also doubles the performance. Around 1200 conns/s
may be achieved on a Pentium-M 1.7 GHz which leads to around 50 MB/s of traces.
The entry and exits of all functions will be dumped into a file designated
by the HAPROXY_TRACE environment variable, or by default "trace.out". If the
trace file name is empty or "/dev/null", then traces are disabled. If
opening the trace file fails, then stderr is used. If HAPROXY_TRACE_FAST is
used, then the time is taken from the global <now> variable. Last, if
HAPROXY_TRACE_TSC is used, then the machine's TSC is used instead of the
real time (almost twice as fast).
The output format is :
<sec.usec> <level> <caller_ptr> <dir> <callee_ptr>
or :
<tsc> <level> <caller_ptr> <dir> <callee_ptr>
where <dir> is '>' when entering a function and '<' when leaving.
The awk script in contrib/trace provides a nicer indented output :
6f74989e6f8 ->->-> run_poll_loop > signal_process_queue [src/haproxy.c:1097:0x804bd69] > [include/proto/signal.h:32:0x8049cd0]
6f74989eb00 run_poll_loop < signal_process_queue [src/haproxy.c:1097:0x804bd69] < [include/proto/signal.h:32:0x8049cd0]
6f74989ef44 ->->-> run_poll_loop > wake_expired_tasks [src/haproxy.c:1100:0x804bd72] > [src/task.c:123:0x8055060]
6f74989f3a6 ->->->-> wake_expired_tasks > eb32_lookup_ge [src/task.c:128:0x8055091] > [ebtree/eb32tree.c:138:0x80a8c70]
6f74989f7e9 wake_expired_tasks < eb32_lookup_ge [src/task.c:128:0x8055091] < [ebtree/eb32tree.c:138:0x80a8c70]
6f74989fc0d ->->->-> wake_expired_tasks > eb32_first [src/task.c:134:0x80550d5] > [ebtree/eb32tree.h:55:0x8054ad0]
6f7498a003d ->->->->-> eb32_first > eb_first [ebtree/eb32tree.h:56:0x8054af1] > [ebtree/ebtree.h:520:0x8054a10]
6f7498a0436 ->->->->->-> eb_first > eb_walk_down [ebtree/ebtree.h:521:0x8054a33] > [ebtree/ebtree.h:442:0x80549a0]
6f7498a0843 ->->->->->->-> eb_walk_down > eb_gettag [ebtree/ebtree.h:445:0x80549d6] > [ebtree/ebtree.h:418:0x80548e0]
6f7498a0c2b eb_walk_down < eb_gettag [ebtree/ebtree.h:445:0x80549d6] < [ebtree/ebtree.h:418:0x80548e0]
6f7498a1042 ->->->->->->-> eb_walk_down > eb_untag [ebtree/ebtree.h:447:0x80549e2] > [ebtree/ebtree.h:412:0x80548a0]
6f7498a1498 eb_walk_down < eb_untag [ebtree/ebtree.h:447:0x80549e2] < [ebtree/ebtree.h:412:0x80548a0]
6f7498a18c6 ->->->->->->-> eb_walk_down > eb_root_to_node [ebtree/ebtree.h:448:0x80549e7] > [ebtree/ebtree.h:432:0x8054960]
6f7498a1cd4 eb_walk_down < eb_root_to_node [ebtree/ebtree.h:448:0x80549e7] < [ebtree/ebtree.h:432:0x8054960]
6f7498a20c4 eb_first < eb_walk_down [ebtree/ebtree.h:521:0x8054a33] < [ebtree/ebtree.h:442:0x80549a0]
6f7498a24b4 eb32_first < eb_first [ebtree/eb32tree.h:56:0x8054af1] < [ebtree/ebtree.h:520:0x8054a10]
6f7498a289c wake_expired_tasks < eb32_first [src/task.c:134:0x80550d5] < [ebtree/eb32tree.h:55:0x8054ad0]
6f7498a2c8c run_poll_loop < wake_expired_tasks [src/haproxy.c:1100:0x804bd72] < [src/task.c:123:0x8055060]
6f7498a3095 ->->-> run_poll_loop > process_runnable_tasks [src/haproxy.c:1103:0x804bd7a] > [src/task.c:190:0x8055150]
A nice improvement would possibly consist in trying to get the function's
arguments in the stack and to dump a few more infor for some well-known
functions (eg: the session's status for process_session).
We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
make_arg_list() builds an array of typed arguments with their values,
that the caller describes how to parse. This will be used to support
multiple arguments for ACLs and patterns, which is currently problematic
and prevents ACLs and patterns from being merged. Up to 7 arguments types
may be enumerated in a single 32-bit word, including their number of
mandatory parts.
At the moment, these files are not used yet, they're only built. Note that
the 4-bit encoding for the type has left only one unused type!
The principle behind this load balancing algorithm was first imagined
and modeled by Steen Larsen then iteratively refined through several
work sessions until it would totally address its original goal.
The purpose of this algorithm is to always use the smallest number of
servers so that extra servers can be powered off during non-intensive
hours. Additional tools may be used to do that work, possibly by
locally monitoring the servers' activity.
The first server with available connection slots receives the connection.
The servers are choosen from the lowest numeric identifier to the highest
(see server parameter "id"), which defaults to the server's position in
the farm. Once a server reaches its maxconn value, the next server is used.
It does not make sense to use this algorithm without setting maxconn. Note
that it can however make sense to use minconn so that servers are not used
at full load before starting new servers, and so that introduction of new
servers requires a progressively increasing load (the number of servers
would more or less follow the square root of the load until maxconn is
reached). This algorithm ignores the server weight, and is more beneficial
to long sessions such as RDP or IMAP than HTTP, though it can be useful
there too.
Some older libc don't define splice() and and don't define _syscall*()
either, which causes build errors if splicing is enabled.
To solve this, we now split the syscall redefinition into two layers :
- one file per syscall (epoll, splice)
- one common file to declare the _syscall*() macros
The code is cleaner because files using the syscalls just have to include
their respective file. It's not adviced to merge multiple syscall families
into a same file if all are not intended to be used simultaneously, because
defining unused static functions causes warnings to be emitted during build.
As a result, the new USE_MY_SPLICE parameter was added in order to be able
to define the splice() syscall separately.
Gcc 4.4 enables strict aliasing by default, resuling in complaints
when casting struct sockaddr_storage to sockaddr_in. Not only doing
this does not provide any noticeable performance improvement, it also
presents a risk of strange bugs even when the compiler does not emit
a warning, so let's disable this optimization !
Hank A. Paulson suggested to add CPU=native to optimize the code for
the build machine. This makes sense in a lot of situations. Since it
is often possible to have both 32 and 64 bits supported on recent
systems, the ARCH=32 and ARCH=64 build options were also added.
Some distros' libc are built for CPUs earlier than i686 and as such do
not offer support for Linux kernel's faster vsyscalls. This code adds
a new build option USE_VSYSCALLS to bypass libc for most commonly used
system calls. A net gain of about 10% can be observed with this change
alone.
It only works when /proc/sys/abi/vsyscall32 equals exactly 2. When it's
set to 1, the VDSO is randomized and cannot be used.
The 'client.c' file now only contained frontend-specific functions,
so it has naturally be renamed 'frontend.c'. Same for client.h. This
has also been an opportunity to remove some cross references from
files that should not have depended on it.
In the end, this file should contain a protocol-agnostic accept()
code, which would initialize a session, task, etc... based on an
accept() from a lower layer. Right now there are still references
to TCP.
Holger Just and Ross West reported build issues on FreeBSD and
Solaris that were initially caused by the definition of
_XOPEN_SOURCE at the top of auth.c, which was required on Linux
to avoid a build warning.
Krzysztof Oledzki found that using _GNU_SOURCE instead also worked
on Linux and did not cause any issue on several versions of FreeBSD.
Solaris still reported a warning this time, which was fixed by
including <crypt.h>, which itself is not present on FreeBSD nor on
all Linux toolchains.
So by adding a new build option (NEED_CRYPT_H), we can get Solaris
to get crypt() working and stop complaining at the same time, without
impacting other platforms.
This fix was tested at least on several linux toolchains (at least
uclibc, glibc 2.2.5, 2.3.6 and 2.7), on FreeBSD 4 to 8, Solaris 8
(which needs crypt.h), and AIX 5.3 (without crypt.h).
Every time it builds without a warning.
Add generic authentication & authorization support.
Groups are implemented as bitmaps so the count is limited to
sizeof(int)*8 == 32.
Encrypted passwords are supported with libcrypt and crypt(3), so it is
possible to use any method supported by your system. For example modern
Linux/glibc instalations support MD5/SHA-256/SHA-512 and of course classic,
DES-based encryption.
It's a pain to enable regparm because ebtree is built in its corner
and does not depend on the rest of the config. This causes no problem
except that if the regparm settings are not exactly similar, then we
can get inconsistent function interfaces and crashes.
One solution realized in this patch consists in externalizing all
compiler settings and changing CONFIG_XXX_REGPARM into CONFIG_REGPARM
so that we ensure that any sub-component uses the same setting. Since
ebtree used a value here and not a boolean, haproxy's config has been
set to use a number too. Both haproxy's core and ebtree currently use
the same copy of the compiler.h file. That way we don't have any issue
anymore when one setting changes somewhere.
All files referencing the previous ebtree code were changed to point
to the new one in the ebtree directory. A makefile variable (EBTREE_DIR)
is also available to use files from another directory.
The ability to build the libebtree library temporarily remains disabled
because it can have an impact on some existing toolchains and does not
appear worth it in the medium term if we add support for multi-criteria
stickiness for instance.
Consistent hashing provides some interesting advantages over common
hashing. It avoids full redistribution in case of a server failure,
or when expanding the farm. This has a cost however, the hashing is
far from being perfect, as we associate a server to a request by
searching the server with the closest key in a tree. Since servers
appear multiple times based on their weights, it is recommended to
use weights larger than approximately 10-20 in order to smoothen
the distribution a bit.
In some cases, playing with weights will be the only solution to
make a server appear more often and increase chances of being picked,
so stats are very important with consistent hashing.
In order to indicate the type of hashing, use :
hash-type map-based (default, old one)
hash-type consistent (new one)
Consistent hashing can make sense in a cache farm, in order not
to redistribute everyone when a cache changes state. It could also
probably be used for long sessions such as terminal sessions, though
that has not be attempted yet.
More details on this method of hashing here :
http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
It was becoming painful to have all the LB algos in backend.c.
Let's move them to their own files. A few hashing functions still
need be broken in two parts, one for the contents and one for the
map position.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
Newer GIT versions do not support "git-cmd" anymore, so date and version
can be wrong during development builds. Use "git cmd" now. Also fix
git-tar to use "git archive" instead of "git-tar-tree".
By default, when building from a git tree, haproxy's release date is
set to the last commit's date. But it was the wrong date which was
used, the initial patch's date, which can cause time jumps in the
past when an old patch gets merged. What we want is the commit date,
which reflects the correct code history.
After considering various possibilities, we compiled haproxy under cygwin.
Attached is an updated full diff that also has the TARGET=cygwin documented.
The whole thing compiles and installs with this diff only.
In cygwin 1.7 (now in beta), there is apparently support for ipv6. Cygwin
1.5 (later versions, anyway) already includes some support in the form of a
define USE_IPV6. When defined, it declares the sockaddr_in6 struct and
possibly other things. The above definition AF_INET6=23 is taken from
their /usr/include/socket.h file (where it is #if 0'd out).
We are running into a socket limit. It appears that Cygwin (running on
Windows 2003 Server) will only allow us to set ulimit -n (maximum open
files) to 3200, which means we're a little short of 1600 connections.
The limit of 3200 is an internal Cygwin limit. Perhaps they can raise it in
the future. Using the nbproc option, I was able to bring up 10 servers. It
seems to me that they were able to handle over 2000 connections (even though
each had maxconn 1500 set, and the hard Cygwin fd limit).
When trying to build a 32-bit binary on a 64-bit platform, we generally
need to pass "-m32" to gcc, which is not convenient with current makefile.
Note that this option requires gcc >= 3.
In order to ease parameter passing, a new ARCH= makefile option has been
added. If it receives a target architecture, according "-m32"/"-m64" and
"-march=xxxx" will be passed to gcc. Only the generic makefile has been
changed to support this option right now as the need only appeared on Linux.
The spec file now makes use of this option so that rpmbuild can automatically
build with the proper architecture.
If both make parameters USE_PCRE and USE_STATIC_PCRE are set to 1
while building haproxy, pcre gets linked in dynamically.
Therefore we check if USE_STATIC_PCRE was explicitely enabled to
ommit the CFLAGS and LDFLAGS normally set if USE_PCRE is enabled.
With this change, all frontends, backends, and servers maintain a session
counter and a timer to compute a session rate over the last second. This
value will be very useful because it varies instantly and can be used to
check thresholds. This value is also reported in the stats in a new "rate"
column.
This will provide high performance data forwarding between sockets,
but it is broken on many kernels and will sometimes forward corrupted
data without some kernel patches. Consider this experimental for now.
A new data type has been added : pipes. Some pre-allocated empty pipes
are maintained in a pool for users such as splice which use them a lot
for very short times.
Pipes are allocated using get_pipe() and released using put_pipe().
Pipes which are released with pending data are immediately killed.
The struct pipe is small (16 to 20 bytes) and may even be further
reduced by unifying ->data and ->next.
It would be nice to have a dedicated cleanup task which would watch
for the pipes usage and destroy a few of them from time to time.
Tracking connection status changes was hard, and some code was
redundant. A new SI_ST_CER state was added to the stream interface
to indicate a past connection error, and an SI_FL_ERR flag was
added to report past I/O error. The stream_sock code does not set
the connection to SI_ST_CLO anymore in case of I/O error, it's
the upper layer which does it. This makes it possible to know
exactly when the file descriptors are allocated.
The new SI_ST_CER state permitted to split tcp_connection_status()
in two parts, one processing SI_ST_CON and the other one SI_ST_CER.
Synchronous connection errors now make use of this last state, hence
eliminating duplicate code.
Some ib<->ob copy paste errors were found and fixed, and all entities
setting SI_ST_CLO also shut the buffers down.
Some of these stream_interface specific functions and structures
have migrated to a new stream_interface.c file.
Some types of errors are still not detected by the buffers. For
instance, let's assume the following scenario in one single pass
of process_session: a connection sits in SI_ST_TAR state during
a retry. At TAR expiration, a new connection attempt is made, the
connection is obtained and srv->cur_sess is increased. Then the
buffer timeout is fires and everything is cleared, the new state
becomes SI_ST_CLO. The cleaning code checks that previous state
was either SI_ST_CON or SI_ST_EST to release the connection. But
that's wrong because last state is still SI_ST_TAR. So the
server's connection count does not get decreased.
This means that prev_state must not be used, and must be replaced
by some transition detection instead of level detection.
The following debugging line was useful to track state changes :
fprintf(stderr, "%s:%d: cs=%d ss=%d(%d) rqf=0x%08x rpf=0x%08x\n", __FUNCTION__, __LINE__,
s->si[0].state, s->si[1].state, s->si[1].err_type, s->req->flags, s-> rep->flags);
Reported by Cherife Li : just doing a "make install" fails because it
depends on "all" which is equivalent to "help" if no TARGET was specified.
Make it depend on "haproxy" instead.
haproxy relies on linking the binary using gcc, so there is no real need to
hardcode both (CC and LD). Setting 'LD = $(CC)' will make the build system
a bit more cross-compile friendly because only the right cross-compiler has
to be passed via make.
To be flexible while installing haproxy following variables have been
added to the Makefile:
- DESTDIR useful i.e. while installing in a sandbox (not set by default)
- PREFIX defines the default install prefix (default: /usr/local)
- SBINDIR defines the dir the haproxy binary gets installed
(default: $PREFIX/sbin)
Too often, people report performance issues on Linux 2.6 because they don't
use the available optimizations. We need to ensure that people are aware of
the available features, and for this, we must force them to choose a target
OS (or "generic"), but at least prevent them from blindly building for a
generic target.
Using some Linux kernel patches, it is possible to redirect non-local
traffic to local sockets when IP forwarding is enabled. In order to
enable this option, we introduce the "transparent" option keyword on
the "bind" command line. It will make the socket reachable by remote
sources even if the destination address does not belong to the machine.
The build process was getting annoying under some conditions,
especially on platforms which are used to set CFLAGS, as well
as those which set a lot of complex defines. The new Makefile
takes care of this situation by not mixing TARGET, CPU and user
values, and by making privileging the pre-setting of common
variables with the ability to override them.
Now CFLAGS and LDFLAGS are set by default and may be overridden
without the risk of breaking useful defines. Options are better
dealt with, and as a bonus, it was possible to merge the FreeBSD
and OpenBSD targets into the common GNU Makefile.
The report of build options by "haproxy -vv" has been slightly
adapted to the new mode. Options implied by architecture are not
reported, only user-specified options are. It is also possible to
add options which will not be reported in order not to mangle the
output when specifying dirty informations such as URLs...
The Makefile was copiously documented and it should be easier to
build for any target now. Backwards compatibility with older
build processes was kept, and warnings are emitted for deprecated
build options.
Sometimes it is useful to find out how a given binary version was
built. The build compiler and options are now provided for this,
and it's possible to get them with the -vv option.
Proxy listeners were very special and not very easy to manipulate.
A proto_tcp file has been created with all that is required to
manage TCPv4/TCPv6 as raw protocols, and provide generic listeners.
The code of start_proxies() and maintain_proxies() now looks less
like spaghetti. Also, event_accept will need a serious lifting in
order to use more of the information provided by the listener.
A new file, proto_uxst.c, implements support of PF_UNIX sockets
of type SOCK_STREAM. It relies on generic stream_sock_read/write
and uses its own accept primitive which also tries to be generic.
Right now it only implements an echo service in sight of a general
support for start dumping via unix socket. The echo code is more
of a proof of concept than useful code.
A new generic protocol mechanism has been added. It provides
an easy method to implement new protocols with different
listeners (eg: unix sockets).
The listeners are automatically started at the right moment
and enabled after the possible fork().
The version does not appear anymore in the Makefiles nor in
the include files. It was a nightmare to maintain. Now there
is a VERSION file which contains the major version, a VERDATE
file which contains the date for this version and a SUBVERS
file which may contain a sub-version.
A "make version" target has been added to all makefiles to
check the version. The GNU Makefile also has an update-version
target to update those files. This should never be used.
It is still possible to override those values by specifying
them in the equivalent make variables. By default, the GNU
makefile tries to detect a GIT repository and always uses the
version and date from the current repository. This can be
disabled by setting IGNOREGIT to a non-void value.
src/chtbl.c, src/hashpjw.c and src/list.c are distributed under
an obscure license. While Aleks and I believe that this license
is OK for haproxy, other people think it is not compatible with
the GPL.
Whether it is or not is not the problem. The fact that it rises
a doubt is sufficient for this problem to be addressed. Arnaud
Cornet rewrote the unclear parts with clean GPLv2 and LGPL code.
The hash algorithm has changed too and the code has been slightly
simplified in the process. A lot of care has been taken in order
to respect the original API as much as possible, including the
LGPL for the exportable parts.
The new code has not been thoroughly tested but it looks OK now.
It's now as easy as passing "DLMALLOC_SRC=<path_to_dlmalloc.c>" to
build with support for dlmalloc. The dlmalloc source is not provided
with haproxy in order to ensure that people will use either the most
recent, or the most suited version for their platform. The minimal
mmap size is specified in DLMALLOC_THRES, which defaults to 4096. It
should be increased on platforms with larger pages (eg: 8 kB on some
64 bit systems).
- acl: smarter integer comparison support in ACLs
- acl: specify the direction during fetches
- acl: provide the argument length for fetch functions
- acl: provide a reference to the expr to fetch()
- acl: implement matching on header values
- acl: support maching on 'path' component
- acl: permit to return any header when no name specified
- errorfile: use a local file to feed error messages
- negation in ACL conds was not cleared between terms
- fix segfault at exit when using captures
- improve memory freeing upon exit
- acl: support '-i' to ignore case when matching
- str2net() must not change the const char *
- provide default ACLs
- acl: distinguish between request and response headers
- added the 'use_backend' keyword for full content-switching
- acl: added the TRUE and FALSE ACLs.
- shut warnings 'is*' macros from ctype.h on solaris
- do not re-arm read timeout in SHUTR state
- optimize I/O by detecting system starvation
- the epoll FD must not be shared between processes
- limit the number of events returned by *poll*
- fixed ev_sepoll again by rewriting the state machine
- switched all timeouts to timevals instead of milliseconds
- improved memory management using mempools v2.
- several minor optimizations
- several fixes in ev_sepoll
- fixed some expiration dates on some tasks
- fixed a bug in connection establishment detection due to speculative I/O
- fixed rare bug occuring on TCP with early close (reported by Andy Smith)
- implemented URI hashing algorithm (Guillaume Dallaire)
- implemented SMTP health checks (Peter van Dijk)
- replaced the rbtree with ul2tree from old scheduler project
- new framework for generic ACL support
- added the 'acl' and 'block' keywords to the config language
- added several ACL criteria and matches (IP, port, URI, ...)
- cleaned up and better modularization for some time functions
- fixed list macros
- fixed useless memory allocation in str2net()
- store the original destination address in the session
This framework offers all other subsystems the ability to register
ACL matching criteria. Some generic matching functions are already
provided. Others will come soon and the framework shall evolve.
- modularized the polling mechanisms and use function pointers instead
of macros at many places
- implemented support for FreeBSD's kqueue() polling mechanism
- fixed a warning on OpenBSD : MIN/MAX redefined
- change socket registration order at startup to accomodate kqueue.
- several makefile cleanups to support old shells
- fix build with limits.h once for all
- ev_epoll: do not rely on fd_sets anymore, use changes stacks instead.
- fdtab now holds the results of polling
- implemented support for speculative I/O processing with epoll()
- remove useless calls to shutdown(SHUT_RD), resulting in small speed boost
- auto-registering of pollers at load time
The principle behind speculative I/O is to speculatively try to
perform I/O before registering the events in the system. This
considerably reduces the number of calls to epoll_ctl() and
sometimes even epoll_wait(), and manages to increase overall
performance by about 10%.
The new poller has been called "sepoll". It is used by default
on Linux when it works. A corresponding option "nosepoll" and
the command line argument "-ds" allow to disable it.
select, poll and epoll now have their dedicated functions and have
been split into distinct files. Several FD manipulation primitives
have been provided with each poller.
The rest of the code needs to be cleaned to remove traces of
StaticReadEvent/StaticWriteEvent. A trick involving a macro has
temporarily been used right now. Some work needs to be done to
factorize tests and sets everywhere.
- rewriting either the status line or request line could crash the
process due to a pointer which ought to be reset before parsing.
- rewriting the status line in the response did not work, it caused
a 502 Bad Gateway due to an erroneous state during parsing
- fix reqadd when no option httpclose is used.
- removed now unused fiprm and beprm from proxies
- split logs into two versions : TCP and HTTP
- added some docs about http headers storage and acls
- added a VIM script for syntax color highlighting (Bruno Michel)
- fixed several bugs which might have caused a crash with bad configs
- several optimizations in header processing
- many progresses towards transaction-based processing
- option forwardfor may be used in frontends
- completed HTTP response processing
- some code refactoring between request and response processing
- new HTTP header manipulation functions
- optimizations on the recv() patch to reduce CPU usage under very
high data rates.
- more user-friendly help about the 'usesrc' keyword (CTTPROXY)
- username/groupname support from Marcus Rueckert
- added the "except" keyword to the "forwardfor" option (Bryan German)
- support for health-checks on other addresses (Fabrice Dulaunoy)
- makefile for MacOS 10.4 / Darwin (Dan Zinngrabe)
- do not insert "Connection: close" in HTTP/1.0 messages
Previously, use of the "usesrc" keyword could silently fail if
either the module was not loaded, or the user did not have enough
permissions. Now the errors are better diagnosed and more appropriate
advices are given.
- fix critical bug introduced with 1.3.6 : an empty request header
may lead to a crash due to missing pointer assignment
- hdr_idx might be left uninitialized in debug mode
- fixed build on FreeBSD due to missing fd_set declaration
- stats now support the HEAD method too
- extracted http request from the session
- huge rework of the HTTP parser which is now a 28-state FSM.
- linux-style likely/unlikely macros for optimization hints
- do not create a server socket when there's no server
- added complete support and doc for TCP Splicing
- replaced the wait-queue linked list with an rbtree.
- stats: swap color sets for active and backup servers
- try to guess server check port when unset
- a few bugfixes and cleanups
This patch from Sin Yu makes use of an rbtree for the wait queue,
which will solve the slowdown problem encountered when timeouts
are heterogenous in the configuration. The next step will be to
turn maintain_proxies() into a per-proxy task so that we won't
have to scan them all after each poll() loop.
The tcp-splicing code has been merged, and a doc has been written.
A configuration example has been derived from the previous content
switching sample.
Released 1.3.4 with the following major changes :
- support for cttproxy on the server side to present the client
address to the server.
- added support for SO_REUSEPORT on Linux (needs kernel patch)
- new RFC2616-compliant HTTP request parser with header indexing
- split proxies in frontends, rulesets and backends
- implemented the 'req[i]setbe' to select a backend depending
on the contents
- added the 'default_backend' keyword to select a default BE.
- new stats page featuring FEs and BEs + bytes in both dirs
- improved log format to indicate the backend and the time in ms.
- lots of cleanups
If git is found during the build process, then it will be used
to set the version, the commit number and the commit date. This
way, it will not be needed anymore to update the code to change
the version. The version is the last tag, and the commit number
is the number of commits since the last tag.
This structure will consume 4 bytes per header to keep track of
headers within a request or a response without having to parse
the whole request for each regex. As it's not possible to allocate
only 4 bytes, we define a max number of HTTP headers. We set it
to (BUFSIZE+79)/80 so that 8kB buffers can contain 100 headers
(like Apache), resulting in 400 bytes dedicated to indexation,
or about 400/(2*8kB) ~= 2.4% of the memory usage.
Using the cttproxy kernel patch, it's possible to bind to any source
address. It is highly recommended to use the 03-natdel patch with the
other ones.
A new keyword appears as a complement to the "source" keyword : "usesrc".
The source address is mandatory and must be valid on the interface which
will see the packets. The "usesrc" option supports "client" (for full
client_ip:client_port spoofing), "client_ip" (for client_ip spoofing)
and any 'IP[:port]' combination to pretend to be another machine.
Right now, the source binding is missing from server health-checks if
set to another address. It must be implemented (think restricted firewalls).
The doc is still missing too.
Released 1.3.3 with the following changes :
- fix broken redispatch option in case the connection has already
been marked "in progress" (ie: nearly always).
- support regparm on x86 to speed up some often called functions
- removed a few useless calls to gettimeofday() in log functions.
- lots of 'const char*' cleanups
- turn every FD_* into functions which are faster on recent CPUs
- builds again on OpenBSD and Solaris
- started the changes towards I/O completion callbacks. stream_sock* have
replaced event_*.
- added the new "reqtarpit" and "reqitarpit" protection features
Released 1.3.1 with the following changes from 1.2.15 :
- now, haproxy warns about missing timeout during startup to try to
eliminate all those buggy configurations.
- added "Content-Type: text/html" in responses wherever appropriate, as
suggested by Cameron Simpson.
- implemented "option ssl-hello-chk" to use SSLv3 CLIENT HELLO messages to
test server's health
- implemented "monitor-uri" so that haproxy can reply to a specific URI with
an "HTTP/1.0 200 OK" response. This is useful to validate multiple proxies
at once.
The files are now stored under :
- include/haproxy for the generic includes
- include/types.h for the structures needed within prototypes
- include/proto.h for function prototypes and inline functions
- src/*.c for the C files
Most include files are now covered by LGPL. A last move still needs
to be done to put inline functions under GPL and not LGPL.
Version has been set to 1.3.0 in the code but some control still
needs to be done before releasing.
Released 1.2.14 with the following changes :
- new HTML status report with the 'stats' keyword.
- added the 'abortonclose' option to better resist traffic surges
- implemented dynamic traffic regulation with the 'minconn' option
- show request time on denied requests
- definitely fixed hot reconf on OpenBSD by the use of SO_REUSEPORT
- now a proxy instance is allowed to run without servers, which is
useful to dedicate one instance to stats
- added lots of error counters
- a missing parenthesis preventd matching of cacheable cookies
- a missing parenthesis in poll_loop() might have caused missed events.
Right now it only validates the user/passwd according to a specified list,
and lets the user pass through the proxy if the authentication is OK, and
it refuses any invalid access with a 401 Unauthorized response.
- an uninitialized field in the struct session could cause a crash when
the session was freed. This has been encountered on Solaris only.
- Solaris and OpenBSD no not support shutdown() on listening socket. Let's
be nice to them by performing a soft stop if pause fails.
Summary of changes :
- 'maxconn' server parameter to do per-server session limitation
- queueing to support non-blocking session limitation
- fixed removal of cookies for cookie-less servers such as backup servers
- two separate wait queues for expirable and non-expirable tasks provide
better performance with lots of sessions.
- some code cleanups and performance improvements
- made state dumps a bit more verbose
- fixed missing checks for NULL srv in dispatch mode
- load balancing on backup servers was not possible in source hash mode.
- two session flags shared the same bit, but fortunately they were not
compatible.
* second batch of socklen_t changes.
* clean-ups from Cameron Simpson.
* because tv_remain() does not know about eternity, using no timeout can
make select() spin around a null time-out. Bug reported by Cameron Simpson.
* client read timeout was not properly set to eternity initialized after an
accept() if it was not set in the config. It remained undetected so long
because eternity is 0 and newly allocated pages are zeroed by the system.
* do not call get_original_dst() when not in transparent mode.
* implemented a workaround for a bug in certain epoll() implementations on
linux-2.4 kernels (epoll-lt <= 0.21).
* implemented TCP keepalive with new options : tcpka, clitcpka, srvtcpka.
* changed the runtime argument to disable epoll() to '-de'
* changed the runtime argument to disable poll() to '-dp'
* added global options 'nopoll' and 'noepoll' to do the same at the
configuration level.
* added a 'linux24e' target to the Makefile for Linux 2.4 systems patched to
support epoll().
* changed default FD_SETSIZE to 65536 on Solaris (default=1024)
* conditionned signals redirection to #ifdef DEBUG_MEMORY
* made epoll() support a compile-time option : ENABLE_EPOLL
* provided a very little libc replacement for a possibly missing epoll()
implementation which can be enabled by -DUSE_MY_EPOLL
* implemented the poll() poller, which can be enabled with -DENABLE_POLL.
The equivalent runtime argument becomes '-P'. A few tests show that it
performs like select() with many fds, but slightly slower (certainly
because of the higher amount of memory involved).
* separated the 3 polling methods and the tasks scheduler into 4 distinct
functions which makes the code a lot more modular.
* moved some event tables to private static declarations inside the poller
functions.
* the poller functions can now initialize themselves, run, and cleanup.
* changed the runtime argument to enable epoll() to '-E'.
* removed buggy epoll_ctl() code in the client_retnclose() function. This
function was never meant to remove anything.
* fixed a typo which caused glibc to yell about a double free on exit.
* removed error checking after epoll_ctl(DEL) because we can never know if
the fd is still active or already closed.
* added a few entries in the makefile
* merged Alexander Lazic's and Klaus Wagner's work on application
cookie-based persistence. Since this is the first merge, this version is
not intended for general use and reports are more than welcome. Some
documentation is really needed though.
* add an architecture guide to the documentation
* released without any changes
* increased default BUFSIZE to 16 kB to accept max headers of 8 kB which is
compatible with Apache. This limit can be configured in the makefile now.
Thanks to Eric Fehr for the checks.
* added a per-server "source" option which now makes it possible to bind to
a different source for each (potentially identical) server.
* changed cookie-based server selection slightly to allow several servers to
share a same cookie, thus making it possible to associate backup servers to
live servers and ease soft-stop for maintenance periods. (Alexander Lazic)
* added the cookie 'prefix' mode which makes it possible to use persistence
with thin clients which support only one cookie. The server name is prefixed
before the application cookie, and restore back.
* fixed the order of servers within an instance to match documentation. Now
the servers are *really* used in the order of their declaration. This is
particularly important when multiple backup servers are in use.
* add the "logasap" option which produces a log without waiting for the data
to be transferred from the server to the client.
* add the "httpclose" option which removes any "connection:" header and adds
"Connection: close" in both direction.
* fixed a stupid bug introduced in 1.1.22 which caused second and subsequent
'default' sections to keep previous parameters, and not initialize logs
correctly.
* fixed a second stupid bug introduced in 1.1.22 which caused configurations
relying on 'dispatch' mode to segfault at the first connection.
* 'option httpchk' now supports method, HTTP version and a few headers.
* now, 'option httpchk', 'cookie' and 'capture' can be specified in
'defaults' section
* a fresh new english documentation
* large Makefile cleanup for increased portability
* new build script 'build.cfg' for Formilux-0.1.8
* new startup script 'init.haproxy.flx0' for Formilux-0.1.8
* 'listen' now supports optionnal address:port-range lists
* 'bind' introduced to add new listen addresses
* fixed a bug which caused a session to be kept established on a server till
it timed out if the client closed during the DATA phase.
* the port part of each server address can now be empty to make the proxy
connect to the server on the same port it was connected to, be an absolute
unsigned number to reflect a single port (as in older versions), or an
explicitly signed number (+N/-N) to indicate that this offset must be
applied to the port the proxy was connected to, when connecting to the
server.
* the 'port' server option allows the user to specify a different
health-check port than the service one. It is mandatory when only relative
ports have been specified and check is required. By default, the checks are
sent to the service port.
* new 'defaults' section which is rather similar to 'listen' except that all
values are only used as default values for future 'listen' sections, until
a new 'defaults' resets them. At the moment, server options, regexes,
cookie names and captures cannot be set in the 'defaults' section.
* Makefile now optimizes for Ultrasparc by default on Solaris/Sparc
* large documentation updates and fixes
* new 'tests' directory with some debug files
* changed the debug output format so that it now includes the session unique
ID followed by the instance name at the beginning of each line.
* in debug mode, accept now shows the client's IP and port.
* added one 3 small debugging scripts to search and pretty print debug output
* changed the default health check request to "OPTIONS /" instead of
"OPTIONS *" since not all servers implement the later one.
* "option httpchk" now accepts an optional parameter allowing the user to
specify and URI other than '/' during health-checks.
* made Makefile more robust to pcre-config errors
* added 3 new pretty-print scripts : debug2ansi, debug2html and debugfind
* upgraded Formilux package to haproxy-1.1.21-flx.1.pkg
* removed the now obsolete haproxy2html.sh
* haproxy was NOT RFC compliant because it was case-sensitive on HTTP
"Cookie:" and "Set-Cookie:" headers. This caused JVM 1.4 to fail on
cookie persistence because it uses "cookie:". Two memcmp() have been
replaced with strncasecmp().
* added the haproxy2html.sh script
* removed the now useless NOTES file
* made pcre-config quiet in the makefile.
* Haproxy can be compiled with PCRE regex instead of libc regex, by setting
REGEX=pcre on the make command line.
* HTTP health-checks now use "OPTIONS *" instead of "OPTIONS /".
* when explicit source address binding is required, it is now also used for
health-checks.
* added 'reqpass' and 'reqipass' to allow certain headers but not the request
itself.
* factored several strings to reduce binary size by about 2 kB.
* replaced setreuid() and setregid() with more standard setuid() and setgid().
* added 4 status flags to the log line indicating who ended the connection
first, the sessions state, the validity of the cookie, and action taken on
the set-cookie header.
* rearranged the changelog and removed it from haproxy.c
* large documentation updates
* fixed stats monitoring, and optimized some tv_* for most common cases.
* replaced temporary 'newhdr' with 'trash' to reduce stack size
* made HTTP errors more HTML-fiendly.
* renamed strlcpy() to strlcpy2() because of a slightly difference between
their behaviour (return value), to avoid confusion.
* restricted HTTP messages to HTTP proxies only
* added a 502 message when the connection has been refused by the server,
to prevent clients from believing this is a zero-byte HTTP 0.9 reply.
* changed 'Cache-control:' from 'no-cache="set-cookie"' to 'private' when
inserting a cookie, because some caches (apache) don't understand it.
* fixed processing of server headers when client is in SHUTR state
* automatically close fd's 0,1 and 2 when going daemon ; setpgrp() after
setpgid()
* updated the Makefile and the Formilux build script
* don't use snprintf()'s return value as an end of message since it may
be larger. This caused bus errors and segfaults in internal libc's
getenv() during localtime() in send_log().
* removed dead insecure send_syslog() function and all references to it.
* fixed warnings on Solaris due to buggy implementation of isXXXX().
* option "dontlognull"
* fixed "double space" bug in config parser
* fixed an uninitialized server field in case of dispatch
with no existing server which could cause a segfault during
logging.
* the pid logged was always the father's, which was wrong for daemons.
* fixed wrong level "LOG_INFO" for message "proxy started".
* http logging is now complete :
- ip:port, date, proxy, server
- req_time, conn_time, hdr_time, tot_time
- status, size, request
* source address binding
* added OpenBSD, Linux-2.2 and Linux-2.4 targets to the Makefile
* added a Formilux init script
* fixed a few timeout bugs
* rearranged the task scheduler subsystem to improve performance,
add new tasks, and make it easier to later port to librt ;
* allow multiple accept() for one select() wake up ;
* implemented internal load balancing with basic health-check ;
* cookie insertion and header add/replace/delete, with better strings
support.
* reworked buffer handling to fix a few rewrite bugs, and
improve overall performance.
* implement the "purge" option to delete server cookies in direct mode.
* fixed some error cases where the maxfd was not decreased.
* now supports transparent proxying, at least on linux 2.4.
* soft stop works again (fixed select timeout computation).
* it seems that TCP proxies sometimes cannot timeout.
* added a "quiet" mode.
* enforce file descriptor limitation on socket() and accept().