Consistent hashing provides some interesting advantages over common
hashing. It avoids full redistribution in case of a server failure,
or when expanding the farm. This has a cost however, the hashing is
far from being perfect, as we associate a server to a request by
searching the server with the closest key in a tree. Since servers
appear multiple times based on their weights, it is recommended to
use weights larger than approximately 10-20 in order to smoothen
the distribution a bit.
In some cases, playing with weights will be the only solution to
make a server appear more often and increase chances of being picked,
so stats are very important with consistent hashing.
In order to indicate the type of hashing, use :
hash-type map-based (default, old one)
hash-type consistent (new one)
Consistent hashing can make sense in a cache farm, in order not
to redistribute everyone when a cache changes state. It could also
probably be used for long sessions such as terminal sessions, though
that has not be attempted yet.
More details on this method of hashing here :
http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
It was becoming painful to have all the LB algos in backend.c.
Let's move them to their own files. A few hashing functions still
need be broken in two parts, one for the contents and one for the
map position.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
Newer GIT versions do not support "git-cmd" anymore, so date and version
can be wrong during development builds. Use "git cmd" now. Also fix
git-tar to use "git archive" instead of "git-tar-tree".
By default, when building from a git tree, haproxy's release date is
set to the last commit's date. But it was the wrong date which was
used, the initial patch's date, which can cause time jumps in the
past when an old patch gets merged. What we want is the commit date,
which reflects the correct code history.
After considering various possibilities, we compiled haproxy under cygwin.
Attached is an updated full diff that also has the TARGET=cygwin documented.
The whole thing compiles and installs with this diff only.
In cygwin 1.7 (now in beta), there is apparently support for ipv6. Cygwin
1.5 (later versions, anyway) already includes some support in the form of a
define USE_IPV6. When defined, it declares the sockaddr_in6 struct and
possibly other things. The above definition AF_INET6=23 is taken from
their /usr/include/socket.h file (where it is #if 0'd out).
We are running into a socket limit. It appears that Cygwin (running on
Windows 2003 Server) will only allow us to set ulimit -n (maximum open
files) to 3200, which means we're a little short of 1600 connections.
The limit of 3200 is an internal Cygwin limit. Perhaps they can raise it in
the future. Using the nbproc option, I was able to bring up 10 servers. It
seems to me that they were able to handle over 2000 connections (even though
each had maxconn 1500 set, and the hard Cygwin fd limit).
When trying to build a 32-bit binary on a 64-bit platform, we generally
need to pass "-m32" to gcc, which is not convenient with current makefile.
Note that this option requires gcc >= 3.
In order to ease parameter passing, a new ARCH= makefile option has been
added. If it receives a target architecture, according "-m32"/"-m64" and
"-march=xxxx" will be passed to gcc. Only the generic makefile has been
changed to support this option right now as the need only appeared on Linux.
The spec file now makes use of this option so that rpmbuild can automatically
build with the proper architecture.
If both make parameters USE_PCRE and USE_STATIC_PCRE are set to 1
while building haproxy, pcre gets linked in dynamically.
Therefore we check if USE_STATIC_PCRE was explicitely enabled to
ommit the CFLAGS and LDFLAGS normally set if USE_PCRE is enabled.
With this change, all frontends, backends, and servers maintain a session
counter and a timer to compute a session rate over the last second. This
value will be very useful because it varies instantly and can be used to
check thresholds. This value is also reported in the stats in a new "rate"
column.
This will provide high performance data forwarding between sockets,
but it is broken on many kernels and will sometimes forward corrupted
data without some kernel patches. Consider this experimental for now.
A new data type has been added : pipes. Some pre-allocated empty pipes
are maintained in a pool for users such as splice which use them a lot
for very short times.
Pipes are allocated using get_pipe() and released using put_pipe().
Pipes which are released with pending data are immediately killed.
The struct pipe is small (16 to 20 bytes) and may even be further
reduced by unifying ->data and ->next.
It would be nice to have a dedicated cleanup task which would watch
for the pipes usage and destroy a few of them from time to time.
Tracking connection status changes was hard, and some code was
redundant. A new SI_ST_CER state was added to the stream interface
to indicate a past connection error, and an SI_FL_ERR flag was
added to report past I/O error. The stream_sock code does not set
the connection to SI_ST_CLO anymore in case of I/O error, it's
the upper layer which does it. This makes it possible to know
exactly when the file descriptors are allocated.
The new SI_ST_CER state permitted to split tcp_connection_status()
in two parts, one processing SI_ST_CON and the other one SI_ST_CER.
Synchronous connection errors now make use of this last state, hence
eliminating duplicate code.
Some ib<->ob copy paste errors were found and fixed, and all entities
setting SI_ST_CLO also shut the buffers down.
Some of these stream_interface specific functions and structures
have migrated to a new stream_interface.c file.
Some types of errors are still not detected by the buffers. For
instance, let's assume the following scenario in one single pass
of process_session: a connection sits in SI_ST_TAR state during
a retry. At TAR expiration, a new connection attempt is made, the
connection is obtained and srv->cur_sess is increased. Then the
buffer timeout is fires and everything is cleared, the new state
becomes SI_ST_CLO. The cleaning code checks that previous state
was either SI_ST_CON or SI_ST_EST to release the connection. But
that's wrong because last state is still SI_ST_TAR. So the
server's connection count does not get decreased.
This means that prev_state must not be used, and must be replaced
by some transition detection instead of level detection.
The following debugging line was useful to track state changes :
fprintf(stderr, "%s:%d: cs=%d ss=%d(%d) rqf=0x%08x rpf=0x%08x\n", __FUNCTION__, __LINE__,
s->si[0].state, s->si[1].state, s->si[1].err_type, s->req->flags, s-> rep->flags);
Reported by Cherife Li : just doing a "make install" fails because it
depends on "all" which is equivalent to "help" if no TARGET was specified.
Make it depend on "haproxy" instead.
haproxy relies on linking the binary using gcc, so there is no real need to
hardcode both (CC and LD). Setting 'LD = $(CC)' will make the build system
a bit more cross-compile friendly because only the right cross-compiler has
to be passed via make.
To be flexible while installing haproxy following variables have been
added to the Makefile:
- DESTDIR useful i.e. while installing in a sandbox (not set by default)
- PREFIX defines the default install prefix (default: /usr/local)
- SBINDIR defines the dir the haproxy binary gets installed
(default: $PREFIX/sbin)
Too often, people report performance issues on Linux 2.6 because they don't
use the available optimizations. We need to ensure that people are aware of
the available features, and for this, we must force them to choose a target
OS (or "generic"), but at least prevent them from blindly building for a
generic target.
Using some Linux kernel patches, it is possible to redirect non-local
traffic to local sockets when IP forwarding is enabled. In order to
enable this option, we introduce the "transparent" option keyword on
the "bind" command line. It will make the socket reachable by remote
sources even if the destination address does not belong to the machine.
The build process was getting annoying under some conditions,
especially on platforms which are used to set CFLAGS, as well
as those which set a lot of complex defines. The new Makefile
takes care of this situation by not mixing TARGET, CPU and user
values, and by making privileging the pre-setting of common
variables with the ability to override them.
Now CFLAGS and LDFLAGS are set by default and may be overridden
without the risk of breaking useful defines. Options are better
dealt with, and as a bonus, it was possible to merge the FreeBSD
and OpenBSD targets into the common GNU Makefile.
The report of build options by "haproxy -vv" has been slightly
adapted to the new mode. Options implied by architecture are not
reported, only user-specified options are. It is also possible to
add options which will not be reported in order not to mangle the
output when specifying dirty informations such as URLs...
The Makefile was copiously documented and it should be easier to
build for any target now. Backwards compatibility with older
build processes was kept, and warnings are emitted for deprecated
build options.
Sometimes it is useful to find out how a given binary version was
built. The build compiler and options are now provided for this,
and it's possible to get them with the -vv option.
Proxy listeners were very special and not very easy to manipulate.
A proto_tcp file has been created with all that is required to
manage TCPv4/TCPv6 as raw protocols, and provide generic listeners.
The code of start_proxies() and maintain_proxies() now looks less
like spaghetti. Also, event_accept will need a serious lifting in
order to use more of the information provided by the listener.
A new file, proto_uxst.c, implements support of PF_UNIX sockets
of type SOCK_STREAM. It relies on generic stream_sock_read/write
and uses its own accept primitive which also tries to be generic.
Right now it only implements an echo service in sight of a general
support for start dumping via unix socket. The echo code is more
of a proof of concept than useful code.
A new generic protocol mechanism has been added. It provides
an easy method to implement new protocols with different
listeners (eg: unix sockets).
The listeners are automatically started at the right moment
and enabled after the possible fork().
The version does not appear anymore in the Makefiles nor in
the include files. It was a nightmare to maintain. Now there
is a VERSION file which contains the major version, a VERDATE
file which contains the date for this version and a SUBVERS
file which may contain a sub-version.
A "make version" target has been added to all makefiles to
check the version. The GNU Makefile also has an update-version
target to update those files. This should never be used.
It is still possible to override those values by specifying
them in the equivalent make variables. By default, the GNU
makefile tries to detect a GIT repository and always uses the
version and date from the current repository. This can be
disabled by setting IGNOREGIT to a non-void value.
src/chtbl.c, src/hashpjw.c and src/list.c are distributed under
an obscure license. While Aleks and I believe that this license
is OK for haproxy, other people think it is not compatible with
the GPL.
Whether it is or not is not the problem. The fact that it rises
a doubt is sufficient for this problem to be addressed. Arnaud
Cornet rewrote the unclear parts with clean GPLv2 and LGPL code.
The hash algorithm has changed too and the code has been slightly
simplified in the process. A lot of care has been taken in order
to respect the original API as much as possible, including the
LGPL for the exportable parts.
The new code has not been thoroughly tested but it looks OK now.
It's now as easy as passing "DLMALLOC_SRC=<path_to_dlmalloc.c>" to
build with support for dlmalloc. The dlmalloc source is not provided
with haproxy in order to ensure that people will use either the most
recent, or the most suited version for their platform. The minimal
mmap size is specified in DLMALLOC_THRES, which defaults to 4096. It
should be increased on platforms with larger pages (eg: 8 kB on some
64 bit systems).
- acl: smarter integer comparison support in ACLs
- acl: specify the direction during fetches
- acl: provide the argument length for fetch functions
- acl: provide a reference to the expr to fetch()
- acl: implement matching on header values
- acl: support maching on 'path' component
- acl: permit to return any header when no name specified
- errorfile: use a local file to feed error messages
- negation in ACL conds was not cleared between terms
- fix segfault at exit when using captures
- improve memory freeing upon exit
- acl: support '-i' to ignore case when matching
- str2net() must not change the const char *
- provide default ACLs
- acl: distinguish between request and response headers
- added the 'use_backend' keyword for full content-switching
- acl: added the TRUE and FALSE ACLs.
- shut warnings 'is*' macros from ctype.h on solaris
- do not re-arm read timeout in SHUTR state
- optimize I/O by detecting system starvation
- the epoll FD must not be shared between processes
- limit the number of events returned by *poll*
- fixed ev_sepoll again by rewriting the state machine
- switched all timeouts to timevals instead of milliseconds
- improved memory management using mempools v2.
- several minor optimizations