Add generic authentication & authorization support.
Groups are implemented as bitmaps so the count is limited to
sizeof(int)*8 == 32.
Encrypted passwords are supported with libcrypt and crypt(3), so it is
possible to use any method supported by your system. For example modern
Linux/glibc instalations support MD5/SHA-256/SHA-512 and of course classic,
DES-based encryption.
Cyril Bonté found that when an error is detected in one config file, it
is also reported in all other ones, which is wrong. The fix obviously
consists in checking the return code from readcfgfile() and not the
accumulator.
Cameron Simpson reported an annoying case where haproxy simply reports
"Error(s) found in configuration file" when the file is not found or
not readable.
Fortunately the parsing function still returns -1 in case of open
error, so we're able to detect the issue from the caller and report
the corresponding errno message.
Some rarely information are stored in fdtab, making it larger for no
reason (source port ranges, remote address, ...). Such information
lie there because the checks can't find them anywhere else. The goal
will be to move these information to the stream interface once the
checks make use of it.
For now, we move them to an fdinfo array. This simple change might
have improved the cache hit ratio a little bit because a 0.5% of
performance increase has measured.
During troubleshooting, it's often useful to get the list of supported
pollers but until now it was required to have a working configuration
first. Since the pollers are known before main() is called, let's list
them with the build options.
This patch implements "description" (proxy and global) and "node" (global)
options, removes "node-name" and adds "show-node" & "show-desc" options
for "stats". It also changes the way the header lines (with proxy name) and
the statistics are displayed, so stats no longer look so clumsy with very
long names.
Instead of "node-name" it is possible to use show-node/show-desc with
an optional parameter that overrides a default node/description.
backend cust-0045
# report specific values for this customer
stats show-node Europe
stats show-desc Master node for Europe, Asia, Africa
Commit 27a674efb8 introduced the ability
to configure buffer sizes. Unfortunately, the pool was created before
the conf was read, so that is was always set to the default size.
In order to fix that, we delay the call to init_buffer(), which is not
a problem since nothing uses it during the initialization.
The new tune.bufsize and tune.maxrewrite global directives allow one to
change the buffer size and the maxrewrite size. Right now, setting bufsize
too low will block stats sockets which will not be able to write at all.
An error checking must be added to buffer_write_chunk() so that if it
cannot write its message to an empty buffer, it causes the caller to abort.
Creating a frontend for the global stats socket will help merge
unix sockets management with the other socket management. Since
frontends are huge structs, we only allocate it if required.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
Do not exit early at the first error found while checking configuration
validity. This particularly helps spotting multiple wrong tracked server
names at once.
Try not to immediately exit on non-fatal errors while parsing the
global section, so that the user has a chance to get most of the
errors at once, which is quite convenient especially during config
checks with the -c argument. Some other errors such as unresolved
server names also don't make the parser exit too early.
We now support up to 10 distinct configuration files. They are
all loaded in the order defined by -f <file1> -f <file2> ...
This can be useful in order to store global, private, public,
etc... configurations in distinct files.
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
When a new process fails to grab some ports, it sends a signal to
the old process in order to release them. Then it tries to bind
again. If it still fails (eg: one of the ports is bound to a
completely different process), it must send the continue signal
to the old process so that this one re-binds to the ports. This
is correctly done, but the newly bound ports are not released
first, which sometimes causes the old process to remain running
with no port bound. The fix simply consists in unbinding all
ports before sending the signal to the old process.
It is recommended to have -D in init scripts, but -D also implies
quiet mode, which hides warning messages, and both options are now
completely unrelated. Remove the implication to get warnings with
-D.
The small list of signals currently handled by haproxy were processed
as soon as they were received. This has caused trouble with calls to
pool_gc2() occuring in the middle of libc's memory management functions
seldom causing deadlocks preventing the old process from leaving.
Now these signals use the new async signal framework and are called
asynchronously, when there is no risk of recursion. This ensures more
reliable operation, especially for sensible processing such as memory
management.
The byte counters have long been 64-bit to avoid overflows. But with
several sites nowadays, we see session counters wrap around every 10-days
or so. So it was the moment to switch counters to 64-bit, including
error and warning counters which can theorically rise as fast as session
counters even if in practice there is very low risk.
The performance impact should not be noticeable since those counters are
only updated once per session. The stats output have been carefully checked
for proper types on both 32- and 64-bit platforms.
If we get very large data at once, it's almost certain that it's
worthless trying to read again, because we got everything we could
get.
Doing this has made all -EAGAIN disappear from splice reads. The
threshold has been put in the global tunable structures so that if
we one day want to make it accessible from user config, it will be
easy to do so.
On overloaded systems, it sometimes happens that hundreds or thousands
of incoming connections are queued in the system's backlog, and all get
dequeued at once. The problem is that when haproxy processes them and
does not apply any limit, this can take some time and the internal date
does not progress, resulting in wrong timer measures for all sessions.
The most common effect of this is that all of these sessions report a
large request time (around several hundreds of ms) which is in fact
caused by the time spent accepting other connections. This might happen
on shared systems when the machine swaps.
For this reason, we finally apply a reasonable limit even in mono-process
mode. Accepting 100 connections at once is fast enough for extreme cases
and will not cause that much of a trouble when the system is saturated.
The "bind-process" keyword lets the admin select which instances may
run on which process (in multi-process mode). It makes it easier to
more evenly distribute the load across multiple processes by avoiding
having too many listen to the same IP:ports.
Setting "nosplice" in the global section will disable the use of TCP
splicing (both tcpsplice and linux 2.6 splice). The same will be
achieved using the "-dS" parameter on the command line.
The global tuning options right now only concern the polling mechanisms,
and they are not in the global struct itself. It's not very practical to
add other options so let's move them to the global struct and remove
types/polling.h which was not used for anything else.
global.maxconn/4 seems to be a good hint for global.maxpipes when that
one must be guessed. If the limit is reached, it's still possible to
set it manually in the configuration.
Using pipe pools makes pipe management a lot easier. It also allows to
remove quite a bunch of #ifdefs in areas which depended on the presence
or not of support for kernel splicing.
The buffer now holds a pointer to a pipe structure which is always NULL
except if there are still data in the pipe. When it needs to use that
pipe, it dynamically allocates it from the pipe pool. When the data is
consumed, the pipe is immediately released.
That way, there is no need anymore to care about pipe closure upon
session termination, nor about pipe creation when trying to use
splice().
Another immediate advantage of this method is that it considerably
reduces the number of pipes needed to use splice(). Tests have shown
that even with 0.2 pipe per connection, almost all sessions can use
splice(), because the same pipe may be used by several consecutive
calls to splice().
If splicing is enabled in a backend, we need to guess how many
pipes will be needed. We used to rely on fullconn, but this leads
to non-working splicing when fullconn is not specified. So we now
fallback to global.maxconn.
This code provides support for linux 2.6 kernel splicing. This feature
appeared in kernel 2.6.25, but initial implementations were awkward and
buggy. A kernel >= 2.6.29-rc1 is recommended, as well as some optimization
patches.
Using pipes, this code is able to pass network data directly between
sockets. The pipes are a bit annoying to manage (fd creation, release,
...) but finally work quite well.
Preliminary tests show that on high bandwidths, there's a substantial
gain (approx +50%, only +20% with kernel workarounds for corruption
bugs). With 2000 concurrent connections, with Myricom NICs, haproxy
now more easily achieves 4.5 Gbps for 1 process and 6 Gbps for two
processes buffers. 8-9 Gbps are easily reached with smaller numbers
of connections.
We also try to splice out immediately after a splice in by making
profit from the new ability for a data producer to notify the
consumer that data are available. Doing this ensures that the
data are immediately transferred between sockets without latency,
and without having to re-poll. Performance on small packets has
considerably increased due to this method.
Earlier kernels return only one TCP segment at a time in non-blocking
splice-in mode, while newer return as many segments as may fit in the
pipe. To work around this limitation without hurting more recent kernels,
we try to collect as much data as possible, but we stop when we believe
we have read 16 segments, then we forward everything at once. It also
ensures that even upon shutdown or EAGAIN the data will be forwarded.
Some tricks were necessary because the splice() syscall does not make
a difference between missing data and a pipe full, it always returns
EAGAIN. The trick consists in stop polling in case of EAGAIN and a non
empty pipe.
The receiver waits for the buffer to be empty before using the pipe.
This is in order to avoid confusion between buffer data and pipe data.
The BF_EMPTY flag now covers the pipe too.
Right now the code is disabled by default. It needs to be built with
CONFIG_HAP_LINUX_SPLICE, and the instances intented to use splice()
must have "option splice-response" (or option splice-request) enabled.
It is probably desirable to keep a pool of pre-allocated pipes to
avoid having to create them for every session. This will be worked
on later.
Preliminary tests show very good results, even with the kernel
workaround causing one memcpy(). At 3000 connections, performance
has moved from 3.2 Gbps to 4.7 Gbps.
Three new options have been added when CONFIG_HAP_LINUX_SPLICE is
set :
- splice-request
- splice-response
- splice-auto
They are used to enable splicing per frontend/backend. They are also
supported in defaults sections. The "splice-auto" option is meant to
automatically turn splice on for buffers marked as fast streamers.
This should save quite a bunch of file descriptors.
It was required to add a new "options2" field to the proxy structure
because the original "options" is full.
When global.maxpipes is not set, it is automatically adjusted to
the max of the sums of all frontend's and backend's maxconns for
those which have at least one splice option enabled.
Josh Goebel reported that haproxy silently dies when it fails to
chroot. In fact, it does so when in daemon mode, because daemon
mode has been disabling output for ages.
Since the code has been reworked, this could have been changed
because there is no reason for this anymore, hence this patch.
(cherry picked from commit 304d6fb00f)
(cherry picked from commit 50b7f7f12c67322c793f50a6be009f0fd0eec1bb)
The INTBITS macro was found to be already defined on some platforms,
and to equal 32 (while INTBITS was 5 here). Due to pure luck, there
was no declaration conflict, but it's nonetheless a problem to fix.
Looking at the code showed that this macro was only used for left
shifts and nothing else anymore. So the replacement is obvious. The
new macro, BITS_PER_INT is more obviously correct.
It should be stated as a rule that a C file should never
include types/xxx.h when proto/xxx.h exists, as it gives
less exposure to declaration conflicts (one of which was
caught and fixed here) and it complicates the file headers
for nothing.
Only types/global.h, types/capture.h and types/polling.h
have been found to be valid includes from C files.
This is the first attempt at moving all internal parts from
using struct timeval to integer ticks. Those provides simpler
and faster code due to simplified operations, and this change
also saved about 64 bytes per session.
A new header file has been added : include/common/ticks.h.
It is possible that some functions should finally not be inlined
because they're used quite a lot (eg: tick_first, tick_add_ifset
and tick_is_expired). More measurements are required in order to
decide whether this is interesting or not.
Some function and variable names are still subject to change for
a better overall logics.
We now insert tasks in a certain sequence in the run queue.
The sorting key currently is the arrival order. It will now
be possible to apply a "nice" value to any task so that it
goes forwards or backwards in the run queue.
The calls to wake_expired_tasks() and maintain_proxies()
have been moved to the main run_poll_loop(), because they
had nothing to do in process_runnable_tasks().
The task_wakeup() function is not inlined anymore, as it was
only used at one place.
The qlist member of the task structure has been removed now.
The run_queue list has been replaced for an integer indicating
the number of tasks in the run queue.
The following config makes haproxy segfault on exit :
defaults
mode http
balance roundrobin
listen no-stats
bind :8001
listen stats
bind :8002
stats uri /stats
The simple fix is to ensure that p->uri_auth is not NULL
before dereferencing it.
The ultree code has been removed in favor of a simpler and
cleaner ebtree implementation. The eternity queue does not
need to exist anymore, and the pool_tree64 has been removed.
The ebtree node is stored in the task itself. The qlist list
header is still used by the run-queue, but will be able to
disappear once the run-queue uses ebtree too.
The first implementation of the monotonic clock did not verify
forward jumps. The consequence is that a fast changing time may
expire a lot of tasks. While it does seem minor, in fact it is
problematic because most machines which boot with a wrong date
are in the past and suddenly see their time jump by several
years in the future.
The solution is to check if we spent more apparent time in
a poller than allowed (with a margin applied). The margin
is currently set to 1000 ms. It should be large enough for
any poll() to complete.
Tests with randomly jumping clock show that the result is quite
accurate (error less than 1 second at every change of more than
one second).
If the system date is set backwards while haproxy is running,
some scheduled events are delayed by the amount of time the
clock went backwards. This is particularly problematic on
systems where the date is set at boot, because it seldom
happens that health-checks do not get sent for a few hours.
Before switching to use clock_gettime() on systems which
provide it, we can at least ensure that the clock is not
going backwards and maintain two clocks : the "date" which
represents what the user wants to see (mostly for logs),
and an internal date stored in "now", used for scheduled
events.
The dequeuing logic was completely wrong. First, a task was assigned
to all servers to process the queue, but this task was never scheduled
and was only woken up on session free. Second, there was no reservation
of server entries when a task was assigned a server. This means that
as long as the task was not connected to the server, its presence was
not accounted for. This was causing trouble when detecting whether or
not a server had reached maxconn. Third, during a redispatch, a session
could lose its place at the server's and get blocked because another
session at the same moment would have stolen the entry. Fourth, the
redispatch option did not work when maxqueue was reached for a server,
and it was not possible to do so without indefinitely hanging a session.
The root cause of all those problems was the lack of pre-reservation of
connections at the server's, and the lack of tracking of servers during
a redispatch. Everything relied on combinations of flags which could
appear similarly in quite distinct situations.
This patch is a major rework but there was no other solution, as the
internal logic was deeply flawed. The resulting code is cleaner, more
understandable, uses less magics and is overall more robust.
As an added bonus, "option redispatch" now works when maxqueue has
been reached on a server.
A new "redirect" keyword adds the ability to send an HTTP 301/302/303
redirection to either an absolute location or to a prefix followed by
the original URI. The redirection is conditionned by ACL rules, so it
becomes very easy to move parts of a site to another site using this.
This work was almost entirely done at Exceliance by Emeric Brun.
A test-case has been added in the tests/ directory.
- free oldpids
- call free(exp->preg), not only regfree(exp->preg): req_exp, rsp_exp
- build a list of unique uri_auths and eventually free it
- prune_acl_cond/free for switching_rules
- add a callback pointer to free ptr from acl_pattern (used for regexs) and execute it
==1180== malloc/free: in use at exit: 0 bytes in 0 blocks.
==1180== malloc/free: 5,599 allocs, 5,599 frees, 4,220,556 bytes allocated.
==1180== All heap blocks were freed -- no leaks are possible.
New functions implemented:
- deinit_pollers: called at the end of deinit())
- prune_acl: called via list_for_each_entry_safe
Add missing pool_destroy2 calls:
- p->hdr_idx_pool
- pool2_tree64
Implement all task stopping:
- health-check: needs new "struct task" in the struct server
- queue processing: queue_mgt
- appsess_refresh: appsession_refresh
before (idle system):
==6079== LEAK SUMMARY:
==6079== definitely lost: 1,112 bytes in 75 blocks.
==6079== indirectly lost: 53,356 bytes in 2,090 blocks.
==6079== possibly lost: 52 bytes in 1 blocks.
==6079== still reachable: 150,996 bytes in 504 blocks.
==6079== suppressed: 0 bytes in 0 blocks.
after (idle system):
==6945== LEAK SUMMARY:
==6945== definitely lost: 7,644 bytes in 137 blocks.
==6945== indirectly lost: 9,913 bytes in 587 blocks.
==6945== possibly lost: 0 bytes in 0 blocks.
==6945== still reachable: 0 bytes in 0 blocks.
==6945== suppressed: 0 bytes in 0 blocks.
before (running system for ~2m):
==9343== LEAK SUMMARY:
==9343== definitely lost: 1,112 bytes in 75 blocks.
==9343== indirectly lost: 54,199 bytes in 2,122 blocks.
==9343== possibly lost: 52 bytes in 1 blocks.
==9343== still reachable: 151,128 bytes in 509 blocks.
==9343== suppressed: 0 bytes in 0 blocks.
after (running system for ~2m):
==11616== LEAK SUMMARY:
==11616== definitely lost: 7,644 bytes in 137 blocks.
==11616== indirectly lost: 9,981 bytes in 591 blocks.
==11616== possibly lost: 0 bytes in 0 blocks.
==11616== still reachable: 4 bytes in 1 blocks.
==11616== suppressed: 0 bytes in 0 blocks.
Still not perfect but significant improvement.
Released version 1.3.15 with the following main changes :
- [BUILD] Added support for 'make install'
- [BUILD] Added 'install-man' make target for installing the man page
- [BUILD] Added 'install-bin' make target
- [BUILD] Added 'install-doc' make target
- [BUILD] Removed "/" after '$(DESTDIR)' in install targets
- [BUILD] Changed 'install' target to install the binaries first
- [BUILD] Replace hardcoded 'LD = gcc' with 'LD = $(CC)'
- [MEDIUM]: Inversion for options
- [MEDIUM]: Count retries and redispatches also for servers, fix redistribute_pending, extend logs, %d->%u cleanup
- [BUG]: Restore clearing t->logs.bytes
- [MEDIUM]: rework checks handling
- [DOC] Update a "contrib" file with a hint about a scheme used for formathing subjects
- [MEDIUM] Implement "track [<backend>/]<server>"
- [MINOR] Implement persistent id for proxies and servers
- [BUG] Don't increment server connections too much + fix retries
- [MEDIUM]: Prevent redispatcher from selecting the same server, version #3
- [MAJOR] proto_uxst rework -> SNMP support
- [BUG] appsession lookup in URL does not work
- [BUG] transparent proxy address was ignored in backend
- [BUG] hot reconfiguration failed because of a wrong error check
- [DOC] big update to the configuration manual
- [DOC] large update to the configuration manual
- [DOC] document more options
- [BUILD] major rework of the GNU Makefile
- [STATS] add support for "show info" on the unix socket
- [DOC] document options forwardfor to logasap
- [MINOR] add support for the "backlog" parameter
- [OPTIM] introduce global parameter "tune.maxaccept"
- [MEDIUM] introduce "timeout http-request" in frontends
- [MINOR] tarpit timeout is also allowed in backends
- [BUG] increment server connections for each connect()
- [MEDIUM] add a turn-around state of one second after a connection failure
- [BUG] fix typo in redispatched connection
- [DOC] document options nolinger to ssl-hello-chk
- [DOC] added documentation for "option tcplog" to "use_backend"
- [BUG] connect_server: server might not exist when sending error report
- [MEDIUM] support fully transparent proxy on Linux (USE_LINUX_TPROXY)
- [MEDIUM] add non-local bind to connect() on Linux
- [MINOR] add transparent proxy support for balabit's Tproxy v4
- [BUG] use backend's source and not server's source with tproxy
- [BUG] fix overlapping server flags
- [MEDIUM] fix server health checks source address selection
- [BUG] build failed on CONFIG_HAP_LINUX_TPROXY without CONFIG_HAP_CTTPROXY
- [DOC] added "server", "source" and "stats" keywords
- [DOC] all server parameters have been documented
- [DOC] document all req* and rsp* keywords.
- [DOC] added documentation about HTTP header manipulations
- [BUG] log response byte count, not request
- [BUILD] code did not build in full debug mode
- [BUG] fix truncated responses with sepoll
- [MINOR] use s->frt_addr as the server's address in transparent proxy
- [MINOR] fix configuration hint about timeouts
- [DOC] minor cleanup of the doc and notice to contributors
- [MINOR] report correct section type for unknown keywords.
- [BUILD] update MacOS Makefile to build on newer versions
- [DOC] fix erroneous "useallbackups" option in the doc
- [DOC] applied small fixes from early readers
- [MINOR] add configuration support for "redir" server keyword
- [MEDIUM] completely implement the server redirection method
- [TESTS] add a test case for the server redirection mechanism
- [DOC] add a configuration entry for "server ... redir <prefix>"
- [BUILD] backend.c and checks.c did not build without tproxy !
- Revert "[BUILD] backend.c and checks.c did not build without tproxy !"
- [BUILD] backend.c and checks.c did not build without tproxy !
- [OPTIM] used unsigned ints for HTTP state and message offsets
- [OPTIM] GCC4's builtin_expect() is suboptimal
- [BUG] failed conns were sometimes incremented in the frontend!
- [BUG] timeout.check was not pre-set to eternity
- [TESTS] add test-pollers.cfg to easily report pollers in use
- [BUG] do not apply timeout.connect in checks if unset
- [BUILD] ensure that makefile understands USE_DLMALLOC=1
- [MINOR] silent gcc for a wrong warning
- [CLEANUP] update .gitignore to ignore more temporary files
- [CLEANUP] report dlmalloc's source path only if explictly specified
- [BUG] str2sun could leak a small buffer in case of error during parsing
- [BUG] option allbackups was not working anymore in roundrobin mode
- [MAJOR] implementation of the "leastconn" load balancing algorithm
- [BUILD] ensure that users don't build without setting the target anymore.
- [DOC] document the leastconn LB algo
- [MEDIUM] fix stats socket limitation to 16 kB
- [DOC] fix unescaped space in httpchk example.
- [BUG] fix double-decrement of server connections
- [TESTS] add a test case for port mapping
- [TESTS] add a benchmark for integer hashing
- [TESTS] add new methods in ip-hash test file
- [MAJOR] implement parameter hashing for POST requests
This new parameter makes it possible to override the default
number of consecutive incoming connections which can be
accepted on a socket. By default it is not limited on single
process mode, and limited to 8 in multi-process mode.
The build process was getting annoying under some conditions,
especially on platforms which are used to set CFLAGS, as well
as those which set a lot of complex defines. The new Makefile
takes care of this situation by not mixing TARGET, CPU and user
values, and by making privileging the pre-setting of common
variables with the ability to override them.
Now CFLAGS and LDFLAGS are set by default and may be overridden
without the risk of breaking useful defines. Options are better
dealt with, and as a bonus, it was possible to merge the FreeBSD
and OpenBSD targets into the common GNU Makefile.
The report of build options by "haproxy -vv" has been slightly
adapted to the new mode. Options implied by architecture are not
reported, only user-specified options are. It is also possible to
add options which will not be reported in order not to mangle the
output when specifying dirty informations such as URLs...
The Makefile was copiously documented and it should be easier to
build for any target now. Backwards compatibility with older
build processes was kept, and warnings are emitted for deprecated
build options.
The error check in return of start_proxies checked for exact ERR_RETRYABLE
but did not consider the return as a bit field. The function returned both
ERR_RETRYABLE and ERR_ALERT, hence the problem.
Sometimes it is useful to find out how a given binary version was
built. The build compiler and options are now provided for this,
and it's possible to get them with the -vv option.
Under certain circumstances, it is very useful to be able to fail some
monitor requests. One specific case is when the number of servers in
the backend falls below a certain level. The new "monitor fail" construct
followed by either "if"/"unless" <condition> makes it possible to specify
ACL-based conditions which will make the monitor return 503 instead of
200. Any number of conditions can be passed. Another use may be to limit
the requests to local networks only.
It is very convenient for SNMP monitoring to have unique process ID,
proxy ID and server ID. Those have been added to the CSV outputs.
The numbers start at 1. 0 is reserved. For servers, 0 means that the
reported name is not a server name but half a proxy (FRONTEND/BACKEND).
A remaining hidden "-" in the CSV output has been eliminated too.
Some applications do not have a strict persistence requirement, yet
it is still desirable for performance considerations, due to local
caches on the servers. For some reasons, there are some applications
which cannot rely on cookies, and for which the last resort is to use
a parameter passed in the URL.
The new 'url_param' balance method is there to solve this issue. It
accepts a parameter name which is looked up from the URL and which
is then hashed to select a server. If the parameter is not found,
then the round robin algorithm is used in order to provide a normal
load balancing across the servers for the first requests. It would
have been possible to use a source IP hash instead, but since such
applications are generally buried behind multiple levels of
reverse-proxies, it would not provide a good balance.
The doc has been updated, and two regression testing configurations
have been added.
localtime() was called with pointers to tv_sec, which is time_t on
some platforms and long on others. A problem was encountered on
Sparc64 under OpenBSD where tv_sec is long (64 bits) and time_t is
32 bits. Since this architecture is big-endian, it exhibited the
bug because localtime() always worked with the high part of the
value which is always zero. This problem was identified and debugged
by Thierry Fournier.
The correct solution is to pass the date by value and not by pointer,
through an intermediate function. The use of localtime_r() instead of
localtime() also made it possible to get rid of the first call to
localtime() since it does not need to allocate memory anymore.
Removed old unused MODE_LOG and MODE_STATS, and replaced the "stats"
keyword in the global section. The new "stats" keyword in the global
section is used to create a UNIX socket on which the statistics will
be accessed. The client must issue a "show stat\n" command in order
to get a CSV-formated output similar to the output on the HTTP socket
in CSV mode.
A new generic protocol mechanism has been added. It provides
an easy method to implement new protocols with different
listeners (eg: unix sockets).
The listeners are automatically started at the right moment
and enabled after the possible fork().
This must come from a copy-paste typo: in the unlikely event that
fork() would fail, the parent process would only exit(1) if there
were old pids. That's non-sense.
When one server appears at the same position in multiple backends, it
receives all the checks from all the backends exactly at the same time
because the health-checks are only spread within a backend but not
globally.
Attached patch implements per-server start delay in a different way.
Checks are now spread globally - not locally to one backend. It also makes
them start faster - IMHO there is no need to add a 'server->inter' when
calculating first execution. Calculation were moved from cfgparse.c to
checks.c. There is a new function start_checks() and now it is not called
when haproxy is started in MODE_CHECK.
With this patch it is also possible to set a global 'spread-checks'
parameter. It takes a percentage value (1..50, probably something near
5..10 is a good idea) so haproxy adds or removes that many percent to the
original interval after each check. My test shows that with 18 backends,
54 servers total and 10000ms/5% it takes about 45m to mix them completely.
I decided to use rand/srand pseudo-random number generator. I am aware it
is not recommend for a good randomness but a) we do not need a good random
generator here b) it is probably the most portable one.
The following patch will give the ability to tweak socket linger mode.
You can use this option with "option nolinger" inside fronted or backend
configuration declaration.
This will help in environments where lots of FIN_WAIT sockets are
encountered.
This patch fixes a nasty bug raported by both glibc and valgrind, which
leads into a problem that haproxy does not exit when a new instace
starts ap (-sf/-st).
==9299== Invalid free() / delete / delete[]
==9299== at 0x401D095: free (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9299== by 0x804A377: deinit (haproxy.c:721)
==9299== by 0x804A883: main (haproxy.c:1014)
==9299== Address 0x41859E0 is 0 bytes inside a block of size 21 free'd
==9299== at 0x401D095: free (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9299== by 0x804A84B: main (haproxy.c:985)
==9299==
6542 open("/dev/tty", O_RDWR|O_NONBLOCK|O_NOCTTY) = -1 ENOENT (No such file
or directory)
6542 writev(2, [{"*** glibc detected *** ", 23}, {"corrupted double-linked
list", 28}, {": 0x", 4}, {"6ff91878", 8}, {" ***\n", 5}], 5) = -1 EBADF (Bad
file descriptor)
I found this bug trying to find why, after one week with many restarts, I
finished with >100 haproxy process running. ;)
The deinit() function is specialized in memory area freeing.
There were a ton of information that were not released at the
exit time, which made valgrind complain. Now, most of the entries
are freed. However, it seems like regfree() does not completely
free a regex (12 bytes lost per regex).
By default, epoll/kqueue used to return as many events as possible.
This could sometimes cause huge latencies (latencies of up to 400 ms
have been observed with many thousands of fds at once). Limiting the
number of events returned also reduces the latency by avoiding too
many blind processing. The value is set to 200 by default and can be
changed in the global section using the tune.maxpollevents parameter.
When we're interrupted by another instance, it is very likely
that the other one will need some memory. Now we know how to
free what is not used, so let's do it.
Also only free non-null pointers. Previously, pool_destroy()
did implicitly check for this case which was incidentely
needed.
Also during this process, a bug was found in appsession_refresh().
It would not automatically requeue the task in the queue, so the
old sessions would not vanish.
The timeout functions were difficult to manipulate because they were
rounding results to the millisecond. Thus, it was difficult to compare
and to check what expired and what did not. Also, the comparison
functions were heavy with multiplies and divides by 1000. Now, all
timeouts are stored in timevals, reducing the number of operations
for updates and leading to cleaner and more efficient code.
The rbtree-based wait queue consumes a lot of CPU. Use the ul2tree
instead. Lots of cleanups and code reorganizations made it possible
to reduce the task struct and simplify the code a bit.
The principle behind speculative I/O is to speculatively try to
perform I/O before registering the events in the system. This
considerably reduces the number of calls to epoll_ctl() and
sometimes even epoll_wait(), and manages to increase overall
performance by about 10%.
The new poller has been called "sepoll". It is used by default
on Linux when it works. A corresponding option "nosepoll" and
the command line argument "-ds" allow to disable it.
Gcc provides __attribute__((constructor)) which is very convenient
to execute functions at startup right before main(). All the pollers
have been converted to have their register() function declared like
this, so that it is not necessary anymore to call them from a centralized
file.
Some pollers such as kqueue lose their FD across fork(), meaning that
the registered file descriptors are lost too. Now when the proxies are
started by start_proxies(), the file descriptors are not registered yet,
leaving enough time for the fork() to take place and to get a new pollfd.
It will be the first call to maintain_proxies that will register them.
select, poll and epoll now have their dedicated functions and have
been split into distinct files. Several FD manipulation primitives
have been provided with each poller.
The rest of the code needs to be cleaned to remove traces of
StaticReadEvent/StaticWriteEvent. A trick involving a macro has
temporarily been used right now. Some work needs to be done to
factorize tests and sets everywhere.
logs are handled better with dedicated functions. The HTTP implementation
moved to proto_http.c. It has been cleaned up a bit. Now a frontend with
option httplog and no log will not call the function anymore.
Previously, use of the "usesrc" keyword could silently fail if
either the module was not loaded, or the user did not have enough
permissions. Now the errors are better diagnosed and more appropriate
advices are given.
- stats now support the HEAD method too
- extracted http request from the session
- huge rework of the HTTP parser which is now a 28-state FSM.
- linux-style likely/unlikely macros for optimization hints
- do not create a server socket when there's no server
This patch from Sin Yu makes use of an rbtree for the wait queue,
which will solve the slowdown problem encountered when timeouts
are heterogenous in the configuration. The next step will be to
turn maintain_proxies() into a per-proxy task so that we won't
have to scan them all after each poll() loop.
The tcp-splicing code has been merged, and a doc has been written.
A configuration example has been derived from the previous content
switching sample.
It is now possible to define an errorloc in the backend as well as
in the frontend. The backend's will be used first, and if undefined,
then the frontend's will be used instead. If none is used, then the
original error messages will be used.
The nbconn attribute in the proxies was not relevant anymore because
a frontend A may use backend B and both of them must account for their
respective connections. For this reason, there now are two separate
counters for frontend and backend connections.
The stats page has been updated to reflect the backend, but a separate
line entry for the frontend with error counts would be good.
Note that as of now, beconn may be higher than maxconn, because maxconn
applies to the frontend, while beconn may be increased due to sessions
passed from another frontend.
The files are now stored under :
- include/haproxy for the generic includes
- include/types.h for the structures needed within prototypes
- include/proto.h for function prototypes and inline functions
- src/*.c for the C files
Most include files are now covered by LGPL. A last move still needs
to be done to put inline functions under GPL and not LGPL.
Version has been set to 1.3.0 in the code but some control still
needs to be done before releasing.