This patch has 2 goals :
1. I wanted to test the appsession feature with a small PHP code,
using PHPSESSID. The problem is that when PHP gets an unknown session
id, it creates a new one with this ID. So, when sending an unknown
session to PHP, persistance is broken : haproxy won't see any new
cookie in the response and will never attach this session to a
specific server.
This also happens when you restart haproxy : the internal hash becomes
empty and all sessions loose their persistance (load balancing the
requests on all backend servers, creating a new session on each one).
For a user, it's like the service is unusable.
The patch modifies the code to make haproxy also learn the persistance
from the client : if no session is sent from the server, then the
session id found in the client part (using the URI or the client cookie)
is used to associated the server that gave the response.
As it's probably not a feature usable in all cases, I added an option
to enable it (by default it's disabled). The syntax of appsession becomes :
appsession <cookie> len <length> timeout <holdtime> [request-learn]
This helps haproxy repair the persistance (with the risk of losing its
session at the next request, as the user will probably not be load
balanced to the same server the first time).
2. This patch also tries to reduce the memory usage.
Here is a little example to explain the current behaviour :
- Take a Tomcat server where /session.jsp is valid.
- Send a request using a cookie with an unknown value AND a path
parameter with another unknown value :
curl -b "JSESSIONID=12345678901234567890123456789012" http://<haproxy>/session.jsp;jsessionid=00000000000000000000000000000001
(I know, it's unexpected to have a request like that on a live service)
Here, haproxy finds the URI session ID and stores it in its internal
hash (with no server associated). But it also finds the cookie session
ID and stores it again.
- As a result, session.jsp sends a new session ID also stored in the
internal hash, with a server associated.
=> For 1 request, haproxy has stored 3 entries, with only 1 which will be usable
The patch modifies the behaviour to store only 1 entry (maximum).
This can ensure that data is readily available on a socket when
we accept it, but a bug in the kernel ignores the timeout so the
socket can remain pending as long as the client does not talk.
Use with care.
Released version 1.4-dev4 with the following main changes :
- [DOC] add missing rate_lim and rate_max
- [MAJOR] struct chunk rework
- [MEDIUM] Health check reporting code rework + health logging, v3
- [BUG] check if rise/fall has an argument and it is > 0
- [MINOR] health checks logging unification
- [MINOR] add "description", "node" and show-node"/"show-desc", remove "node-name", v2
- [MINOR] Allow dots in show-node & add "white-space: nowrap" in th.pxname.
- [DOC] Add information about http://haproxy.1wt.eu/contrib.html
- [MINOR] Introduce include/types/counters.h
- [CLEANUP] Move counters to dedicated structures
- [MINOR] Add "clear counters" to clear statistics counters
- [MEDIUM] Collect & provide separate statistics for sockets, v2
- [BUG] Fix NULL pointer dereference in stats_check_uri_auth(), v2
- [MINOR] acl: don't report valid acls as potential mistakes
- [MINOR] Add cut_crlf(), ltrim(), rtrim() and alltrim()
- [MINOR] Add chunk_htmlencode and chunk_asciiencode
- [MINOR] Capture & display more data from health checks, v2
- [BUG] task.c: don't assing last_timer to node-less entries
- [BUG] http stats: large outputs sometimes got some parts chopped off
- [MINOR] backend: export some functions to recount servers
- [MINOR] backend: uninline some LB functions
- [MINOR] include time.h from freq_ctr.h as is uses "now".
- [CLEANUP] backend: move LB algos to individual files
- [MINOR] lb_map: reorder code in order to ease integration of new hash functions
- [CLEANUP] proxy: move last lb-specific bits to their respective files
- [MINOR] backend: separate declarations of LB algos from their lookup method
- [MINOR] backend: reorganize the LB algorithm selection
- [MEDIUM] backend: introduce the "static-rr" LB algorithm
- [MINOR] report list of supported pollers with -vv
- [DOC] log-health-checks is an option, not a directive
- [MEDIUM] new option "independant-streams" to stop updating read timeout on writes
- [BUG] stats: don't call buffer_shutw(), but ->shutw() instead
- [MINOR] stats: strip CR and LF from the input command line
- [BUG] don't refresh timeouts late after detected activity
- [MINOR] stats_dump_errors_to_buffer: use buffer_feed_chunk()
- [MINOR] stats_dump_sess_to_buffer: use buffer_feed_chunk()
- [MINOR] stats: make stats_dump_raw_to_buffer() use buffer_feed_chunk
- [MEDIUM] stats: don't use s->ana_state anymore
- [MINOR] remove now obsolete ana_state from the session struct
- [MEDIUM] stats: make HTTP stats use an I/O handler
- [MEDIUM] stream_int: adjust WAIT_ROOM handling
- [BUG] config: look for ID conflicts in all sockets, not only last ones.
- [MINOR] config: reference file and line with any listener/proxy/server declaration
- [MINOR] config: report places of duplicate names or IDs
- [MINOR] config: add pointer to file name in block/redirect/use_backend/monitor rules
- [MINOR] tools: add a new get_next_id() function
- [MEDIUM] config: automatically find unused IDs for proxies, servers and listeners
- [OPTIM] counters: move some max numbers to the counters struct
- [BUG] counters: fix segfault on missing counters for a listener
- [MEDIUM] backend: implement consistent hashing variation
- [MINOR] acl: add fe_conn, be_conn, queue, avg_queue
- [MINOR] stats: use 'clear counters all' to clear all values
- [MEDIUM] add access restrictions to the stats socket
- [MINOR] buffers: add buffer_feed2() and make buffer_feed() measure string length
- [MINOR] proxy: provide function to retrieve backend/server pointers
- [MINOR] add the "initial weight" to the server struct.
- [MEDIUM] stats: add the "get weight" command to report a server's weight
- [MEDIUM] stats: add the "set weight" command
- [BUILD] add a 'make tags' target
- [MINOR] stats: add support for numeric IDs in set weight/get weight
- [MINOR] stats: use a dedicated state to output static data
- [OPTIM] stats: check free space before trying to print
Krzysztof reported that using names only for get weight/set weight
was not enough because it's still possible to have multiple servers
with the same name (and my test config is one of those). He suggested
to be able to designate them by their unique numeric IDs by prefixing
the ID with a dash.
That way we can have :
set weight #120/#2
as well as
get weight static/srv1 10
The stats socket can now run at 3 different levels :
- user
- operator (default one)
- admin
These levels are used to restrict access to some information
and commands. Only the admin can clear all stats. A user cannot
clear anything nor access sensible data such as sessions or
errors.
The most common use of "clear counters" should be to only clear
max values without affecting cumulated values, for instance,
after an incident. So we change "clear counters" to only clear
max values, and add "clear counters all" to clear all counters.
These ACLs are used to check the number of active connections on the
frontend, backend or in a backend's queue. The avg_queue returns the
average number of queued connections per server, and for this, divides
the total number of queued connections by the number of alive servers.
The dst_conn ACL has been slightly changed to more reflect its name and
original usage, which is to return the number of connections on the
destination address/port (the socket) and not the whole frontend.
Consistent hashing provides some interesting advantages over common
hashing. It avoids full redistribution in case of a server failure,
or when expanding the farm. This has a cost however, the hashing is
far from being perfect, as we associate a server to a request by
searching the server with the closest key in a tree. Since servers
appear multiple times based on their weights, it is recommended to
use weights larger than approximately 10-20 in order to smoothen
the distribution a bit.
In some cases, playing with weights will be the only solution to
make a server appear more often and increase chances of being picked,
so stats are very important with consistent hashing.
In order to indicate the type of hashing, use :
hash-type map-based (default, old one)
hash-type consistent (new one)
Consistent hashing can make sense in a cache farm, in order not
to redistribute everyone when a cache changes state. It could also
probably be used for long sessions such as terminal sessions, though
that has not be attempted yet.
More details on this method of hashing here :
http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
Until now it was required that every custom ID was above 1000 in order to
avoid conflicts. Now we have the list of all assigned IDs and can automatically
pick the first unused one. This means that it is perfectly possible to interleave
automatic IDs with persistent IDs and the parser will automatically allocate
unused values starting with 1.
This patch allows to collect & provide separate statistics for each socket.
It can be very useful if you would like to distinguish between traffic
generate by local and remote users or between different types of remote
clients (peerings, domestic, foreign).
Currently no "Session rate" is supported, but adding it should be possible
if we found it useful.
By default, when data is sent over a socket, both the write timeout and the
read timeout for that socket are refreshed, because we consider that there is
activity on that socket, and we have no other means of guessing if we should
receive data or not.
While this default behaviour is desirable for almost all applications, there
exists a situation where it is desirable to disable it, and only refresh the
read timeout if there are incoming data. This happens on sessions with large
timeouts and low amounts of exchanged data such as telnet session. If the
server suddenly disappears, the output data accumulates in the system's
socket buffers, both timeouts are correctly refreshed, and there is no way
to know the server does not receive them, so we don't timeout. However, when
the underlying protocol always echoes sent data, it would be enough by itself
to detect the issue using the read timeout. Note that this problem does not
happen with more verbose protocols because data won't accumulate long in the
socket buffers.
When this option is set on the frontend, it will disable read timeout updates
on data sent to the client. There probably is little use of this case. When
the option is set on the backend, it will disable read timeout updates on
data sent to the server. Doing so will typically break large HTTP posts from
slow lines, so use it with caution.
The "static-rr" is just the old round-robin algorithm. It is still
in use when a hash algorithm is used and the data to hash is not
present, but it was impossible to configure it explicitly. This one
is cheaper in terms of CPU and supports unlimited numbers of servers,
so it makes sense to be able to use it.
This patch implements "description" (proxy and global) and "node" (global)
options, removes "node-name" and adds "show-node" & "show-desc" options
for "stats". It also changes the way the header lines (with proxy name) and
the statistics are displayed, so stats no longer look so clumsy with very
long names.
Instead of "node-name" it is possible to use show-node/show-desc with
an optional parameter that overrides a default node/description.
backend cust-0045
# report specific values for this customer
stats show-node Europe
stats show-desc Master node for Europe, Asia, Africa
This patch adds health logging so it possible to check what
was happening before a crash. Failed healt checks are logged if
server is UP and succeeded healt checks if server is DOWN,
so the amount of additional information is limited.
I also reworked the code a little:
- check_status_description[] and check_status_info[] is now
joined into check_statuses[]
- set_server_check_status updates not only s->check_status and
s->check_duration but also s->result making the code simpler
Changes in v3:
- for now calculate and use local versions of health/rise/fall/state,
it is a slow path, no harm should be done. One day we may centralize
processing of the checks and remove the duplicated code.
- also log checks that are restoring current state
- use "conditionally succeeded" for 404 with disable-on-404
We can get rid of the stats analyser by moving all the stats code
to a stream interface applet. Above being cleaner, it provides new
advantages such as the ability to process requests and responses
from the same function and work only with simple state machines.
There's no need for any hijack hack anymore.
The direct advantage for the user are the interactive mode and the
ability to chain several commands delimited by a semi-colon. Now if
the user types "prompt", he gets a prompt from which he can send
as many requests as he wants. All outputs are terminated by a
blank line followed by a new prompt, so this can be used from
external tools too.
The code is not very clean, it needs some rework, but some part
of the dirty parts are due to the remnants of the hijack mode used
in the old functions we call.
The old AN_REQ_STATS_SOCK analyser flag is now unused and has been
removed.
Collect information about last health check result,
including L7 code if possible (for example http or smtp
return code) and time took to finish last check.
Health check info is provided on both stats pages (html & csv)
and logged when a server is marked UP or DOWN. Currently active
check are marked with an asterisk, but only in html mode.
Currently there are 14 status codes:
UNK -> unknown
INI -> initializing
SOCKERR -> socket error
L4OK -> check passed on layer 4, no upper layers testing enabled
L4TOUT -> layer 1-4 timeout
L4CON -> layer 1-4 connection problem, for example "Connection refused"
(tcp rst) or "No route to host" (icmp)
L6OK -> check passed on layer 6
L6TOUT -> layer 6 (SSL) timeout
L6RSP -> layer 6 invalid response - protocol error
L7OK -> check passed on layer 7
L7OKC -> check conditionally passed on layer 7, for example
404 with disable-on-404
L7TOUT -> layer 7 (HTTP/SMTP) timeout
L7RSP -> layer 7 invalid response - protocol error
L7STS -> layer 7 response error, for example HTTP 5xx
HTTP supports status codes 100 and 101 to report protocol indications,
which are followed by the requests's response. Till now, haproxy would
only see those responses without parsing subsequent ones. That means
that cookie additions were only performed on 1xx messages for instance,
which does not work since headers must be ignored with 1xx messages.
Also, logs were not terribly useful with the common 100 status code
in response to "Expect: 100-continue" during POST some requests.
This change adds support for such messages. Now haproxy sees them,
forwards them and skips them until it finds a correct response, which
it logs and processes. As an exception, header removal/rewriting still
work on 1xx responses in order to be able to strip out sensible
information that may have accidentely been left by another equipment
(possibly an older haproxy itself). But headers addition are disabled
however.
This change brings the ability to loop on response without data, which
is a starting point to support keepalive. The change is marked as major
as a few fixes had to be performed in the HTTP message parser.
The new tune.bufsize and tune.maxrewrite global directives allow one to
change the buffer size and the maxrewrite size. Right now, setting bufsize
too low will block stats sockets which will not be able to write at all.
An error checking must be added to buffer_write_chunk() so that if it
cannot write its message to an empty buffer, it causes the caller to abort.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
The new "node-name" stats setting enables reporting of a node ID on
the stats page. It is possible to return the system's host name as
well as a specific name.
Released version 1.4-dev1 with the following main changes :
- [MINOR] acl: add support for matching of RDP cookies
- [MEDIUM] add support for RDP cookie load-balancing
- [MEDIUM] add support for RDP cookie persistence
- [MINOR] add a new CLF log format
- [MINOR] startup: don't imply -q with -D
- [BUG] ensure that we correctly re-start old process in case of error
- [MEDIUM] add support for binding to source port ranges during connect
- [MINOR] config: track "no option"/"option" changes
- [MINOR] config: support resetting options do default values
- [MEDIUM] implement option tcp-smart-accept at the frontend
- [MEDIUM] stream_sock: implement tcp-cork for use during shutdowns on Linux
- [MEDIUM] implement tcp-smart-connect option at the backend
- [MEDIUM] add support for TCP MSS adjustment for listeners
- [MEDIUM] support setting a server weight to zero
- [MINOR] make DEFAULT_MAXCONN user-configurable at build time
- [MAJOR] session: don't clear buffer status flags anymore
- [MAJOR] session: only check for timeouts when they have just occurred.
- [MAJOR] session: simplify buffer error handling
- [MEDIUM] config: split parser and checker in two functions
- [MEDIUM] config: support loading multiple configuration files
- [MEDIUM] stream_sock: don't close prematurely when nolinger is set
- [MEDIUM] session: rework buffer analysis to permit permanent analysers
- [MEDIUM] splice: set the capability on each stream_interface
- [BUG] http: redirect rules were processed too early
- [CLEANUP] remove unused DEBUG_PARSE_NO_SPEEDUP define
- [MEDIUM] http: split request waiter from request processor
- [MEDIUM] session: tell analysers what bit they were called for
- [MAJOR] http: complete splitting of the remaining stages
- [MINOR] report in the proxies the requirements for ACLs
- [MINOR] http: rely on proxy->acl_requires to allocate hdr_idx
- [MINOR] acl: add HTTP protocol detection (req_proto_http)
- [MINOR] prepare callers of session_set_backend to handle errors
- [BUG] default ACLs did not properly set the ->requires flag
- [MEDIUM] allow a TCP frontend to switch to an HTTP backend
- [MINOR] ensure we can jump from swiching rules to http without data
- [MINOR] http: take http request timeout from the backend
- [MINOR] allow TCP inspection rules to make use of HTTP ACLs
- [BUILD] report commit date and not author's date as build date
- [MINOR] acl: don't complain anymore when using L7 acls in TCP
- [BUG] stream_sock: always shutdown(SHUT_WR) before closing
- [BUG] stream_sock: don't stop reading when the poller reports an error
- [BUG] config: tcp-request content only accepts "if" or "unless"
- [BUG] task: fix possible timer drift after update
- [MINOR] apply tcp-smart-connect option for the checks too
- [MINOR] stats: better displaying in MSIE
- [MINOR] config: improve error reporting in global section
- [MINOR] config: improve error reporting in listen sections
- [MINOR] config: the "capture" keyword is not allowed in backends
- [MINOR] config: improve error reporting when checking configuration
- [BUILD] fix a minor build warning on AIX
- [BUILD] use "git cmd" instead of "git-cmd"
- [CLEANUP] report 2009 not 2008 in the copyright banner.
- [MINOR] print usage on the stats sockets upon invalid commands
- [MINOR] acl: detect and report potential mistakes in ACLs
- [BUILD] fix incorrect printf arg count with tcp_splice
- [BUG] fix random pauses on last segment of a series
- [BUILD] add support for build under Cygwin
The new statement "persist rdp-cookie" enables RDP cookie
persistence. The RDP cookie is then extracted from the RDP
protocol, and compared against available servers. If a server
matches the RDP cookie, then it gets the connection.
This patch adds support for hashing RDP cookies in order to
use them as a load-balancing key. The new "rdp-cookie(name)"
load-balancing metric has to be used for this. It is still
mandatory to wait for an RDP cookie in the frontend, otherwise
it will randomly work.
The RDP protocol is quite simple and documented, which permits
an easy detection and extraction of cookies. It can be useful
to match the MSTS cookie which can contain the username specified
by the client.
Since we can now switch from TCP to HTTP, we need to be able to apply
the HTTP request timeout after switching. That means we need to take
it from the backend and not from the frontend. Since the backend points
to the frontend before switching, that changes nothing for the normal
case.
This patch allows a TCP frontend to switch to an HTTP backend.
During the switch, missing structures are automatically allocated.
The HTTP parser is enabled so that the backend first waits for a
full HTTP request.
Now that we can perform TCP-based content switching, it makes sense
to be able to detect HTTP traffic and act accordingly. We already
have an HTTP decoder, we just have to call it in order to detect HTTP
protocol. Note that since the decoder will automatically fill in the
interesting fields of the HTTP transaction, it would make sense to
use this parsing to extend HTTP matching to TCP.
The HTTP processing has been splitted into 7 steps, one of which
is not anymore HTTP-specific (content-switching). That way, it
becomes possible to use "use_backend" rules in TCP mode. A new
"use_server" directive should follow soon.
Sometimes it can be useful to limit the advertised TCP MSS on
incoming connections, for instance when requests come through
a VPN or when the system is running with jumbo frames enabled.
Passing the "mss <value>" arguments to a "bind" line will set
the value. This works under Linux >= 2.6.28, and maybe a few
earlier ones, though due to an old kernel bug most of earlier
versions will probably ignore it. It is also possible that some
other OSes will support this.
This new option enables combining of request buffer data with
the initial ACK of an outgoing TCP connection. Doing so saves
one packet per connection which is quite noticeable on workloads
mostly consisting in small objects. The option is not enabled by
default.
This option disables TCP quick ack upon accept. It is also
automatically enabled in HTTP mode, unless the option is
explicitly disabled with "no option tcp-smart-accept".
This saves one packet per connection which can bring reasonable
amounts of bandwidth for servers processing small requests.
A new keyword prefix "default" has been introduced in order to
reset some options to their default values. This can be needed
for instance when an option is forced disabled or enabled in a
defaults section and when later sections want to use automatic
settings regardless of what was specified there. Right now it
is only supported by options, just like the "no" prefix.
Some users are already hitting the 64k source port limit when
connecting to servers. The system usually maintains a list of
unused source ports, regardless of the source IP they're bound
to. So in order to go beyond the 64k concurrent connections, we
have to manage the source ip:port lists ourselves.
The solution consists in assigning a source port range to each
server and use a free port in that range when connecting to that
server, either for a proxied connection or for a health check.
The port must then be put back into the server's range when the
connection is closed.
This mechanism is used only when a port range is specified on
a server. It makes it possible to reach 64k connections per
server, possibly all from the same IP address. Right now it
should be more than enough even for huge deployments.
Released version 1.3.18 with the following main changes :
- [MEDIUM] add support for "balance hdr(name)"
- [CLEANUP] give a little bit more information in error message
- [MINOR] add X-Original-To: header
- [BUG] x-original-to: fix missing initialization to default value
- [BUILD] spec file: fix broken pipe during rpmbuild and add man file
- [MINOR] improve reporting of misplaced acl/reqxxx rules
- [MEDIUM] http: add options to ignore invalid header names
- [MEDIUM] http: capture invalid requests/responses even if accepted
- [BUILD] add format(printf) to printf-like functions
- [MINOR] fix several printf formats and missing arguments
- [BUG] stats: total and lbtot are unsigned
- [MINOR] fix a few remaining printf-like formats on 64-bit platforms
- [CLEANUP] remove unused make option from haproxy.spec
- [BUILD] make it possible to pass alternative arch at build time
- [MINOR] switch all stat counters to 64-bit
- [MEDIUM] ensure we don't recursively call pool_gc2()
- [CRITICAL] uninitialized response field can sometimes cause crashes
- [BUG] fix wrong pointer arithmetics in HTTP message captures
- [MINOR] rhel init script : support the reload operation
- [MINOR] add basic signal handling functions
- [BUILD] add signal.o to all makefiles
- [MEDIUM] call signal_process_queue from run_poll_loop
- [MEDIUM] pollers: don't wait if a signal is pending
- [MEDIUM] convert all signals to asynchronous signals
- [BUG] O(1) pollers should check their FD before closing it
- [MINOR] don't close stdio fds twice
- [MINOR] add options dontlog-normal and log-separate-errors
- [DOC] minor fixes and rearrangements
- [BUG] fix parser crash on unconditional tcp content rules
- [DOC] rearrange the configuration manual and add a summary
- [MINOR] standard: provide a new 'my_strndup' function
- [MINOR] implement per-logger log level limitation
- [MINOR] compute the max of sessions/s on fe/be/srv
- [MINOR] stats: report max sessions/s and limit in CSV export
- [MINOR] stats: report max sessions/s and limit in HTML stats
- [MINOR] stats/html: use the arial font before helvetica
Some people are using haproxy in a shared environment where the
system logger by default sends alert and emerg messages to all
consoles, which happens when all servers go down on a backend for
instance. These people can not always change the system configuration
and would like to limit the outgoing messages level in order not to
disturb the local users.
The addition of an optional 4th field on the "log" line permits
exactly this. The minimal log level ensures that all outgoing logs
will have at least this level. So the logs are not filtered out,
just set to this level.
There is a patch made by me that allow for balancing on any http header
field.
[WT:
made minor changes:
- turned 'balance header name' into 'balance hdr(name)' to match more
closely the ACL syntax for easier future convergence
- renamed the proxy structure fields header_* => hh_*
- made it possible to use the domain name reduction to any header, not
only "host" since it makes sense to do it with other ones.
Otherwise patch looks good.
/WT]
Several people have asked for a summary in order to ease finding
of sections in the configuration manual. It was the opportunity to
tidy it up a bit and rearrange some sections.
Some big traffic sites have trouble dealing with logs and tend to
disable them. Here are two new options to help cope with massive
logs.
- dontlog-normal only disables logging for 100% successful
connections, other ones will still be logged
- log-separate-errors will cause non-100% successful connections
to be logged at level "err" instead of level "info" so that a
properly configured syslog daemon can send them to a different
file for longer conservation.