In TCP, we don't want to forward chunks of data, we want to forward
indefinitely. This patch introduces a special value for the amount
of data to be forwarded. When buffer_forward() is called with
BUF_INFINITE_FORWARD, it configures the buffer to never stop
forwarding until the end.
An abort during a connect would go to the SI_ST_CLO state without
the buffers shut. This was causing some sessions to never end if
they would abort before the connect request was initiated. This
bug has been introduced after 1.4-dev2.
The doc has been extended to reflect that too.
The BF_EMPTY flag was once used to indicate an empty buffer. However,
it was used half the time as meaning the buffer is empty for the reader,
and half the time as meaning there is nothing left to send.
"nothing to send" is only indicated by "->send_max=0 && !pipe". Once
we fix this, we discover that the flag is not used anymore. So the
flags has been renamed BF_OUT_EMPTY and means exactly the condition
above, ie, there is nothing to send.
Doing so has allowed us to remove some unused tests for emptiness,
but also to uncover a certain amount of situations where the flag
was not correctly set or tested.
The BF_WRITE_ENA buffer flag became very complex to deal with, because
it was used to :
- enable automatic connection
- enable close forwarding
- enable data forwarding
The last point was not very true anymore since we introduced ->send_max,
but still the test remained everywhere. This was causing issues such as
impossibility to connect without forwarding data, impossibility to prevent
closing when data was forwarded, etc...
This patch clarifies the situation by getting rid of this multi-purpose
flag and replacing it with :
- data forwarding based only on ->send_max || ->pipe ;
- a new BF_AUTO_CONNECT flag to allow automatic connection and only
that ;
- ability to perform an automatic connection when ->send_max or ->pipe
indicate that data is waiting to leave the buffer ;
- a new BF_AUTO_CLOSE flag to let the producer automatically set the
BF_SHUTW_NOW flag when it gets a BF_SHUTR.
During this cleanup, it was discovered that some tests were performed
twice, or that the BF_HIJACK flag was still tested, which is not needed
anymore since ->send_max replcaed it. These places have been fixed too.
These cleanups have also revealed a few areas where the other flags
such as BF_EMPTY are not cleanly used. This will be an opportunity for
a second patch.
By inlining this function and slightly reordering it, we can double
the getchar/putchar test throughput, and reduce its footprint by about
40 bytes. Also, it was the only non-inlined char-based function, which
now makes it more consistent this time.
HTTP supports status codes 100 and 101 to report protocol indications,
which are followed by the requests's response. Till now, haproxy would
only see those responses without parsing subsequent ones. That means
that cookie additions were only performed on 1xx messages for instance,
which does not work since headers must be ignored with 1xx messages.
Also, logs were not terribly useful with the common 100 status code
in response to "Expect: 100-continue" during POST some requests.
This change adds support for such messages. Now haproxy sees them,
forwards them and skips them until it finds a correct response, which
it logs and processes. As an exception, header removal/rewriting still
work on 1xx responses in order to be able to strip out sensible
information that may have accidentely been left by another equipment
(possibly an older haproxy itself). But headers addition are disabled
however.
This change brings the ability to loop on response without data, which
is a starting point to support keepalive. The change is marked as major
as a few fixes had to be performed in the HTTP message parser.
The stream_int_return() function used to call buffer_erase() on the response
buffer, which completely wipes it without taking care about whatever could
have been there. Now we more carefully strip only data not scheduled to be
sent.
This function is used to cut the "tail" of a buffer, which means strip it
to the length of unsent data only, and kill any remaining unsent data. Any
scheduled forwarding is stopped. This is mainly to be used to send error
messages after existing data. It does the same as buffer_erase() for buffers
without pending outgoing data.
The computations in buffer_forward() were only valid if buffer_forward()
was used on a buffer which had no more data scheduled for forwarding.
This is always the case right now so this bug is not yet triggered but
it will soon be. Now we correctly discount the bytes to be forwarded
from the data already present in the buffer.
This function works like a traditional putchar() except that it
can return 0 if the output buffer is full.
Now a basic character-based echo function would look like this, from
a stream interface :
while (1) {
c = buffer_si_peekchar(req);
if (c < 0)
break;
if (!buffer_si_putchar(res, c)) {
si->flags |= SI_FL_WAIT_ROOM;
break;
}
buffer_skip(req, 1);
req->flags |= BF_WRITE_PARTIAL;
res->flags |= BF_READ_PARTIAL;
}
The buffer_si_peekline() function is sort of a fgets() to be used from a
stream interface. It returns a complete line whenever possible, and does
not update the buffer's pointer, so that the reader is free to consume
what it wants to.
buffer_si_peekchar() only returns one character, and also needs a call
to buffer_skip() once the character is definitely consumed.
This functions act like their buffer_write*() counter-parts,
except that they're specifically designed to be used from a
stream interface handler, as they carefully check size limits
and automatically advance the read pointer depending on the
to_forward attribute.
buffer_feed_chunk() is an inline calling buffer_feed() as both
are the sames. For this reason, buffer_write_chunk() has also
been turned into an inline which calls buffer_write().
buffer_contig_space(), buffer_contig_data() and buffer_skip()
provide easy methods to extract/insert data from/into a buffer.
buffer_write() and buffer_write_chunk() currently do not check
max_len nor to_forward, so they will quickly become embarrassing
to use or will need an equivalent. The reason is that they are
used to build error messages which currently are not subject to
analysis.
This flag was incorrectly used as meaning "close immediately",
while it needs to say "close ASAP". ASAP here means when unsent
data pending in the buffer are sent. This helps cleaning up some
dirty tricks where the buffer output was checking the BF_SHUTR
flag combined with EMPTY and other such things. Now we have a
clearly defined semantics :
- producer sets SHUTR and *may* set SHUTW_NOW if WRITE_ENA is
set, otherwise leave it to the session processor to set it.
- consumer only checks SHUTW_NOW to decide whether or not to
call shutw().
This also induced very minor changes at some locations which were
not protected against buffer changes while the SHUTW_NOW flag was
set. Now we prevent send_max from changing when the flag is set.
Several tests have been run without any unexpected behaviour detected.
Some more cleanups are needed, as it clearly appears that some tests
could be removed with stricter semantics.
Tarpit was broken by recent splitting of analysers. It would still
let the connection go to the server due to a missing buffer_write_dis().
Also, it was performed too late (after content switching rules).
Please consider the following patches. They are required to
compile haproxy-1.4-dev2 on FreeBSD.
Summary:
1) include <sys/types.h> before <netinet/tcp.h>
2) Use IPPROTO_TCP instead of SOL_TCP
(they are both defined as 6, TCP protocol number)
send() supports the MSG_MORE flag on Linux, which does the same
as TCP_CORK except that we don't have to remove TCP_NODELAY before
and we don't need any syscall to set/remove it. This can save up
to 4 syscalls around a send() (two for setting it, two for removing
it), and it's much cleaner since it is not persistent. So make use
of it instead.
We used to call stream_sock_data_finish() directly at the end of
a session update, but if we want to support non-socket interfaces,
we need to have this function configurable. Now we access it via
->update().
Since it's now possible to change the buffer size by configuration,
we have to take special measures against writes that are larger than
the buffer size. Before this patch, the writers would indefinitely
block, waiting for some space to free up.
With this patch, the functions simply reject the data with an
appropriate code so that the writers can either detect and process
the error or go on, but never remain blocked.
This has been tested on the stats page which does no longer hang
with buffer sizes smaller than 2.5 kB (256 bytes is even OK for
the CSV version).
The new tune.bufsize and tune.maxrewrite global directives allow one to
change the buffer size and the maxrewrite size. Right now, setting bufsize
too low will block stats sockets which will not be able to write at all.
An error checking must be added to buffer_write_chunk() so that if it
cannot write its message to an empty buffer, it causes the caller to abort.
The first step towards dynamic buffer size consists in removing
all static definitions of the buffer size. Instead, we store a
buffer's size in itself. Right now they're all preinitialized
to BUFSIZE, but we will change that.
sess_establish() used to resort to protocol-specific guesses
in order to set rep->analysers. This is no longer needed as it
gets set from the frontend and the backend as a copy of what
was defined in the configuration.
s->srv_error was set depending on the frontend's protocol. Now it is
set by the HTTP analyser, so that even when switching from a TCP
frontend to an HTTP backend, we can have HTTP error messages.
Analyser bitmaps are now stored in the frontend and backend, and
combined at configuration time. That way, set_session_backend()
does not need to perform any protocol-specific combinations.
Since the listener is the one indicating what analyser and session
handlers to call, it makes sense that it also sets the task's nice
value. This also helps getting rid of the last trace of the stats
in the proto_uxst file.
The remains of the stats socket code has nothing to do in proto_uxst
anymore and must move to dumpstats. The code is much cleaner and more
structured. It was also an opportunity to rename AN_REQ_UNIX_STATS
as AN_REQ_STATS_SOCK as the stats socket is no longer unix-specific
either.
The last item refering to stats in proto_uxst is the setting of the
task's nice value which should in fact come from the listener.
process_session() is now ready to handle unix stats sockets. This
first step works and old code has not been removed. A cleanup is
required. The stats handler is not unix socket-centric anymore and
should move to dumpstats.c.
When a stream interface has no connect() function, it means it is
immediately connected, so we don't need any connection request.
This will be used with unix sockets.
In order to merge the unix session handling code, we have to maintain
the number of per-listener connections in the session. This was only
performed for unix sockets till now.
Creating a frontend for the global stats socket will help merge
unix sockets management with the other socket management. Since
frontends are huge structs, we only allocate it if required.
The connection establishment was completely handled by backend.c which
normally just handles LB algos. Since it's purely TCP, it must move to
proto_tcp.c. Also, instead of calling it directly, we now call it via
the stream interface, which will later help us unify session handling.
Andrew Azarov reported that haproxy-1.4-dev1 does not build
under FreeBSD 7.2 because SOL_TCP is not defined. So add a
check for its definition before using it. This only impacts
network optimisations anyway.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
The new "node-name" stats setting enables reporting of a node ID on
the stats page. It is possible to return the system's host name as
well as a specific name.
Released version 1.4-dev2 with the following main changes :
- [BUG] task: fix possible crash when some timeouts are not configured
- [BUG] log: option tcplog would log to global if no logger was defined
Romuald du Song reported a strange bug causing "option tcplog" to
unexpectedly use global log parameters if no log server was declared.
Eventhough it can be useful in some circumstances, it only hides
configuration bugs and can even cause traffic logs to be sent to
the wrong logger, since global settings are just for the process.
This has been fixed and a warning has been added for configurations
where tcplog or httplog are set without any logger. This fix must
be backported to 1.3.20, but not to 1.3.15.X in order not to risk
any regression on old configurations.
Cristian Ditoiu reported a major regression when testing 1.3.19 at
transfer.ro. It would crash within a few minutes while 1.3.15.10
was OK. He offered to help so we could run gdb and debug the crash
live. We finally found that the crash was the result of a regression
introduced by recent fix 814c978fb6
(task: fix possible timer drift after update) which makes it possible
for a tree walk to start from a detached task if this task has got
its timeout disabled due to a missing timeout.
The trivial fix below has been extensively tested and confirmed not
to crash anymore.
Special thanks to Cristian who spontaneously provided a lot of help
and trust to debug this issue which at first glance looked impossible
after reading the code and traces, but took less than an hour to spot
and fix when caught live in gdb ! That's really appreciated !
Released version 1.4-dev1 with the following main changes :
- [MINOR] acl: add support for matching of RDP cookies
- [MEDIUM] add support for RDP cookie load-balancing
- [MEDIUM] add support for RDP cookie persistence
- [MINOR] add a new CLF log format
- [MINOR] startup: don't imply -q with -D
- [BUG] ensure that we correctly re-start old process in case of error
- [MEDIUM] add support for binding to source port ranges during connect
- [MINOR] config: track "no option"/"option" changes
- [MINOR] config: support resetting options do default values
- [MEDIUM] implement option tcp-smart-accept at the frontend
- [MEDIUM] stream_sock: implement tcp-cork for use during shutdowns on Linux
- [MEDIUM] implement tcp-smart-connect option at the backend
- [MEDIUM] add support for TCP MSS adjustment for listeners
- [MEDIUM] support setting a server weight to zero
- [MINOR] make DEFAULT_MAXCONN user-configurable at build time
- [MAJOR] session: don't clear buffer status flags anymore
- [MAJOR] session: only check for timeouts when they have just occurred.
- [MAJOR] session: simplify buffer error handling
- [MEDIUM] config: split parser and checker in two functions
- [MEDIUM] config: support loading multiple configuration files
- [MEDIUM] stream_sock: don't close prematurely when nolinger is set
- [MEDIUM] session: rework buffer analysis to permit permanent analysers
- [MEDIUM] splice: set the capability on each stream_interface
- [BUG] http: redirect rules were processed too early
- [CLEANUP] remove unused DEBUG_PARSE_NO_SPEEDUP define
- [MEDIUM] http: split request waiter from request processor
- [MEDIUM] session: tell analysers what bit they were called for
- [MAJOR] http: complete splitting of the remaining stages
- [MINOR] report in the proxies the requirements for ACLs
- [MINOR] http: rely on proxy->acl_requires to allocate hdr_idx
- [MINOR] acl: add HTTP protocol detection (req_proto_http)
- [MINOR] prepare callers of session_set_backend to handle errors
- [BUG] default ACLs did not properly set the ->requires flag
- [MEDIUM] allow a TCP frontend to switch to an HTTP backend
- [MINOR] ensure we can jump from swiching rules to http without data
- [MINOR] http: take http request timeout from the backend
- [MINOR] allow TCP inspection rules to make use of HTTP ACLs
- [BUILD] report commit date and not author's date as build date
- [MINOR] acl: don't complain anymore when using L7 acls in TCP
- [BUG] stream_sock: always shutdown(SHUT_WR) before closing
- [BUG] stream_sock: don't stop reading when the poller reports an error
- [BUG] config: tcp-request content only accepts "if" or "unless"
- [BUG] task: fix possible timer drift after update
- [MINOR] apply tcp-smart-connect option for the checks too
- [MINOR] stats: better displaying in MSIE
- [MINOR] config: improve error reporting in global section
- [MINOR] config: improve error reporting in listen sections
- [MINOR] config: the "capture" keyword is not allowed in backends
- [MINOR] config: improve error reporting when checking configuration
- [BUILD] fix a minor build warning on AIX
- [BUILD] use "git cmd" instead of "git-cmd"
- [CLEANUP] report 2009 not 2008 in the copyright banner.
- [MINOR] print usage on the stats sockets upon invalid commands
- [MINOR] acl: detect and report potential mistakes in ACLs
- [BUILD] fix incorrect printf arg count with tcp_splice
- [BUG] fix random pauses on last segment of a series
- [BUILD] add support for build under Cygwin
During a direct data transfer from the server to the client, if the
system did not have enough buffers anymore, haproxy would not enable
write polling again if it could write at least one data chunk. Under
normal conditions, this would remain undetected because the remaining
data would be pushed by next data chunks.
However, when this happens on the last chunk of a session, or the last
in a series in an interactive bidirectional TCP transfer, haproxy would
only start sending again when the read timeout was reached on the side
it stopped writing, causing long pauses on some protocols such as SQL.
This bug was reported by an Exceliance customer who generously offered
to help us by sending large amounts of traces and running various tests
on production systems.
It is quite hard to trigger it but it becomes easier with a ping-pong
TCP service which transfers random data sizes, with a modified version
of send() able to send packets smaller than the average transfer size.
A cleaner fix would imply only updating the write timeout when data
transfers are *attempted*, not succeeded, but that requires more
sensible code changes without fixing the result. It is a candidate
for a later patch though.
I've discovered a configuration with lots of occurrences of the
following :
acl xxx hdr_beg (host) xxx
The problem is that hdr_beg will match every header against patterns
(host) and xxx due to the space between both, which certainly is not
what the user wanted. Now we detect such ACLs and report a warning
with a suggestion to add "--" between "hdr_beg" and "(host)" if this
is definitely what is wanted.
When issuing commands on the unix socket, there's no way to
know if the result is empty or if the command is wrong. This
patch makes invalid command return a help message.
Newer GIT versions do not support "git-cmd" anymore, so date and version
can be wrong during development builds. Use "git cmd" now. Also fix
git-tar to use "git archive" instead of "git-tar-tree".
AIX wants string.h in signal.c (and is right to do so) :
gcc -Iinclude -Wall -O2 -g -DTPROXY -DENABLE_POLL -DCONFIG_HAPROXY_VERSION=\"1.3.18\" -DCONFIG_HAPROXY_DATE=\"2009/05/10\" -c -o src/signal.o src/signal.c
src/signal.c: In function 'signal_init':
src/signal.c:32: warning: implicit declaration of function 'memset'
src/signal.c:32: warning: incompatible implicit declaration of built-in function 'memset'