Commit Graph

1617 Commits

Author SHA1 Message Date
Cyril Bonté
c9f825f060 [OPTIM] config: only allocate check buffer when checks are enabled
To save a little memory, the check_data buffer is only allocated
for the servers that are checked.

[WT: this patch saves 80 MB of RAM on the test config with 5000 servers]
2010-03-17 22:05:23 +01:00
Willy Tarreau
039381855d [BUG] checks: don't wait for a close to start parsing the response
Cyril Bonté reported a regression introduced with very last changes
on the checks code, which causes failed checks on if the server does
not close the connection in time. This happens on HTTP/1.1 checks or
on SMTP checks for instance.

This fix consists in restoring the old behaviour of parsing as soon
as something is available in the response buffer, and waiting for
more data if some are missing. This also helps releasing connections
earlier (eg: a GET check will not have to download the whole object).
2010-03-17 21:52:07 +01:00
Willy Tarreau
e437c44483 [BUG] init: unconditionally catch SIGPIPE
Apparently some systems define MSG_NOSIGNAL but do not necessarily
check it (or maybe binaries are built somewhere and used on older
versions). There were reports of very recent FreeBSD setups causing
SIGPIPEs, while older ones catch the signal. Recent FreeBSD manpages
indeed define MSG_NOSIGNAL.

So let's now unconditionnaly catch the signal. It's useless not to do
it for the rare cases where it's not needed (linux 2.4 and below).
2010-03-17 18:02:46 +01:00
Willy Tarreau
bf3f1de5b5 [BUG] http: fix truncated responses on chunk encoding when size divides buffer size
Bernhard Krieger reported truncated HTTP responses in presence of some
specific chunk-encoded data, and kindly offered complete traces of the
issue which made it easy to reproduce it.

Those traces showed that the chunks were of exactly 8192 bytes, chunk
size and CRLF included, which was exactly half the size of the buffer.
In this situation, the function http_chunk_skip_crlf() could erroneously
try to parse a CRLF after the chunk believing there were more data
pending, because the number of bytes present in the buffer was considered
instead of the number of remaining bytes to be parsed.
2010-03-17 15:54:24 +01:00
Willy Tarreau
6315d914b6 [MINOR] checks: make shutdown() silently fail
Shutdown may fail for instance after an RST. So we must not report
any error for that.
2010-03-16 22:57:27 +01:00
Willy Tarreau
659d7bc4e3 [BUG] checks: don't abort when second poll returns an error
Now that the response may be fragmented, we may receive early notifications
of aborts in return of poll(), as indicated below, which currently cause
an early error detection :

  21:11:21.036600 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 993) = 1
  21:11:21.054361 gettimeofday({1268770281, 54467}, NULL) = 0
  21:11:21.054540 recv(7, "H"..., 8030, 0) = 1
  21:11:21.054694 recv(7, 0x967e759, 8029, 0) = -1 EAGAIN (Resource temporarily unavailable)
  21:11:21.054843 epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}}, 8, 975) = 1
  21:11:21.060274 gettimeofday({1268770281, 60386}, NULL) = 0
  21:11:21.060454 close(7)                = 0

Just as in stream_sock, we must not believe poll() without attempting to receive,
which fixes the issue :

  21:11:59.402207 recv(7, "H"..., 8030, 0) = 1
  21:11:59.402362 recv(7, 0x8b5c759, 8029, 0) = -1 EAGAIN (Resource temporarily unavailable)
  21:11:59.402511 epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}}, 8, 974) = 1
  21:11:59.407242 gettimeofday({1268770319, 407353}, NULL) = 0
  21:11:59.407425 recv(7, "TTP/1.0 200 OK\r\n"..., 8029, 0) = 16
  21:11:59.407606 recv(7, 0x8b5c769, 8013, 0) = -1 ECONNRESET (Connection reset by peer)
  21:11:59.407753 shutdown(7, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected)
2010-03-16 22:57:27 +01:00
Willy Tarreau
c1a07960a6 [BUG] checks: don't report an error when recv() returns an error after data
This happens when a server immediately closes the connection after
the response without lingering or when we close before the end of
the data. We get an RST which translates into a late error. We must
not declare an error without checking that the contents are OK.
2010-03-16 22:57:27 +01:00
Willy Tarreau
2c7ace07ad [OPTIM] checks: try to detect the end of response without polling again
Since the recv() call returns every time it succeeds, we always need
to calls with one intermediate poll before detecting the end of response :

  20:20:03.958207 recv(7, "HTTP/1.1 200\r\nConnection: close\r\n"..., 8030, 0) = 145
  20:20:03.958365 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
  20:20:03.958543 gettimeofday({1268767203, 958626}, NULL) = 0
  20:20:03.958694 recv(7, ""..., 7885, 0) = 0
  20:20:03.958833 shutdown(7, 2 /* send and receive */) = 0

Let's read as long as we can, that way we can detect end of connections
in the same call, which is much more efficient especially for LBs with
hundreds of servers :

  20:29:58.797019 recv(7, "HTTP/1.1 200\r\nConnection: close\r\n"..., 8030, 0) = 145
  20:29:58.797182 recv(7, ""..., 7885, 0) = 0
  20:29:58.797356 shutdown(7, 2 /* send and receive */) = 0
2010-03-16 22:57:27 +01:00
Nick Chalk
57b1bf7785 [MEDIUM] checks: support multi-packet health check responses
We are seeing both real servers repeatedly going on- and off-line with
a period of tens of seconds. Packet tracing, stracing, and adding
debug code to HAProxy itself has revealed that the real servers are
always responding correctly, but HAProxy is sometimes receiving only
part of the response.

It appears that the real servers are sending the test page as three
separate packets. HAProxy receives the contents of one, two, or three
packets, apparently randomly. Naturally, the health check only
succeeds when all three packets' data are seen by HAProxy. If HAProxy
and the real servers are modified to use a plain HTML page for the
health check, the response is in the form of a single packet and the
checks do not fail.

(...)
I've added buffer and length variables to struct server, and allocated
space with the rest of the server initialisation.

(...)
It seems to be working fine in my tests, and handles check responses
that are bigger than the buffer.
2010-03-16 22:57:26 +01:00
Willy Tarreau
3965040898 [MINOR] http: don't mark a server as failed when it returns 501/505
Those two codes can be triggered on demand by client requests.
We must not fail a server on them.

Ideally we should ignore a certain amount of status codes which do
not indicate life nor death.
2010-03-15 19:44:39 +01:00
Willy Tarreau
f53b25d7c1 [BUG] config: fix endless loop when parsing "on-error"
An arg index increment was missing causing the same arg to be parsed
in an endless loop. Proabably a merge conflict that remained undetected.
2010-03-15 19:40:37 +01:00
Willy Tarreau
2a56c5e1c3 [BUG] don't merge anonymous ACLs !
The new anonymous ACL feature was buggy. If several ones are
declared, the first rule is always matched because all of them
share the same internal name (".noname"). Now we simply declare
them with an empty name and ensure that we disable any merging
when the name is empty.
2010-03-15 16:13:29 +01:00
Cyril Bonté
7f2c53938c [BUG] clf logs segfault when capturing a non existant header
Hi Willy,

Please find a small patch to prevent haproxy segfaulting when logging captured headers in CLF format.

Example config to reproduce the bug :
listen test :10080
	log 127.0.0.1 local7 debug err
	mode	http
	option	httplog clf
	capture request header NonExistantHeader len 16

--
Cyril Bonté
2010-03-14 20:02:10 +01:00
Willy Tarreau
296897f2c6 [MEDIUM] connect to servers even when the input has already been closed
The BF_AUTO_CLOSE flag prevented a connection from establishing on
a server if the other side's input channel was already closed. This
is wrong because there may be pending data to be sent.

This was causing an issue with stats, as noticed and reported by
Cyril Bonté. Since the stats are now handled as a server, sometimes
concurrent accesses were causing one of the connections to send the
shutdown(write) before the connection to the stats function was
established, which aborted it early.

This fix causes the BF_AUTO_CLOSE flag to be checked only when the
connection on the outgoing stream interface has reached an established
state. That way we're still able to connect, send the request then
close.
2010-03-14 19:21:34 +01:00
Willy Tarreau
1d21e0a28e [MINOR] force null-termination of hostname
Marcello Gorlani reported that at least on FreeBSD, a long hostname
was reported with garbage on the stats page. POSIX does not make it
mandatory for gethostname() to NULL-terminate the string in case of
truncation, and at least FreeBSD appears not to do it. So let's
force null-termination to keep safe.
2010-03-12 21:58:54 +01:00
Cyril Bonté
78caf8449d [DOC] Some more documentation cleanups
Since the last documentation cleanups, I've found more typos that I kept
in a corner instead of sending you a mail just for one character :)

--
Cyril Bonté
2010-03-12 06:46:06 +01:00
Cyril Bonté
8f27090b84 [CLEANUP] product branch update
today I've noticed that the stats page still displays v1.3 in the
"Updates" link, due to the PRODUCT_BRANCH value in version.h, then
it's maybe time to send you the result (notice that the patch updates
PRODUCT_BRANCH to "1.4").

--
Cyril Bonté
2010-03-12 06:45:26 +01:00
Willy Tarreau
4256463b5a [BUG] url_param hash may return a down server
Jozef Hovan reported a bug sometimes causing a down server to be
used in url_param hashing mode.

This happens if the following conditions are met :
  - the backend contains more than one server with at least two
    of different weights
  - all servers but one are down
  - the server which is not down has a weight which does not divide
    all the other ones

Example: 3 servers with 20,20,10, the first one remains up.

The problem is caused by an optimisation in recalc_server_map()
which only fills the first map slot when only one server is up,
because all LB algorithms are optimized to use entry zero when
only one server is up... All but url_param. When doing the modulus,
we can return a position which is greater than zero and use an
entry which still refers to a server which has since been stopped.

One solution could be to optimize the url_param algo to proceed
as the other ones, but the fact that was wrong implies that we
can repeat the same bug later. So let's first correctly initialize
the map in order to avoid that trap.
2010-03-12 06:22:16 +01:00
Willy Tarreau
6a8573ef68 [BUG] stats: connection reset counters must be plain ascii, not HTML
Using U2H*() to put numbers in title tags is a bad idea as those
appear in the HTML itself. Problem reported by Laurent Dolosor.
2010-03-05 18:15:23 +01:00
Willy Tarreau
66dc20a17b [MINOR] stats socket: add show sess <id> to dump details about a session
When trying to spot some complex bugs, it's often needed to access
information on stuck sessions, which is quite difficult. This new
command helps one get detailed information about a session, with
flags, timers, states, etc... The buffer data are not dumped yet.
2010-03-05 17:58:04 +01:00
Willy Tarreau
d426a18a04 [MINOR] stats: make the data dump function reusable for other purposes
The dump_error_line() function was limited to dump error buffers while
it's perfectly suitable to dump anything else.
2010-03-05 17:57:41 +01:00
Willy Tarreau
8811f8e82e [MINOR] stats: don't send empty lines in "show errors"
Empty lines indicate end of dump, so it's important not to do that.
Send a single space instead.
2010-03-05 17:56:56 +01:00
Willy Tarreau
bc77530456 [MINOR] proto_uxst: set accept_date upon accept() to the wall clock time
This accept_date field was not set and will be reported in the stats as
the connection's accept date.
2010-03-05 17:56:31 +01:00
Willy Tarreau
6464841769 [BUG] http: don't wait for response data to leave buffer is client has left
In case of pipelined requests, if the client aborts before reading response
N-1, haproxy waits forever for the data to leave the buffer before parsing
the next response.
2010-03-05 10:57:48 +01:00
Willy Tarreau
15e5554467 [CLEANUP] session: remove duplicate test
This duplicate test should have been removed with the loop rework but was forgotten.
It was harmless, but disassembly shows that it prevents gcc from correctly optimizing
the loop.
2010-03-05 10:12:01 +01:00
Willy Tarreau
c5e60c3360 [RELEASE] Released version 1.4.1
Released version 1.4.1 with the following main changes :
    - [BUG] Clear-cookie path issue
    - [DOC] fix typo on stickiness rules
    - [BUILD] fix BSD and OSX makefiles for missing files
    - [BUILD] includes order breaks OpenBSD build
    - [BUILD] fix some build warnings on Solaris with is* macros
    - [BUG] logs: don't report "last data" when we have just closed after an error
    - [BUG] logs: don't report "proxy request" when server closes early
    - [BUILD] fix platform-dependant build issues related to crypt()
    - [STATS] count transfer aborts caused by client and by server
    - [STATS] frontend requests were not accounted for failed requests
    - [MINOR] report total number of processed connections when stopping a proxy
    - [DOC] be more clear about the limitation to one single monitor-net entry
2010-03-04 23:39:19 +01:00
Willy Tarreau
95cd28309b [DOC] be more clear about the limitation to one single monitor-net entry
It was not clear in the doc that only one monitor-net entry is supported.
2010-03-04 23:36:33 +01:00
William Turner
d986526d98 [BUG] Clear-cookie path issue
We have been using haproxy to balance a not very well written application
(http://www.blackboard.com/). Using the "insert postonly indirect" cookie
method, I was attempting to remove the cookie when users would logout,
allowing the machine to re-balance for the next user (this application is
used in school computer labs, so a computer might stay on the whole day
but be used on and off).

I was having a lot of trouble because when the cookie was set, it was with
"Path=/", but when being cleared there was no "Path" in the set cookie
header, and because the logout page was in a different place of the
website (which I couldn't change), the cookie would not be cleared. I
don't know if this would be a problem for anyone other than me (as our
HTTP application is so un-adjustable), but just in case, I have included
the patch I used. Maybe it will help someone else.

[ WT: this was a correct fix, and I also added the same missing path to
  the set-cookie option ]
2010-03-04 23:16:42 +01:00
Willy Tarreau
1104614b57 [MINOR] report total number of processed connections when stopping a proxy
It's sometimes convenient to know if a proxy has processed any connection
at all when stopping it. Since a soft restart causes the "Proxy stopped"
message to be logged for each proxy, let's add the number of connections
so that it's possible afterwards to check whether a proxy had received
any connection.
2010-03-04 23:07:28 +01:00
Willy Tarreau
3e1b6d1ed0 [STATS] frontend requests were not accounted for failed requests
But failed requests were accounted for, resulting in more failures
than requests.
2010-03-04 23:02:38 +01:00
Willy Tarreau
ae52678444 [STATS] count transfer aborts caused by client and by server
Often we need to understand why some transfers were aborted or what
constitutes server response errors. With those two counters, it is
now possible to detect an unexpected transfer abort during a data
phase (eg: too short HTTP response), and to know what part of the
server response errors may in fact be assigned to aborted transfers.
2010-03-04 20:34:23 +01:00
Willy Tarreau
890a33e41f [BUILD] fix platform-dependant build issues related to crypt()
Holger Just and Ross West reported build issues on FreeBSD and
Solaris that were initially caused by the definition of
_XOPEN_SOURCE at the top of auth.c, which was required on Linux
to avoid a build warning.

Krzysztof Oledzki found that using _GNU_SOURCE instead also worked
on Linux and did not cause any issue on several versions of FreeBSD.
Solaris still reported a warning this time, which was fixed by
including <crypt.h>, which itself is not present on FreeBSD nor on
all Linux toolchains.

So by adding a new build option (NEED_CRYPT_H), we can get Solaris
to get crypt() working and stop complaining at the same time, without
impacting other platforms.

This fix was tested at least on several linux toolchains (at least
uclibc, glibc 2.2.5, 2.3.6 and 2.7), on FreeBSD 4 to 8, Solaris 8
(which needs crypt.h), and AIX 5.3 (without crypt.h).

Every time it builds without a warning.
2010-03-04 19:10:14 +01:00
Willy Tarreau
40dba09343 [BUG] logs: don't report "proxy request" when server closes early
A copy-paste typo and a missing check were causing the logs to
report "PR" instead of "SD" when a server closes before sending
full data. Also, the log would erroneously report 502 while in
fact the correct response will already have been transmitted.
2010-03-04 18:45:47 +01:00
Willy Tarreau
033b2dbeb3 [BUG] logs: don't report "last data" when we have just closed after an error
Some people have reported seeing "SL" flags in their logs quite often while
this should never happen. The reason was that then a server error is detected,
we close the connection to that server and when we decide what state we were
in, we see the connection is closed, and deduce it was the last data transfer,
which is wrong. We should report DATA if the previous state was an established
state, which this patch does.

Now logs correctly report "SD" and not "SL" when a server resets a connection
before the end of the transfer.
2010-03-04 18:45:47 +01:00
Willy Tarreau
88e058164a [BUILD] fix some build warnings on Solaris with is* macros
isalnum, isdigit and friends are really annoying because they take
an int in which we should pass an unsigned char, while strings
everywhere use chars. Solaris uses macros relying on an array for
those functions, which easily triggers some warnings showing where
we have mistakenly passed a char instead of an unsigned char or an
int. Those warnings may indicate real bugs on some platforms
depending on the implementation.
2010-03-03 00:16:00 +01:00
Willy Tarreau
0e996c681f [BUILD] includes order breaks OpenBSD build
Jeff Buchbinder reported that OpenBSD build broke on compat.h,
and that this patch fixes the issue.
2010-02-26 22:00:19 +01:00
Willy Tarreau
f2d9d84e96 [BUILD] fix BSD and OSX makefiles for missing files
Jeff Buchbinder reported that OpenBSD build broke because recently
added files were not ported to BSD and OSX makefiles.
2010-02-26 21:36:32 +01:00
Willy Tarreau
ec579d83f7 [DOC] fix typo on stickiness rules 2010-02-26 19:15:04 +01:00
Willy Tarreau
e18fdfdb85 [RELEASE] Released version 1.4.0
Released version 1.4.0 with the following main changes :
    - [MINOR] stats: report maint state for tracking servers too
    - [DOC] fix summary to add pattern extraction
    - [DOC] Documentation cleanups
    - [BUG] cfgparse memory leak and missing free calls in deinit()
    - [BUG] pxid/puid/luid: don't shift IDs when some of them are forced
    - [EXAMPLES] add auth.cfg
    - [BUG] uri_auth: ST_SHLGNDS should be 0x00000008 not 0x0000008
    - [BUG] uri_auth: do not attemp to convert uri_auth -> http-request more than once
    - [BUILD] auth: don't use unnamed unions
    - [BUG] config: report unresolvable host names as errors
    - [BUILD] fix build breakage with DEBUG_FULL
    - [DOC] fix a typo about timeout check and clarify the explanation.
    - [MEDIUM] http: don't use trash to realign large buffers
    - [STATS] report HTTP requests (total and rate) in frontends
    - [STATS] separate frontend and backend HTTP stats
    - [MEDIUM] http: revert to use a swap buffer for realignment
    - [MINOR] stats: report the request rate in frontends as cell titles
    - [MINOR] stats: mark areas with an underline when tooltips are available
    - [DOC] reorder some entries to maintain the alphabetical order
    - [DOC] cleanup of the keyword matrix
2010-02-26 14:55:22 +01:00
Willy Tarreau
5c6f7b360e [DOC] cleanup of the keyword matrix
The keyword matrix was barely readable due to the long lines.
Also let's repeat the legend every 24 lines.
2010-02-26 13:34:29 +01:00
Willy Tarreau
d63335a861 [DOC] reorder some entries to maintain the alphabetical order 2010-02-26 12:56:52 +01:00
Willy Tarreau
e0454096c0 [MINOR] stats: mark areas with an underline when tooltips are available
There are many information available in the stats page that can only
be seen when the mouse hovers over them. But it's hard to know where
those information are. Now with a discrete dotted underline it's easier
to spot those areas.
2010-02-26 12:29:07 +01:00
Willy Tarreau
b44939a66f [MINOR] stats: report the request rate in frontends as cell titles
The current and max request rates are now reported when the mouse flies
over the session rate cur/max. The total requests is displayed with the
status codes over the total sessions cell.
2010-02-26 11:35:39 +01:00
Willy Tarreau
8096de9a99 [MEDIUM] http: revert to use a swap buffer for realignment
The bounce realign function was algorithmically good but as expected
it was not cache-friendly. Using it with large requests caused so many
cache thrashing that the function itself could drain 70% of the total
CPU time for only 0.5% of the calls !

Revert back to a standard memcpy() using a specially allocated swap
buffer. We're now back to 2M req/s on pipelined requests.
2010-02-26 11:12:27 +01:00
Willy Tarreau
2465779459 [STATS] separate frontend and backend HTTP stats
It is wrong to merge FE and BE stats for a proxy because when we consult a
BE's stats, it reflects the FE's stats eventhough the BE has received no
traffic. The most common example happens with listen instances, where the
backend gets credited for all the trafic even when a use_backend rule makes
use of another backend.
2010-02-26 10:30:28 +01:00
Willy Tarreau
d9b587f260 [STATS] report HTTP requests (total and rate) in frontends
Now that we support keep-alive, it's important to report a separate
counter for requests. Right now it just appears in the CSV output.
2010-02-26 10:05:55 +01:00
Willy Tarreau
b97f199d4b [MEDIUM] http: don't use trash to realign large buffers
The trash buffer may now be smaller than a buffer because we can tune
it at run time. This causes a risk when we're trying to use it as a
temporary buffer to realign unaligned requests, because we may have to
put up to a full buffer into it.

Instead of doing a double copy, we're now relying on an open-coded
bouncing copy algorithm. The principle is that we move one byte at
a time to its final place, and if that place also holds a byte, then
we move it too, and so on. We finish when we've moved all the buffer.
It limits the number of memory accesses, but since it proceeds one
byte at a time and with random walk, it's not cache friendly and
should be slower than a double copy. However, it's only used in
extreme situations and the difference will not be noticeable.

It has been extensively tested and works reliably.
2010-02-25 23:54:31 +01:00
Krzysztof Piotr Oledzki
329f74d463 [BUG] uri_auth: do not attemp to convert uri_auth -> http-request more than once
Bug reported by Laurent Dolosor.
2010-02-23 12:36:10 +01:00
Krzysztof Piotr Oledzki
15f0ac4829 [BUG] uri_auth: ST_SHLGNDS should be 0x00000008 not 0x0000008 2010-02-23 12:36:10 +01:00
Willy Tarreau
d7550a229f [DOC] fix a typo about timeout check and clarify the explanation. 2010-02-10 05:10:19 +01:00