Commit Graph

439 Commits

Author SHA1 Message Date
Willy Tarreau
a9fb08317f [MINOR] report in the proxies the requirements for ACLs
This patch propagates the ACL conditions' "requires" bitfield
to the proxies. This makes it possible to know exactly what a
proxy might have to support for any request, which helps knowing
whether we have to allocate some space for certain types of
structures or not (eg: the hdr_idx struct).

The concept might be extended to a lot more types of information,
such as detecting whether we need to allocate some space for some
request ACLs which need a result in the response, etc...
2009-07-10 23:09:39 +02:00
Willy Tarreau
1d0dfb155d [MAJOR] http: complete splitting of the remaining stages
The HTTP processing has been splitted into 7 steps, one of which
is not anymore HTTP-specific (content-switching). That way, it
becomes possible to use "use_backend" rules in TCP mode. A new
"use_server" directive should follow soon.
2009-07-07 15:10:31 +02:00
Willy Tarreau
3a816293e9 [MEDIUM] session: tell analysers what bit they were called for
Some stream analysers might become generic enough to be called
for several bits. So we cannot have the analyser bit hard coded
into the analyser itself. Let's make the caller inform the callee.
2009-07-07 10:55:49 +02:00
Willy Tarreau
d787e6648c [MEDIUM] http: split request waiter from request processor
We want to split several steps in HTTP processing so that
we can call individual analysers depending on what processing
we want to perform. The first step consists in splitting the
part that waits for a request from the rest.
2009-07-07 10:14:51 +02:00
Willy Tarreau
dc340a900d [MEDIUM] splice: set the capability on each stream_interface
The splice code did not consider compatibility between both ends
of the connection. Now we set different capabilities on each
stream interface, depending on what the protocol can splice to/from.
Right now, only TCP is supported. Thanks to this, we're now able to
automatically detect when splice() is not implemented and automatically
disable it on one end instead of reporting errors to the upper layer.
2009-06-28 23:10:19 +02:00
Willy Tarreau
5d707e1aaa [MEDIUM] stream_sock: don't close prematurely when nolinger is set
When the nolinger option is used, we must not close too fast because
some data might be left unsent. Instead we must proceed with a normal
shutdown first, then a close. Also, we want to avoid merging FIN with
the last segment if nolinger is set, because if that one gets lost,
there is no chance for it to be retransmitted.
2009-06-28 11:09:07 +02:00
Willy Tarreau
5d01a63b78 [MEDIUM] config: support loading multiple configuration files
We now support up to 10 distinct configuration files. They are
all loaded in the order defined by -f <file1> -f <file2> ...

This can be useful in order to store global, private, public,
etc... configurations in distinct files.
2009-06-23 08:17:17 +02:00
Willy Tarreau
915e1ebe63 [MEDIUM] config: split parser and checker in two functions
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
2009-06-23 08:17:17 +02:00
Willy Tarreau
c9fe4562c2 [MINOR] make DEFAULT_MAXCONN user-configurable at build time
The only way to set this previously was to set SYSTEM_MAXCONN
which serves a different purpose.
2009-06-15 16:34:03 +02:00
Willy Tarreau
be1b91842a [MEDIUM] add support for TCP MSS adjustment for listeners
Sometimes it can be useful to limit the advertised TCP MSS on
incoming connections, for instance when requests come through
a VPN or when the system is running with jumbo frames enabled.

Passing the "mss <value>" arguments to a "bind" line will set
the value. This works under Linux >= 2.6.28, and maybe a few
earlier ones, though due to an old kernel bug most of earlier
versions will probably ignore it. It is also possible that some
other OSes will support this.
2009-06-14 18:48:19 +02:00
Willy Tarreau
d88edf2e52 [MEDIUM] implement tcp-smart-connect option at the backend
This new option enables combining of request buffer data with
the initial ACK of an outgoing TCP connection. Doing so saves
one packet per connection which is quite noticeable on workloads
mostly consisting in small objects. The option is not enabled by
default.
2009-06-14 15:48:17 +02:00
Willy Tarreau
fb14edc215 [MEDIUM] stream_sock: implement tcp-cork for use during shutdowns on Linux
Setting TCP_CORK on a socket before sending the last segment enables
automatic merging of this segment with the FIN from the shutdown()
call. Playing with TCP_CORK is not easy though as we have to track
the status of the TCP_NODELAY flag since both are mutually exclusive.
Doing so saves one more packet per session and offers about 5% more
performance.

There is no reason not to do it, so there is no associated option.
2009-06-14 15:24:37 +02:00
Willy Tarreau
9ea05a790f [MEDIUM] implement option tcp-smart-accept at the frontend
This option disables TCP quick ack upon accept. It is also
automatically enabled in HTTP mode, unless the option is
explicitly disabled with "no option tcp-smart-accept".

This saves one packet per connection which can bring reasonable
amounts of bandwidth for servers processing small requests.
2009-06-14 12:07:01 +02:00
Willy Tarreau
84b57dae4a [MINOR] config: track "no option"/"option" changes
Sometimes we would want to implement implicit default options,
but for this we need to be able to disable them, which requires
to keep track of "no option" settings. With this change, an option
explicitly disabled in a defaults section will still be seen as
explicitly disabled. There should be no regression as nothing makes
use of this yet.
2009-06-14 11:10:45 +02:00
Willy Tarreau
c6f4ce8fc4 [MEDIUM] add support for binding to source port ranges during connect
Some users are already hitting the 64k source port limit when
connecting to servers. The system usually maintains a list of
unused source ports, regardless of the source IP they're bound
to. So in order to go beyond the 64k concurrent connections, we
have to manage the source ip:port lists ourselves.

The solution consists in assigning a source port range to each
server and use a free port in that range when connecting to that
server, either for a proxied connection or for a health check.
The port must then be put back into the server's range when the
connection is closed.

This mechanism is used only when a port range is specified on
a server. It makes it possible to reach 64k connections per
server, possibly all from the same IP address. Right now it
should be more than enough even for huge deployments.
2009-06-10 12:23:32 +02:00
Willy Tarreau
13a34bd110 [MINOR] compute the max of sessions/s on fe/be/srv
Some users want to keep the max sessions/s seen on servers, frontends
and backends for capacity planning. It's easy to grab it while the
session count is updated, so let's keep it.
2009-05-10 18:52:49 +02:00
Willy Tarreau
f7edefa413 [MINOR] implement per-logger log level limitation
Some people are using haproxy in a shared environment where the
system logger by default sends alert and emerg messages to all
consoles, which happens when all servers go down on a backend for
instance. These people can not always change the system configuration
and would like to limit the outgoing messages level in order not to
disturb the local users.

The addition of an optional 4th field on the "log" line permits
exactly this. The minimal log level ensures that all outgoing logs
will have at least this level. So the logs are not filtered out,
just set to this level.
2009-05-10 17:20:05 +02:00
Benoit
affb481f1a [MEDIUM] add support for "balance hdr(name)"
There is a patch made by me that allow for balancing on any http header
field.

[WT:
  made minor changes:
  - turned 'balance header name' into 'balance hdr(name)' to match more
    closely the ACL syntax for easier future convergence
  - renamed the proxy structure fields header_* => hh_*
  - made it possible to use the domain name reduction to any header, not
    only "host" since it makes sense to do it with other ones.
  Otherwise patch looks good.
/WT]
2009-05-10 15:50:15 +02:00
Willy Tarreau
946ba59190 [MINOR] standard: provide a new 'my_strndup' function
This function is only offered by GNU extensions and is sometimes
useful during configuration parsing.
2009-05-10 15:41:18 +02:00
Willy Tarreau
c9bd0cc224 [MINOR] add options dontlog-normal and log-separate-errors
Some big traffic sites have trouble dealing with logs and tend to
disable them. Here are two new options to help cope with massive
logs.

  - dontlog-normal only disables logging for 100% successful
    connections, other ones will still be logged

  - log-separate-errors will cause non-100% successful connections
    to be logged at level "err" instead of level "info" so that a
    properly configured syslog daemon can send them to a different
    file for longer conservation.
2009-05-10 11:57:02 +02:00
Willy Tarreau
8f38bd0497 [MINOR] add basic signal handling functions
These functions will be used to deliver asynchronous signals in order
to make the signal handling functions more robust. The goal is to keep
the same interface to signal handlers.
2009-05-10 09:24:23 +02:00
Maik Broemme
2850cb42b6 [MINOR] add X-Original-To: header
I have attached a patch which will add on every http request a new
header 'X-Original-To'. If you have HAProxy running in transparent mode
with a big number of SQUID servers behind it, it is very nice to have
the original destination ip as a common header to make decisions based
on it.

The whole thing is configurable with a new option 'originalto'. I have
updated the sourcecode as well as the documentation. The 'haproxy-en.txt'
and 'haproxy-fr.txt' files are untouched, due to lack of my french
language knowledge. ;)

Also the patch adds this header for IPv4 only. I haven't any IPv6 test
environment running here and don't know if getsockopt() with SO_ORIGINAL_DST
will work on IPv6. If someone knows it and wants to test it I can modify
the diff. Feel free to ask me questions or things which should be changed. :)

--Maik
2009-05-01 16:22:33 +02:00
Willy Tarreau
3b88d441e9 [MINOR] switch all stat counters to 64-bit
The byte counters have long been 64-bit to avoid overflows. But with
several sites nowadays, we see session counters wrap around every 10-days
or so. So it was the moment to switch counters to 64-bit, including
error and warning counters which can theorically rise as fast as session
counters even if in practice there is very low risk.

The performance impact should not be noticeable since those counters are
only updated once per session. The stats output have been carefully checked
for proper types on both 32- and 64-bit platforms.
2009-04-11 20:44:08 +02:00
Willy Tarreau
40d2516371 [BUILD] add format(printf) to printf-like functions
Doing this helps catching warnings about wrong output formats.
2009-04-03 12:01:47 +02:00
Willy Tarreau
4076a15255 [MEDIUM] http: capture invalid requests/responses even if accepted
It's useful to be able to accept an invalid header name in a request
or response but still be able to monitor further such errors. Now,
when an invalid request/response is received and accepted due to
an "accept-invalid-http-{request|response}" option, the invalid
request will be captured for later analysis with "show errors" on
the stats socket.
2009-04-02 21:36:37 +02:00
Willy Tarreau
32a4ec0ed7 [MEDIUM] http: add options to ignore invalid header names
Sometimes it is required to let invalid requests pass because
applications sometimes take time to be fixed and other servers
do not care. Thus we provide two new options :

     option accept-invalid-http-request  (for the frontend)
     option accept-invalid-http-response (for the backend)

When those options are set, invalid requests or responses do
not cause a 403/502 error to be generated.
2009-04-02 21:36:34 +02:00
Willy Tarreau
61d188920e [MINOR] improve reporting of misplaced acl/reqxxx rules
Now we can detect improper ordering of "block", "reqxxx", "reqadd",
"redirect" and "use_backend", and warn the user accordingly.
2009-03-31 10:49:21 +02:00
Willy Tarreau
e7239b5152 [MINOR] implement ulltoh() to write HTML-formatted numbers
This function sets CSS letter spacing after each 3rd digit. The page must
create a class "rls" (right letter spacing) with style "letter-spacing: 0.3em"
in order to use it.
2009-03-29 13:41:58 +02:00
Willy Tarreau
3884cbaae6 [MINOR] show sess: report number of calls to each task
For debugging purposes, it can be useful to know how many times each
task has been called.
2009-03-28 17:54:35 +01:00
Willy Tarreau
1b194fe03e [OPTIM] buffer: new BF_READ_DONTWAIT flag reduces EAGAIN rates
When the reader does not expect to read lots of data, it can
set BF_READ_DONTWAIT on the request buffer. When it is set,
the stream_sock_read callback will not try to perform multiple
reads, it will return after only one, and clear the flag.
That way, we can immediately return when waiting for an HTTP
request without trying to read again.

On pure request/responses schemes such as monitor-uri or
redirects, this has completely eliminated the EAGAIN occurrences
and the epoll_ctl() calls, resulting in a performance increase of
about 10%. Similar effects should be observed once we support
HTTP keep-alive since we'll immediately disable reads once we
get a full request.
2009-03-21 21:57:30 +01:00
Willy Tarreau
6f4a82c7af [OPTIM] stream_sock: don't retry to read after a large read
If we get very large data at once, it's almost certain that it's
worthless trying to read again, because we got everything we could
get.

Doing this has made all -EAGAIN disappear from splice reads. The
threshold has been put in the global tunable structures so that if
we one day want to make it accessible from user config, it will be
easy to do so.
2009-03-21 20:43:57 +01:00
Willy Tarreau
c7bdf09f9f [MINOR] stats: report number of tasks (active and running)
It may be useful for statistics purposes to report the number of
tasks.
2009-03-21 18:33:52 +01:00
Willy Tarreau
a461318f97 [MINOR] task: keep a task count and clean up task creators
It's sometimes useful at least for statistics to keep a task count.
It's easy to do by forcing the rare task creators to always use the
same functions to create/destroy a task.
2009-03-21 18:13:21 +01:00
Willy Tarreau
26ca34e66e [BUG] scheduler: fix improper handling of duplicates __task_queue()
The top of a duplicate tree is not where bit == -1 but at the most
negative bit. This was causing tasks to be queued in reverse order
within duplicates. While this is not dramatic, it's incorrect and
might lead to longer than expected duplicate depths under some
circumstances.
2009-03-21 12:57:06 +01:00
Willy Tarreau
e35c94a748 [MEDIUM] scheduler: get rid of the 4 trees thanks and use ebtree v4.1
Since we're now able to search from a precise expiration date in
the timer tree using ebtree 4.1, we don't need to maintain 4 trees
anymore. Not only does this simplify the code a lot, but it also
ensures that we can always look 24 days back and ahead, which
doubles the ability of the previous scheduler. Indeed, while based
on absolute values, the timer tree is now relative to <now> as we
can always search from <now>-31 bits.

The run queue uses the exact same principle now, and is now simpler
and a bit faster to process. With these changes alone, an overall
0.5% performance gain was observed.

Tests were performed on the few wrapping cases and everything works
as expected.
2009-03-21 10:25:14 +01:00
Willy Tarreau
5804434a0f [MINOR] update ebtree to version 4.1
Ebtree version 4.1 brings lookup by ranges. This will be useful for
the scheduler.
2009-03-21 10:23:36 +01:00
Willy Tarreau
844553303d [BUG] session: errors were not reported in termination flags in TCP mode
In order to get termination flags properly updated, the session was
relying a bit too much on http_return_srv_error() which is http-centric.

A generic srv_error function was implemented in the session in order to
catch all connection abort situations. It was then noticed that a request
abort during a connection attempt was not reported, which is now fixed.

Read and write errors/timeouts were not logged either. It was necessary
to add those tests at 4 new locations.

Now it looks like everything is correctly logged. Most likely some error
checking code could now be removed from some analysers.
2009-03-15 22:34:05 +01:00
Willy Tarreau
5af24efee9 [CLEANUP] config: catch and report some possibly wrong rule ordering
There are some configurations in which redirect rules are declared
after use_backend rules. We can also find "block" rules after any
of these ones. The processing sequence is :
  - block
  - redirect
  - use_backend

So as of now we try to detect wrong ordering to warn the user about
a possibly undesired behaviour.
2009-03-15 15:23:16 +01:00
Willy Tarreau
ff01a21ebe [MINOR] cfgparse: some cleanups in the consistency checks
Check for servers in health mode, for health mode in pure-backends.
Some code have been refactored for better organization.
2009-03-15 13:46:16 +01:00
Willy Tarreau
e8a28bf165 [MINOR] buffers: implement buffer_flush()
This function will flush the buffer's data, which means that all data
remaining in the buffer will be scheduled for sending.
2009-03-08 21:12:04 +01:00
Willy Tarreau
6f0aa476bd [CLEANUP] buffer_flush() was misleading, rename it as buffer_erase 2009-03-08 20:33:29 +01:00
Willy Tarreau
531cf0cf8d [OPTIM] task: reduce the number of calls to task_queue()
Most of the time, task_queue() will immediately return. By extracting
the preliminary checks and putting them in an inline function, we can
significantly reduce the number of calls to the function itself, and
most of the tests can be optimized away due to the caller's context.

Another minor improvement in process_runnable_tasks() consisted in
taking benefit from the processor's branch prediction unit by making
a special case of the process_session() callback which is by far the
most common one.

All this improved performance by about 1%, mainly during the call
from process_runnable_tasks().
2009-03-08 16:35:27 +01:00
Willy Tarreau
d0a201b35c [CLEANUP] task: distinguish between clock ticks and timers
Timers are unsigned and used as tree positions. Ticks are signed and
used as absolute date within current time frame. While the two are
normally equal (except zero), it's important not to confuse them in
the code as they are not interchangeable.

We add two inline functions to turn each one into the other.

The comments have also been moved to the proper location, as it was
not easy to understand what was a tick and what was a timer unit.
2009-03-08 15:58:07 +01:00
Willy Tarreau
26c250683f [MEDIUM] minor update to the task api: let the scheduler queue itself
All the tasks callbacks had to requeue the task themselves, and update
a global timeout. This was not convenient at all. Now the API has been
simplified. The tasks callbacks only have to update their expire timer,
and return either a pointer to the task or NULL if the task has been
deleted. The scheduler will take care of requeuing the task at the
proper place in the wait queue.
2009-03-08 09:38:41 +01:00
Willy Tarreau
4726f53794 [OPTIM] task: don't unlink a task from a wait queue when waking it up
In many situations, we wake a task on an I/O event, then queue it
exactly where it was. This is a real waste because we delete/insert
tasks into the wait queue for nothing. The only reason for this is
that there was only one tree node in the task struct.

By adding another tree node, we can have one tree for the timers
(wait queue) and one tree for the priority (run queue). That way,
we can have a task both in the run queue and wait queue at the
same time. The wait queue now really holds timers, which is what
it was designed for.

The net gain is at least 1 delete/insert cycle per session, and up
to 2-3 depending on the workload, since we save one cycle each time
the expiration date is not changed during a wake up.
2009-03-08 07:59:18 +01:00
Willy Tarreau
79584225e5 [OPTIM] rate-limit: cleaner behaviour on low rates and reduce consumption
The rate-limit was applied to the smoothed value which does a special
case for frequencies below 2 events per period. This caused irregular
limitations when set to 1 session per second.

The proper way to handle this is to compute the number of remaining
events that can occur without reaching the limit. This is what has
been added. It also has the benefit that the frequency calculation
is now done once when entering event_accept(), before the accept()
loop, and not once per accept() loop anymore, thus saving a few CPU
cycles during very high loads.

With this fix, rate limits of 1/s are perfectly respected.
2009-03-06 09:18:27 +01:00
Willy Tarreau
3a7d20781d [MEDIUM] implement "rate-limit sessions" for the frontend
The new "rate-limit sessions" statement sets a limit on the number of
new connections per second on the frontend. As it is extremely accurate
(about 0.1%), it is efficient at limiting resource abuse or DoS.
2009-03-05 23:48:25 +01:00
Willy Tarreau
7f062c4193 [MEDIUM] measure and report session rate on frontend, backends and servers
With this change, all frontends, backends, and servers maintain a session
counter and a timer to compute a session rate over the last second. This
value will be very useful because it varies instantly and can be used to
check thresholds. This value is also reported in the stats in a new "rate"
column.
2009-03-05 18:43:00 +01:00
Willy Tarreau
755905857a [MINOR] add curr_sec_ms and curr_sec_ms_scaled for current second.
Several algorithms will need to know the millisecond value within
the current second. Instead of doing a divide every time it is needed,
it's better to compute it when it changes, which is when now and now_ms
are recomputed.

curr_sec_ms_scaled is the same multiplied by 2^32/1000, which will be
useful to compute some ratios based on the position within last second.
2009-03-05 16:56:16 +01:00
Willy Tarreau
776cd87e32 [MINOR] time: add __usec_to_1024th to convert usecs to 1024th of second
This function performs a fast conversion from usec to 1024th of a second,
and will be useful for many fast sub-second computations.
2009-03-05 00:34:01 +01:00