Commit Graph

2137 Commits

Author SHA1 Message Date
Willy Tarreau
9d87ca0685 BUILD: tcp: define SOL_TCP when only IPPROTO_TCP exists
FreeBSD prefers to use IPPROTO_TCP over SOL_TCP, just like it does
with their *_IP counterparts. It's worth noting that there are a few
inconsistencies between SOL_TCP and IPPROTO_TCP in the code, eg on
TCP_QUICKACK. The two values are the same but it's worth applying
what implementations recommend.

No backport is needed, this was uncovered by the recent tcp_info stuff.
2016-08-10 21:11:38 +02:00
Willy Tarreau
d2629f293e BUILD: connection: fix build breakage on openbsd due to missing in_systm.h
Recent commit 93b227d ("MINOR: listener: add the "accept-netscaler-cip"
option to the "bind" keyword") introduced an include of netinet/ip.h
which requires in_systm.h on OpenBSD. No backport is needed.
2016-08-10 19:32:33 +02:00
Willy Tarreau
16e015635c MINOR: tcp: add dst_is_local and src_is_local
It is sometimes needed in application server environments to easily tell
if a source is local to the machine or a remote one, without necessarily
knowing all the local addresses (dhcp, vrrp, etc). Similarly in transparent
proxy configurations it is sometimes desired to tell the difference between
local and remote destination addresses.

This patch adds two new sample fetch functions for this :

dst_is_local : boolean
  Returns true if the destination address of the incoming connection is local
  to the system, or false if the address doesn't exist on the system, meaning
  that it was intercepted in transparent mode. It can be useful to apply
  certain rules by default to forwarded traffic and other rules to the traffic
  targetting the real address of the machine. For example the stats page could
  be delivered only on this address, or SSH access could be locally redirected.
  Please note that the check involves a few system calls, so it's better to do
  it only once per connection.

src_is_local : boolean
  Returns true if the source address of the incoming connection is local to the
  system, or false if the address doesn't exist on the system, meaning that it
  comes from a remote machine. Note that UNIX addresses are considered local.
  It can be useful to apply certain access restrictions based on where the
  client comes from (eg: require auth or https for remote machines). Please
  note that the check involves a few system calls, so it's better to do it only
  once per connection.
2016-08-09 16:50:08 +02:00
Willy Tarreau
77128f585c MINOR: sample: provide smp_is_rw() and smp_make_rw()
At some places, smp_dup() is inappropriately called to ensure a modification
is possible while in fact we only need to ensure the sample may be modified
in place. Let's provide smp_is_rw() to check for this capability and
smp_make_rw() to perform the smp_dup() when it is not the case.

Note that smp_is_rw() will also try to add the trailing zero on strings when
needed if possible, to avoid a useless duplication.
2016-08-09 14:30:57 +02:00
Willy Tarreau
2c594794dd MINOR: sample: implement smp_is_safe() and smp_make_safe()
These functions ensure that the designated sample is "safe for use",
which means that its size is known, its length is correct regarding its
size, and that strings are properly zero-terminated.

smp_is_safe() only checks (and optionally sets the trailing zero when
needed and possible). smp_make_safe() will call smp_dup() after
smp_is_safe() fails.
2016-08-09 14:03:36 +02:00
Willy Tarreau
ad63582eb9 BUG/MEDIUM: samples: make smp_dup() always duplicate the sample
Vedran Furac reported a strange problem where the "base" sample fetch
would not always work for tracking purposes.

In fact, it happens that commit bc8c404 ("MAJOR: stick-tables: use sample
types in place of dedicated types") merged in 1.6 exposed a fundamental
bug related to the way samples use chunks as strings. The problem is that
chunks convey a base pointer, a length and an optional size, which may be
zero when unknown or when the chunk is allocated from a read-only location.
The sole purpose of this size is to know whether or not the chunk may be
appended new data. This size cause some semantics issue in the sample,
which has its own SMP_F_CONST flag to indicate read-only contents.

The problem was emphasized by the commit above because it made use of new
calls to smp_dup() to convert a sample to a table key. And since smp_dup()
would only check the SMP_F_CONST flag, it would happily return read-write
samples indicating size=0.

So some tests were added upon smp_dup() return to ensure that the actual
length is smaller than size, but this in fact made things even worse. For
example, the "sni" server directive does some bad stuff on many occasions
because it limits len to size-1 and effectively sets it to -1 and writes
the zero byte before the beginning of the string!

It is therefore obvious that smp_dup() needs to be modified to take this
nature of the chunks into account. It's not enough but is needed. The core
of the problem comes from the fact that smp_dup() is called for 5 distinct
needs which are not always fulfilled :

  1) duplicate a sample to keep a copy of it during some operations
  2) ensure that the sample is rewritable for a converter like upper()
  3) ensure that the sample is terminated with a \0
  4) set a correct size on the sample
  5) grow the sample in case it was extracted from a partial chunk

Case 1 is not used for now, so we can ignore it. Case 2 indicates the wish
to modify the sample, so its R/O status must be removed if any, but there's
no implied requirement that the chunk becomes larger. Case 3 is used when
the sample has to be made compatible with libc's str* functions. There's no
need to make it R/W nor to duplicate it if it is already correct. Case 4
can happen when the sample's size is required (eg: before performing some
changes that must fit in the buffer). Case 5 is more or less similar but
will happen when the sample by be grown but we want to ensure we're not
bound by the current small size.

So the proposal is to have different functions for various operations. One
will ensure a sample is safe for use with str* functions. Another one will
ensure it may be rewritten in place. And smp_dup() will have to perform an
inconditional duplication to guarantee at least #5 above, and implicitly
all other ones.

This patch only modifies smp_dup() to make the duplication inconditional. It
is enough to fix both the "base" sample fetch and the "sni" server directive,
and all use cases in general though not always optimally. More patches will
follow to address them more optimally and even better than the current
situation (eg: avoid a dup just to add a \0 when possible).

The bug comes from an ambiguous design, so its roots are old. 1.6 is affected
and a backport is needed. In 1.5, the function already existed but was only
used by two converters modifying the data in place, so the bug has no effect
there.
2016-08-09 14:03:23 +02:00
Dragan Dosen
1a5d06032b MINOR: standard: add function "escape_string"
Similar to "escape_chunk", this function tries to prefix all characters
tagged in the <map> with the <escape> character. The specified <string>
contains the input to be escaped.
2016-07-26 15:25:32 +02:00
Ruoshan Huang
e4edc6b628 MEDIUM: http: implement http-response track-sc* directive
This enables tracking of sticky counters from current response. The only
difference from "http-request track-sc" is the <key> sample expression
can only make use of samples in response (eg. res.*, status etc.) and
samples below Layer 6.
2016-07-26 14:31:14 +02:00
Thierry FOURNIER
9bd52d478b BUG/MEDIUM: lua: the function txn_done() from action wrapper can crash
If an action wrapper stops the processing of the transaction
with a txn_done() function, the return code of the action is
"continue". So the continue can implies the processing of other
like adding headers. However, the HTTP content is flushed and
a segfault occurs.

This patchs add a flag indicating that the Lua code want to
stop the processing, ths flags is forwarded to the haproxy core,
and other actions are ignored.

Must be backported in 1.6
2016-07-14 16:14:32 +02:00
Thierry FOURNIER
ab00df6cf6 BUG/MEDIUM: lua: the function txn_done() from sample fetches can crash
The function txn_done() ends a transaction. It does not make
sense to call this function from a lua sample-fetch wrapper,
because the role of a sample-fetch is not to terminate a
transaction.

This patch modify the role of the fucntion txn_done() if it
is called from a sample-fetch wrapper, now it just ends the
execution of the Lua code like the done() function.

Must be backported in 1.6
2016-07-14 16:14:24 +02:00
Nenad Merdanovic
8ab79420ba BUG/MINOR: Fix endiness issue in DNS header creation code
Alexander Lebedev reported that the response bit is set on SPARC when
DNS queries are sent. This has been tracked to the endianess issue, so
this patch makes the code portable.

Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
2016-07-13 14:47:58 +02:00
Willy Tarreau
eec1d3869d BUG/MEDIUM: dns: fix alignment issues in the DNS response parser
Alexander Lebedev reported that the DNS parser crashes in 1.6 with a bus
error on Sparc when it receives a response. This is obviously caused by
some alignment issues. The issue can also be reproduced on ARMv5 when
setting /proc/cpu/alignment to 4 (which helps debugging).

Two places cause this crash in turn, the first one is when the IP address
from the packet is compared to the current one, and the second place is
when the address is assigned because an unaligned address is passed to
update_server_addr().

This patch modifies these places to properly use memcpy() and memcmp()
to manipulate the unaligned data.

Nenad Merdanovic found another set of places specific to 1.7 in functions
in_net_ipv4() and in_net_ipv6(), which are used to compare networks. 1.6
has the functions but does not use them. There we perform a temporary copy
to a local variable to fix the problem. The type of the function's argument
is wrong since it's not necessarily aligned, so we change it for a const
void * instead.

This fix must be backported to 1.6. Note that in 1.6 the code is slightly
different, there's no rec[] array, the pointer is used directly from the
buffer.
2016-07-13 12:13:24 +02:00
David Carlier
3015a2eebd CLEANUP: connection: using internal struct to hold source and dest port.
Originally, tcphdr's source and dest from Linux were used to get the
source and port which led to a build issue on BSD oses.
To avoid side problems related to network then we just use an internal
struct as we need only those two fields.
2016-07-05 14:43:05 +02:00
Hubert Verstraete
2eae3a0497 MINOR: new function my_realloc2 = realloc + free upon failure
When realloc fails to allocate memory, the original pointer is not
freed. Sometime people override the original pointer with the pointer
returned by realloc which is NULL in case of failure. This results
in a memory leak because the memory pointed by the original pointer
cannot be freed.
2016-06-29 10:45:15 +02:00
Bertrand Jacquin
9075968356 MINOR: tcp: add "tcp-request connection expect-netscaler-cip layer4"
This configures the client-facing connection to receive a NetScaler
Client IP insertion protocol header before any byte is read from the
socket. This is equivalent to having the "accept-netscaler-cip" keyword
on the "bind" line, except that using the TCP rule allows the PROXY
protocol to be accepted only for certain IP address ranges using an ACL.
This is convenient when multiple layers of load balancers are passed
through by traffic coming from public hosts.
2016-06-20 23:02:47 +02:00
Bertrand Jacquin
93b227db95 MINOR: listener: add the "accept-netscaler-cip" option to the "bind" keyword
When NetScaler application switch is used as L3+ switch, informations
regarding the original IP and TCP headers are lost as a new TCP
connection is created between the NetScaler and the backend server.

NetScaler provides a feature to insert in the TCP data the original data
that can then be consumed by the backend server.

Specifications and documentations from NetScaler:
  https://support.citrix.com/article/CTX205670
  https://www.citrix.com/blogs/2016/04/25/how-to-enable-client-ip-in-tcpip-option-of-netscaler/

When CIP is enabled on the NetScaler, then a TCP packet is inserted just after
the TCP handshake. This is composed as:

  - CIP magic number : 4 bytes
    Both sender and receiver have to agree on a magic number so that
    they both handle the incoming data as a NetScaler Client IP insertion
    packet.

  - Header length : 4 bytes
    Defines the length on the remaining data.

  - IP header : >= 20 bytes if IPv4, 40 bytes if IPv6
    Contains the header of the last IP packet sent by the client during TCP
    handshake.

  - TCP header : >= 20 bytes
    Contains the header of the last TCP packet sent by the client during TCP
    handshake.
2016-06-20 23:02:47 +02:00
Emmanuel Hocdet
5e0e6e409b MINOR: ssl: crt-list parsing factor
LINESIZE and MAX_LINE_ARGS are too low for parsing crt-list.
2016-06-20 17:29:56 +02:00
William Lallemand
72a8a18e89 MEDIUM: dumpstats: make stats_tlskeys_list() yield-aware during tls-keys dump
The previous dump algorithm was not trying to yield when the buffer is
full, it's not a problem with the TLS_TICKETS_NO which is 3 by default
but it can become one if the buffer size is lowered and if the
TLS_TICKETS_NO is increased.

The index of the latest ticket dumped is now stored to ensure we can
resume the dump after a yield.
2016-06-14 19:42:08 +02:00
William Lallemand
cf9e788790 BUG/MEDIUM: dumpstats: undefined behavior in stats_tlskeys_list()
The function stats_tlskeys_list() can meet an undefined behavior when
called with appctx->st2 == STAT_ST_LIST, indeed the ref pointer is used
uninitialized.

However this function was using NULL in appctx->ctx.tlskeys.ref as a
flag to dump every tickets from every references.  A real flag
appctx->ctx.tlskeys.dump_all is now used for this behavior.

This patch delete the 'ref' variable and use appctx->ctx.tlskeys.ref
directly.
2016-06-14 19:41:58 +02:00
Dragan Dosen
e984a0e4fb MINOR: stream: export the function 'smp_create_src_stkctr'
Could be useful outside of this file.
2016-06-13 21:21:51 +02:00
William Lallemand
2e785f23cb MEDIUM: tcp: add 'set-src' to 'tcp-request connection'
The 'set-src' action was not available for tcp actions The action code
has been converted into a function in proto_tcp.c to be used for both
'http-request' and 'tcp-request connection' actions.

Both http and tcp keywords are registered in proto_tcp.c
2016-06-01 11:44:11 +02:00
Willy Tarreau
5f6e9054b9 BUILD: fix build on Solaris 11
htonll()/ntohll() already exist on Solaris 11 with a different declaration,
causing a build error as reported by Jonathan Fisher. They used to exist on
OSX with a #define which allowed us to detect them. It was a bad idea to give
these functions a name subject to conflicts like this. Simply rename them
my_htonll()/my_ntohll() to definitely get rid of the conflict.

This patch must be backported to 1.6.
2016-05-26 07:15:57 +02:00
Lukas Tribus
f2ebcb47cb BUG/MEDIUM: dns: unbreak DNS resolver after header fix
DNS requests (using the internal resolver) are corrupted since commit
e2f8497716 ("BUG/MINOR: dns: fix DNS header definition").

Fix it by defining the struct in network byte order, while complying
with RFC 2535, section 6.1.

First reported by Eduard Vopicka on discourse.

This must be backported to 1.6 (1.6.5 is affected).
2016-05-25 22:39:37 +02:00
Willy Tarreau
58727ec088 BUG/MAJOR: http: fix breakage of "reqdeny" causing random crashes
Commit 108b1dd ("MEDIUM: http: configurable http result codes for
http-request deny") introduced in 1.6-dev2 was incomplete. It introduced
a new field "rule_deny_status" into struct http_txn, which is filled only
by actions "http-request deny" and "http-request tarpit". It's then used
in the deny code path to emit the proper error message, but is used
uninitialized when the deny comes from a "reqdeny" rule, causing random
behaviours ranging from returning a 200, an empty response, or crashing
the process. Often upon startup only 200 was returned but after the fields
are used the crash happens. This can be sped up using -dM.

There's no need at all for storing this status in the http_txn struct
anyway since it's used immediately after being set. Let's store it in
a temporary variable instead which is passed as an argument to function
http_req_get_intercept_rule().

As an extra benefit, removing it from struct http_txn reduced the size
of this struct by 8 bytes.

This fix must be backported to 1.6 where the bug was detected. Special
thanks to Falco Schmutz for his detailed report including an exploitable
core and a reproducer.
2016-05-25 16:23:59 +02:00
Vincent Bernat
6e61589573 BUG/MAJOR: fix listening IP address storage for frontends
When compiled with GCC 6, the IP address specified for a frontend was
ignored and HAProxy was listening on all addresses instead. This is
caused by an incomplete copy of a "struct sockaddr_storage".

With the GNU Libc, "struct sockaddr_storage" is defined as this:

    struct sockaddr_storage
      {
        sa_family_t ss_family;
        unsigned long int __ss_align;
        char __ss_padding[(128 - (2 * sizeof (unsigned long int)))];
      };

Doing an aggregate copy (ss1 = ss2) is different than using memcpy():
only members of the aggregate have to be copied. Notably, padding can be
or not be copied. In GCC 6, some optimizations use this fact and if a
"struct sockaddr_storage" contains a "struct sockaddr_in", the port and
the address are part of the padding (between sa_family and __ss_align)
and can be not copied over.

Therefore, we replace any aggregate copy by a memcpy(). There is another
place using the same pattern. We also fix a function receiving a "struct
sockaddr_storage" by copy instead of by reference. Since it only needs a
read-only copy, the function is converted to request a reference.
2016-05-19 10:43:24 +02:00
Christopher Faulet
3a394fa7cd MEDIUM: filters: Add pre and post analyzer callbacks
'channel_analyze' callback has been removed. Now, there are 2 callbacks to
surround calls to analyzers:

  * channel_pre_analyze: Called BEFORE all filterable analyzers. it can be
    called many times for the same analyzer, once at each loop until the
    analyzer finishes its processing. This callback is resumable, it returns a
    negative value if an error occurs, 0 if it needs to wait, any other value
    otherwise.

  * channel_post_analyze: Called AFTER all filterable analyzers. Here, AFTER
    means when an analyzer finishes its processing. This callback is NOT
    resumable, it returns a negative value if an error occurs, any other value
    otherwise.

Pre and post analyzer callbacks are not automatically called. 'pre_analyzers'
and 'post_analyzers' bit fields in the filter structure must be set to the right
value using AN_* flags (see include/types/channel.h).

The flag AN_RES_ALL has been added (AN_REQ_ALL already exists) to ease the life
of filter developers. AN_REQ_ALL and AN_RES_ALL include all filterable
analyzers.
2016-05-18 15:11:54 +02:00
Christopher Faulet
a9215b7206 MINOR: filters: Simplify calls to analyzers using 2 new macros
Now, to call an analyzer in 'process_stream' function, we should use
FLT_ANALAYZE or ANALYZE macros, depending if this is a filterable analyzer or
not.
2016-05-18 15:11:54 +02:00
Christopher Faulet
1339d744d5 MEDIUM: filters: Move HTTP headers filtering in its own callback
Instead of calling 'channel_analyze' callback with the flag AN_FLT_HTTP_HDRS,
now we use the new callback 'http_headers'. This change is done because
'channel_analyze' callback will be removed in a next commit.
2016-05-18 15:11:54 +02:00
Willy Tarreau
27b639d37f MINOR: log: add the %Td log-format specifier
As suggested by Pavlos, it's too bad that we didn't have a %Td log
format tag given that there are a few mentions of Td corresponding
to the data transmission time already in the doc, so this is now done.
Just like the other specifiers, we report -1 if the connection failed
before reaching the data transmission state.
2016-05-17 18:04:30 +02:00
Maxime de Roucy
dc88785f9c MINOR: add list_append_word function
int list_append_word(struct list *li, const char *str, char **err)

Append a copy of string <str> (inside a wordlist) at the end of
the list <li>.
The caller is responsible for freeing the <err> and <str> copy memory
area using free().

On failure : return 0 and <err> filled with an error message.
2016-05-14 00:00:54 +02:00
Vincent Bernat
e2f8497716 BUG/MINOR: dns: fix DNS header definition
Conforming to RFC 2535, section 6.1. This is not an important bug as
those fields don't seem to be set to something else than 0 and to be
checked on answers.
2016-05-09 11:01:08 +02:00
Cyril Bont
6ca9e01ab2 BUG/MEDIUM: stats: show backend may show an empty or incomplete result
This is the same issue as "show servers state", where the result is incorrect
it the data can't fit in one buffer. The similar fix is applied, to restart
the data processing where it stopped as buffers are sent to the client.

This fix should be backported to haproxy 1.6
2016-05-06 12:28:43 +02:00
Cyril Bont
76a99784f4 BUG/MEDIUM: stats: show servers state may show an empty or incomplete result
It was reported that the unix socket command "show servers state" returned an
empty response while "show servers state <backend>" worked.
In fact, both cases can reproduce the issue. It happens when the response can't
fit in one buffer.

The fix consists in processing the response in several steps, as it is done in
some others commands, by restarting where it was stopped after the buffer is
sent to the client.

This fix should be backported to haproxy 1.6
2016-05-06 12:28:43 +02:00
Willy Tarreau
8bf242b764 BUG/MEDIUM: channel: fix inconsistent handling of 4GB-1 transfers
In 1.4-dev3, commit 31971e5 ("[MEDIUM] add support for infinite forwarding")
made it possible to configure the lower layer to forward data indefinitely
by setting the forward size to CHN_INFINITE_FORWARD (4GB-1). By then larger
chunk sizes were not supported so there was no confusion in the usage of the
function.

Since 1.5 we support 64-bit content-lengths and chunk sizes and the function
has grown to support 64-bit arguments, though it still limits a single pass
to 32-bit quantities (what fit in the channel's to_forward field). The issue
now becomes that a 4GB-1 content-length can be confused with infinite
forwarding (in fact it's 4GB-1+what was already in the buffer). It causes a
visible effect when transferring this exact size because the transfer rate
is lower than with other sizes due in part to the disabling of the Nagle
algorithm on the sendto() call.

In theory with keep-alive it should prevent a second request from being
processed after such a transfer, but since the analysers are still present,
the forwarding analyser properly counts down the remaining size to transfer
and ultimately the transaction gets correctly reset so there is no visible
effect.

Since the root cause of the issue is an API problem (lack of distinction
between a real valid length and a magic value), this patch modifies the API
to have a new dedicated function called channel_forward_forever() to program
a permanent forwarding. The existing function __channel_forward() was modified
to properly take care of the requested sizes and ensure it 1) never overflows
and 2) never reaches CHN_INFINITE_FORWARD by accident.

It is worth noting that the function used to have a bug causing a 2GB
forward to be scheduled if it was called with less data than what is present
in buf->i. Fortunately this bug couldn't be triggered with existing code.

This fix should be backported to 1.6 and 1.5. While it also theorically
affects 1.4, it's better not to backport it there, as the risk of breaking
large object transfers due to significant API differences is high, compared
to the fact that the largest supported objects (4GB-1) are just slower to
transfer.
2016-05-04 15:26:37 +02:00
Willy Tarreau
ef907fee12 BUG/MAJOR: channel: fix miscalculation of available buffer space (4th try)
Unfortunately, commit 169c470 ("BUG/MEDIUM: channel: fix miscalculation of
available buffer space (3rd try)") was still not enough to completely
address the issue. It fell into an integer comparison trap. Contrary to
expectations, chn->to_forward may also have the sign bit set when
forwarding regular data having a large content-length, resulting in
an incomplete check of the result and of the reserve because the with
to_forward very large, to_forward+o could become very small and also
the reserve could become positive again and make channel_recv_limit()
return a negative value.

One way to reproduce this situation is to transfer a large file (> 2GB)
with http-keep-alive or http-server-close, without splicing, and ensure
that the server uses content-length instead of chunks. The transfer
should stall very early after the first buffer has been transferred
to the client.

This fix now properly checks 1) for an overflow caused by summing o and
to_forward, and 2) for o+to_forward being smaller or larger than maxrw
before performing the subtract, so that all sensitive operations are
properly performed on 33-bit arithmetics.

The code was subjected again to a series of tests using inject+httpterm
scanning a wide range of object sizes (+10MB after each new request) :

  $ printf "new page 1\nget 127.0.0.1:8002 / s=%%s0m\n" | \
      inject64 -o 1 -u 1 -f /dev/stdin

With previous fix, the transfer would suddenly stop when reaching 2GB :

   hits ^hits hits/s  ^h/s     bytes  kB/s  last  errs  tout htime  sdht ptime
    203     1      2     1 216816173354 2710202 3144892     0     0 685.0 0.0 685.0
    205     2      2     2 219257283186 2706880 2441109     0     0 679.5 6.5 679.5
    205     0      2     0 219257283186 2673836     0     0     0 0.0 0.0 0.0
    205     0      2     0 219257283186 2641622     0     0     0 0.0 0.0 0.0
    205     0      2     0 219257283186 2610174     0     0     0 0.0 0.0 0.0

Now it's fine even past 4 GB.

Many thanks to Vedran Furac for reporting this issue early with a common
access pattern helping to troubleshoot this.

This fix must be backported to 1.6 and 1.5 where the commit above was
already backported.
2016-05-03 17:58:03 +02:00
Willy Tarreau
55e58f2334 MINOR: channel: add new function channel_congested()
This function returns non-zero if the channel is congested with data in
transit waiting for leaving, indicating to the caller that it should wait
for the reserve to be released before starting to process new data in
case it needs the ability to append data. This is meant to be used while
waiting for a clean response buffer before processing a request.
2016-05-02 16:39:22 +02:00
Thierry Fournier
3610c39c8c MINOR: filters: add opaque data
Add opaque data between the filter keyword registrering and the parsing
function. This opaque data allow to use the same parser with differents
registered keywords. The opaque data is used for giving data which mainly
makes difference between the two keywords.

It will be used with Lua keywords registering.
2016-04-27 10:48:15 +02:00
Nenad Merdanovic
174dd37d88 MINOR: Add ability for agent-check to set server maxconn
This is very useful in complex architecture systems where HAproxy
is balancing DB connections for example. We want to keep the maxconn
high in order to avoid issues with queueing on the LB level when
there is slowness on another part of the system. Example is a case of
an architecture where each thread opens multiple DB connections, which
if get stuck in queue cause a snowball effect (old connections aren't
closed, new ones cannot be established). These connections are mostly
idle and the DB server has no problem handling thousands of them.

Allowing us to dynamically set maxconn depending on the backend usage
(LA, CPU, memory, etc.) enables us to have high maxconn for situations
like above, but lowering it in case there are real issues where the
backend servers become overloaded (cache issues, DB gets hit hard).
2016-04-25 17:23:50 +02:00
Willy Tarreau
169c47028a BUG/MEDIUM: channel: fix miscalculation of available buffer space (3rd try)
Latest fix 8a32106 ("BUG/MEDIUM: channel: fix miscalculation of available
buffer space (2nd try)") did happen to fix some observable issues but not
all of them in fact, some corner cases still remained and at least one user
reported a busy loop that appeared possible, though not easily reproducible
under experimental conditions.

The remaining issue is that we still consider min(i, to_fwd) as the number
of bytes in transit, but in fact <i> is not relevant here. Indeed, what
matters is that we can read everything we want at once provided that at
the end, <i> cannot be larger than <size-maxrw> (if it was not already).

This is visible in two cases :
  - let's have i=o=max/2 and to_fwd=0. Then i+o >= max indicates that the
    buffer is already full, while it is not since once <o> is forwarded,
    some space remains.

  - when to_fwd is much larger than i, it's obvious that we can fill the
    buffer.

The only relevant part in fact is o + to_fwd. to_fwd will ensure that at
least this many bytes will be moved from <i> to <o> hence will leave the
buffer, whatever the number of rounds it takes.

Interestingly, the fix applied here ensures that channel_recv_max() will
now equal (size - maxrw - i + to_fwd), which is indeed what remains
available below maxrw after to_fwd bytes are forwarded from i to o and
leave the buffer.

Additionally, the latest fix made it possible to meet an integer overflow
that was not caught by the range test when forwarding in TCP or tunnel
mode due to to_forward being added to an existing value, causing the
buffer size to be limited when it should not have been, resulting in 2
to 3 recv() calls when a single one was enough. The first one was limited
to the unreserved buffer size, the second one to the size of the reserve
minus 1, and the last one to the last byte. Eg with a 2kB buffer :

recvfrom(22, "HTTP/1.1 200\r\nConnection: close\r"..., 1024, 0, NULL, NULL) = 1024
recvfrom(22, "23456789.123456789.123456789.123"..., 1023, 0, NULL, NULL) = 1023
recvfrom(22, "5", 1, 0, NULL, NULL)     = 1

This bug is still present in 1.6 and 1.5 so the fix should be backported
there.
2016-04-21 18:06:08 +02:00
Willy Tarreau
93dc478a04 BUG/MEDIUM: channel: incorrect polling condition may delay event delivery
The condition to poll for receive as implemented in channel_may_recv()
is still incorrect. If buf->o is null and buf->i is slightly larger than
chn->to_forward and at least as large as buf->size - maxrewrite, then
reading will be disabled. It may slightly delay some data delivery by
having first to forward pending bytes, but may also cause some random
issues with analysers that wait for some data before starting to forward
what they correctly parsed. For instance, a body analyser may be prevented
from seeing the data that only fits in the reserve.

This bug may also prevent an applet's chk_rcv() function from being called
when part of a buffer is released. It is possible (though not verified)
that this participated to some peers frozen session issues some people
have been facing.

This fix should be backported to 1.6 and 1.5 to ensure better coherency
with channel_recv_limit().
2016-04-21 17:03:46 +02:00
Willy Tarreau
4b46a3e8cc BUG/MEDIUM: channel: don't allow to overwrite the reserve until connected
Commit 9c06ee4 ("BUG/MEDIUM: channel: don't schedule data in transit for
leaving until connected") took care of an issue involving POST in conjunction
with http-send-name-header, where we absolutely never want to touch the
reserve until we're sure not to touch the buffer contents anymore, which
is indicated by the output stream-interface being connected.

But channel_may_recv() was not equipped with such a test, so in some
situations it might decide that it is possible to poll for reads, and
later channel_recv_limit() will decide it's not possible to read,
causing a loop. So we must add a similar test there.

Since the fix above was backported to 1.6 and 1.5, this fix must as well.
2016-04-21 15:31:22 +02:00
Christopher Faulet
b3f4e14932 MINOR: filters: Print the list of existing filters during HA startup
This is done  in verbose/debug mode and when build options are reported.
2016-04-21 06:58:08 +02:00
Willy Tarreau
7a798e5d6b CLEANUP: fix inconsistency between fd->iocb, proto->accept and accept()
There's quite some inconsistency in the internal API. listener_accept()
which is the main accept() function returns void but is declared as int
in the include file. It's assigned to proto->accept() for all stream
protocols where an int is expected but the result is never checked (nor
is it documented by the way). This proto->accept() is in turn assigned
to fd->iocb() which is supposed to return an int composed of FD_WAIT_*
flags, but which is never checked either.

So let's fix all this mess :
  - nobody checks accept()'s return
  - nobody checks iocb()'s return
  - nobody sets a return value

=> let's mark all these functions void and keep the current ones intact.

Additionally we now include listener.h from listener.c to ensure we won't
silently hide this incoherency in the future.

Note that this patch could/should be backported to 1.6 and even 1.5 to
simplify debugging sessions.
2016-04-14 11:18:22 +02:00
Willy Tarreau
8a32106fff BUG/MEDIUM: channel: fix miscalculation of available buffer space (2nd try)
Commit 999f643 ("BUG/MEDIUM: channel: fix miscalculation of available buffer
space.") introduced a bug which made output data to be ignored when computing
the remaining room in a buffer. The problem is that channel_may_recv()
properly considers them and may declare that the FD may be polled for read
events, but once the even strikes, channel_recv_limit() called before recv()
says the opposite. In 1.6 and later this case is automatically caught by
polling loop detection at the connection level and is harmless. But the
backport in 1.5 ends up with a busy polling loop as soon as it becomes
possible to have a buffer with this conflict. In order to reproduce it, it
is necessary to have less than [maxrewrite] bytes available in a buffer, no
forwarding enabled (end of transfer) and [buf->o >= maxrewrite - free space].

Since this heavily depends on socket buffers, it will randomly strike users.
On 1.5 with 8kB buffers it was possible to reproduce it with httpterm using
the following command line :

   $ (printf "GET /?s=675000 HTTP/1.0\r\n\r\n"; sleep 60) | \
       nc6 --rcvbuf-size 1 --send-only 127.0.0.1 8002

This bug is only medium in 1.6 and later but is major in the 1.5 backport,
so it must be backported there.

Thanks to Nenad Merdanovic and Janusz Dziemidowicz for reporting this issue
with enough elements to help understand it.
2016-04-11 17:13:35 +02:00
Willy Tarreau
f3764b7993 MEDIUM: proxy: use dynamic allocation for error dumps
There are two issues with error captures. The first one is that the
capture size is still hard-coded to BUFSIZE regardless of any possible
tune.bufsize setting and of the fact that frontends only capture request
errors and that backends only capture response errors. The second is that
captures are allocated in both directions for all proxies, which start to
count a lot in configs using thousands of proxies.

This patch changes this so that error captures are allocated only when
needed, and of the proper size. It also refrains from dumping a buffer
that was not allocated, which still allows to emit all relevant info
such as flags and HTTP states. This way it is possible to save up to
32 kB of RAM per proxy in the default configuration.
2016-03-31 13:49:23 +02:00
Thierry Fournier
ff480424ab MINOR: lua: add class listener
This class provides the access to the listener struct, it allows
some manipulations and retrieve informations.
2016-03-30 18:43:47 +02:00
Thierry Fournier
f2fdc9dc39 MINOR: lua: add class server
This class provides the access to the server struct, it allows
some manipulations and retrieve informations.
2016-03-30 18:43:47 +02:00
Thierry Fournier
f61aa6356e MINOR: lua: add class proxy
This class provides the access to the proxy struct, it allows
some manipulations and retrieve informations.
2016-03-30 18:43:42 +02:00
Thierry Fournier
d0a56c2953 MINOR: dumpstats: split stats_dump_be_stats() in two parts
This patch splits the function stats_dump_be_stats() in two parts. The
part is called stats_fill_be_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
2016-03-30 17:26:19 +02:00
Thierry Fournier
61fe6c0adb MINOR: dumpstats: split stats_dump_sv_stats() in two parts
This patch splits the function stats_dump_sv_stats() in two parts. The
extracted part is called stats_fill_sv_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
2016-03-30 17:26:09 +02:00