The previous dump algorithm was not trying to yield when the buffer is
full, it's not a problem with the TLS_TICKETS_NO which is 3 by default
but it can become one if the buffer size is lowered and if the
TLS_TICKETS_NO is increased.
The index of the latest ticket dumped is now stored to ensure we can
resume the dump after a yield.
The function stats_tlskeys_list() can meet an undefined behavior when
called with appctx->st2 == STAT_ST_LIST, indeed the ref pointer is used
uninitialized.
However this function was using NULL in appctx->ctx.tlskeys.ref as a
flag to dump every tickets from every references. A real flag
appctx->ctx.tlskeys.dump_all is now used for this behavior.
This patch delete the 'ref' variable and use appctx->ctx.tlskeys.ref
directly.
The 'set-src' action was not available for tcp actions The action code
has been converted into a function in proto_tcp.c to be used for both
'http-request' and 'tcp-request connection' actions.
Both http and tcp keywords are registered in proto_tcp.c
htonll()/ntohll() already exist on Solaris 11 with a different declaration,
causing a build error as reported by Jonathan Fisher. They used to exist on
OSX with a #define which allowed us to detect them. It was a bad idea to give
these functions a name subject to conflicts like this. Simply rename them
my_htonll()/my_ntohll() to definitely get rid of the conflict.
This patch must be backported to 1.6.
DNS requests (using the internal resolver) are corrupted since commit
e2f8497716 ("BUG/MINOR: dns: fix DNS header definition").
Fix it by defining the struct in network byte order, while complying
with RFC 2535, section 6.1.
First reported by Eduard Vopicka on discourse.
This must be backported to 1.6 (1.6.5 is affected).
Commit 108b1dd ("MEDIUM: http: configurable http result codes for
http-request deny") introduced in 1.6-dev2 was incomplete. It introduced
a new field "rule_deny_status" into struct http_txn, which is filled only
by actions "http-request deny" and "http-request tarpit". It's then used
in the deny code path to emit the proper error message, but is used
uninitialized when the deny comes from a "reqdeny" rule, causing random
behaviours ranging from returning a 200, an empty response, or crashing
the process. Often upon startup only 200 was returned but after the fields
are used the crash happens. This can be sped up using -dM.
There's no need at all for storing this status in the http_txn struct
anyway since it's used immediately after being set. Let's store it in
a temporary variable instead which is passed as an argument to function
http_req_get_intercept_rule().
As an extra benefit, removing it from struct http_txn reduced the size
of this struct by 8 bytes.
This fix must be backported to 1.6 where the bug was detected. Special
thanks to Falco Schmutz for his detailed report including an exploitable
core and a reproducer.
When compiled with GCC 6, the IP address specified for a frontend was
ignored and HAProxy was listening on all addresses instead. This is
caused by an incomplete copy of a "struct sockaddr_storage".
With the GNU Libc, "struct sockaddr_storage" is defined as this:
struct sockaddr_storage
{
sa_family_t ss_family;
unsigned long int __ss_align;
char __ss_padding[(128 - (2 * sizeof (unsigned long int)))];
};
Doing an aggregate copy (ss1 = ss2) is different than using memcpy():
only members of the aggregate have to be copied. Notably, padding can be
or not be copied. In GCC 6, some optimizations use this fact and if a
"struct sockaddr_storage" contains a "struct sockaddr_in", the port and
the address are part of the padding (between sa_family and __ss_align)
and can be not copied over.
Therefore, we replace any aggregate copy by a memcpy(). There is another
place using the same pattern. We also fix a function receiving a "struct
sockaddr_storage" by copy instead of by reference. Since it only needs a
read-only copy, the function is converted to request a reference.
'channel_analyze' callback has been removed. Now, there are 2 callbacks to
surround calls to analyzers:
* channel_pre_analyze: Called BEFORE all filterable analyzers. it can be
called many times for the same analyzer, once at each loop until the
analyzer finishes its processing. This callback is resumable, it returns a
negative value if an error occurs, 0 if it needs to wait, any other value
otherwise.
* channel_post_analyze: Called AFTER all filterable analyzers. Here, AFTER
means when an analyzer finishes its processing. This callback is NOT
resumable, it returns a negative value if an error occurs, any other value
otherwise.
Pre and post analyzer callbacks are not automatically called. 'pre_analyzers'
and 'post_analyzers' bit fields in the filter structure must be set to the right
value using AN_* flags (see include/types/channel.h).
The flag AN_RES_ALL has been added (AN_REQ_ALL already exists) to ease the life
of filter developers. AN_REQ_ALL and AN_RES_ALL include all filterable
analyzers.
Now, to call an analyzer in 'process_stream' function, we should use
FLT_ANALAYZE or ANALYZE macros, depending if this is a filterable analyzer or
not.
Instead of calling 'channel_analyze' callback with the flag AN_FLT_HTTP_HDRS,
now we use the new callback 'http_headers'. This change is done because
'channel_analyze' callback will be removed in a next commit.
As suggested by Pavlos, it's too bad that we didn't have a %Td log
format tag given that there are a few mentions of Td corresponding
to the data transmission time already in the doc, so this is now done.
Just like the other specifiers, we report -1 if the connection failed
before reaching the data transmission state.
int list_append_word(struct list *li, const char *str, char **err)
Append a copy of string <str> (inside a wordlist) at the end of
the list <li>.
The caller is responsible for freeing the <err> and <str> copy memory
area using free().
On failure : return 0 and <err> filled with an error message.
Conforming to RFC 2535, section 6.1. This is not an important bug as
those fields don't seem to be set to something else than 0 and to be
checked on answers.
This is the same issue as "show servers state", where the result is incorrect
it the data can't fit in one buffer. The similar fix is applied, to restart
the data processing where it stopped as buffers are sent to the client.
This fix should be backported to haproxy 1.6
It was reported that the unix socket command "show servers state" returned an
empty response while "show servers state <backend>" worked.
In fact, both cases can reproduce the issue. It happens when the response can't
fit in one buffer.
The fix consists in processing the response in several steps, as it is done in
some others commands, by restarting where it was stopped after the buffer is
sent to the client.
This fix should be backported to haproxy 1.6
In 1.4-dev3, commit 31971e5 ("[MEDIUM] add support for infinite forwarding")
made it possible to configure the lower layer to forward data indefinitely
by setting the forward size to CHN_INFINITE_FORWARD (4GB-1). By then larger
chunk sizes were not supported so there was no confusion in the usage of the
function.
Since 1.5 we support 64-bit content-lengths and chunk sizes and the function
has grown to support 64-bit arguments, though it still limits a single pass
to 32-bit quantities (what fit in the channel's to_forward field). The issue
now becomes that a 4GB-1 content-length can be confused with infinite
forwarding (in fact it's 4GB-1+what was already in the buffer). It causes a
visible effect when transferring this exact size because the transfer rate
is lower than with other sizes due in part to the disabling of the Nagle
algorithm on the sendto() call.
In theory with keep-alive it should prevent a second request from being
processed after such a transfer, but since the analysers are still present,
the forwarding analyser properly counts down the remaining size to transfer
and ultimately the transaction gets correctly reset so there is no visible
effect.
Since the root cause of the issue is an API problem (lack of distinction
between a real valid length and a magic value), this patch modifies the API
to have a new dedicated function called channel_forward_forever() to program
a permanent forwarding. The existing function __channel_forward() was modified
to properly take care of the requested sizes and ensure it 1) never overflows
and 2) never reaches CHN_INFINITE_FORWARD by accident.
It is worth noting that the function used to have a bug causing a 2GB
forward to be scheduled if it was called with less data than what is present
in buf->i. Fortunately this bug couldn't be triggered with existing code.
This fix should be backported to 1.6 and 1.5. While it also theorically
affects 1.4, it's better not to backport it there, as the risk of breaking
large object transfers due to significant API differences is high, compared
to the fact that the largest supported objects (4GB-1) are just slower to
transfer.
Unfortunately, commit 169c470 ("BUG/MEDIUM: channel: fix miscalculation of
available buffer space (3rd try)") was still not enough to completely
address the issue. It fell into an integer comparison trap. Contrary to
expectations, chn->to_forward may also have the sign bit set when
forwarding regular data having a large content-length, resulting in
an incomplete check of the result and of the reserve because the with
to_forward very large, to_forward+o could become very small and also
the reserve could become positive again and make channel_recv_limit()
return a negative value.
One way to reproduce this situation is to transfer a large file (> 2GB)
with http-keep-alive or http-server-close, without splicing, and ensure
that the server uses content-length instead of chunks. The transfer
should stall very early after the first buffer has been transferred
to the client.
This fix now properly checks 1) for an overflow caused by summing o and
to_forward, and 2) for o+to_forward being smaller or larger than maxrw
before performing the subtract, so that all sensitive operations are
properly performed on 33-bit arithmetics.
The code was subjected again to a series of tests using inject+httpterm
scanning a wide range of object sizes (+10MB after each new request) :
$ printf "new page 1\nget 127.0.0.1:8002 / s=%%s0m\n" | \
inject64 -o 1 -u 1 -f /dev/stdin
With previous fix, the transfer would suddenly stop when reaching 2GB :
hits ^hits hits/s ^h/s bytes kB/s last errs tout htime sdht ptime
203 1 2 1 216816173354 2710202 3144892 0 0 685.0 0.0 685.0
205 2 2 2 219257283186 2706880 2441109 0 0 679.5 6.5 679.5
205 0 2 0 219257283186 2673836 0 0 0 0.0 0.0 0.0
205 0 2 0 219257283186 2641622 0 0 0 0.0 0.0 0.0
205 0 2 0 219257283186 2610174 0 0 0 0.0 0.0 0.0
Now it's fine even past 4 GB.
Many thanks to Vedran Furac for reporting this issue early with a common
access pattern helping to troubleshoot this.
This fix must be backported to 1.6 and 1.5 where the commit above was
already backported.
This function returns non-zero if the channel is congested with data in
transit waiting for leaving, indicating to the caller that it should wait
for the reserve to be released before starting to process new data in
case it needs the ability to append data. This is meant to be used while
waiting for a clean response buffer before processing a request.
Add opaque data between the filter keyword registrering and the parsing
function. This opaque data allow to use the same parser with differents
registered keywords. The opaque data is used for giving data which mainly
makes difference between the two keywords.
It will be used with Lua keywords registering.
This is very useful in complex architecture systems where HAproxy
is balancing DB connections for example. We want to keep the maxconn
high in order to avoid issues with queueing on the LB level when
there is slowness on another part of the system. Example is a case of
an architecture where each thread opens multiple DB connections, which
if get stuck in queue cause a snowball effect (old connections aren't
closed, new ones cannot be established). These connections are mostly
idle and the DB server has no problem handling thousands of them.
Allowing us to dynamically set maxconn depending on the backend usage
(LA, CPU, memory, etc.) enables us to have high maxconn for situations
like above, but lowering it in case there are real issues where the
backend servers become overloaded (cache issues, DB gets hit hard).
Latest fix 8a32106 ("BUG/MEDIUM: channel: fix miscalculation of available
buffer space (2nd try)") did happen to fix some observable issues but not
all of them in fact, some corner cases still remained and at least one user
reported a busy loop that appeared possible, though not easily reproducible
under experimental conditions.
The remaining issue is that we still consider min(i, to_fwd) as the number
of bytes in transit, but in fact <i> is not relevant here. Indeed, what
matters is that we can read everything we want at once provided that at
the end, <i> cannot be larger than <size-maxrw> (if it was not already).
This is visible in two cases :
- let's have i=o=max/2 and to_fwd=0. Then i+o >= max indicates that the
buffer is already full, while it is not since once <o> is forwarded,
some space remains.
- when to_fwd is much larger than i, it's obvious that we can fill the
buffer.
The only relevant part in fact is o + to_fwd. to_fwd will ensure that at
least this many bytes will be moved from <i> to <o> hence will leave the
buffer, whatever the number of rounds it takes.
Interestingly, the fix applied here ensures that channel_recv_max() will
now equal (size - maxrw - i + to_fwd), which is indeed what remains
available below maxrw after to_fwd bytes are forwarded from i to o and
leave the buffer.
Additionally, the latest fix made it possible to meet an integer overflow
that was not caught by the range test when forwarding in TCP or tunnel
mode due to to_forward being added to an existing value, causing the
buffer size to be limited when it should not have been, resulting in 2
to 3 recv() calls when a single one was enough. The first one was limited
to the unreserved buffer size, the second one to the size of the reserve
minus 1, and the last one to the last byte. Eg with a 2kB buffer :
recvfrom(22, "HTTP/1.1 200\r\nConnection: close\r"..., 1024, 0, NULL, NULL) = 1024
recvfrom(22, "23456789.123456789.123456789.123"..., 1023, 0, NULL, NULL) = 1023
recvfrom(22, "5", 1, 0, NULL, NULL) = 1
This bug is still present in 1.6 and 1.5 so the fix should be backported
there.
The condition to poll for receive as implemented in channel_may_recv()
is still incorrect. If buf->o is null and buf->i is slightly larger than
chn->to_forward and at least as large as buf->size - maxrewrite, then
reading will be disabled. It may slightly delay some data delivery by
having first to forward pending bytes, but may also cause some random
issues with analysers that wait for some data before starting to forward
what they correctly parsed. For instance, a body analyser may be prevented
from seeing the data that only fits in the reserve.
This bug may also prevent an applet's chk_rcv() function from being called
when part of a buffer is released. It is possible (though not verified)
that this participated to some peers frozen session issues some people
have been facing.
This fix should be backported to 1.6 and 1.5 to ensure better coherency
with channel_recv_limit().
Commit 9c06ee4 ("BUG/MEDIUM: channel: don't schedule data in transit for
leaving until connected") took care of an issue involving POST in conjunction
with http-send-name-header, where we absolutely never want to touch the
reserve until we're sure not to touch the buffer contents anymore, which
is indicated by the output stream-interface being connected.
But channel_may_recv() was not equipped with such a test, so in some
situations it might decide that it is possible to poll for reads, and
later channel_recv_limit() will decide it's not possible to read,
causing a loop. So we must add a similar test there.
Since the fix above was backported to 1.6 and 1.5, this fix must as well.
There's quite some inconsistency in the internal API. listener_accept()
which is the main accept() function returns void but is declared as int
in the include file. It's assigned to proto->accept() for all stream
protocols where an int is expected but the result is never checked (nor
is it documented by the way). This proto->accept() is in turn assigned
to fd->iocb() which is supposed to return an int composed of FD_WAIT_*
flags, but which is never checked either.
So let's fix all this mess :
- nobody checks accept()'s return
- nobody checks iocb()'s return
- nobody sets a return value
=> let's mark all these functions void and keep the current ones intact.
Additionally we now include listener.h from listener.c to ensure we won't
silently hide this incoherency in the future.
Note that this patch could/should be backported to 1.6 and even 1.5 to
simplify debugging sessions.
Commit 999f643 ("BUG/MEDIUM: channel: fix miscalculation of available buffer
space.") introduced a bug which made output data to be ignored when computing
the remaining room in a buffer. The problem is that channel_may_recv()
properly considers them and may declare that the FD may be polled for read
events, but once the even strikes, channel_recv_limit() called before recv()
says the opposite. In 1.6 and later this case is automatically caught by
polling loop detection at the connection level and is harmless. But the
backport in 1.5 ends up with a busy polling loop as soon as it becomes
possible to have a buffer with this conflict. In order to reproduce it, it
is necessary to have less than [maxrewrite] bytes available in a buffer, no
forwarding enabled (end of transfer) and [buf->o >= maxrewrite - free space].
Since this heavily depends on socket buffers, it will randomly strike users.
On 1.5 with 8kB buffers it was possible to reproduce it with httpterm using
the following command line :
$ (printf "GET /?s=675000 HTTP/1.0\r\n\r\n"; sleep 60) | \
nc6 --rcvbuf-size 1 --send-only 127.0.0.1 8002
This bug is only medium in 1.6 and later but is major in the 1.5 backport,
so it must be backported there.
Thanks to Nenad Merdanovic and Janusz Dziemidowicz for reporting this issue
with enough elements to help understand it.
There are two issues with error captures. The first one is that the
capture size is still hard-coded to BUFSIZE regardless of any possible
tune.bufsize setting and of the fact that frontends only capture request
errors and that backends only capture response errors. The second is that
captures are allocated in both directions for all proxies, which start to
count a lot in configs using thousands of proxies.
This patch changes this so that error captures are allocated only when
needed, and of the proper size. It also refrains from dumping a buffer
that was not allocated, which still allows to emit all relevant info
such as flags and HTTP states. This way it is possible to save up to
32 kB of RAM per proxy in the default configuration.
This patch splits the function stats_dump_be_stats() in two parts. The
part is called stats_fill_be_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_sv_stats() in two parts. The
extracted part is called stats_fill_sv_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_li_stats() in two parts. The
extracted part is called stats_fill_li_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_fe_stats() in two parts. The
extracted part is called stats_fill_fe_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_info_to_buffer() in two parts. The
extracted part is called stats_fill_info(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch adds a Lua post initialisation wrapper. It already exists for
pure Lua function, now it executes also C. It is useful for doing things
when the configuration is ready to use. For example we can can browse and
register all the proxies.
All the HAProxy Lua object are declared with the same pattern:
- Add the function __tosting which dumps the object name
- Register the name in the Lua REGISTRY
- Register the reference ID
These action are refactored in on function. This remove some
lines of code.
The functions
- hlua_class_const_int()
- hlua_class_const_str()
- hlua_class_function()
are use for common class registration actions.
The function 'hlua_dump_object()' is generic dump name function.
These functions can be used by all the HAProxy objects, so I move
it into the safe functions file.
Avoiding harmful memcpy call if the allocation failed.
Resetting the size which avoids further harmful freeing
invalid pointer. Closer to the comment behavior description.
The strftime() function can call tzset() internally on some platforms.
When haproxy is chrooted, the /etc/localtime file is not found, and some
implementations will clobber the content of the current timezone.
The GMT offset is computed by diffing the times returned by gmtime_r() and
localtime_r(). These variants are guaranteed to not call tzset() and were
already used in haproxy while chrooted, so they should be safe.
This patch must be backported to 1.6 and 1.5.
GMT offset used in local time formats was computed at startup, but was not updated when DST status changed while running.
For example these two RFC5424 syslog traces where emitted 5 seconds apart, just before and after DST changed:
<14>1 2016-03-27T01:59:58+01:00 bunch-VirtualBox haproxy 2098 - - Connect ...
<14>1 2016-03-27T03:00:03+01:00 bunch-VirtualBox haproxy 2098 - - Connect ...
It looked like they were emitted more than 1 hour apart, unlike with the fix:
<14>1 2016-03-27T01:59:58+01:00 bunch-VirtualBox haproxy 3381 - - Connect ...
<14>1 2016-03-27T03:00:03+02:00 bunch-VirtualBox haproxy 3381 - - Connect ...
This patch should be backported to 1.6 and partially to 1.5 (no fix needed in log.c).
This emits the field positions, names and types. It is more convenient
than the default output for a parser that doesn't know all the fields. It
simply relies on stats_emit_typed_data_field() and stats_emit_field_tags()
added by previous patch for the output. A new stats format flag was added,
STAT_FMT_TYPED, which is set when the "typed" keyword is specified on the
CLI.
New function stats_emit_typed_data_field() does exactly like
stats_emit_raw_data_field() except that it also prints the data
type after a colon. This will be used to print using the typed
format.
And function stats_emit_field_tags() appends a 3-letter code
describing the origin, nature, and scope, followed by an optional
delimiter. This will be particularly convenient to dump typed
data.
It's easier to have a new flag in <flags> to indicate whether or not we
want to display the admin column in HTML dumps. We already have similar
flags to show the version or the legends.
We're preparing for various data types for each stats field as they
appear in the CSV output. For now we only cover the regular types handled
by printf, so we have 32 and 64 bit ints and counters, strings, and of
course "empty" to indicate that there's nothing in the field and which
guarantees that any accessed entry will return 0.
More types will surely come later so that some fields are properly
represented. For example, we could see limits where only the value 0
doesn't show up, or human time, etc.
This is the continuation of previous patch called "BUG/MAJOR: samples:
check smp->strm before using it".
It happens that variables may have a session-wide scope, and that their
session is retrieved by dereferencing the stream. But nothing prevents them
from being used from a streamless context such as tcp-request connection,
thus crashing the process. Example :
tcp-request connection accept if { src,set-var(sess.foo) -m found }
In order to fix this, we have to always ensure that variable manipulation
only happens via the sample, which contains the correct owner and context,
and that we never use one from a different source. This results in quite a
large change since a lot of functions are inderctly involved in the call
chain, but the change is easy to follow.
This fix must be backported to 1.6, and requires the last two patches.