Till now the callers had to know which one to call for specific use cases.
Let's fuse them now since a single one will remain after the API migration.
Given that bi_del() may only be used where o==0, just combine the two tests
by first removing output data then only input.
This will be important so that we can parse a buffer without touching it.
Now we indicate where from the buffer's head we plan to start to copy, and
for how many bytes. This will be used by send functions to loop at the end
of the buffer without having to update the buffer's output byte count.
This new functoin limits itself to the amount of data available in the
buffer and doesn't care about the direction anymore. It's only called
from co_getblk() which already checks that no more than the available
output bytes is requested.
These ones were merged into a single b_contig_space() that covers both
(the bo_ case was a simplified version of the other one). The function
doesn't use ->i nor ->o anymore.
This function was sometimes used from a channel and sometimes from a buffer.
In both cases it requires knowledge of the size of the output data (to skip
them). Here the split ensures the channel can deal with this point, and that
other places not having output data can continue to work.
These ones manipulate the output data count which will be specific to
the channel soon, so prepare the call points to use the channel only.
The b_* functions are now unused and were removed.
The few call places where it's used can use the trash as a swap buffer,
which is made for this exact purpose. This way we can rely on the
generic b_slow_realign() call.
Where relevant, the channel version is used instead. The buffer version
was ported to be more generic and now takes a swap buffer and the output
byte count to know where to set the alignment point. The H2 mux still
uses buffer_slow_realign() with buf->o but it will change later.
This adds :
- c_orig() : channel buffer's origin
- c_size() : channel buffer's size
- c_wrap() : channel buffer's wrapping location
- c_data() : channel buffer's total data count
- c_room() : room left in channel buffer's
- c_empty() : true if channel buffer is empty
- c_full() : true if channel buffer is full
- c_ptr() : pointer to an offset relative to input data in the buffer
- c_adv() : advances the channel's buffer (bytes become part of output)
- c_rew() : rewinds the channel's buffer (output bytes not output anymore)
- c_realign_if_empty() : realigns the buffer if it's empty
- co_data() : # of output data
- co_head() : beginning of output data
- co_tail() : end of output data
- ci_data() : # of input data
- ci_head() : beginning of input data
- ci_tail() : end of input data
- ci_stop() : location after ci_tail()
- ci_next() : pointer to next input byte
And for the ci_* / co_* functions above, the "__*" variants which disable
wrapping checks, and the "_ofs" variants which return an offset relative to
the buffer's origin instead.
Many places deal with buffer realignment after data removal. The method
is always the same : if the buffer is empty, set its pointer to the origin.
Let's have a function for this so that we have less code to change with the
new API.
Add a new function that lets you set the amount of input in a buffer.
For now it extends/truncates b->i except if the total length is
below b->o in which case it clears i and adjusts o.
Instead of doing b->i -= directly, introduce b_sub(), that does the job, to
make it easier to switch to the future API.
Also add b_add(), that increases b->i, instead of using it directly, and
bo_add(), that does increase b->o.
Here's the list of newly introduced functions :
- b_data(), returning the total amount of data in the buffer (currently i+o)
- b_orig(), returning the origin of the storage area, that is, the place of
position 0.
- b_wrap(), pointer to wrapping point (currently data+size)
- b_size(), returning the size of the buffer
- b_room(), returning the amount of bytes left available
- b_full(), returning true if the buffer is full, otherwise false
- b_stop(), pointer to end of data mark (currently p+i), used to compute
distances or a stop pointer for a loop.
- b_peek(), this one will help make the transition to the new buffer model.
It returns a pointer to a position in the buffer known from an offest
relative to the beginning of the data in the buffer. Thus, we can replace
the following occurrences :
bo_ptr(b) => b_peek(b, 0);
bo_end(b) => b_peek(b, b->o);
bi_ptr(b) => b_peek(b, b->o);
bi_end(b) => b_peek(b, b->i + b->o);
b_ptr(b, ofs) => b_peek(b, b->o + ofs);
- b_head(), pointer to the beginning of data (currently bo_ptr())
- b_tail(), pointer to first free place (currently bi_ptr())
- b_next() / b_next_ofs(), pointer to the next byte, taking wrapping
into account.
- b_dist(), returning the distance between two pointers belonging to a buffer
- b_reset(), which resets the buffer
- b_space_wraps(), indicating if the free space wraps around the buffer
- b_almost_full(), indicating if 3/4 or more of the buffer are used
Some of these are provided with the unchecked variants using the "__"
prefix, or with the "_ofs" suffix indicating they return a relative
position to the buffer's origin instead of a pointer.
Cc: Olivier Houchard <ohouchard@haproxy.com>
Passing unsigned ints everywhere is painful, and will cause some headache
later when we'll want to integrate better with struct ist which already
uses size_t. Let's switch buffers to use size_t instead.
The buffer code currently depends on pools and other stuff and is not
really autonomous anymore. The rewrite of the new API is an opportunity
to clean this up. This patch creates a new file (buf.h) which does not
depend on other elements and which will only contain what is needed to
perform the most basic buffer operations. The new API will be introduced
in this file and the conversion will be finished once buffer.h is empty.
The definition of struct buffer was moved to this new file, using more
explicity stdint types for the sizes and offsets.
Most new functions will be implemented in two variants :
__b_something() : unchecked variant, no wrapping is expected
b_something() : wrapping-checked variant
This way callers will be able to select which one to use depending on
the use cases.
If a timeout strikes on the connection side with some active streams,
there is a corner case which can sometimes cause the following sequence
to happen :
- There are active streams but there are data in the mux buffer
(eg: a client suddenly disconnected during a download with pending
requests). The timeout is active.
- The timeout strikes, h2_timeout_task() is called, kills the task and
doesn't close the connection since there are streams left ; The
connection is marked in H2_CS_ERROR ;
- the streams are woken up and closed ;
- when the last stream closes, calling h2_detach(), it sees the
tree list is empty, but there is no condition allowing the
connection to be closed (mbuf->o > 0), thus it does nothing ;
- since the task is dead, there's no more hope to clear this
situation later
For now we can take care of this by adding a test for the presence of
H2_CS_ERROR and !task, implying the timeout task triggered already
and will not be able to handle this again.
Over the long term it seems like a more reliable test on should be
made, so that it is possible to know whether or not someone is still
able to close this connection.
A big thanks to Janusz Dziemidowicz and Milan Petruzelka for providing
many details helping in figuring this bug.
We currently don't process trailers on H2, but this has an impact : on
chunked HTTP/1 responses, we decide to emit the ES bit once we see the
0CRLF. From this point the stream switches to the CLOSED state, which
aborts processing of the remaining bytes. Thus the extra CRLF which ends
trailers is not processed and remains in the buffer. This prevents the
stream from being notified about end of transmission, which in turn keeps
the mux busy and prevents the connection from quitting.
The case of the trailers is not the root cause of this issue, though it
is what triggers it. The root cause is that upon error and/or close, once
we know we're not going to process any more data, we must absolutely flush
any remaining bytes from the output buffer, otherwise there is no way the
stream can quit. This is what this patch does.
It looks very likely related to the issues reported and debugged by
Janusz Dziemidowicz and Milan Petruzelka.
One way to reproduce it is to chain two proxies with the last one emitting
chunked data (typically using the stats page) :
global
stats socket /tmp/sock1 mode 666 level admin
stats timeout 1h
tune.ssl.default-dh-param 1024
tune.bufsize 16384
defaults
mode http
timeout connect 4s
timeout client 10s
timeout server 20s
listen px1
bind :4443 ssl crt rsa+dh2048.pem npn h2 alpn h2
server s1 127.0.0.1:4445
listen px2
bind :4444 ssl crt rsa+dh2048.pem npn h2 alpn h2
bind :4445
stats uri /
Then use curl to fetch the stats through px1 :
curl --http2 -k "https://127.0.0.1:4443/"
When curl is sent to the first one, "show sess" issued to the CLI will
show a remaining session during the client timeout. When curl is aimed at
port 4444 (px2), there is no such remaining session.
This fix needs to be backported to 1.8.
The streams bookkeeping made in H2 is used for protocol compliance only
but it doesn't consider the number of conn_streams still attached to the
mux. It causes an issue when http-request set-nice rules are applied on
H2 requests processed on a saturated machine. Indeed, in this case, the
requests are accepted and assigned a default nice value of zero. When
they are processed, their nice value changes to a higher one (say 1024).
The response is sent through the H2 mux, which detects the end of stream
and decrements the protocol-level stream count (h2c->nb_streams). The
client may then send a new request. But the conn_stream is still attached
and will require a new call to process_stream() to finish, which is made
through the scheduler. Given that the machine is saturated, it is assumed
that many tasks are present in the scheduler. Thus the closing tasks holding
a higher nice value will pass after the new stream creations. If the client
is fast enough with a low latency link, it may add a lot of new stream
creations before the stream terminations have a chance to disappear due
to their high nice value, resulting in a huge amount of memory being used.
The solution consists in letting a mux always monitor its conn_streams and
refrain from creating new ones when it is full. Here the H2 mux checks the
nb_cs counter and sets a new blocked flag (H2_CF_DEM_TOOMANY) if the limit
was reached, so that the frame parser requests a pause in the new stream
creation, leaving some time for the pending conn_streams to vanish.
Several experiments were made using varying thresholds to see if
overbooking would provide any benefit here but it turned out not to be
the case, so the conn_stream limit remains set to the exact streams
limit. Interestingly various performance measurements showed that the
code tends to be slightly faster now than without the limit, probably
due to the smoother memory usage.
This commit requires previous patch ("MINOR: h2: keep a count of the number
of conn_streams attached to the mux"). It needs to be backported to 1.8.
The h2 mux only knows about the number of H2 streams which are not in a
CLOSED state. This is used for protocol compliance. But it doesn't hold
the number of really attached streams. It is a problem because depending
on scheduling, it is possible that more streams are attached to the mux
than the ones seen at the protocol level, due to some streams taking some
time to be detached. Let's add this count based on the conn_streams.
Note: this patch is part of a series of fixes which will have to be
backported to 1.8.
Commit 200b0fa ("MEDIUM: Add support for updating TLS ticket keys via
socket") introduced support for updating TLS ticket keys from the CLI,
but missed a small corner case : if multiple bind lines reference the
same tls_keys file, the same reference is used (as expected), but during
the clean shutdown, it will lead to a double free when destroying the
bind_conf contexts since none of the lines knows if others still use
it. The impact is very low however, mostly a core and/or a message in
the system's log upon old process termination.
Let's introduce some basic refcounting to prevent this from happening,
so that only the last bind_conf frees it.
Thanks to Janusz Dziemidowicz and Thierry Fournier for both reporting
the same issue with an easy reproducer.
This fix needs to be backported from 1.6 to 1.8.
With certain curl versions URLs which contain brackets may be interpreted
by the "URL globbing parser". This patch ensures that such brackets
are escaped.
Thank you to Ilya Shipitsin for having reported this issue.
By default, HAProxy's DNS resolution at runtime ensure that there is no
IP address duplication in a backend (for servers being resolved by the
same hostname).
There are a few cases where people want, on purpose, to disable this
feature.
This patch introduces a couple of new server side options for this purpose:
"resolve-opts allow-dup-ip" or "resolve-opts prevent-dup-ip".
dns_get_ip_from_response() is used to compare the caller current IP to
the IP available in the records returned by the DNS server.
A scoring system is in place to get the best IP address available.
That said, in the current implementation, there are a couple of issues:
1. a comment does not match what the code does
2. the code does not match what the commet says (score value is not
incremented with '2')
This patch fixes both issues.
Backport status: 1.8
The comment was about "prefered network ip version" while it's actually
"prefered ip version" in the code.
Fixed
Backport status: 1.7 and 1.8
Be careful, this patch may not apply on 1.7, since the score was '4'
for this item at that time.
Ilya Shipitsin reported that with some curl versions this reg test
may fail due to a wrong URI syntax with ::1 ipv6 local address in
this varnishtest script. This patch fixes this syntax issue and
replaces the iteration of "procees" commands by a "shell" command
to start curl processes (must be faster).
Thanks to Ilya Shipitsin for having reported this VTC file bug.
The master process will exit with the status of the last worker. When
the worker is killed with SIGTERM, it is expected to get 143 as an
exit status. Therefore, we consider this exit status as normal from a
systemd point of view. If it happens when not stopping, the systemd
unit is configured to always restart, so it has no adverse effect.
This has mostly a cosmetic effect. Without the patch, stopping HAProxy
leads to the following status:
â— haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/system/haproxy.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2018-06-22 20:35:42 CEST; 8min ago
Docs: man:haproxy(1)
file:/usr/share/doc/haproxy/configuration.txt.gz
Process: 32715 ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS (code=exited, status=143)
Process: 32714 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUCCESS)
Main PID: 32715 (code=exited, status=143)
After the patch:
â— haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/system/haproxy.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:haproxy(1)
file:/usr/share/doc/haproxy/configuration.txt.gz
Change the way the process groups are set. Indeed setsid() was called
for every processes which caused the worker to have a different process
group than the master.
This patch behave in a better way:
- In daemon mode only, each child do a setsid()
- In master worker + daemon mode, the setsid() is done in the master before
forking the children
- In any foreground mode, we don't do a setsid()
Could be backported in 1.8 but the master-worker mode is mostly used
with systemd which rely on cgroups so that won't affect much people.
The Lua parser doesn't takes in account end-of-headers containing
only '\n'. It expects always '\r\n'. If a '\n' is processes the Lua
parser considers it miss 1 byte, and wait indefinitely for new data.
When the client reaches their timeout, it closes the connection.
This close is not detected and the connection keep in CLOSE-WAIT
state.
I guess that this patch fix only a visible part of the problem.
If the Lua HTTP parser wait for data, the timeout server or the
connectio closed by the client may stop the applet.
How reproduce the problem:
HAProxy conf:
global
lua-load bug38.lua
frontend frt
timeout client 2s
timeout server 2s
mode http
bind *:8080
http-request use-service lua.donothing
Lua conf
core.register_service("donothing", "http", function(applet) end)
Client request:
echo -ne 'GET / HTTP/1.1\n\n' | nc 127.0.0.1 8080
Look for CLOSE-WAIT in the connection with "netstat" or "ss". I
use this script:
while sleep 1; do ss | grep CLOSE-WAIT; done
This patch must be backported in 1.6, 1.7 and 1.8
Workaround: enable the "hard-stop-after" directive, and perform
periodic reload.
stktable_release() has been involved in two recent crashes by being
used without enough care. Just like any free() function this one is
often called on an exit path with a possibly unsafe argument. Given
that there is another case (smp_fetch_sc_trackers()) which theorically
could call it with an unchecked NULL, though it cannot happen since
the function doesn't support being called with src_* hence cannot make
use of tmpstkctr, let's rather move the check into the function itself
to make it safer for the long term.
This patch could be backported to 1.8 as a strengthening measure.
The build without threads was once again broken.
This issue was introduced in commit ba86c6c ("MINOR: threads: Be sure to
remove threads from all_threads_mask on exit").
This is exactly the same problem as last time it happened, because of
all_threads_mask not being defined with USE_THREAD=
This must be backported in 1.8
When a lookup is done on a key not present in the stick-table the "st"
pointer is NULL and it is used to return the converter result, but it
is used untested with stktable_release().
This regression was introduced in 1.8.10 here:
BUG/MEDIUM: stick-tables: Decrement ref_cnt in table_* converters
commit d7bd88009d88dd413e01bc0baa90d6662a3d7718
Author: Daniel Corbett <dcorbett@haproxy.com>
Date: Sun May 27 09:47:12 2018 -0400
Minimal conf for reproducong the problem:
frontend test
mode http
stick-table type ip size 1m expire 1h store gpc0
bind *:8080
http-request redirect location /a if { src,in_table(test) }
The segfault is triggered using:
curl -i http://127.0.0.1:8080/
This patch must be backported in 1.8
With this patch we can provide LEVEL environment variable when
running reg-tests Makefile targe (reg testing) to set the execution
level of the reg-tests make target to run.
LEVEL default value is 1.
LEVEL=1 is to run all h*.vtc files which are the most important
reg testing files (to test haproxy core, HTTP compliance etc).
LEVEL=2 is to run all s*.vtc files which are a bit slow tests,
for instance tests requiring external programs (curl, socat etc).
LEVEL=3 is to run all l*.vtc files which are test files with again
more slow or with little interest.
With this patch, we set HAPROXY_PROGRAM environment variable
default value to the haproxy executable of the current working directory.
So, if the current directory is the haproxy sources directory,
the reg-tests Makefile target may be run with this shorter command:
$ VARNISTEST_PROGRAM=<...> make reg-tests
in place of
$ VARNISTEST_PROGRAM=<...> HAPROXY_PROGRAM=<...> make reg-tests
When HAProxy is started with several threads, Each running thread holds a bit in
the bitfiled all_threads_mask. This bitfield is used here and there to check
which threads are registered to take part in a specific processing. So when a
thread exits, it seems normal to remove it from all_threads_mask.
No direct impact could be identified with this right now but it would
be better to backport it to 1.8 as a preventive measure to avoid complex
situations like the one in previous bug.