Commit Graph

9783 Commits

Author SHA1 Message Date
Christopher Faulet
85db3212b8 MINOR: spoe: Use the sample context to pass frag_ctx info during encoding
This simplifies the API and hide the details in the sample. This way, only
string and binary are aware of these info, because other types cannot be
partially encoded.

This patch may be backported to 1.9 and 1.8.
2019-04-29 16:02:05 +02:00
Kevin Zhu
f7f54280c8 BUG/MEDIUM: spoe: arg len encoded in previous frag frame but len changed
Fragmented arg will do fetch at every encode time, each fetch may get
different result if SMP_F_MAY_CHANGE, for example res.payload, but
the length already encoded in first fragment of the frame, that will
cause SPOA decode failed and waste resources.

This patch must be backported to 1.9 and 1.8.
2019-04-29 16:02:05 +02:00
Christopher Faulet
1907ccc2f7 BUG/MINOR: http: Call stream_inc_be_http_req_ctr() only one time per request
The function stream_inc_be_http_req_ctr() is called at the beginning of the
analysers AN_REQ_HTTP_PROCESS_FE/BE. It as an effect only on the backend. But we
must be careful to call it only once. If the processing of HTTP rules is
interrupted in the middle, when the analyser is resumed, we must not call it
again. Otherwise, the tracked counters of the backend are incremented several
times.

This bug was reported in github. See issue #74.

This fix should be backported as far as 1.6.
2019-04-29 16:01:47 +02:00
Willy Tarreau
97215ca284 BUG/MEDIUM: mux-h2: properly deal with too large headers frames
In h2c_decode_headers(), now that we support CONTINUATION frames, we
try to defragment all pending frames at once before processing them.
However if the first is exactly full and the second cannot be parsed,
we don't detect the problem and we wait for the next part forever due
to an incorrect check on exit; we must abort the processing as soon as
the current frame remains full after defragmentation as in this case
there is no way to make forward progress.

Thanks to Yves Lafon for providing traces exhibiting the problem.

This must be backported to 1.9.
2019-04-29 10:20:21 +02:00
David CARLIER
4de0eba848 MEDIUM: da: HTX mode support.
The DeviceAtlas module now can support both the legacy
mode and the new HTX's with the known set of support headers
for the latter.
2019-04-26 17:06:32 +02:00
David Carlier
0470d704a7 BUILD/MEDIUM: contrib: Dummy DeviceAtlas API.
Creating a "mocked" version mainly for testing purposes.
2019-04-26 17:06:32 +02:00
Willy Tarreau
4ad574fbe2 MEDIUM: streams: measure processing time and abort when detecting bugs
On some occasions we've had loops happening when processing actions
(e.g. a yield not being well understood) resulting in analysers being
called in loops until the analysis timeout without incrementing the
stream's call count, thus this type of bug cannot be caught by the
current protection system.

What this patch proposes is to start to measure the time spent in analysers
when profiling is enabled on the thread, in order to detect if a stream is
really misbehaving. In this case we measured the consumed CPU time, not the
wall clock time, so as not to be affected by possible noisy neighbours
sharing the same CPU. When more than 100ms are spent in an analyser, we
trigger the stream_dump_and_crash() function to report the anomaly.

The choice of 100ms comes from the fact that regular calls only take around
1 microsecond and it seems reasonable to accept a degradation factor of
100000, which covers very slow machines such as home gateways running on
sub-ghz processors, with extremely heavy configurations. Some complete
tests show that even this common bogus map_regm() entry supposedly designed
to extract a port from an IP:port entry does not trigger the timeout (25 ms
evaluation time for a 4kB header, exercise left to the reader to spot the
mistake) :

   ([0-9]{0,3}).([0-9]{0,3}).([0-9]{0,3}).([0-9]{0,3}):([0-9]{0,5}) \5

However this one purposely designed to kill haproxy definitely dies as it
manages to completely freeze the whole process for more than one second
on a 4 GHz CPU for only 120 bytes in :

   (.{0,20})(.{0,20})(.{0,20})(.{0,20})(.{0,20})b \1

This protection will definitely help during the code stabilization period
and may possibly be left enabled later depending on reported issues or not.

If you've noticed that your workload is affected by this patch, please
report it as you have very likely found a bug. And in the mean time you
can turn profiling off to disable it.
2019-04-26 14:30:59 +02:00
Willy Tarreau
3d07a16f14 MEDIUM: stream/debug: force a crash if a stream spins over itself forever
If a stream is caught spinning over itself at more than 100000 loops per
second and for more than one second, the process will be aborted and the
offender reported on the console and logs. Typical figures usually are just
a few tens to hundreds per second over a very short time so there is a huge
margin here. Using even higher values could also work but there is the risk
of not being able to catch offenders if multiple ones start to bug at the
same time and share the load. This code should ideally be disabled for
stable releases, though in theory nothing should ever trigger it.
2019-04-26 13:16:14 +02:00
Willy Tarreau
dcb0e1d37d MEDIUM: appctx/debug: force a crash if an appctx spins over itself forever
If an appctx is caught spinning over itself at more than 100000 loops per
second and for more than one second, the process will be aborted and the
offender reported on the console and logs. Typical figures usually are just
a few tens to hundreds per second over a very short time so there is a huge
margin here. Using even higher values could also work but there is the risk
of not being able to catch offenders if multiple ones start to bug at the
same time and share the load. This code should ideally be disabled for
stable releases, though in theory nothing should ever trigger it.
2019-04-26 13:15:56 +02:00
Willy Tarreau
71c07ac65a MINOR: stream/debug: make a stream dump and crash function
During 1.9 development (and even a bit after) we've started to face a
significant number of situations where streams were abusively spinning
due to an uncaught error flag or complex conditions that couldn't be
correctly identified. Sometimes streams wake appctx up and conversely
as well. More importantly when this happens the only fix is to restart.

This patch adds a new function to report a serious error, some relevant
info and to crash the process using abort() so that a core dump is
available. The purpose will be for this function to be called in various
situations where the process is unfixable. It will help detect these
issues much earlier during development and may even help fixing test
platforms which are able to automatically restart when such a condition
happens, though this is not the primary purpose.

This patch only provides the function and doesn't use it yet.
2019-04-26 13:15:56 +02:00
Willy Tarreau
5e6a5b3a6e MINOR: connection: make the debugging helper functions safer
We have various functions like conn_get_ctrl_name() to retrieve
some information reported in "show sess" for debugging, which
assume that the connection is valid. This is really not convenient
in code aimed at debugging and is error-prone. Let's add a validity
test first.
2019-04-25 18:35:49 +02:00
Willy Tarreau
5e370daa52 BUG/MINOR: proto_http: properly reset the stream's call rate on keep-alive
The stream's call rate measurement was added by commit 2e9c1d296 ("MINOR:
stream: measure and report a stream's call rate in "show sess"") but it
forgot to reset it in case of HTTP keep-alive (legacy mode), resulting
in incorrect measurements.

No backport is needed, unless the patch above is backported.
2019-04-25 18:33:37 +02:00
Willy Tarreau
d5ec4bfe85 CLEANUP: standard: use proper const to addr_to_str() and port_to_str()
The input parameter was not marked const, making it painful for some calls.
2019-04-25 17:48:16 +02:00
Willy Tarreau
d2d3348acb MINOR: activity: enable automatic profiling turn on/off
Instead of having to manually turn task profiling on/off in the
configuration, by default it will work in "auto" mode, which
automatically turns on on any thread experiencing sustained loop
latencies over one millisecond averaged over the last 1024 samples.

This may happen with configs using lots of regex (thing map_reg for
example, which is the lazy way to convert Apache's rewrite rules but
must not be abused), and such high latencies affect all the process
and the problem is most often intermittent (e.g. hitting a map which
is only used for certain host names).

Thus now by default, with profiling set to "auto", it remains off all
the time until something bad happens. This also helps better focus on
the issues when looking at the logs as well as in "show sess" output.
It automatically turns off when the average loop latency over the last
1024 calls goes below 990 microseconds (which typically takes a while
when in idle).

This patch could be backported to stable versions after a bit more
exposure, as it definitely improves observability and the ability to
quickly spot the culprit. In this case, previous patch ("MINOR:
activity: make the profiling status per thread and not global") must
also be taken.
2019-04-25 17:26:46 +02:00
Willy Tarreau
d9add3acc8 MINOR: activity: make the profiling status per thread and not global
In order to later support automatic profiling turn on/off, we need to
have it per-thread. We're keeping the global option to know whether to
turn it or on off, but the profiling status is now set per thread. We're
updating the status in activity_count_runtime() which is called before
entering poll(). The reason is that we'll extend this with run time
measurement when deciding to automatically turn it on or off.
2019-04-25 17:26:19 +02:00
Willy Tarreau
d636675137 BUG/MINOR: activity: always initialize the profiling variable
It happens it was only set if present in the configuration. It's
harmless anyway but can still cause doubts when comparing logs and
configurations so better correctly initialize it.

This should be backported to 1.9.
2019-04-25 17:26:19 +02:00
Willy Tarreau
a0abc8f2be BUILD: travis: remove the "allow_failures" entry
Now that OSX passes all regtests as well, remove this temporary entry.
2019-04-25 08:58:02 +02:00
Willy Tarreau
084354f0be REGTEST: exclude OSX and generic targets from abns_socket.vtc
This one relies on Linux's abstract namespace sockets which are not
available there. FreeBSD used to already be excluded.
2019-04-25 08:50:25 +02:00
Willy Tarreau
4fd376d51d REGTEST: relax the IPv6 address format checks in converters_ipmask_concat_strcmp_field_word
In Travis build https://travis-ci.com/haproxy/haproxy/jobs/195477767 we
can see that OSX tends to pad zeroes at a different position than Linux
in compact IPv6 addresses, resulting in a failure in the checks which
were developped on Linux. This patch uses [0:]* in holes and [0:]+ at the
end of addresses to allow the different variants. It will unfortunately
also accept impossible addresses but there is no reason that we have to
care about for such crap to be emitted.
2019-04-25 08:47:15 +02:00
Willy Tarreau
03c6ab0cbb REGTEST: exclude osx and generic targets for 40be_2srv_odd_health_checks
As explained in the commit below, this test relies on TCP_DEFER_ACCEPT
which is not available everywhere, and as such fails on OSX as well :
15685c791 ("REGTEST: Exclude freebsd target for some reg tests.")
2019-04-25 08:39:48 +02:00
Tim Duesterhus
88c63a6e55 BUILD: extend Travis CI config to support more platforms
This commit extends the Travis CI configuration to build HAProxy
with gcc on Linux, clang on Mac and cleans up the build flag
configuration to be easier extendable.

Note: At the moment HAProxy fails on Travis for configurations
on OS X
2019-04-25 08:24:29 +02:00
Willy Tarreau
22d63a24d9 MINOR: applet: measure and report an appctx's call rate in "show sess"
Very similarly to previous commit doing the same for streams, we now
measure and report an appctx's call rate. This will help catch applets
which do not consume all their data and/or which do not properly report
that they're waiting for something else. Some of them like peers might
theorically be able to exhibit some occasional peeks when teaching a
full table to a nearby peer (e.g. the new replacement process), but
nothing close to what a bogus service can do so there is no risk of
confusion.
2019-04-24 16:04:23 +02:00
Willy Tarreau
2e9c1d2960 MINOR: stream: measure and report a stream's call rate in "show sess"
Quite a few times some bugs have made a stream task incorrectly
handle a complex combination of events, which was often reported as
"100% CPU", and was usually caused by the event not being properly
identified and flushed, and the stream's handler called in loops.

This patch adds a call rate counter to the stream struct. It's not
huge, it's really inexpensive (especially compared to the rest of the
processing function) and will easily help spot such tasks in "show sess"
output, possibly even allowing to kill them.

A future patch should probably consist in alerting when they're above a
certain threshold, possibly sending a dump and killing them. Some options
could also consist in aborting in order to get an analyzable core dump
and let a service manager restart a fresh new process.
2019-04-24 16:04:23 +02:00
Willy Tarreau
0212fadd65 MINOR: tasks/activity: report the context switch and task wakeup rates
It's particularly useful to spot runaway tasks to see this. The context
switch rate covers all tasklet calls (tasks and I/O handlers) while the
task wakeups only covers tasks picked from the run queue to be executed.
High values there will indicate either an intense traffic or a bug that
mades a task go wild.
2019-04-24 16:04:23 +02:00
Willy Tarreau
69b5a7f1a3 CLEANUP: task: report calls as unsigned in show sess
The "show sess" output used signed ints to report the number of calls,
which is confusing for runaway tasks where the call count can turn
negative.
2019-04-24 16:04:23 +02:00
Christopher Faulet
4904058661 BUG/MINOR: htx: Exclude TCP proxies when the HTX mode is handled during startup
When tests are performed on the HTX mode during HAProxy startup, only HTTP
proxies are considered. It is important because, since the commit 1d2b586cd
("MAJOR: htx: Enable the HTX mode by default for all proxies"), the HTX is
enabled on all proxies by default. But for TCP proxies, it is "deactivated".

This patch must be backported to 1.9.
2019-04-24 15:40:02 +02:00
Christopher Faulet
c1918d1a8f BUG/MAJOR: muxes: Use the HTX mode to find the best mux for HTTP proxies only
Since the commit 1d2b586cd ("MAJOR: htx: Enable the HTX mode by default for all
proxies"), the HTX is enabled by default for all proxies, HTTP and TCP, but also
CLI and HEALTH proxies. But when the best mux is retrieved, only HTTP and TCP
modes are checked. If the TCP mode is not explicitly set, it is considered as an
HTTP proxy. It is an hidden bug introduced when the option "http-use-htx" was
added. It has no effect until the commit 1d2b586cd. But now, when a stats socket
is created for the master process, the mux h1 is installed on all incoming
connections to the CLI proxy, leading to segfaults because HTX operations are
performed on raw buffers.

So to fix the buf, when a mux is installed, all proxies are considered as TCP
proxies, except HTTP ones. This way, CLI and HEALTH proxies will be handled as
TCP proxies.

This patch must be backported to 1.9 although it has no effect. It is safer to
not keep hidden bugs.
2019-04-24 15:40:02 +02:00
Willy Tarreau
274ba67862 BUG/MAJOR: lb/threads: fix AB/BA locking issue in round-robin LB
An occasional divide by zero in the round-robin scheduler was addressed
in commit 9df86f997 ("BUG/MAJOR: lb/threads: fix insufficient locking on
round-robin LB") by grabing the server's lock in fwrr_get_server_from_group().

But it happens that this is not the correct approach as it introduces a
case of AB/BA deadlock reported by Maksim Kupriianov. This happens when
a server weight changes from/to zero while another thread extracts this
server from the tree. The reason is that the functions used to manipulate
the state work under the server's lock and grab the LB lock while the ones
used in LB work under the LB lock and grab the server's lock when needed.

This commit mostly reverts the changes above and instead further completes
the locking analysis performed on this code to identify areas that really
need to be protected by the server's lock, since this is the only algorithm
which happens to have this requirement. This audit showed that in fact all
locations which require the server's lock are already protected by the LB
lock. This was not noticed the first time due to the server's lock being
taken instead and due to some functions misleadingly using atomic ops to
modify server fields which are under the LB lock protection (these ones
were now removed).

The change consists in not taking the server's lock anymore here, and
instead making sure that the aforementioned function which used to
suffer from the server's weight becoming zero only uses a copy of the
weight which was preliminary verified to be non-null (when the weight
is null, the server will be removed from the tree anyway so there is
no need to recalculate its position).

With this change, the code survived an injection at 200k req/s split
on two servers with weights changing 50 times a second.

This commit must be backported to 1.9 only.
2019-04-24 14:23:40 +02:00
Olivier Houchard
a28454ee21 BUG/MEDIUM: ssl: Return -1 on recv/send if we got EAGAIN.
In ha_ssl_read()/ha_ssl_write(), if we couldn't send/receive data because
we got EAGAIN, return -1 and not 0, as older SSL versions expect that.
This should fix the problems with OpenSSL < 1.1.0.
2019-04-24 12:06:08 +02:00
Christopher Faulet
371723b0c2 BUG/MINOR: spoe: Don't systematically wakeup SPOE stream in the applet handler
This can lead to wakeups in loop between the SPOE stream and the SPOE applets
waiting to receive agent messages (mainly AGENT-HELLO and AGENT-DISCONNECT).

This patch must be backported to 1.9 and 1.8.
2019-04-23 21:20:47 +02:00
Christopher Faulet
5e1a9d715e BUG/MEDIUM: stream: Fix the way early aborts on the client side are handled
A regression was introduced with the commit c9aecc8ff ("BUG/MEDIUM: stream:
Don't request a server connection if a shutw was scheduled"). Among other this,
it breaks the CLI when the shutr on the client side is handled with the client
data. To depend on the flag CF_SHUTW_NOW to not establish the server connection
when an error on the client side is detected is the right way to fix the bug,
because this flag may be set without any error on the client side.

So instead, we abort the request where the error is handled and only when the
backend stream-interface is in the state SI_ST_INI. This way, there is no
ambiguity on the reason why the abort accurred. The stream-interface is also
switched to the state SI_ST_CLO.

This patch must be backported to 1.9. If the commit c9aecc8ff is backported to
previous versions, this one MUST also be backported. Otherwise, it MAY be
backported to older versions that 1.9 with caution.
2019-04-23 21:20:47 +02:00
Frédéric Lécaille
bed883abe8 BUG/MAJOR: stream: Missing DNS context initializations.
Fix some missing initializations wich came with 333939c commit (MINOR: action:
new '(http-request|tcp-request content) do-resolve' action). The DNS contexts of
streams which were allocated were not initialized by stream_new(). This leaded to
accesses to non-allocated memory when freeing these contexts with stream_free().
2019-04-23 20:24:11 +02:00
Willy Tarreau
ca8df4c074 REGTEST: make the "run-regtests" script search for tests in reg-tests by default
It happens almost daily to me that make regtests fails because the script
found a temporary, old, or broken VTC file that was lying in my work dir,
leaving me no place to hide it. This is a real pain as some tests take ages
to fail, so let's make this script only look up for tests where they are
expected to be stored, under reg-tests only. It remains possible to force
the location on the command line though.
2019-04-23 16:09:50 +02:00
Frédéric Lécaille
b894f9230c REGTEST: adapt some reg tests after renaming.
Some reg tests and their dependencies have been renamed. They may be
referenced by the .vtc files. So, this patch modifies also the references
to these dependencies.
2019-04-23 15:37:11 +02:00
Frédéric Lécaille
d7a8f14145 REGTEST: rename the reg test files.
We rename all the VTC files to avoid name collisions when importing/backporting.
2019-04-23 15:37:03 +02:00
Frédéric Lécaille
dc1a3bd999 REGTEST: replace LEVEL option by a more human readable one.
This patch replaces LEVEL variable by REGTESTS_TYPES variable which is more
mnemonic and human readable. It is uses as a filter to run the reg tests scripts
where a commented #REGTEST_TYPE may be defined to designate their types.
Running the following command:

    $ REGTESTS_TYPES=slow,default

will start all the reg tests where REGTEST_TYPE is defines as 'slow' or 'default'.
Note that 'default' is also the default value of REGTEST_TYPE when not specified
dedicated to run all the current h*.vtc files. When REGTESTS_TYPES is not specified
there is no filter at all. All the tests are run.

This patches also defines REGTEST_TYPE with 'slow' value for all the s*.vtc files,
'bug' value for al the b*.vtc files, 'broken' value for all the k*.vtc files.
2019-04-23 15:14:52 +02:00
Frédéric Lécaille
0bad840b4d MINOR: log: Extract some code to send syslog messages.
This patch extracts the code of __send_log() responsible of sending a syslog
message to a syslog destination represented as a logsrv struct to define
__do_send_log() function. __send_log() calls __do_send_log() for each syslog
destination of a proxy after having prepared some of its parameters.
2019-04-23 14:16:51 +02:00
Baptiste Assmann
333939c2ee MINOR: action: new '(http-request|tcp-request content) do-resolve' action
The 'do-resolve' action is an http-request or tcp-request content action
which allows to run DNS resolution at run time in HAProxy.
The name to be resolved can be picked up in the request sent by the
client and the result of the resolution is stored in a variable.
The time the resolution is being performed, the request is on pause.
If the resolution can't provide a suitable result, then the variable
will be empty. It's up to the admin to take decisions based on this
statement (return 503 to prevent loops).

Read carefully the documentation concerning this feature, to ensure your
setup is secure and safe to be used in production.

This patch creates a global counter to track various errors reported by
the action 'do-resolve'.
2019-04-23 11:41:52 +02:00
Baptiste Assmann
0b9ce82dfa MINOR: obj_type: new object type for struct stream
This patch creates a new obj_type for the struct stream in HAProxy.
2019-04-23 11:35:56 +02:00
Baptiste Assmann
db4c8521ca MINOR: dns: move callback affection in dns_link_resolution()
In dns.c, dns_link_resolution(), each type of dns requester is managed
separately, that said, the callback function is affected globaly (and
points to server type callbacks only).
This design prevents the addition of new dns requester type and this
patch aims at fixing this limitation: now, the callback setting is done
directly into the portion of code dedicated to each requester type.
2019-04-23 11:34:11 +02:00
Baptiste Assmann
dfd35fd71a MINOR: dns: dns_requester structures are now in a memory pool
dns_requester structure can be allocated at run time when servers get
associated to DNS resolution (this happens when SRV records are used in
conjunction with service discovery).
Well, this memory allocation is safer if managed in an HAProxy pool,
furthermore with upcoming HTTP action which can perform DNS resolution
at runtime.

This patch moves the memory management of the dns_requester structure
into its own pool.
2019-04-23 11:33:48 +02:00
paulborile
cd9b9bd3e4 MINOR: contrib: dummy wurfl library
This is dummy version of the Scientiamobile WURFL C API that can be used
to successfully build/run haproxy compiled with USE_WURFL=1.
It is marked as version 1.11.2.100 to distinguish it from any real version
of the lib. It has no external dependencies so it should work out of the
box by building it like this :

   $ make -C contrib/wurfl

In order to use it, simply reference this directory as the WURFL include
and library paths :

   $ make TARGET=<target> USE_WURFL=1 WURFL_INC=$PWD/contrib/wurfl WURFL_LIB=$PWD/contrib/wurfl
2019-04-23 11:00:23 +02:00
paulborile
7714b12604 MINOR: wurfl: enabled multithreading mode
Initially excluded multithreaded mode is completely supported (libwurfl is fully MT safe).
Internal tests now are run also with multithreading enabled.
2019-04-23 11:00:23 +02:00
paulborile
c81b9bf7b4 DOC: wurfl: added point of contact in MAINTAINERS file 2019-04-23 11:00:23 +02:00
paulborile
bad132c384 CLEANUP: wurfl: removed deprecated methods
last 2 major releases of libwurfl included a complete review of engine options with
the result of deprecating many features. The patch removes unecessary code and fixes
the documentation.
Can be backported on any version of haproxy.

[wt: must not be backported since it removes config keywords and would
 thus break existing configurations]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2019-04-23 11:00:23 +02:00
paulborile
59d50145dc BUILD: wurfl: build fix for 1.9/2.0 code base
This applies the required changes for the new buffer API that came in 1.9.
This patch must be backported to 1.9.
2019-04-23 11:00:23 +02:00
Willy Tarreau
b518823f1b MINOR: wurfl: indicate in haproxy -vv the wurfl version in use
It also explicitly mentions that the library is the dummy one when it
is detected.

We have this output now :

$ ./haproxy  -vv |grep -i wurfl
Built with WURFL support (dummy library version 1.11.2.100)
2019-04-23 11:00:23 +02:00
Willy Tarreau
1426198efb BUILD: add USE_WURFL to the list of known build options
Since the removal of WURFL we've improved the build system to report
known build options, let's reference this option there as well.
2019-04-23 11:00:23 +02:00
Willy Tarreau
b3cc9f2887 Revert "CLEANUP: wurfl: remove dead, broken and unmaintained code"
This reverts commit 8e5e1e7bf0.

The following patches will fix this code and may be backported.
2019-04-23 10:34:43 +02:00
Emeric Brun
d0e095c2aa MINOR: ssl/cli: async fd io-handlers printable on show fd
This patch exports the async fd iohandlers and make them printable
doing a 'show fd' on cli.
2019-04-19 17:27:01 +02:00