Commit Graph

2991 Commits

Author SHA1 Message Date
Olivier Houchard
53216e7db9 MEDIUM: connections: Don't directly mess with the polling from the upper layers.
Avoid using conn_xprt_want_send/recv, and totally nuke cs_want_send/recv,
from the upper layers. The polling is now directly handled by the connection
layer, it is activated on subscribe(), and unactivated once we got the event
and we woke the related task.
2018-10-21 05:58:40 +02:00
Olivier Houchard
1fddc9b7bb BUG/MEDIUM: connections: Remove subscription if going in idle mode.
Make sure we don't have any subscription when the connection is going in
idle mode, otherwise there's a race condition when the connection is
reused, if there are still old subscriptions, new ones won't be done.

No backport is needed.
2018-10-21 05:55:20 +02:00
Olivier Houchard
62975a7740 BUG/MEDIUM: pools: Fix the usage of mmap()) with DEBUG_UAF.
When mapping memory with mmap(), we should use a fd of -1, not 0. 0 may
work on linux, but it doesn't work on FreeBSD, and probably other OSes.

It would be nice to backport this to 1.8 to help debugging there.
2018-10-21 05:43:33 +02:00
Willy Tarreau
4e7cc3381b BUILD: compiler: rename __unreachable() to my_unreachable()
Olivier reported that on FreeBSD __unreachable is already defined
and causes build warnings. Let's rename it then.
2018-10-20 17:45:48 +02:00
Willy Tarreau
7a6ad88b02 BUILD: memory: fix free_list pointer declaration again for atomic CAS
Commit ac6c880 ("BUILD: memory: fix pointer declaration for atomic CAS")
attemtped to fix a build warning affecting the lock-free version of the
pool allocator. But the fix tried to hide the cause instead of addressing
it, thus clang still complains about (void **) not matching (void ***).

The real solution is to declare free_list (void **) and not to use a cast.
Now this builds fine with gcc/clang with and without threads.

No backport is needed.
2018-10-20 17:37:38 +02:00
Willy Tarreau
ed72d82827 MEDIUM: time: measure the time stolen by other threads
The purpose is to detect if threads or processes are competing for the
same CPU. This can happen when threads are incorrectly bound, or after a
reload if the previous process still has an important activity. With
threads this situation is problematic because a preempted thread holding
a lock will block other ones waiting for this lock to be released.

A first attempt consisted in measuring the cumulated lost time more
precisely but the system's scheduler is smart enough to try to limit the
thread preemption rate by mostly context switching during poll()'s blank
periods, so most of the time lost is not seen. In essence this is good
because it means a thread is not preempted with a lock held, and even
regarding the rendez-vous point it cannot prevent the other ones from
making progress. But still it happens tens to hundreds of times per
second that a thread might be preempted, so it's still possible to detect
that the situation is happening, thus it's interesting to measure and
report its frequency.

Each time we enter the poller, we check the CPU time spent working and
see if we've lost time doing something else. To limit false positives,
we're only interested in losses of 500 microseconds or more (i.e. half
a clock tick on a 1 kHz system). If so, it indicates that some time was
stolen by another thread or process. Note that we purposely store some
sub-millisecond counters so that under heavy traffic with a 1 kHz clock,
it's still possible to measure something without being subject to the
risk of rounding errors (i.e. if exactly 1 ms is stolen it's possible
that the time difference could often be slightly lower).

This counter of lost CPU time slots time is reported in "show activity"
in numbers of milliseconds of CPU lost per second, per 15s, and total
over the process' life. By definition, the per-second counter cannot
report values larger than 1000 per thread per second and the 15s one
will be limited to 15000/s in the worst case, but it's possible that
peak values exceed such thresholds after long pauses.
2018-10-19 08:51:59 +02:00
Willy Tarreau
5ceeb15002 MINOR: time: add now_mono_time() and now_cpu_time()
These two functions retrieve respectively the monotonic clock time and
the per-thread CPU time when available on the platform, or return zero.
These syscalls may require to link with -lrt on certain libc, which is
enabled in the Makefile with USE_RT=1 (default on Linux systems).
2018-10-18 16:39:48 +02:00
Willy Tarreau
ac6c8805be BUILD: memory: fix pointer declaration for atomic CAS
The calls to HA_ATOMIC_CAS() on the lockfree version of the pool allocator
were mistakenly done on (void*) for the old value instead of (void **).
While this has no impact on "recent" gcc, it does have one for gcc < 4.7
since the CAS was open coded and it's not possible to assign a temporary
variable of type "void".

No backport is needed, this only affects 1.9.
2018-10-18 16:12:28 +02:00
Willy Tarreau
7e9c4ae4de MINOR: poller: move time and date computation out of the pollers
By placing this code into time.h (tv_entering_poll() and tv_leaving_poll())
we can remove the logic from the pollers and prepare for extending this to
offer more accurate time measurements.
2018-10-17 19:59:43 +02:00
Willy Tarreau
f37ba94768 MINOR: fd: centralize poll timeout computation in compute_poll_timeout()
The 4 pollers all contain the same code used to compute the poll timeout.
This is pointless, let's centralize this into fd.h. This also gets rid of
the useless SCHEDULER_RESOLUTION macro which used to work arond a very old
linux 2.2 bug causing select() to wake up slightly before the timeout.
2018-10-17 19:59:43 +02:00
Willy Tarreau
e18db9e984 MEDIUM: pools: implement a thread-local cache for pool entries
Each thread now keeps the last ~512 kB of freed objects into a local
cache. There are some heuristics involved so that a specific pool cannot
use more than 1/8 of the total cache in number of objects. Tests have
shown that 512 kB is an optimal size on a 24-thread test running on a
dual-socket machine, resulting in an overall 7.5% performance increase
and a cache miss ratio reducing from 19.2 to 17.7%. Anyway it seems
pointless to keep more than an L2 cache, which probably explains why
sizes between 256 and 512 kB are optimal.

Cached objects appear in two lists, one per pool and one LRU to help
with fair eviction. Currently there is no way to check each thread's
cache state nor to flush it. This cache cannot be disabled and is
enabled as soon as the lockless pools are enabled (i.e.: threads are
enabled, no pool debugging is in use and the CPU supports a double word
CAS).
2018-10-16 13:46:08 +02:00
Willy Tarreau
146794dc4f MINOR: pools: split pool_free() in the lockfree variant
This separates the validity tests from the code committing the object
to the pool, in order to ease insertion of the thread-local cache.
2018-10-16 10:29:28 +02:00
Willy Tarreau
0a93b6413f MINOR: pools: allocate most memory pools from an array
For caching it will be convenient to have indexes associated with pools,
without having to dereference the pool itself. One solution could consist
in replacing all pool pointers with integers but this would limit the
number of allocatable pools. Instead here we allocate the 32 first pools
from a pre-allocated array whose base address is known so that it's trivial
to convert a pool to an index in this array. Pools that cannot fit there
will be allocated normally.
2018-10-16 10:29:26 +02:00
Bertrand Jacquin
d5e4de8e5f DOC: Fix a few typos
these are mostly spelling mistakes, some of them might be candidate for
backporting as well.
2018-10-15 19:38:15 +02:00
Willy Tarreau
8d8747abe0 OPTIM: tasks: group all tree roots per cache line
Currently we have per-thread arrays of trees and counts, but these
ones unfortunately share cache lines and are accessed very often. This
patch moves the task-specific stuff into a structure taking a multiple
of a cache line, and has one such per thread. Just doing this has
reduced the cache miss ratio from 19.2% to 18.7% and increased the
12-thread test performance by 3%.

It starts to become visible that we really need a process-wide per-thread
storage area that would cover more than just these parts of the tasks.
The code was arranged so that it's easy to move the pieces elsewhere if
needed.
2018-10-15 19:06:13 +02:00
Willy Tarreau
b20aa9eef3 MAJOR: tasks: create per-thread wait queues
Now we still have a main contention point with the timers in the main
wait queue, but the vast majority of the tasks are pinned to a single
thread. This patch creates a per-thread wait queue and queues a task
to the local wait queue without any locking if the task is bound to a
single thread (the current one) otherwise to the shared queue using
locking. This significantly reduces contention on the wait queue. A
test with 12 threads showed 11 ms spent in the WQ lock compared to
4.7 seconds in the same test without this change. The cache miss ratio
decreased from 19.7% to 19.2% on the 12-thread test, and its performance
increased by 1.5%.

Another indirect benefit is that the average queue size is divided
by the number of threads, which roughly removes log(nbthreads) levels
in the tree and further speeds up lookups.
2018-10-15 19:04:40 +02:00
Willy Tarreau
87d54a9a6d MEDIUM: fd/threads: only grab the fd's lock if the FD has more than one thread
The vast majority of FDs are only seen by one thread. Currently the lock
on FDs costs a lot because it's touched often, though there should be very
little contention. This patch ensures that the lock is only grabbed if the
FD is shared by more than one thread, since otherwise the situation is safe.
Doing so resulted in a 15% performance boost on a 12-threads test.
2018-10-15 13:25:06 +02:00
Willy Tarreau
98d334bd94 MINOR: tools: add a new function atleast2() to test masks for more than 1 bit
For threads it's common to have to check if a mask contains more than
one bit set. Let's have this "atleast2()" function report this.
2018-10-15 13:25:06 +02:00
Willy Tarreau
d944344f01 BUILD: peers: check allocation error during peers_init_sync()
peers_init_sync() doesn't check task_new()'s return value and doesn't
return any result to indicate success or failure. Let's make it return
an int and check it from the caller.

This can be backported as far as 1.6.
2018-10-15 13:24:43 +02:00
Willy Tarreau
8d26f02e69 BUILD: compiler: add a new statement "__unreachable()"
This statement is used as a hint for the compiler so that it knows that
the location where it's placed cannot be reached. It will mostly be used
after longjmp() or equivalent statements that deal with error processing
and that the compiler doesn't know will not return on certain conditions,
so that it doesn't complain about null dereferences on error paths.
2018-10-15 13:24:43 +02:00
Willy Tarreau
c1f40b38a6 MINOR: chunk: add chunk_cpy() and chunk_cat()
Sometimes we need to concatenate constant chunks to existing ones, but
no function currently exists to do this easily, hence these two new ones.
2018-10-12 16:58:01 +02:00
Christopher Faulet
25da9e34f1 MINOR: h1: Add the flag H1_MF_NO_PHDR to not add pseudo-headers during parsing
Some pseudo-headers are added during the headers parsing, mainly for the mux
H2. With this flag, it is possible to not add them. This avoid some boring
filtering in the mux H1.
2018-10-12 16:15:18 +02:00
Christopher Faulet
1dc2b49556 MINOR: h1: Change the union h1_sl to use indirect strings to store infos
Instead of using offsets relating to the parsed buffer to store start line
infos, we now use indirect strings. So now, these infos remain valid only if the
origin buffer remains untouched. But it's not a real problem because this union
is used during the parsing and never stored to a later use.
2018-10-12 16:14:57 +02:00
Christopher Faulet
08088e77c6 MINOR: conn-stream: Add CL_FL_NOT_FIRST flag
This flags will be used by multiplexers to warn a conn-stream (and, by
transitivity, a stream) it is not the first one created by the mux. It will help
mux H1 to handle keep-alive connections.
2018-10-12 16:09:26 +02:00
Christopher Faulet
315b39c391 MINOR: http: Use same flag for httpclose and forceclose options
Since keep-alive mode is the default mode, the passive close has disappeared,
and in the code, httpclose and forceclose options are handled the same way:
connections with the client and the server are closed as soon as the request and
the response are received and missing "Connection: close" header is added in
each direction.

So to make things clearer, forceclose is now an alias for httpclose. And
httpclose is explicitly an active close. So the old passive close does not exist
anymore. Internally, the flag PR_O_HTTP_PCL has been removed and PR_O_HTTP_FCL
has been replaced by PR_O_HTTP_CLO. In HTTP analyzers, the checks done to find
the right mode to use, depending on proxies options and "Connection: " header
value, have been simplified.

This should only be a cleanup and no changes are expected.
2018-10-12 16:07:56 +02:00
Christopher Faulet
10079f59b7 MINOR: http: Export some functions and do cleanup to prepare HTTP refactoring
To ease the refactoring, the function "http_header_add_tail" have been
remove. Now, "http_header_add_tail2" is always used. And the function
"capture_headers" have been renamed into "http_capture_headers". Finally, some
functions have been exported.
2018-10-12 16:00:45 +02:00
Christopher Faulet
702226c827 MINOR: stats: Add missing include
"proto/stats.h" must include "types/stats.h".
2018-10-12 16:00:32 +02:00
Christopher Faulet
7e266c7936 MINOR: http: Move comment about some HTTP macros in the right header file
HTTP_FLG_* and HTTP_IS_* were moved from "proto/proto_http.h" to "common/http.h"
but the associated comment was forgotten during the move.

This is 1.9-specific and should not be backported.
2018-10-12 16:00:24 +02:00
Olivier Houchard
4fdec7aafa BUG/MEDIUM: stream: Make sure to unsubscribe before si_release_endpoint.
Make sure we unsubscribe from events before si_release_endpoint destroys
the conn_stream, or it will be never called. To do so, move the call to
unsubscribe to si_release_endpoint() directly.

This is 1.9-specific and shouldn't be backported.
2018-10-11 17:16:43 +02:00
Olivier Houchard
fa8aa867b9 MEDIUM: connections: Change struct wait_list to wait_event.
When subscribing, we don't need to provide a list element, only the h2 mux
needs it. So instead, Add a list element to struct h2s, and use it when a
list is needed.
This forces us to use the unsubscribe method, since we can't just unsubscribe
by using LIST_DEL anymore.
This patch is larger than it should be because it includes some renaming.
2018-10-11 15:34:39 +02:00
Olivier Houchard
83a0cd8a36 MINOR: connections: Introduce an unsubscribe method.
As we don't know how subscriptions are handled, we can't just assume we can
use LIST_DEL() to unsubscribe, so introduce a new method to mux and connections
to do so.
2018-10-11 15:34:21 +02:00
Willy Tarreau
27346b01aa OPTIM: tools: optimize my_ffsl() for x86_64
This call is now used quite a bit in the fd cache, to decide which cache
to add/remove the fd to/from, when waking up a task for a single thread
in __task_wakeup(), in fd_cant_recv() and in fd_process_cached_events(),
and we can replace it with a single instruction, removing ~30 instructions
and ~80 bytes from the inner loop of some of these functions.

In addition the test for zero value was replaced with a comment saying
that it is illegal and leads to an undefined behaviour. The code does
not make use of this useless case today.
2018-10-10 19:24:23 +02:00
Willy Tarreau
2325d8af93 BUG/MINOR: threads: move declaration of capabilities to config.h
In commit f161d0f51 ("BUG/MINOR: pools/threads: don't ignore DEBUG_UAF
on double-word CAS capable archs") I moved some defines and accidently
messed up with lockfree pools. The problem is that the HA_HAVE_CAS_DW
macro is not defined anymore where the CONFIG_HAP_LOCKLESS_POOLS macro
is set, so this fix implicitly disabled lockfree pools.

This patch fixes this by moving the capabilities definition to config.h
(probably that we'd benefit from having an "arch.h" file to declare the
capabilities offered by the architecture). In a test on a 12-core machine,
we used to measure 19s spent in the pool lock for 1M requests without
this patch, and 0 with it so that's definitely a net saving.

No backport is required, this is only for 1.9.
2018-10-10 18:29:23 +02:00
Dirkjan Bussink
c26c72d89b CLEANUP: h1: Fix debug warnings for h1 headers
The wrong method was used to debug the h1m state here. This fixes both
the signature of the h1m method and also fixes the invocation to be
correct.
2018-10-09 15:09:29 +02:00
Dirkjan Bussink
415150f764 MEDIUM: ssl: add support for ciphersuites option for TLSv1.3
OpenSSL released support for TLSv1.3. It also added a separate function
SSL_CTX_set_ciphersuites that is used to set the ciphers used in the
TLS 1.3 handshake. This change adds support for that new configuration
option by adding a ciphersuites configuration variable that works
essentially the same as the existing ciphers setting.

Note that it should likely be backported to 1.8 in order to ease usage
of the now released openssl-1.1.1.
2018-10-08 19:20:13 +02:00
Olivier Houchard
363c745569 BUG/MEDIUM: buffers: Make sure we don't wrap in ci_insert_line2/b_rep_blk.
In ci_insert_line2() and b_rep_blk(), we can't afford to wrap, so don't use
b_tail() to check if we do, use __b_tail() instead.

This should be backported to previous versions.
2018-10-08 16:11:54 +02:00
Emmanuel Hocdet
747ca61693 MINOR: ssl: generate-certificates for BoringSSL 2018-10-08 09:42:34 +02:00
Willy Tarreau
491cec20be CLEANUP: http: remove some leftovers from recent cleanups
The prototypes of functions find_hdr_value_end(), extract_cookie_value()
and http_header_match2() were still in proto_http.h while some of them
don't exist anymore and the others were just moved. Let's remove them.
In addition, da.c was updated to use http_extract_cookie_value() which
is the correct one.
2018-10-02 18:37:27 +02:00
Willy Tarreau
61c112aa5b REORG: http: move HTTP rules parsing to http_rules.c
These ones are mostly called from cfgparse.c for the parsing and do
not depend on the HTTP representation. The functions's prototypes
were moved to proto/http_rules.h, making this file work exactly like
tcp_rules. Ideally we should stop calling these functions directly
from cfgparse and register keywords, but there are a few cases where
that wouldn't work (stats http-request) so it's probably not worth
trying to go this far.
2018-10-02 18:28:05 +02:00
Willy Tarreau
79e57336b5 REORG: http: move the code to different files
The current proto_http.c file is huge and contains different processing
domains making it very difficult to work on an alternative representation.
This commit moves some parts to other files :

  - ACL registration code => http_acl.c
    This code only creates some ACL mappings and doesn't know anything
    about HTTP nor about the representation. This code could even have
    moved to acl.c but it was not worth polluting it again.

  - HTTP sample conversion => http_conv.c
    This code doesn't depend on the internal representation but definitely
    manipulates some HTTP elements, such as dates. It also has access to
    captures.

  - HTTP sample fetching => http_fetch.c
    This code does depend entirely on the internal representation but is
    totally independent on the analysers. Placing it into a different
    file will ease the transition to the new representation and the
    creation of a wrapper if required. An include file was created due
    to CHECK_HTTP_MESSAGE_FIRST() being used at various places.

  - HTTP action registration => http_act.c
    This code doesn't directly interact with the messages nor the
    transaction but it does so via some exported http functions like
    http_replace_req_line() or http_set_status() so it will be easier
    to change only this after the conversion.

  - a few very generic parts were found and moved to http.{c,h} as
    relevant.

It is worth noting that the functions moved to these new files are not
referenced anywhere outside of the files and are only called as registered
callbacks, so these files do not even require associated include files.
2018-10-02 18:26:59 +02:00
Adis Nezirovic
8878f8eb3d MEDIUM: lua: Add stick table support for Lua.
This ads support for accessing stick tables from Lua. The supported
operations are reading general table info, lookup by string/IP key, and
dumping the table.

Similar to "show table", a data filter is available during dump, and as
an improvement over "show table" it's possible to use up to 4 filter
expressions instead of just one (with implicit AND clause binding the
expressions). Dumping with/without filters can take a long time for
large tables, and should be used sparingly.
2018-09-29 20:15:01 +02:00
Olivier Houchard
0e367bbb01 BUG/MEDIUM: process_stream: Don't use si_cs_io_cb() in process_stream().
Instead of using si_cs_io_cb() in process_stream()  use si_cs_send/si_cs_recv
instead, as si_cs_io_cb() may lead to process_stream being woken up when it
shouldn't be, and thus timeout would never get triggered.
2018-09-26 14:21:54 +02:00
Willy Tarreau
7f2a44d319 BUG/CRITICAL: hpack: fix improper sign check on the header index value
Tim Düsterhus found using afl-fuzz that some parts of the HPACK decoder
use incorrect bounds checking which do not catch negative values after
a type cast. The first culprit is hpack_valid_idx() which takes a signed
int and is fed with an unsigned one, but a few others are affected as
well due to being designed to work with an uint16_t as in the table
header, thus not being able to detect the high offset bits, though they
are not exposed if hpack_valid_idx() is fixed.

The impact is that the HPACK decoder can be crashed by an out-of-bounds
read. The only work-around without this patch is to disable H2 in the
configuration.

CVE-2018-14645 was assigned to this bug.

This patch addresses all of these issues at once. It must be backported
to 1.8.
2018-09-20 11:45:56 +02:00
Willy Tarreau
55e0da664e BUILD: connection: silence a couple of null-deref build warnings at -Wextra
These ones don't need to be checked either.
2018-09-20 11:42:15 +02:00
Willy Tarreau
4ae4923c3e MINOR: stream-int: make si_appctx() never fail
Callers of si_appctx() always use the result without checking it because
they know by construction that it's valid. This results in unchecked null
pointer warnings at -Wextra, so let's remove this test and make it clear
that it's up to the caller to check validity first.
2018-09-20 11:42:15 +02:00
Willy Tarreau
babc15e8cf MINOR: stktable: provide an unchecked version of stktable_data_ptr()
stktable_data_ptr() currently performs null pointer checks but most
callers don't check the result since they know by construction that
it cannot be null. This causes valid warnings when building with
-Wextra which are worth addressing since it will result in better
code. Let's provide an unguarded version of this function for use
where the check is known to be useless and untested.
2018-09-20 11:42:15 +02:00
Willy Tarreau
4c0fcc2314 BUG/MINOR: tools: fix set_net_port() / set_host_port() on IPv4
These two functions were apparently written on the same model as their
parents when added by commit 11bcb6c4f ("[MEDIUM] IPv6 support for syslog")
except that they perform an assignment instead of a return, and as a
result fall through the next case where the assigned value may possibly
be partially overwritten. At least under Linux the port offset is the
same in both sockaddr_in and sockaddr_in6 so the value is written twice
without side effects.

This needs to be backported as far as 1.5.
2018-09-20 10:52:48 +02:00
Willy Tarreau
2557f6a3e2 MEDIUM: h1: better handle transfer-encoding vs content-length
The transfer-encoding header processing was a bit lenient in this part
because it was made to read messages already validated by haproxy. We
absolutely need to reinstate the strict processing defined in RFC7230
as is currently being done in proto_http.c. That is, transfer-encoding
presence alone is enough to cancel content-length, and must be
terminated by the "chunked" token, except in the response where we
can fall back to the close mode if it's not last.

For this we now use a specific parsing function which updates the
flags and we introduce a new flag H1_MF_XFER_ENC indicating that the
transfer-encoding header is present.

Last, if such a header is found, we delete all content-length header
fields found in the message.
2018-09-14 17:40:35 +02:00
Willy Tarreau
e2c418e94b MINOR: http: add http_hdr_del() to remove a header from a list
This one removes all occurrences of the specified header field name from
a complete list and returns the new count.
2018-09-14 17:40:35 +02:00
Christopher Faulet
c4e53f4ad7 MINOR: h1: Add H1_MF_XFER_LEN flag
This flag is usefull to handle cases where there is no body, regardless of CL or
TE headers (for instance, responses to HEAD requests). It will not be set by the
parser itself.
2018-09-14 16:02:40 +02:00