In spoe_release_appctx(), the SPOE applet may be used after it was released to
get its exit status code. Of course, HAProxy crashes when this happens.
This patch must be backported to 1.9 and 1.8.
As fat as possible, we try to keep the connections alive on redirect. It's
possible when the request has no body or when the request parsing is finished.
No backport is needed.
The stats page now reports the per-process output bit rate and applies
the usual conversions needed to turn the TCP payload rate to an Ethernet
bit rate in order to give a reasonably accurate estimate of how far from
interface saturation we are.
Many times we've been missing per-process traffic statistics. While it
didn't make sense in multi-process mode, with threads it does. Thus we
now have a counter of bytes emitted by raw_sock, and a freq counter for
these as well. However, freq_ctr are limited to 32 bits, and given that
loads of 300 Gbps have already been reached over a loopback using
splicing, we need to downscale this a bit. Here we're storing 1/32 of
the byte rate, which gives a theorical limit of 128 GB/s or ~1 Tbps,
which is more than enough. Let's have fun re-reading this sentence in
2029 :-) The values can be read in "show info" output on the CLI.
It's needed on Linux to have access to timerfd_*, and on FreeBSD this
lib is needed as well, though not enabled in our default build. We can
see later if it's OK to enable it, for now let's fix the build issues.
SI_TKILL is for Linux. We're again in the non-portable area. Both OSes
use macros to define these values so we can #ifdef them. Let's make
SI_TKILL defined based on SI_LWP when only the latter is defined.
Bah, the linux manpage suggests to use si_int but it's a fake, it's only
a define on sigval.sival_int where sigval is defined as si_value. Let's
use si_value.sival_int, at least it builds on both Linux and FreeBSD. It's
likely that this code will have to be limited to a small subset of OSes
if it causes difficulties like this.
Released version 2.0-dev4 with the following main changes :
- BUILD: enable freebsd builds on cirrus-ci
- BUG/MINOR: http_fetch: Rely on the smp direction for "cookie()" and "hdr()"
- MEDIUM: Make 'option forceclose' actually warn
- MEDIUM: Make 'resolution_pool_size' directive fatal
- DOC: management: place "show activity" at the right place
- MINOR: cli/activity: show the dumping thread ID starting at 1
- MINOR: task: export global_task_mask
- MINOR: cli/debug: add a thread dump function
- BUG/MEDIUM: streams: Don't use CF_EOI to decide if the request is complete.
- BUG/MEDIUM: streams: Try to L7 retry before aborting the connection.
- BUG/MINOR: debug: make ha_task_dump() always check the task before dumping it
- BUG/MINOR: debug: make ha_task_dump() actually dump the requested task
- MINOR: debug: make ha_thread_dump() and ha_task_dump() take a buffer
- BUG/MINOR: debug: don't check the call date on tasklets
- MINOR: thread: implement ha_thread_relax()
- MINOR: task: put barriers after each write to curr_task
- MINOR: task: always reset curr_task when freeing a task or tasklet
- MINOR: stream: detach the stream from its own task on stream_free()
- MEDIUM: debug/threads: implement an advanced thread dump system
- REGTEST: extend the check duration on tls_health_checks and mark it slow
- DOC: fix "successful" typo
- MINOR: init: setenv HAPROXY_CFGFILES
- MINOR: threads/init: synchronize the threads startup
- MEDIUM: init/mworker: make the pipe register function a regular initcall
- CLEANUP: memory: make the fault injection code use the OTHER_LOCK label
- CLEANUP: threads: remove the now unused START_LOCK label
- MINOR: init/threads: make the global threads an array of structs
- MINOR: threads: add each thread's clockid into the global thread_info
- CLEANUP: stream: remove an obsolete debugging test
- MINOR: tools: add dump_hex()
- MINOR: debug: implement ha_panic()
- MINOR: debug/cli: add some debugging commands for developers
- MINOR: tools: provide a may_access() function and make dump_hex() use it
- MINOR: debug: make ha_panic() report threads starting at 1
- REORG: compat: move some integer limit definitions from standard.h to compat.h
- REORG: threads: move the struct thread_info from global.h to hathreads.h
- MINOR: compat: make sure to always define clockid_t
- MINOR: threads: always place the clockid in the struct thread_info
- MINOR: threads: add a thread-local thread_info pointer "ti"
- MINOR: time: move the cpu, mono, and idle time to thread_info
- MINOR: time: add a function to retrieve another thread's cputime
- MINOR: debug: report each thread's cpu usage in "show thread"
- BUILD: threads: only assign the clock_id when supported
- BUILD: makefile: use USE_OBSOLETE_LINKER for solaris
- BUILD: makefile: remove -fomit-frame-pointer optimisation (solaris)
- MAJOR: polling: add event ports support (Solaris)
- BUG/MEDIUM: streams: Don't switch from SI_ST_CON to SI_ST_DIS on read0.
- CLEANUP: time: refine the test on _POSIX_TIMERS
- MINOR: compat: define a new empty type empty_t for non-implemented fields
- CLEANUP: time: switch clockid_t to empty_t when not available
- BUG/MINOR: mworker: Fix memory leak of mworker_proc members
- CLEANUP: objtype: make obj_type() and obj_type_name() take consts
- MINOR: debug: switch to SIGURG for thread dumps
- CLEANUP: threads: really move thread_info to hathreads.c
- MINOR: threads: make threads_{harmless|want_rdv}_mask constant 0 without threads
- CLEANUP: debug: always report harmless/want_rdv even without threads
- MINOR: threads: implement ha_tkill() and ha_tkillall()
- CLEANUP: debug: make use of ha_tkill() and remove ifdefs
- MINOR: stream: introduce a stream_dump() function and use it in stream_dump_and_crash()
- MINOR: debug: dump streams when an applet, iocb or stream is known
- MINOR: threads: add a "stuck" flag to the thread_info struct
- MINOR: threads: add a timer_t per thread in thread_info
- MAJOR: watchdog: implement a thread lockup detection mechanism
- MINOR: stream: remove the cpu time detection from process_stream()
- MINOR: connection: report the mux names in "haproxy -vv"
- CLEANUP: mux-h1: use "H1" and not "h1" as the mux's name
- BUG/MEDIUM: WURFL: segfault in wurfl-get() with missing info.
- MINOR: WURFL: call header_retireve_callback() in dummy library
- MINOR: WURFL: fixed Engine load failed error when wurfl-information-list contains wurfl_root_id
- MINOR: WURFL: shows log messages during module initialization
- MINOR: WURFL: removes heading wurfl-information-separator from wurfl-get-all() and wurfl-get() results
- MINOR: WURFL: wurfl_get() and wurfl_get_all() now return an empty string if device detection fails
- MEDIUM: WURFL: HTX awareness.
- MINOR: WURFL: module version bump to 2.0
- MINOR: WURFL: do not emit warnings when not configured
- CONTRIB: wurfl: address 3 build issues in the wurfl dummy library
- BUG/MEDIUM: init/threads: provide per-thread alloc/free function callbacks
- BUILD: travis: add sanitizers to travis-ci builds
- BUILD: time: remove the test on _POSIX_C_SOURCE
- CLEANUP: build: rename some build macros to use the USE_* ones
- CLEANUP: raw_sock: remove support for very old linux splice bug workaround
- BUG/MEDIUM: dns: make the port numbers unsigned
- MEDIUM: config: deprecate the antique req* and rsp* commands
These commands don't follow the same flow as the rest of the commands,
each of them iterates over all header lines before switching to the
next directive. In addition they make no distinction between start
line and headers and can lead to unparsable rewrites which are very
difficult to deal with internally.
Most of them are still occasionally found in configurations, mainly
because of the usual "we've always done this way". By marking them
deprecated and emitting a warning and recommendation on first use of
each of them, we will raise users' awareness of users regarding the
cleaner, faster and more reliable alternatives.
Some use cases of "reqrep" still appear from time to time for URL
rewriting that is not so convenient with other rules. But at least
users facing this requirement will explain their use case so that we
can best serve them. Some discussion started on this subject in a
thread linked to from github issue #100.
The goal is to remove them in 2.1 since they require to reparse the
result before indexing it and we don't want this hack to live long.
The following directives were marked deprecated :
-reqadd
-reqallow
-reqdel
-reqdeny
-reqiallow
-reqidel
-reqideny
-reqipass
-reqirep
-reqitarpit
-reqpass
-reqrep
-reqtarpit
-rspadd
-rspdel
-rspdeny
-rspidel
-rspideny
-rspirep
-rsprep
Mustafa Yildirim reported in Discourse that ports >32767 advertised
in SRV records are wrong. Given the high value they definitely
correspond to a sign extension of a negative number. The cause was
indeed that the port is declared as a signed int in the dns_answer_item
structure, and Lukas confirmed in github issue #103 that turning it to
unsigned addresses the issue.
It is worth noting that there are other such fields in this structure
that don't look right (ttl, priority, class, type) and that someone
should audit this part to be certain they are properly typed.
This fix must be backported to 1.9 and likely to 1.8 as well.
We've been dealing with a workaround for a bug in splice that used to
affect version 2.6.25 to 2.6.27.12 and which was fixed 10 years ago
in kernel versions which are not supported anymore. Given that people
who would use a kernel in such a range would face much more serious
stability and security issues, it's about time to get rid of this
workaround and of the ASSUME_SPLICE_WORKS build option used to disable
it.
We still have quite a number of build macros which are mapped 1:1 to a
USE_something setting in the makefile but which have a different name.
This patch cleans this up by renaming them to use the USE_something
one, allowing to clean up the makefile and make it more obvious when
reading the code what build option needs to be added.
The following renames were done :
ENABLE_POLL -> USE_POLL
ENABLE_EPOLL -> USE_EPOLL
ENABLE_KQUEUE -> USE_KQUEUE
ENABLE_EVPORTS -> USE_EVPORTS
TPROXY -> USE_TPROXY
NETFILTER -> USE_NETFILTER
NEED_CRYPT_H -> USE_CRYPT_H
CONFIG_HAP_CRYPT -> USE_LIBCRYPT
CONFIG_HAP_NS -> DUSE_NS
CONFIG_HAP_LINUX_SPLICE -> USE_LINUX_SPLICE
CONFIG_HAP_LINUX_TPROXY -> USE_LINUX_TPROXY
CONFIG_HAP_LINUX_VSYSCALL -> USE_LINUX_VSYSCALL
It seems it's not defined on FreeBSD while it's mentioned on Linux that
clock_gettime() can be detected using this. Given that we also have the
test for _POSIX_TIMERS>0 that should cover it well enough. If it breaks
on other systems, we'll see.
Report was here :
https://github.com/haproxy/haproxy/runs/133866993
full list of changes:
use TARGET=osx instead of generic for osx builds,
add USE_PCRE_JIT=1, USE_GETADDRINFO=1 to build matrix,
enable address sanitizer for clang
enable device detection: WURFL, DEVICEATLAS
We currently have the ability to register functions to be called early
on thread creation and at thread deinitialization. It turns out this is
not sufficient because certain such functions may use resources that are
being allocated by the other ones, thus creating a race condition depending
only on the linking order. For example the mworker needs to register a
file descriptor while the pollers will reallocate the fd_updt[] array.
Similarly logs and trashes may be used by some init functions while it's
unclear whether they have been deduplicated.
The same issue happens on deinit, if the fd_updt[] or trash is released
before some functions finish to use them, we'll get into trouble.
This patch creates a couple of early and late callbacks for per-thread
allocation/freeing of resources. A few init functions were moved there,
and the fd init code was split between the two (since it used to both
allocate and initialize at once). This way the init/deinit sequence is
expected to be safe now.
This patch should be backported to 1.9 as at least the trash/log issue
seems to be present. The run_thread_poll_loop() code is a bit different
there as the mworker is not a callback, but it will have no effect and
it's enough to drop the mworker changes.
This bug was reported by Ilya Shipitsin in github issue #104.
At the moment the WURFL module emits 3 lines of warnings upon startup
when it is not referenced in the configuration file, which is quite
confusing. Let's make sure to keep it silent when not configured, as
detected by the absence of the wurfl-data-file statement.
The current coverage of the dummy library was limited because the callbacks
passed to wurfl_lookup() were not called. Now we do call them with one existing
and one non-existing headers to make sure that ha_wurfl_retrieve_header() is
covered by the tests as well.
A segfault may happen in ha_wurfl_get() when dereferencing information not
present in wurfl-information-list. Check the node retrieved from the tree,
not its container.
This fix must be backported to 1.9.
The mux's name is the only one reported in lower case in "show sess"
or "haproxy -vv" while the other ones are upper case, so it loses and
the other ones win :-)
It was not as efficient as the watchdog in that it would only trigger
after the problem resolved by itself, and still required a huge margin
to make sure we didn't trigger for an invalid reason. This used to leave
little indication about the cause. Better use the watchdog now and
improve it if needed.
The detector of unkillable tasks remains active though.
Since threads were introduced, we've naturally had a number of bugs
related to locking issues. In addition we've also got some issues
with corrupted lists in certain rare cases not necessarily involving
threads. Not only these events cause a lot of trouble to the production
as it is very hard to detect that the process is stuck in a loop and
doesn't deliver the service anymore, but it's often difficult (or too
late) to collect more debugging information.
The patch presented here implements a lockup detection mechanism, also
known as "watchdog". The principle is that (on systems supporting it),
each thread will have its own CPU timer which progresses as the thread
consumes CPU cycles, and when a deadline is met, a signal is delivered
(SIGALRM here since it doesn't interrupt gdb by default).
The thread handling this signal (which is not necessarily the one which
triggered the timer) figures the thread ID from the signal arguments and
checks if it's really stuck by looking at the time spent since last exit
from poll() and by checking that the thread's scheduler is still alive
(so that even when dealing with configuration issues resulting in insane
amount of tasks being called in turn, it is not possible to accidently
trigger it). Checking the scheduler's activity will usually result in a
second chance, thus doubling the detecting time.
In order not to incorrectly flag a thread as being the cause of the
lockup, the thread_harmless_mask is checked : a thread could very well
be spinning on itself waiting for all other threads to join (typically
what happens when issuing "show sess"). In this case, once all threads
but one (or two) have joined, all the innocent ones are marked harmless
and will not trigger the timer. Only the ones not reacting will.
The deadline is set to one second, which already appears impossible to
reach, especially since it's 1 second of CPU usage, not elapsed time
with the CPU being preempted by other threads/processes/hypervisor. In
practice due to the scheduler's health verification it takes up to two
seconds to decide to panic.
Once all conditions are met, the goal is to crash from the offending
thread. So if it's the current one, we call ha_panic() otherwise the
signal is bounced to the offending thread which deals with it. This
will result in all threads being woken up in turn to dump their context,
the whole state is emitted on stderr in hope that it can be logged, and
the process aborts, leaving a chance for a core to be dumped and for a
service manager to restart it.
An alternative mechanism could be implemented for systems unable to
wake up a thread once its CPU clock reaches a deadline (e.g. FreeBSD).
Instead of waking the timer each and every deadline, it is possible to
use a standard timer which is reset each time we leave poll(). Since
the signal handler rechecks the CPU consumption this will also work.
However a totally idle process may trigger it from time to time which
may or may not confuse some debugging sessions. The same is true for
alarm() which could be another option for systems not having such a
broad choice of timers (but it seems that in this case they will not
have per-thread CPU measurements available either).
The feature is currently implemented only when threads are enabled in
order to keep the code clean, since the main purpose is to detect and
address inter-thread deadlocks. But if it proves useful for other
situations this condition might be relaxed.
This will be used by the watchdog to detect that a thread locked up.
It's only defined on platforms supporting it. This patch only reserves
the room for the timer in the struct. A special value was reserved for
the uninitialized timer. The problem is that the POSIX API was horribly
designed, defining no invalid value, thus for each timer it is required
to keep a second variable to indicate whether it's valid. A quick check
shows that defining a 32-bit invalid value is not something uncommon
across other implementations, with ~0 being common. Let's try with this
and if it causes issues we can revisit this decision.
This flag is constantly cleared by the scheduler and will be set by the
watchdog timer to detect stuck threads. It is also set by the "show
threads" command so that it is easy to spot if the situation has evolved
between two subsequent calls : if the first "show threads" shows no stuck
thread and the second one shows such a stuck thread, it indicates that
this thread didn't manage to make any forward progress since the previous
call, which is extremely suspicious.
Whenever we can retrieve a valid stream pointer, we now call stream_dump()
to get a detailed dump of the stream currently running on the processor.
This is used by "show threads" and by ha_panic().
This function dumps a lot of information about a stream into the provided
buffer. It is now used by stream_dump_and_crash() and will be used by the
debugger as well.
These functions are used respectively to signal one thread or all threads.
When multithreading is disabled, it's always the current thread which is
signaled.
Some code starts to add ifdefs everywhere to work around the lack of
threads_harmless_mask when threads are not compiled in. This one is
often used to indicate a thread having joined the rendez-vous point or
a thread sleeping in the poller. By setting it to zero we translate
what usually is required in debugging code (i.e. the only thread is
currently working) and for signal handlers we can use a combination of
threads_harmless_mask and sleeping_threads_mask to detect the polling
cases as well. Similarly do the same with threads_want_rdv_mask which
is less often used though.
Commit 5a6e2245f ("REORG: threads: move the struct thread_info from
global.h to hathreads.h") didn't hold its promise well, as the thread_info
struct was still declared and initialized in haproxy.c in addition to being
in hathreads.c. Let's move it for real now.
The current choice of SIGPWR has the adverse effect of stopping gdb each
time it is triggered using "show threads" or example, which is not really
convenient. Let's switch to SIGURG instead, which we don't use either.
The struct mworker_proc is not uniformly freed everywhere, sometimes leading
to leaks of the `id` string (and possibly the other strings).
Introduce a mworker_free_child function instead of duplicating the freeing
logic everywhere to prevent this kind of issues.
This leak was reported in issue #96.
It looks like the leaks have been introduced in commit 9a1ee7ac31,
which is specific to 2.0-dev. Backporting `mworker_free_child` might be
helpful to ease backporting other fixes, though.
Some structures have optional fields which depend on availability of
certain features on certain platforms, and having to stuff lots of
ifdefs in these structs makes them unreadable. Using real values like
ints requires some initialization and adds even more confusion.
Here we take a different approach : we create an empty type called
empty_t to use as a substitute for the real type that is not implemented
and which doesn't contain any value (it's an empty struct). Thus it has
a size of zero but an address, thus a pointer may point to it. It will
not have to be initialized though. Some initialization code might even
continue to work and do nothing like initializing it using memset with
its sizeof which is zero.
The clock_gettime() man page says we must check that _POSIX_TIMERS is
defined to a value greater than zero, not just that it's simply defined
so let's fix this right now.
When we receive a read0, and we're still in SI_ST_CON state (so on an
outgoing conneciton), don't immediately switch to SI_ST_DIS, or, we would
never call sess_establish(), and so the analysers will never run.
Instead, let sess_establish() handle that case, and switch to SI_ST_DIS if
we already have CF_SHUTR on the channel.
This should be backported to 1.9.
Event ports are kqueue/epoll polling class for Solaris. Code is based
on https://github.com/joyent/haproxy-1.8/tree/joyent/dev-v1.8.8.
Event ports are available only on SunOS systems derived from
Solaris 10 and later (including illumos systems).
-fomit-frame-pointer is commonly avoided because tools like dtrace
needs frame-pointer. Remove it from Makefile and let builder's env
do the job.
This patch could be backported to 1.9.
I took extreme care to always check for _POSIX_THREAD_CPUTIME before
manipulating clock_id, except at one place (run_thread_poll_loop) as
found by Manu, breaking Solaris. Now fixed, no backport needed.
Now we can report each thread's CPU time, both at wake up (poll) and
retrieved while dumping (now), then the difference, which directly
indicates how long the thread has been running uninterrupted. A very
high value for the diff could indicate a deadlock, especially if it
happens between two threads. Note that it may occasionally happen
that a wrong value is displayed since nothing guarantees that the
date is read atomically.