MX_FL_NO_UPG flag may now be set on a multiplexer to explicitly disable
upgrades from this mux. For now, it is set on the FCGI multiplexer because
it is not supported and there is no upgrade on backend-only multiplexers. It
is also set on the H2 multiplexer because it is clearly not supported.
Historically we've used SOL_IP/SOL_IPV6/SOL_TCP everywhere as the socket
level value in getsockopt() and setsockopt() but as we've seen over time
it regularly broke the build and required to have them defined to their
IPPROTO_* equivalent. The Linux ip(7) man page says:
Using the SOL_IP socket options level isn't portable; BSD-based
stacks use the IPPROTO_IP level.
And it indeed looks like a pure linuxism inherited from old examples and
documentation. strace also reports SOL_* instead of IPPROTO_*, which does
not help... A check to linux/in.h shows they have the same values. Only
SOL_SOCKET and other non-IP values make sense since there is no IPPROTO
equivalent.
Let's get rid of this annoying confusion by removing all redefinitions of
SOL_IP/IPV6/TCP and using IPPROTO_* instead, just like any other operating
system. This also removes duplicated tests for the same value.
Note that this should not result in exposing syscalls to other OSes
as the only ones that were still conditionned to SOL_IPV6 were for
IPV6_UNICAST_HOPS which already had an IPPROTO_IPV6 equivalent, and
IPV6_TRANSPARENT which is Linux-specific.
Very often we use "int" where negative numbers are not needed (and can
further cause trouble) just because it's painful to type "unsigned int"
or "unsigned", or ugly to use in function arguments. Similarly sometimes
chars would absolutely need to be signed but nobody types "signed char".
Let's add a few aliases for such types and make them part of the standard
internal API so that over time we can get used to them and get rid of
horrible definitions. A comment also reminds some commonly available
types and their properties regarding other types.
In order to process samples from the command line interface we'll need
rules as well, and these rules will have to be marked as coming from
the CLI parser. This new origin is used for this.
In order to prepare for supporting calling sample expressions from the
CLI, let's create a new CLI_PARSER parsing context. This one supports
constants and internal samples only.
In order to process samples from the config file we'll need rules as
well, and these rules will have to be marked as coming from the
config parser. This new origin is used for this.
We'd sometimes like to be able to process samples while parsing
the configuration based on purely internal thing but that's not
possible right now. Let's add a new CFG_PARSER context for samples
which only permits constant samples (i.e. those which do not change
in the process' life and which are stable during config parsing).
This level indicates that everything it constant in the expression during
the whole process' life and that it may safely be used at config parsing
time.
For now smp_resolve_args() complains on stderr via ha_alert(), but if we
want to make it a bit more dynamic, we need it to return errors in an
allocated message. Let's pass it an error pointer and have it fill it.
On return we indent the output if it contains more than one line.
Define a new cap PR_CAP_LUA. It can be used to allocate the internal
proxy for lua Socket class. This cap overrides default settings for
preferable values in the lua context.
Move all liberation code related to a proxy in a dedicated function
free_proxy in proxy.c. For now, this function is only called in
haproxy.c. In the future, it will be used to free the lua proxy.
This helps to clean up haproxy.c.
Create a new function parse_new_proxy specifically designed to allocate
a new proxy from the configuration file and copy settings from the
default proxy.
The function alloc_new_proxy is reduced to a minimal allocation. It is
used for default proxy allocation and could also be used for internal
proxies such as the lua Socket proxy.
Move deinit_acl_cond and deinit_act_rules from haproxy.c respectively in
acl.c and action.c. The name of the functions has been slightly altered,
replacing the prefix deinit_* by free_* to reflect their purpose more
clearly.
This change has been made in preparation to the implementation of a free
proxy function. As a side-effect, it helps to clean up haproxy.c.
Create a new module init which contains code related to REGISTER_*
macros for initcalls. init.h is included in api.h to make init code
available to all modules.
It's a step to clean up a bit haproxy.c/global.h.
The default SSL_CTX used by a specific frontend is the one of the first
ckch instance created for this frontend. If this instance has SNIs, then
the SSL context is linked to the instance through the list of SNIs
contained in it. If the instance does not have any SNIs though, then the
SSL_CTX is only referenced by the bind_conf structure and the instance
itself has no link to it.
When trying to update a certificate used by the default instance through
a cli command, a new version of the default instance was rebuilt but the
default SSL context referenced in the bind_conf structure would not be
changed, resulting in a buggy behavior in which depending on the SNI
used by the client, he could either use the new version of the updated
certificate or the original one.
This patch adds a reference to the default SSL context in the default
ckch instances so that it can be hot swapped during a certificate
update.
This should fix GitHub issue #1143.
It can be backported as far as 2.2.
Christopher discovered an issue mostly affecting 2.2 and to a less extent
2.3 and above, which is that it's possible to deadlock a soft-stop when
several threads are using a same listener:
thread1 thread2
unbind_listener() fd_set_running()
lock(listener) listener_accept()
fd_delete() lock(listener)
while (running_mask); -----> deadlock
unlock(listener)
This simple case disappeared from 2.3 due to the removal of some locked
operations at the end of listener_accept() on the regular path, but the
architectural problem is still here and caused by a lock inversion built
around the loop on running_mask in fd_clr_running_excl(), because there
are situations where the caller of fd_delete() may hold a lock that is
preventing other threads from dropping their bit in running_mask.
The real need here is to make sure the last user deletes the FD. We have
all we need to know the last one, it's the one calling fd_clr_running()
last, or entering fd_delete() last, both of which can be summed up as
the last one calling fd_clr_running() if fd_delete() calls fd_clr_running()
at the end. And we can prevent new threads from appearing in running_mask
by removing their bits in thread_mask.
So what this patch does is that it sets the running_mask for the thread
in fd_delete(), clears the thread_mask, thus marking the FD as orphaned,
then clears the running mask again, and completes the deletion if it was
the last one. If it was not, another thread will pass through fd_clr_running
and will complete the deletion of the FD.
The bug is easily reproducible in 2.2 under high connection rates during
soft close. When the old process stops its listener, occasionally two
threads will deadlock and the old process will then be killed by the
watchdog. It's strongly believed that similar situations do exist in 2.3
and 2.4 (e.g. if the removal attempt happens during resume_listener()
called from listener_accept()) but if so, they should be much harder to
trigger.
This should be backported to 2.2 as the issue appeared with the FD
migration. It requires previous patches "fd: make fd_clr_running() return
the remaining running mask" and "MINOR: fd: remove the unneeded running
bit from fd_insert()".
Notes for backport: in 2.2, the fd_dodelete() function requires an extra
argument "do_close" indicating whether we want to remove and close the FD
(fd_delete) or just delete it (fd_remove). While this information is not
conveyed along the chain, we know that late calls always imply do_close=1
become do_close=0 exclusively results from fd_remove() which is only used
by the config parser and the master, both of which are single-threaded,
hence are always the last ones in the running_mask. Thus it is safe to
assume that a postponed FD deletion always implies do_close=1.
Thanks to Olivier for his help in designing this optimal solution.
There's no point taking the running bit in fd_insert() since by
definition there will never be more than one thread inserting the FD,
and that fd_insert() may only be done after the fd was allocated by
the system, indicating the end of use by any other thread.
This will need to be backported to 2.2 to fix an issue.
We'll need to know that a thread is the last one to use an fd, so let's
make fd_clr_running() return the remaining bits after removal. Note that
in practice we're only interested in knowing if it's zero but the compiler
doesn't make use of the clags after the AND and emits a CMPXCHG anyway :-/
This will need to be backported to 2.2 to fix an issue.
The commit reverts following commits:
* 83926a04 BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable
* a61789a1 MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
Instead of relying on a Lua function to print the lua traceback into the
debugger, we are now using our own internal function (hlua_traceback()).
This one does not allocate memory and use a chunk instead. This avoids any
issue with a possible deadlock in the memory allocator because the thread
processing was interrupted during a memory allocation.
This patch relies on the commit "BUG/MEDIUM: debug/lua: Use internal hlua
function to dump the lua traceback". Both must be backported wherever the
patches above are backported, thus as far as 2.0
The separator string is now configurable, passing it as parameter when the
function is called. In addition, the message have been slightly changed to
be a bit more readable.
If an unknown CA file was first mentioned in an "add ssl crt-list" CLI
command, it would result in a call to X509_STORE_load_locations which
performs a disk access which is forbidden during runtime. The same would
happen if a "ca-verify-file" or "crl-file" was specified. This was due
to the fact that the crt-list file parsing and the crt-list related CLI
commands parsing use the same functions.
The patch simply adds a new parameter to all the ssl_bind parsing
functions so that they know if the call is made during init or by the
CLI, and the ssl_store_load_locations function can then reject any new
cafile_entry creation coming from a CLI call.
It can be backported as far as 2.2.
str2sa_range function options PA_O_DGRAM and PA_O_STREAM are used to
define the supported address types but also to set the default type
if it is not explicit. If the used address support both STREAM and DGRAM,
the default was always set to STREAM.
This patch introduce a new option PA_O_DEFAULT_DGRAM to force the
default to DGRAM type if it is not explicit in the address field
and both STREAM and DGRAM are supported. If only DGRAM or only STREAM
is supported, it continues to be considered as the default.
In commit a1ecbca0a ("BUG/MINOR: freq_ctr/threads: make use of the last
updated global time"), for period-based counters, the millisecond part
of the global_now variable was used as the date for the new period. But
it's wrong, it only works with sub-second periods as it wraps every
second, and for other periods the counters never rotate anymore.
Let's make use of the newly introduced global_now_ms variable instead,
which contains the global monotonic time expressed in milliseconds.
This patch needs to be backported wherever the patch above is backported.
It depends on previous commit "MINOR: time: also provide a global,
monotonic global_now_ms timer".
The period-based freq counters need the global date in milliseconds,
so better calculate it and expose it rather than letting all call
places incorrectly retrieve it.
Here what we do is that we maintain a new globally monotonic timer,
global_now_ms, which ought to be very close to the global_now one,
but maintains the monotonic approach of now_ms between all threads
in that global_now_ms is always ahead of any now_ms.
This patch is made simple to ease backporting (it will be needed for
a subsequent fix), but it also opens the way to some simplifications
on the time handling: instead of computing the local time and trying
to force it to the global one, we should soon be able to proceed in
the opposite way, that is computing the new global time an making the
local one just the latest snapshot of it. This will bring the benefit
of making sure that the global time is always ahead of the local one.
The pool_alloc_dirty() function was renamed to __pool_alloc() and now
takes a set of flags indicating whether poisonning is permitted or not
and whether zeroing the area is needed or not. The pool_alloc() function
is now just a wrapper calling __pool_alloc(pool, 0).
This one used to maintain a shortcut in the pools allocation path that
was only justified by b_alloc_fast() which was not used! Let's get rid
of it as well so that the allocator becomes a bit more straight forward.
It is never used anymore since 1.7 where it was used by b_alloc_margin()
then replaced by direct calls to the pools function, and it maintains a
dependency on the exposed pools functions. It's time to get rid of it,
as it's not even certain it still works.
The channel's buffer allocator, channel_alloc_buffer(), was still relying
on the principle of a margin for the request and not for the response.
But this margin stopped working around 1.7 with the introduction of the
content filters such as SPOE, and was completely anihilated with the
local pools that came with threads. Let's simplify this and just use
b_alloc().
Right now there is a discrepancy beteween b_alloc() and b_allow_margin():
the former forcefully overwrites the target pointer while the latter tests
it and returns it as-is if already allocated.
As a matter of fact, all callers of b_alloc() either preliminary test the
buffer, or assume it's already null.
Let's remove this pain and make the function test the buffer's allocation
before doing it again, and match call places' expectations.
Some parts of the Lua are non-reentrant. We must be sure to carefully track
these parts to not dump the lua stack when it is interrupted inside such
parts. For now, we only identified the custom lua allocator. If the thread
is interrupted during the memory allocation, we must not try to print the
lua stack wich also allocate memory. Indeed, realloc() is not
async-signal-safe.
In this patch we introduce a thread-local counter. It is incremented before
entering in a non-reentrant part and decremented when exiting. It is only
performed in hlua_alloc() for now.
Now that connections aren't being reused when they failed, remove the
reset() method. It was unimplemented anywhere, except for H1 where it did
nothing, anyway.
Introduce a new XPRT method, start(). The init() method will now only
initialize whatever is needed for the XPRT to run, but any action the XPRT
has to do before being ready, such as handshakes, will be done in the new
start() method. That way, we will be sure the full stack of xprt will be
initialized before attempting to do anything.
The init() call is also moved to conn_prepare(). There's no longer any reason
to wait for the ctrl to be ready, any action will be deferred until start(),
anyway. This means conn_xprt_init() is no longer needed.
Remove static qualifier on stats_allocate_proxy_counters_internal. This
function will be used to allocate extra counters at runtime for dynamic
servers.
Prepare the server parsing API to support dynamic servers.
- define a new parsing flag to be used for dynamic servers
- each keyword contains a new field dynamic_ok to indicate if it can be
used for a dynamic server. For now, no keyword are supported.
- do not copy settings from the default server for a new dynamic server.
- a dynamic server is created in a maintenance mode and requires an
explicit 'enable server' command.
- a new server flag named SRV_F_DYNAMIC is created. This flag is set for
all servers created at runtime. It might be useful later, for example
to know if a server can be purged.
Modify the API of parse_server function. Use flags to describe the type
of the parsed server instead of discrete arguments. These flags can be
used to specify if a server/default-server/server-template is parsed.
Additional parameters are also specified (parsing of the address
required, resolve of a name must be done immediately).
It is now unneeded to use strcmp on args[0] in parse_server. Also, the
calls to parse_server are more explicit thanks to the flags.
The freq counters were using the thread's own time as the start of the
current period. The problem is that in case of contention, it was
occasionally possible to perform non-monotonic updates on the edge of
the next second, because if the upfront thread updates a counter first,
it causes a rotation, then the second thread loses the race from its
older time, and tries again, and detects a different time again, but
in the past so it only updates the counter, then a third thread on the
new date would detect a change again, thus provoking a rotation again.
The effect was triple:
- rare loss of stored values during certain transitions from one
period to the next one, causing counters to report 0
- half of the threads forced to go through the slow path every second
- difficult convergence when using many threads where the CAS can fail
a lot and we can observe N(N-1) attempts for N threads to complete
This patch fixes this issue in two ways:
- first, it now makes use og the monotonic global_now value which also
happens to be volatile and to carry the latest known time; this way
time will never jump backwards anymore and only the first thread
updates it on transition, the other ones do not need to.
- second, re-read the time in the loop after each failure, because
if the date changed in the counter, it means that one thread knows
a more recent one and we need to update. In this case if it matches
the new current second, the fast path is usable.
This patch relies on previous patch "MINOR: time: export the global_now
variable" and must be backported as far as 1.8.
This is the process-wide monotonic time that is used to update each
thread's own time. It may be required at a few places where a strictly
monotonic clock is required such as freq_ctr. It will be have to be
backported as a dependency of a forthcoming fix.
Some are not always easy to spot with "chk" vs "check" or hyphens at
some places and not at others. Now entering "option http-close" properly
suggests "httpclose" and "option tcp-chk" suggests "tcp-check". There's
no need to consider the proxy's capabilities, what matters is to figure
what related word the user tried to spell, and there are not that many
options anyway.
The distance between two words can be high due to a sub-word being missing
and in this case it happens that other totally unrealted words are proposed
because their average score looks lower thanks to being shorter. Here we're
introducing the notion of presence of each character so that word sequences
that contain existing sub-words are favored against the shorter ones having
nothing in common. In addition we do not distinguish being/end from a
regular delimitor anymore. That made it harder to spot inverted words.