No need to keep this flag apart any more, let's merge it into the global
state. The CLI's output state was extended to 6 digits and the linger/cloned
flags moved inside the parenthesis.
For a long time we've had fdtab[].ev and fdtab[].state which contain two
arbitrary sets of information, one is mostly the configuration plus some
shutdown reports and the other one is the latest polling status report
which also contains some sticky error and shutdown reports.
These ones used to be stored into distinct chars, complicating certain
operations and not even allowing to clearly see concurrent accesses (e.g.
fd_delete_orphan() would set the state to zero while fd_insert() would
only set the event to zero).
This patch creates a single uint with the two sets in it, still delimited
at the byte level for better readability. The original FD_EV_* values
remained at the lowest bit levels as they are also known by their bit
value. The next step will consist in merging the remaining bits into it.
The whole bits are now cleared both in fd_insert() and _fd_delete_orphan()
because after a complete check, it is certain that in both cases these
functions are the only ones touching these areas. Indeed, for
_fd_delete_orphan(), the thread_mask has already been zeroed before a
poller can call fd_update_event() which would touch the state, so it
is certain that _fd_delete_orphan() is alone. Regarding fd_insert(),
only one thread will get an FD at any moment, and it as this FD has
already been released by _fd_delete_orphan() by definition it is certain
that previous users have definitely stopped touching it.
Strictly speaking there's no need for clearing the state again in
fd_insert() but it's cheap and will remove some doubts during some
troubleshooting sessions.
In preparation of merging FD_POLL* and FD_EV*, this only changes the
value of FD_POLL_* to use bits 8-15 (the second byte). The size of the
field has been temporarily extended to 32 bits already, as well as
the temporary variables that carry the new composite value inside
fd_update_events(). The resulting fdtab entry becomes temporarily
unaligned. All places making access to .ev or FD_POLL_* were carefully
inspected to make sure they were safe regarding this change. Only one
temporary update was needed for the "show fd" code. The code was only
slightly inflated at this step.
The former was not used and the second was used only as a positive mask
of the flags to keep instead of having the flags that are updated. Both
were removed in favor of a new FD_POLL_UPDT_MASK that only mentions the
updated flags. This will ease merging of state and ev later.
This patch registers the parsed file and the line where a log server
is declared to make those information available in configuration
post check.
Those new informations were added on error messages probed resolving
ring names on post configuration check.
Define MODE_DIAG which is used to run haproxy in diagnostic mode. This
mode is used to output extra warnings about possible configuration
blunder or sub-optimal usage. It can be activated with argument '-dD'.
A new output function ha_diag_warning is implemented reserved for
diagnostic output. It serves to standardize the format of diagnostic
messages.
A macro HA_DIAG_WARN_COND is also available to automatically check if
diagnostic mode is on before executing the diagnostic check.
Historically, an option was added to wait for the request payload (option
http-buffer-request). This option has 2 drawbacks. First, it is an ON/OFF
option for the whole proxy. It cannot be enabled on demand depending on the
message. Then, as its name suggests, it only works on the request side. The
only option to wait for the response payload was to write a dedicated
filter. While it is an acceptable solution for complex applications, it is a
bit overkill to simply match strings in the body.
To make everyone happy, this patch adds a dedicated HTTP action to wait for
the message payload, for the request or the response depending it is used in
an http-request or an http-response ruleset. The time to wait is
configurable and, optionally, the minimum payload size to have before stop
to wait.
Both the http action and the old http analyzer rely on the same internal
function.
L6 sample fetches are now ignored when called from an HTTP proxy. Thus, a
warning is emitted during the startup if such usage is detected. It is true
for most ACLs and for log-format strings. Unfortunately, it is a bit painful
to do so for sample expressions.
This patch relies on the commit "MINOR: action: Use a generic function to
check validity of an action rule list".
The check_action_rules() function is now used to check the validity of an
action rule list. It is used from check_config_validity() function to check
L5/6/7 rulesets.
It is now possible to perform HTTP upgrades on a TCP stream from the
frontend side. To do so, a tcp-request content rule must be defined with the
switch-mode action, specifying the mode (for now, only http is supported)
and optionnaly the proto (h1 or h2).
This way it could be possible to set HTTP directives on a TCP frontend which
will only be evaluated if an upgrade is performed. This new way to perform
HTTP upgrades should replace progressively the old way, consisting to route
the request to an HTTP backend. And it should be also a good start to remove
all HTTP processing from tcp-request content rules.
This action is terminal, it stops the ruleset evaluation. It is only
available on proxy with the frontend capability.
The configuration manual has been updated accordingly.
The code responsible to perform an HTTP upgrade from a TCP stream is moved
in a dedicated function, stream_set_http_mode().
The stream_set_backend() function is slightly updated, especially to
correctly set the request analysers.
Now allocation and initialization of HTTP transactions are performed in a
unique function. Historically, there were two functions because the same TXN
was reset for K/A connections in the legacy HTTP mode. Now, in HTX, K/A
connections are handled at the mux level. A new stream, and thus a new TXN,
is created for each request. In addition, the function responsible to end
the TXN is now also reponsible to release it.
So, now, http_create_txn() and http_destroy_txn() must be used to create and
destroy an HTTP transaction.
MX_FL_NO_UPG flag may now be set on a multiplexer to explicitly disable
upgrades from this mux. For now, it is set on the FCGI multiplexer because
it is not supported and there is no upgrade on backend-only multiplexers. It
is also set on the H2 multiplexer because it is clearly not supported.
Historically we've used SOL_IP/SOL_IPV6/SOL_TCP everywhere as the socket
level value in getsockopt() and setsockopt() but as we've seen over time
it regularly broke the build and required to have them defined to their
IPPROTO_* equivalent. The Linux ip(7) man page says:
Using the SOL_IP socket options level isn't portable; BSD-based
stacks use the IPPROTO_IP level.
And it indeed looks like a pure linuxism inherited from old examples and
documentation. strace also reports SOL_* instead of IPPROTO_*, which does
not help... A check to linux/in.h shows they have the same values. Only
SOL_SOCKET and other non-IP values make sense since there is no IPPROTO
equivalent.
Let's get rid of this annoying confusion by removing all redefinitions of
SOL_IP/IPV6/TCP and using IPPROTO_* instead, just like any other operating
system. This also removes duplicated tests for the same value.
Note that this should not result in exposing syscalls to other OSes
as the only ones that were still conditionned to SOL_IPV6 were for
IPV6_UNICAST_HOPS which already had an IPPROTO_IPV6 equivalent, and
IPV6_TRANSPARENT which is Linux-specific.
Very often we use "int" where negative numbers are not needed (and can
further cause trouble) just because it's painful to type "unsigned int"
or "unsigned", or ugly to use in function arguments. Similarly sometimes
chars would absolutely need to be signed but nobody types "signed char".
Let's add a few aliases for such types and make them part of the standard
internal API so that over time we can get used to them and get rid of
horrible definitions. A comment also reminds some commonly available
types and their properties regarding other types.
In order to process samples from the command line interface we'll need
rules as well, and these rules will have to be marked as coming from
the CLI parser. This new origin is used for this.
In order to prepare for supporting calling sample expressions from the
CLI, let's create a new CLI_PARSER parsing context. This one supports
constants and internal samples only.
In order to process samples from the config file we'll need rules as
well, and these rules will have to be marked as coming from the
config parser. This new origin is used for this.
We'd sometimes like to be able to process samples while parsing
the configuration based on purely internal thing but that's not
possible right now. Let's add a new CFG_PARSER context for samples
which only permits constant samples (i.e. those which do not change
in the process' life and which are stable during config parsing).
This level indicates that everything it constant in the expression during
the whole process' life and that it may safely be used at config parsing
time.
For now smp_resolve_args() complains on stderr via ha_alert(), but if we
want to make it a bit more dynamic, we need it to return errors in an
allocated message. Let's pass it an error pointer and have it fill it.
On return we indent the output if it contains more than one line.
Define a new cap PR_CAP_LUA. It can be used to allocate the internal
proxy for lua Socket class. This cap overrides default settings for
preferable values in the lua context.
Move all liberation code related to a proxy in a dedicated function
free_proxy in proxy.c. For now, this function is only called in
haproxy.c. In the future, it will be used to free the lua proxy.
This helps to clean up haproxy.c.
Create a new function parse_new_proxy specifically designed to allocate
a new proxy from the configuration file and copy settings from the
default proxy.
The function alloc_new_proxy is reduced to a minimal allocation. It is
used for default proxy allocation and could also be used for internal
proxies such as the lua Socket proxy.
Move deinit_acl_cond and deinit_act_rules from haproxy.c respectively in
acl.c and action.c. The name of the functions has been slightly altered,
replacing the prefix deinit_* by free_* to reflect their purpose more
clearly.
This change has been made in preparation to the implementation of a free
proxy function. As a side-effect, it helps to clean up haproxy.c.
Create a new module init which contains code related to REGISTER_*
macros for initcalls. init.h is included in api.h to make init code
available to all modules.
It's a step to clean up a bit haproxy.c/global.h.
The default SSL_CTX used by a specific frontend is the one of the first
ckch instance created for this frontend. If this instance has SNIs, then
the SSL context is linked to the instance through the list of SNIs
contained in it. If the instance does not have any SNIs though, then the
SSL_CTX is only referenced by the bind_conf structure and the instance
itself has no link to it.
When trying to update a certificate used by the default instance through
a cli command, a new version of the default instance was rebuilt but the
default SSL context referenced in the bind_conf structure would not be
changed, resulting in a buggy behavior in which depending on the SNI
used by the client, he could either use the new version of the updated
certificate or the original one.
This patch adds a reference to the default SSL context in the default
ckch instances so that it can be hot swapped during a certificate
update.
This should fix GitHub issue #1143.
It can be backported as far as 2.2.
Christopher discovered an issue mostly affecting 2.2 and to a less extent
2.3 and above, which is that it's possible to deadlock a soft-stop when
several threads are using a same listener:
thread1 thread2
unbind_listener() fd_set_running()
lock(listener) listener_accept()
fd_delete() lock(listener)
while (running_mask); -----> deadlock
unlock(listener)
This simple case disappeared from 2.3 due to the removal of some locked
operations at the end of listener_accept() on the regular path, but the
architectural problem is still here and caused by a lock inversion built
around the loop on running_mask in fd_clr_running_excl(), because there
are situations where the caller of fd_delete() may hold a lock that is
preventing other threads from dropping their bit in running_mask.
The real need here is to make sure the last user deletes the FD. We have
all we need to know the last one, it's the one calling fd_clr_running()
last, or entering fd_delete() last, both of which can be summed up as
the last one calling fd_clr_running() if fd_delete() calls fd_clr_running()
at the end. And we can prevent new threads from appearing in running_mask
by removing their bits in thread_mask.
So what this patch does is that it sets the running_mask for the thread
in fd_delete(), clears the thread_mask, thus marking the FD as orphaned,
then clears the running mask again, and completes the deletion if it was
the last one. If it was not, another thread will pass through fd_clr_running
and will complete the deletion of the FD.
The bug is easily reproducible in 2.2 under high connection rates during
soft close. When the old process stops its listener, occasionally two
threads will deadlock and the old process will then be killed by the
watchdog. It's strongly believed that similar situations do exist in 2.3
and 2.4 (e.g. if the removal attempt happens during resume_listener()
called from listener_accept()) but if so, they should be much harder to
trigger.
This should be backported to 2.2 as the issue appeared with the FD
migration. It requires previous patches "fd: make fd_clr_running() return
the remaining running mask" and "MINOR: fd: remove the unneeded running
bit from fd_insert()".
Notes for backport: in 2.2, the fd_dodelete() function requires an extra
argument "do_close" indicating whether we want to remove and close the FD
(fd_delete) or just delete it (fd_remove). While this information is not
conveyed along the chain, we know that late calls always imply do_close=1
become do_close=0 exclusively results from fd_remove() which is only used
by the config parser and the master, both of which are single-threaded,
hence are always the last ones in the running_mask. Thus it is safe to
assume that a postponed FD deletion always implies do_close=1.
Thanks to Olivier for his help in designing this optimal solution.
There's no point taking the running bit in fd_insert() since by
definition there will never be more than one thread inserting the FD,
and that fd_insert() may only be done after the fd was allocated by
the system, indicating the end of use by any other thread.
This will need to be backported to 2.2 to fix an issue.
We'll need to know that a thread is the last one to use an fd, so let's
make fd_clr_running() return the remaining bits after removal. Note that
in practice we're only interested in knowing if it's zero but the compiler
doesn't make use of the clags after the AND and emits a CMPXCHG anyway :-/
This will need to be backported to 2.2 to fix an issue.
The commit reverts following commits:
* 83926a04 BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable
* a61789a1 MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
Instead of relying on a Lua function to print the lua traceback into the
debugger, we are now using our own internal function (hlua_traceback()).
This one does not allocate memory and use a chunk instead. This avoids any
issue with a possible deadlock in the memory allocator because the thread
processing was interrupted during a memory allocation.
This patch relies on the commit "BUG/MEDIUM: debug/lua: Use internal hlua
function to dump the lua traceback". Both must be backported wherever the
patches above are backported, thus as far as 2.0
The separator string is now configurable, passing it as parameter when the
function is called. In addition, the message have been slightly changed to
be a bit more readable.
If an unknown CA file was first mentioned in an "add ssl crt-list" CLI
command, it would result in a call to X509_STORE_load_locations which
performs a disk access which is forbidden during runtime. The same would
happen if a "ca-verify-file" or "crl-file" was specified. This was due
to the fact that the crt-list file parsing and the crt-list related CLI
commands parsing use the same functions.
The patch simply adds a new parameter to all the ssl_bind parsing
functions so that they know if the call is made during init or by the
CLI, and the ssl_store_load_locations function can then reject any new
cafile_entry creation coming from a CLI call.
It can be backported as far as 2.2.
str2sa_range function options PA_O_DGRAM and PA_O_STREAM are used to
define the supported address types but also to set the default type
if it is not explicit. If the used address support both STREAM and DGRAM,
the default was always set to STREAM.
This patch introduce a new option PA_O_DEFAULT_DGRAM to force the
default to DGRAM type if it is not explicit in the address field
and both STREAM and DGRAM are supported. If only DGRAM or only STREAM
is supported, it continues to be considered as the default.
In commit a1ecbca0a ("BUG/MINOR: freq_ctr/threads: make use of the last
updated global time"), for period-based counters, the millisecond part
of the global_now variable was used as the date for the new period. But
it's wrong, it only works with sub-second periods as it wraps every
second, and for other periods the counters never rotate anymore.
Let's make use of the newly introduced global_now_ms variable instead,
which contains the global monotonic time expressed in milliseconds.
This patch needs to be backported wherever the patch above is backported.
It depends on previous commit "MINOR: time: also provide a global,
monotonic global_now_ms timer".
The period-based freq counters need the global date in milliseconds,
so better calculate it and expose it rather than letting all call
places incorrectly retrieve it.
Here what we do is that we maintain a new globally monotonic timer,
global_now_ms, which ought to be very close to the global_now one,
but maintains the monotonic approach of now_ms between all threads
in that global_now_ms is always ahead of any now_ms.
This patch is made simple to ease backporting (it will be needed for
a subsequent fix), but it also opens the way to some simplifications
on the time handling: instead of computing the local time and trying
to force it to the global one, we should soon be able to proceed in
the opposite way, that is computing the new global time an making the
local one just the latest snapshot of it. This will bring the benefit
of making sure that the global time is always ahead of the local one.
The pool_alloc_dirty() function was renamed to __pool_alloc() and now
takes a set of flags indicating whether poisonning is permitted or not
and whether zeroing the area is needed or not. The pool_alloc() function
is now just a wrapper calling __pool_alloc(pool, 0).
This one used to maintain a shortcut in the pools allocation path that
was only justified by b_alloc_fast() which was not used! Let's get rid
of it as well so that the allocator becomes a bit more straight forward.
It is never used anymore since 1.7 where it was used by b_alloc_margin()
then replaced by direct calls to the pools function, and it maintains a
dependency on the exposed pools functions. It's time to get rid of it,
as it's not even certain it still works.
The channel's buffer allocator, channel_alloc_buffer(), was still relying
on the principle of a margin for the request and not for the response.
But this margin stopped working around 1.7 with the introduction of the
content filters such as SPOE, and was completely anihilated with the
local pools that came with threads. Let's simplify this and just use
b_alloc().