The idle conn task is is a global task used to cleanup backend
connections marked for deletion. Previously, it was only only allocated
if at least one server in the configuration has idle connections.
This assumption won't be valid anymore when new servers can be created
at runtime with idle connections. Always allocate the global idle conn
task.
Declaring a master CLI socket without activating the master-worker mode
is likely a user error, so we issue a warning.
This patch can be backported as far as 1.8.
If the configuration file contains a 'unix-bind prefix' directive, and
if we use the -S option and specify a UNIX socket path, the path of the
socket will be prepended with the value of the unix-bind prefix.
For instance, if we have 'unix-bind prefix /tmp/sockets/' and we use
'-S /tmp/master-socket' on the command line, we will get this error:
Starting proxy MASTER:
cannot bind UNIX socket (No such file or directory) [/tmp/sockets/tmp/master-socket]
So this patch adds an exception, and will ignore the unix-bind prefix
for the master CLI socket.
This patch can be backported as far as 1.9.
The freq counters were using the thread's own time as the start of the
current period. The problem is that in case of contention, it was
occasionally possible to perform non-monotonic updates on the edge of
the next second, because if the upfront thread updates a counter first,
it causes a rotation, then the second thread loses the race from its
older time, and tries again, and detects a different time again, but
in the past so it only updates the counter, then a third thread on the
new date would detect a change again, thus provoking a rotation again.
The effect was triple:
- rare loss of stored values during certain transitions from one
period to the next one, causing counters to report 0
- half of the threads forced to go through the slow path every second
- difficult convergence when using many threads where the CAS can fail
a lot and we can observe N(N-1) attempts for N threads to complete
This patch fixes this issue in two ways:
- first, it now makes use og the monotonic global_now value which also
happens to be volatile and to carry the latest known time; this way
time will never jump backwards anymore and only the first thread
updates it on transition, the other ones do not need to.
- second, re-read the time in the loop after each failure, because
if the date changed in the counter, it means that one thread knows
a more recent one and we need to update. In this case if it matches
the new current second, the fast path is usable.
This patch relies on previous patch "MINOR: time: export the global_now
variable" and must be backported as far as 1.8.
This is the process-wide monotonic time that is used to update each
thread's own time. It may be required at a few places where a strictly
monotonic clock is required such as freq_ctr. It will be have to be
backported as a dependency of a forthcoming fix.
DNS hostname comparisons were fixed to be case-insensitive (see b17b88487
"BUG/MEDIUM: dns: Consider the fact that dns answers are
case-insensitive"). However 2 comparisons are still case-sensitive.
This patch must be backported as far as 1.8.
Some are not always easy to spot with "chk" vs "check" or hyphens at
some places and not at others. Now entering "option http-close" properly
suggests "httpclose" and "option tcp-chk" suggests "tcp-check". There's
no need to consider the proxy's capabilities, what matters is to figure
what related word the user tried to spell, and there are not that many
options anyway.
Now the suggested keywords are sorted with the most relevant ones first
instead of scanning them all in registration order and only dumping the
proposed ones:
- "tra"
trace <module> [cmd [args...]] : manage live tracing
operator : lower the level of the current CLI session to operator
user : lower the level of the current CLI session to user
show trace [<module>] : show live tracing state
- "pool"
show pools : report information about the memory pools usage
add acl : add acl entry
del map : delete map entry
user : lower the level of the current CLI session to user
del acl : delete acl entry
- "sh ta"
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
show tasks : show running tasks
set table [id] : update or create a table entry's data
show table [id]: report table usage stats or dump this table's contents
trace <module> [cmd [args...]] : manage live tracing
- "sh state"
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
set table [id] : update or create a table entry's data
show table [id]: report table usage stats or dump this table's contents
show servers state [id]: dump volatile server information (for backend <id>)
show sess [id] : report the list of current sessions or dump this session
Till now the fuzzy matching would only work on the same number of words,
but this doesn't account for commands like "show servers conn" which
involve 3 words and were not proposed when entering only "show conn".
Let's improve the situation by building the two fingerprints separately
for the correct keyword sequence and the entered one, then compare them.
This can result in slightly larger variations due to the different string
lengths but is easily compensated for. Thanks to this, we can now see
"show servers conn" when entering "show conn", and the following choices
are relevant to correct typos:
- "show foo"
show sess [id] : report the list of current sessions or dump this session
show info : report information about the running process [desc|json|typed]*
show env [var] : dump environment variables known to the process
show fd [num] : dump list of file descriptors in use
show pools : report information about the memory pools usage
- "show stuff"
show sess [id] : report the list of current sessions or dump this session
show info : report information about the running process [desc|json|typed]*
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
show fd [num] : dump list of file descriptors in use
show tasks : show running tasks
- "show stafe"
show sess [id] : report the list of current sessions or dump this session
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
show fd [num] : dump list of file descriptors in use
show table [id]: report table usage stats or dump this table's contents
show tasks : show running tasks
- "show state"
show stat : report counters for each proxy and server [desc|json|no-maint|typed|up]*
show servers state [id]: dump volatile server information (for backend <id>)
It's still visible that the shorter ones continue to easily match, such
as "show sess" not having much in common with "show foo" but what matters
is that the best candidates are definitely relevant. Probably that listing
them in match order would further help.
While sums of squares usually give excellent results in fixed-sise
patterns, they don't work well to compare different sized ones such
as when some sub-words are missing, because a word such as "server"
contains "er" twice, which will rsult in an extra distance of at
least 4 for just this e->r transition compared to another one missing
it. This is one of the main reasons why "show conn" only proposes
"show info" on the CLI. Maybe an improved approach consisting in
using squares only for exact same lengths would work, but it would
still make it difficult to spot reversed characters.
The distance between two words can be high due to a sub-word being missing
and in this case it happens that other totally unrealted words are proposed
because their average score looks lower thanks to being shorter. Here we're
introducing the notion of presence of each character so that word sequences
that contain existing sub-words are favored against the shorter ones having
nothing in common. In addition we do not distinguish being/end from a
regular delimitor anymore. That made it harder to spot inverted words.
In commit a0e8eb8ca ("MINOR: cfgparse: suggest correct spelling for
unknown words in global section") we got the ability to locate a better
matching word in case of error. But it mistakenly used the CFG_LISTEN
class of words instead of CFG_GLOBAL, resulting in proposing unsuitable
matches in addition to the long hard-coded list. Now, "tune.dh-param"
correctly proposes "tune.ssl.default-dh-param".
No backport is needed.
I somehow managed to re-break the "help" command in b736458bf ("MEDIUM:
cli: apply spelling fixes for known commands before listing them")
after fixing it once. A null-deref happens when checking the args
early in the processing.
No backport is needed as this was introduced in 2.4-dev12.
Released version 2.4-dev12 with the following main changes :
- CLEANUP: connection: Use `VAR_ARRAY` in `struct tlv` definition
- CLEANUP: connection: Remove useless test for NULL before calling `pool_free()`
- CLEANUP: connection: Use istptr / istlen for proxy_unique_id
- MINOR: connection: Use a `struct ist` to store proxy_authority
- CLEANUP: connection: Consistently use `struct ist` to process all TLV types
- BUILD: task: fix build at -O0 with threads disabled
- BUILD: bug: refine HA_LINK_ERROR() to only be used on gcc and derivatives
- CLEANUP: config: make the cfg_keyword parsers take a const for the defproxy
- BUILD: connection: do not use VAR_ARRAY in struct tlv
- BUG/MEDIUM: session: NULL dereference possible when accessing the listener
- MINOR: build: force CC to set a return code when probing options
- CLEANUP: stream: rename a few remaining occurrences of "stream *sess"
- BUG/MEDIUM: resolvers: handle huge responses over tcp servers.
- CLEANUP: config: also address the cfg_keyword API change in the compression code
- BUG/MEDIUM: ssl: properly remove the TASK_HEAVY flag at end of handshake
- BUG/MINOR: sample: Rename SenderComID/TargetComID to SenderCompID/TargetCompID
- MINOR: task: give the scheduler a bit more flexibility in the runqueue size
- OPTIM: task: automatically adjust the default runqueue-depth to the threads
- BUG/MINOR: connection: Missing QUIC initialization
- BUG/MEDIUM: stick-tables: fix ref counter in table entry using multiple http tracksc.
- BUILD: atomic/arm64: force the register pairs to use in __ha_cas_dw()
- BUG/MEDIUM: filters: Set CF_FL_ANALYZE on channels when filters are attached
- BUG/MINOR: tcpcheck: Update .health threshold of agent inside an agent-check
- BUG/MINOR: proxy/session: Be sure to have a listener to increment its counters
- BUG/MINOR: tcpcheck: Fix double free on error path when parsing tcp/http-check
- BUG/MINOR: server-state: properly handle the case where the base is not set
- BUG/MINOR: server-state: use the argument, not the global state
- CLEANUP: tcp-rules: add missing actions in the tcp-request error message
- CLEANUP: vars: make the error message clearer on missing arguments for set-var
- CLEANUP: http-rules: remove the unexpected comma before the list of action keywords
- CLEANUP: actions: the keyword must always be const from the rule
- MINOR: tools: add simple word fingerprinting to find similar-looking words
- MINOR: cfgparse: add cfg_find_best_match() to suggest an existing word
- MINOR: cfgparse: suggest correct spelling for unknown words in proxy sections
- MINOR: cfgparse: suggest correct spelling for unknown words in global section
- MINOR: cfgparse/server: try to fix spelling mistakes on server lines
- MINOR: cfgparse/bind: suggest correct spelling for unknown bind keywords
- MINOR: actions: add a function to suggest an action ressembling a given word
- MINOR: http-rules: suggest approaching action names on mismatch
- MINOR: tcp-rules: suggest approaching action names on mismatch
- BUG/MINOR: cfgparse/server: increment the extra keyword counter one at a time
- Revert "BUG/MINOR: resolvers: Only renew TTL for SRV records with an additional record"
- BUG/MINOR: resolvers: Consider server to have no IP on DNS resolution error
- BUG/MINOR: resolvers: Reset server address on DNS error only on status change
- BUG/MINOR: resolvers: Unlink DNS resolution to set RMAINT on SRV resolution
- BUG/MEDIUM: resolvers: Don't set an address-less server as UP
- BUG/MEDIUM: resolvers: Fix the loop looking for an existing ADD item
- MINOR: resolvers: new function find_srvrq_answer_record()
- BUG/MINOR; resolvers: Ignore DNS resolution for expired SRV item
- BUG/MEDIUM: resolvers: Trigger a DNS resolution if an ADD item is obsolete
- MINOR: resolvers: Use a function to remove answers attached to a resolution
- MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
- MINOR: resolvers: Add function to change the srv status based on SRV resolution
- MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
- BUG/MEDIUM: resolvers: Don't release resolution from a requester callbacks
- BUG/MEDIUM: resolvers: Skip DNS resolution at startup if SRV resolution is set
- MINOR: resolvers: Use milliseconds for cached items in resolver responses
- MINOR: resolvers: Don't try to match immediatly renewed ADD items
- CLEANUP: resolvers: Use ha_free() in srvrq_resolution_error_cb()
- CLEANUP: resolvers: Perform unsafe loop on requester list when possible
- BUG/MINOR: cli: make sure "help", "prompt", "quit" are enabled at master level
- CLEANUP: cli: fix misleading comment and better indent the access level flags
- MINOR: cli: set the ACCESS_MASTER* bits on the master bind_conf
- MINOR: cli: test the appctx level for master access instead of comparing pointers
- MINOR: cli: print the error message in the parser function itself
- MINOR: cli: filter the list of commands to the matching part
- MEDIUM: cli: apply spelling fixes for known commands before listing them
- MINOR: tools: add the ability to update a word fingerprint
- MINOR: cli: apply the fuzzy matching on the whole command instead of words
- CLEANUP: cli: rename MAX_STATS_ARGS to MAX_CLI_ARGS
- CLEANUP: cli: rename the last few "stats_" to "cli_"
- CLEANUP: task: make sure tasklet handlers always indicate their statuses
- CLEANUP: assorted typo fixes in the code and comments
When tasklets were derived from tasks, there was no immediate need for
the scheduler to know their status after execution, and in a spirit of
simplicity they just started to always return NULL. The problem is that
it simply prevents the scheduler from 1) accounting their execution time,
and 2) keeping track of their current execution status. Indeed, a remote
wake-up could very well end up manipulating a tasklet that's currently
being executed. And this is the reason why those handlers have to take
the idle lock before checking their context.
In 2.5 we'll take care of making tasklets and tasks work more similarly,
but trouble is to be expected if we continue to propagate the trend of
returning NULL everywhere, especially if some fixes relying on a stricter
model later need to be backported. For this reason this patch updates all
known tasklet handlers to make them return NULL only when the tasklet was
freed. It has no effect for now and isn't even guaranteed to always be
100% safe but it puts the code into the right direction for this.
There were still a very small list of functions, variables and fields
called "stats_" while they were really purely CLI-centric. There's the
frontend called "stats_fe" in the global section, which instantiates a
"cli_applet" called "<CLI>" so it was renamed "cli_fe".
The "alloc_stats_fe" function cas renamed to "cli_alloc_fe" which also
better matches the naming convention of all cli-specific functions.
Finally the "stats_permission_denied_msg" used to return an error on
the CLI was renamed "cli_permission_denied_msg".
Now there's no more "stats_something" that designates the CLI.
This is the number of args accepted on a command received on the CLI,
is has long been totally independent of stats and should not carry
this misleading "stats" name anymore.
Now instead of comparing words at an exact position, we build a fingerprint
made of all of them, so that we can check for them in any position. For
example, "show conn serv" finds "show servers conn" and that "set servers
maxconn" proposes both "set server" and "set maxconn servers".
Instead of making a new one from scratch, let's support not wiping the
existing fingerprint and updating it, and to do the same char by char.
The word-by-word one will still result in multiple beginnings and ends,
but that will accurately translate word boundaries. The char-based one
has more flexibility and requires that the caller maintains the previous
char to indicate the transition, which also allows to insert delimiters
for example.
Entering "show tls" would still emit 35 entries. By measuring the distance
between all unknown words and the candidates, we can sort them and pick the
10 most likely candidates. This works reasonably well, as now "show tls"
only proposes "show tls-keys", "show threads", "show pools" and "show tasks".
If the distance is still too high or if a word is missing, the whole
prefix list continues to be dumped, thus "show" alone will still report
the entire list of commands beginning with "show".
It's still impossible to skip a word, for example "show conn" will not
propose "show servers conn" because the distance is calculated for each
word individually. Some changes to the distance calculation to support
updating an existing map could easily address this. But this is already
a great improvement.
The error message on the CLI has become unreadable due to the long list
and it's not even sorted, making it even harder to figure the right
command.
This patch starts by looking if some of the words match something known,
and if so, will limit the listing only to those commands that start like
the current one. The "help", "prompt" and "quit" commands are always
shown to help the user try something else. Now thanks to this, typing
"add" or "del" will only list "add acl", "add map" and not 50 lines
anymore.
As a small bonus, we won't print "Unknown command" anymore in response
to the "help" command.
By doing so we can report more accurate information about what's wrong.
As a first step, we already distinguish the case of expert-only commands
from other ones.
Now that the appctx contains the master level, it greatly simplifies
all the tests, as we can simply verify that keyword levels match the
effective level without having to cheat with applet pointers. This
also allows to fold the expert test in them.
Right now the code is a bit hackish, it tests for the keyword's level
flags but checks the applet's origin to compare the bits. Let's start
by properly setting the ACCESS_MASTER_ONLY and ACCESS_MASTER flags on
the master CLI's bind_conf so that they are automatically present
all the time.
It was mentioned that ACCESS_MASTER_ONLY as for workers only instead of
master-only. And it wasn't clear that all ACCESS_* would belong to the
same thing.
These 3 commands are functionally valid both in master and worker CLIs.
However, while they do have a valid handler, they are not permitted by
the code and work partially by chance in the master:
- "prompt" and "quit" are intercepted by the request analyser
- "help" triggers an error, which results in displaying the error
message
Let's make sure they are permitted so that we don't count errors there and
that we can report appropriate help.
This bug has always been there but it doesn't have any functional effect
at the moment since "help" can only show the error message. As such, there
is no need to backport it.
The loop looking for existing ADD items to renew their last_seen must ignore
the items already renewed in the same loop. To do so, we rely on the
last_seen time. because it is now based on now_ms, it is safe.
Doing so avoid to match several time the same ADD item when the same IP
address is found in several ADD item. This reduces the number of extra DNS
resolutions.
This patch depends on "MINOR: resolvers: Use milliseconds for cached items
in resolver responses". Both may be backported as far as 2.2 if necessary.
The last time when an item was seen in a resolver responses is now stored in
milliseconds instead of seconds. This avoid some corner-cases at the
edges. This also simplifies time comparisons.
At startup, if a SRV resolution is set for a server, no DNS resolution is
created. We must wait the first SRV resolution to know if it must be
triggered. It is important to do so for two reasons.
First, during a "classical" startup, a server based on a SRV resolution has
no hostname. Thus the created DNS resolution is useless. Best waiting the
first SRV resolution. It is not really a bug at this stage, it is just
useless.
Second, in the same situation, if the server state is loaded from a file,
its hosname will be set a bit later. Thus, if there is no additionnal record
for this server, because there is already a DNS resolution, it inhibits any
new DNS resolution. But there is no hostname attached to the existing DNS
resolution. So no resolution is performed at all for this server.
To avoid any problem, it is fairly easier to handle this special case during
startup. But this means we must be prepared to have no "resolv_requester"
field for a server at runtime.
This patch must be backported as far as 2.2.
Another way to say it: "Safely unlink requester from a requester callbacks".
Requester callbacks must never try to unlink a requester from a resolution, for
the current requester or another one. First, these callback functions are called
in a loop on a request list, not necessarily safe. Thus unlink resolution at
this place, may be unsafe. And it is useless to try to make these loops safe
because, all this stuff is placed in a loop on a resolution list. Unlink a
requester may lead to release a resolution if it is the last requester.
However, the unkink is necessary because we cannot reset the server state
(hostname and IP) with some pending DNS resolution on it. So, to workaround
this issue, we introduce the "safe" unlink. It is only performed from a
requester callback. In this case, the unlink function never releases the
resolution, it only reset it if necessary. And when a resolution is found
with an empty requester list, it is released.
This patch depends on the following commits :
* MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
* MINOR: resolvers: Use a function to remove answers attached to a resolution
* MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
* MINOR: resolvers: Add function to change the srv status based on SRV resolution
All the series must be backported as far as 2.2. It fixes a regression
introduced by the commit b4badf720 ("BUG/MINOR: resolvers: new callback to
properly handle SRV record errors").
don't release resolution from requester cb
When the server status must be updated from the result of a SRV resolution,
we can directly call srvrq_update_srv_state(). It is simpler and this avoid
a test on the server DNS resolution.
This patch is mandatory for the next commit. It also rely on "MINOR:
resolvers: Directly call srvrq_update_srv_state() when possible".
srvrq_update_srv_status() update the server status based on result of SRV
resolution. For now, it is only used from snr_update_srv_status() when
appropriate.
When a SRV request trigger an error, if we decide to handle the error
because last_valid duration is expired, the answer list may be purged. All
items are considered as obsolete.
resolv_purge_resolution_answer_records() must be used to removed all answers
attached to a resolution. For now, it is only used when a resolution is
released.
When a ADD item attached to a SRV item is removed because it is obsolete, we
must trigger a DNS resolution to be sure the hostname still resolves or
not. There is no other way to be the entry is still valid. And we cannot set
the server in RMAINT immediatly, because a DNS server may be inconsitent and
may stop to add some additionnal records.
The opposite is also true. If a valid ADD item is still attached to a SRV
item, any DNS resolution must be stopped. There is no reason to perform
extra resolution in this case.
This patch must be backported as far as 2.2.
If no ADD item is found for a SRV item in a SRV response, a DNS resolution
is triggered. When it succeeds, we must be sure the SRV item is still
alive. Otherwise the DNS resolution must be ignored.
This patch depends on the commit "MINOR: resolvers: Move last_seen time of
an ADD into its corresponding SRV item". Both must be backported as far as
2.2.
This function search for a SRV answer item associated to a requester
whose type is server.
This is mainly useful to "link" a server to its SRV record when no
additional record were found to configure the IP address.
This patch is required by a bug fix.
For each ADD item found in a SRV response, we try to find a corresponding
ADD item already attached to an existing SRV item. If found, the ADD
last_seen time is updated, otherwise we try to find a SRV item with no ADD
to attached the new one.
However, the loop is buggy. Instead of comparing 2 ADD items, it compares
the new ADD item with the SRV item. Because of this bug, we are unable to
renew last_seen time of existing ADD.
This patch must be backported as far as 2.2.
when a server status is updated based on a SRV item, it is always set to UP,
regardless it has an IP address defined or not. For instance, if only a SRV
item is received, with no additional record, only the server hostname is
defined. We must wait to have an IP address to set the server as UP.
This patch must be backported as far as 2.2.
When a server is set in RMAINT becaues of a SRV resolution failure, the
server DNS resolution, if any, must be unlink first. It is mandatory to
handle the change in the context of a SRV resolution.
This patch must be backported as far as 2.2.
When a DNS resolution error is detected, in snr_resolution_error_cb(), the
server address must be reset only if the server status has changed. It this
case, it means the server is set to RMAINT. Thus the server address may by
reset.
This patch fixes a bug introduced by commit d127ffa9f ("BUG/MEDIUM:
resolvers: Reset address for unresolved servers"). It must be backported as
far as 2.0.
When an error is received for a DNS resolution, for instance a NXDOMAIN
error, the server must be considered to have no address when its status is
updated, not the opposite.
Concretly, because this parameter is not used on error path in
snr_update_srv_status(), there is no impact.
This patch must be backported as far as 1.8.
This reverts commit a331a1e8eb.
This commit fixes a real bug, but it also reveals some hidden bugs, mostly
because of some design issues. Thus, in itself, it create more problem than
it solves. So revert it for now. All known bugs will be addressed in next
commits.
This patch should be backported as far as 2.2.