There's no point in reporting a backend's up/down time if it has no
servers. The CSV output used to report "0" for a serverless backend
while the HTML version already removed the field. For servers, this
field is already omitted if checks are disabled. Let's uniformize
all of this and remove the field in CSV as well when irrelevant.
The HTML version doesn't report a check status when the server is in
maintenance since it can be quite old and irrelevant. The CSV forgot
to care about that, so let's do it here as well.
We don't want the HTML dump to rely on the server state. We
already have this piece of information in the status field by
checking that it starts with "DOWN".
It currently is really not convenient to have a state and a color detection
outside of the function and to use these ones inside. It makes it harder to
adjust the stats output based on the server state exactly. Let's move the
logic into the dump function itself.
Here we still have a huge amount of stuff to extract from the HTML code
and even from the caller. Indeed, the calling function computes the
server state and prepares a color code that will be used to determine
what style to use. The operations needed to decide what field to present
or not depend a lot on the server's state, admin state, health value,
rise and fall etc... all of which are not easily present in the table.
We also have to check the reference's values for all of the above.
There are also a number of differences between the CSV and HTML outputs :
- CSV always reports check duration, HTML only if not zero
- regarding last_change, CSV always report the server's while the HTML
considers either the server's or the reference based on the admin state.
- agent and health are separate in the CSV but mixed in the HTML.
- too few info on agent anyway.
After careful code inspection it happens that both sv->last_change and
ref->last_change are identical and can both derive from [LASTCHG].
Also, the following info are missing from the array to complete the HTML
code :
- cookie, address, status description, check-in-progress, srv->admin
At least for now it still works but a lot of info now need to be added.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
Some limits are only emitted if set, so the HTML part will have to
check for these being set.
A number of fields had to be built using printf-format strings, so
instead of allocating strings that risk not being freed, we use
chunk_newstr() and chunk_appendf().
Text strings are now copied verbatim in the stats fields so that only
the CSV dump encodes them as CSV. A single "out" chunk is used and cut
into multiple substrings using chunk_newstr() so that we don't need
distinct chunks for each stats field holding a string. The total amount
of data being emitted at once will never overflow the "out" chunk since
only a small part of it goes into the trash buffer which is the same size
and will thus overflow first.
One point of care is that failed_checks and down_trans were logged as
64-bit quantities on servers but they're 32-bit on proxies. This may
have to be changed later to unify them.
Some fields are still needed to complete the conversion :
- px->srv : used to take decisions when backend has no server (eg: print down or not)
- algo string (useful for CSV as well) // only if SHLGNDS
- cookie_name (useful for CSV as well) // only if SHLGNDS
- px->mode == HTTP (or px->mode as a string) // same for frontend
- px->be_counters.intercepted_req (stats and redirects ?)
The following field already has a place but was not presented in the
CSV output, so it should simply be added afterwards :
- px->be_counters.http.cum_req (was in HTML and missing from CSV)
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now. It's worth noting that there are some
ambiguities between connections and sessions, for example cum_conn
is dumped into cum_sess. Additionally, there is a naming ambiguity
in that the internal "d_time" (time where the beginning of data
appeared) is called "rtime" in the output (response time) and they
actually are indeed the same.
The conversion still requires some elements which are not present in the
current fields :
- the HTML status may emit "WAITING"/"OPEN"/"FULL" while the CSV format
doesn't propose "WAITING", so this last one will have to be added.
- the HTML output emits the listening adresses when the ST_SHLGNDS flag
is set but this address field doesn't exist in the CSV format
- it's interesting to note that when the ST_SHLGNDS flag is not set, the
HTML output doesn't provide the listener's ID while it's present in the
CSV output accessible from the same interface.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
It is worth mentionning that l->cum_conn is being dumped into a cum_sess
field and that once we introduce an official cum_conn field we may have
to dump the same value at both places to maintain compatibility with the
existing stats.
Now we avoid directly accessing the proxy and instead we pick the values
from the stats fields. This unveils that only a few fields are missing to
complete the job :
- know whether or not the checkbox column needs to be displayed. This
is not directly relevant to the stats but rather to the fact that the
HTML dump is also a control interface. This doesn't need a field, just
a function argument.
- px->mode == HTTP (or px->mode as a string)
- px->fe_counters.intercepted_req (stats and redirects ?)
- px->fe_counters.cum_conn
- px->fe_counters.cps_max
- px->fe_conn_per_sec
All the last ones make sense in the CSV, so they'll have to be added as well.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
The new function stats_dump_fields_csv() currenty walks over all CSV
fields and prints all non-empty ones as one line. Strings are csv-encoded
on the fly.
This is in preparation for a unifying of the stats output between the
multiple formats. The long-term goal will be that HTML stats are built
from the array used to produce the CSV output in order to ensure that
no information is missing in any format.
This function dumps non-empty fields, one per line with their name and
values, in the same format as is currently used by "show info". It relies
on previously added stats_emit_raw_data_field().
The table is completely filled with all relevant information. Only the
fields that should appear are presented. The description field is now
properly omitted if not set, instead of being reported as empty.
We're preparing for various data types for each stats field as they
appear in the CSV output. For now we only cover the regular types handled
by printf, so we have 32 and 64 bit ints and counters, strings, and of
course "empty" to indicate that there's nothing in the field and which
guarantees that any accessed entry will return 0.
More types will surely come later so that some fields are properly
represented. For example, we could see limits where only the value 0
doesn't show up, or human time, etc.
Technically speaking, many SSL sample fetch functions act on the
connection and depend on USE_L5CLI on the client side, which means
they're usable as soon as a handshake is completed on a connection.
This means that the test consisting in refusing to call them when
the stream is NULL will prevent them from working when we implement
the tcp-request session ruleset. Better fix this now. The fix consists
in using smp->sess->origin when they're called for the front connection,
and smp->strm->si[1].end when called for the back connection.
There is currently no known side effect for this issue, though it would
better be backported into 1.6 so that the code base remains consistend.
The buffer could not be null by definition since we moved the stream out
of the session. It's the stream which gained the ability to be NULL (hence
the recent fix for this case). The checks in place are useless and make one
think this situation can happen.
This is the continuation of previous patch called "BUG/MAJOR: samples:
check smp->strm before using it".
It happens that variables may have a session-wide scope, and that their
session is retrieved by dereferencing the stream. But nothing prevents them
from being used from a streamless context such as tcp-request connection,
thus crashing the process. Example :
tcp-request connection accept if { src,set-var(sess.foo) -m found }
In order to fix this, we have to always ensure that variable manipulation
only happens via the sample, which contains the correct owner and context,
and that we never use one from a different source. This results in quite a
large change since a lot of functions are inderctly involved in the call
chain, but the change is easy to follow.
This fix must be backported to 1.6, and requires the last two patches.
Some functions like sample_conv_var2smp(), var_get_byname(), and
var_set_byname() directly or indirectly need to access the current
stream and/or session and must find it in the sample itself and not
as a distinct argument. Thus we first need to call smp_set_owner()
prior to each such calls.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily. This requires from a lot of call places
to initialize 4 fields, and it was even forgotten at a few places.
This patch provides a convenient helper to initialize all these fields
at once, making it easy to prepare a new sample from a previous one for
example.
A few call places were cleaned up to make use of it. It will be needed
by further fixes.
At one place in the Lua code, it was moved earlier because we used to
call sample casts with a non completely initialized sample, which is
not clean eventhough at the moment there are no consequences.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily.
The problem is that earlier commit 87b0966 ("REORG/MAJOR: session:
rename the "session" entity to "stream"") had split the session and
stream resulting in the possibility for smp->strm to be NULL before
the stream was initialized. This is what happens in tcp-request
connection rulesets, as discovered by Baptiste.
The sample fetch functions must now check that smp->strm is valid
before using it. An alternative could consist in using a dummy stream
with nothing in it to avoid some checks but it would only result in
deferring them to the next step anyway, and making it harder to detect
that a stream is valid or the dummy one.
There is still an issue with variables which requires a complete
independant fix. They use strm->sess to find the session with strm
possibly NULL and passed as an argument. All call places indirectly
use smp->strm to build strm. So the problem is there but the API needs
to be changed to remove this duplicate argument that makes it much
harder to know what pointer to use.
This fix must be backported to 1.6, as well as the next one fixing
variables.
Commit baf9794 ("BUG/MINOR: tcpcheck: conf parsing error when no port
configured on server and first rule(s) is (are) COMMENT") was wrong, it
incorrectly implemented a list access by dereferencing a pointer of an
incorrect type resulting in checking the next element in the list. The
consequence is that it stops before the last comment instead of at the
last one and skips the first rule. In the end, rules starting with
comments are not affected, but if a sequence of checks directly starts
with connect, it is then skipped and this is visible when no port is
configured on the server line as the config refuses to load.
There was another occurence of the same bug a few lines below, both
of them were fixed. Tests were made on different configs and confirm
the new fix is OK.
This fix must be backported to 1.6.
This memory leak of about 100 bytes occurs only if there is an error
condidtion during evaluation of a "map" directive in the configuration
file. This evaluation only happens once on startup because haproxy
does not have a mechanism for re-loading the configuration file during
run-time. The startup will be aborted anyway due to error conditions
raised.
Nevertheless fix it to silence warnings of static code analysis tools
and be safe against future revisions of the code.
Found in haproxy 1.5.14.
Ignore samples that are neither SMP_T_IPV4 nor SMP_T_IPV6 instead of
matching with an uninitialized value in this case.
This situation should not occur in the current codebase but triggers
warnings in static code analysis tools.
Found in haproxy 1.5.
stats_map_lookup() sets bit SMP_F_CONST in the uninitialized member
flags of a stack-allocated sample, leaving the other bits
uninitialized. All code paths that can access the struct only ever
check for this specific flag, so there is no risk of unintended
behavior.
Nevertheless fix it as it triggers warnings in static code analysis
tools and might become a problem on future revisions of the code.
Problem found in version 1.5.
Owen Marshall reported an issue depending on the server keywords order in the
configuration.
Working line :
server dev1 <ip>:<port> check inter 5000 ssl verify none sni req.hdr(Host)
Non working line :
server dev1 <ip>:<port> check inter 5000 ssl sni req.hdr(Host) verify none
Indeed, both parse_server() and srv_parse_sni() modified the current argument
offset at the same time. To fix the issue, srv_parse_sni() can work on a local
copy ot the offset, leaving parse_server() responsible of the actual value.
This fix must be backported to 1.6.
If a SIGHUP/SIGUSR2 is sent immediately after a SIGTERM/SIGINT and
before wait() is notified, it will mask it since there's no queue,
only a copy of the last received signal. Let's add a special check
before overwriting the signal so that SIGTERM/SIGINT are not masked.
Some people report that sometimes there's a collection of old processes
after a restart of the systemd wrapper. It's not surprizing when reading
the code, the SIGTERM is propagated as a SIGINT which asks for a graceful
stop instead. If people ask for termination, we should terminate.
Commit c54bdd2 ("MINOR: Also accept SIGHUP/SIGTERM in systemd-wrapper")
added support for extra signals but did not adapt the debug message,
which continues to report incorrect signal types. Let's pass the signal
number to the do_restart() and do_shutdown() functions so that the output
matches the signal that was received.
There's a race condition in the systemd wrapper. People who restart it
too fast report old processes remaining there. Here if the signal is
caught before entering the "while" loop we block indefinitely on the
wait() call. Let's check the caught signal before calling wait(). It
doesn't completely close the race, a window still exists between the
test of the flag and the call to wait(), but it's much smaller. A
safer solution could involve pselect() to wait for the signal
delivery.
The recent addition of "show env" on the CLI has revealed an interesting
design bug. Chunks are supposed to support a negative length to indicate
that they carry no data. chunk_printf() sets this size to -1 if the string
is too large for the buffer. At a few places in the http engine we may end
up with trash.len = -1. But bi_putchk(), chunk_appendf() and a few other
chunks consumers don't consider this case as possible and will use such a
chunk, possibly restoring an invalid string or trying to copy -1 bytes.
This fix takes care of clarifying the situation in a backportable way
where such sizes are used, so that a negative length indicating an error
remains present until the chunk is reinitialized or overwritten. But a
cleaner design adjustment needs to be done so that there's a clear contract
on how to use these chunks. At first glance it doesn't seem *that* useful
to support negative sizes, so probably this is what should change.
This fix must be backported to 1.6 and 1.5.
the function server_parse_addr_change_request() contain an hardcoded
updater source "stats command". this function can be called from other
sources than the "stats command", so this patch make this argument
generic.
When the server address is changed, a message with unrequired '\n' or
'.' is displayed, like this:
[WARNING] 054/101137 (3229) : zzzz/s3 changed its IP from 127.0.0.1 to ::55 by stats command
.
This patch remove the '\n' which is sent before the '.'.
This patch must be backported in 1.6
With nbproc > 1, it is possible to specify on which process the stats socket
will be bound using "stats bind-process", but the behaviour was not correct,
ignoring the value in some configurations.
Example :
global
nbproc 4
stats bind-process 1
stats socket /var/run/haproxy.sock
With such a configuration, all the processes will listen on the stats socket.
As a workaround, it is also possible to declare a "process" keyword on
the "stats stocket" line.
The patch must be applied to 1.7, 1.6 and 1.5
A value is copied two time in teh stack, but only one is usefull. The
second copy leaves unused in the stack and take some room for noting.
This path removes the second copy.
Must be backported in 1.6
The commit 87b096 renames the functions srv_shutdown_backup_sessions()
and srv_shutdown_sessions() to srv_shutdown_backup_streams() and
srv_shutdown_streams().
The header file <proto/servers.h> does not repport these changes.
This bug should be repported in the 1.6 branch, even if it is useless
because new dev are frozen.
This patch must be backported in 1.6
hlua_yield() function returns the required sleep time. The Lua core must
be resume the execution after the required time. The core dedicated to
the http and tcp applet doesn't implement the wake up function. It is a
miss.
This patch fix this.
This patch introduces a configurable connection timeout for mailers
with a new "timeout mail <time>" directive.
Acked-by: Simon Horman <horms@verge.net.au>
This options prioritize th choice of an ip address matching a network. This is
useful with clouds to prefer a local ip. In some cases, a cloud high
avalailibility service can be announced with many ip addresses on many
differents datacenters. The latency between datacenter is not negligible, so
this patch permitsto prefers a local datacenter. If none address matchs the
configured network, another address is selected.
DNS selection preferences are actually declared inline in the
struct server. There are copied from the server struct to the
dns_resolution struct for each resolution.
Next patchs adds new preferences options, and it is not a good
way to copy all the configuration information before each dns
resolution.
This patch extract the configuration preference from the struct
server and declares a new dedicated struct. Only a pointer to this
new striuict will be copied before each dns resolution.