This new function stats_dump_fields_html() checks the type of the object
being dumped from the stats table, and emits it in HTML format. It uses
an argument indicating if the HTML page is also used as an admin page,
and for now still takes the proxy in argument as a few entries still
need it.
The code was simply moved as-is to the new function. There's no
functional change.
HTML output used to have it but not the CSV output. It indicates that the
listener is not full but was forced to wait because the max connection
rate was reached.
The color code requires a complex logic, and we use it only in the
HTML part. So let's compute it there based on the server state, its
health and its weight. The thing is tricky but OK. There's a 1-to-1
mapping of down servers, but not of up servers, hence the need for
the weight and health.
The server's cookie value is now reported in the "cookie" column and
used as-is from the HTML dump. It was the last reference to the sv
pointer from this place.
The same was done for the backend's dump.
This new field "addr" presents the server's address:port if the client
is either enabled via "stats show legends" in case of HTTP dumps, or
has at least level operator on the CLI. The address formats might be :
- ipv4:port
- [ipv6]:port
- unix
- (error message)
The HTML dump over HTTP request may have several flags including
ST_SHLGNDS (to show legends), ST_SHNODE (to show node name),
ST_SHDESC (to show some descriptions).
There's no such thing over the CLI so we need to have an equivalent.
Let's compute the flags earlier so that we can make use of these flags
regardless of the call point.
Now instead of recomputing the state based on the health, rise etc,
we reuse the same state as in the CSV file, and optionally complete
it with a down or an up arrow if a change is occurring. We could
have parsed the strings to detect a '/' indicating a state change,
but it was easier to check the health against rise and fall.
This adds the following fields :
- check_rise [...S]: server's "rise" parameter used by checks
- check_fall [...S]: server's "fall" parameter used by checks
- check_health [...S]: server's health check value between 0 and rise+fall-1
- agent_rise [...S]: agent's "rise" parameter, normally 1
- agent_fall [...S]: agent's "fall" parameter, normally 1
- agent_health [...S]: agent's health parameter, between 0 and rise+fall-1
Added these two new fields to the CSV output :
- check_desc : short human-readable description of check_status
- agent_desc : short human-readable description of agent_status
Also factor two tests for enabled checks.
The agent check status is now reported :
- agent_status : status of last agent check
- agent_code : numeric code reported by agent if any (unused for now)
- agent_duration : time in ms taken to finish last check
There's no point in reporting a backend's up/down time if it has no
servers. The CSV output used to report "0" for a serverless backend
while the HTML version already removed the field. For servers, this
field is already omitted if checks are disabled. Let's uniformize
all of this and remove the field in CSV as well when irrelevant.
The HTML version doesn't report a check status when the server is in
maintenance since it can be quite old and irrelevant. The CSV forgot
to care about that, so let's do it here as well.
We don't want the HTML dump to rely on the server state. We
already have this piece of information in the status field by
checking that it starts with "DOWN".
It currently is really not convenient to have a state and a color detection
outside of the function and to use these ones inside. It makes it harder to
adjust the stats output based on the server state exactly. Let's move the
logic into the dump function itself.
Here we still have a huge amount of stuff to extract from the HTML code
and even from the caller. Indeed, the calling function computes the
server state and prepares a color code that will be used to determine
what style to use. The operations needed to decide what field to present
or not depend a lot on the server's state, admin state, health value,
rise and fall etc... all of which are not easily present in the table.
We also have to check the reference's values for all of the above.
There are also a number of differences between the CSV and HTML outputs :
- CSV always reports check duration, HTML only if not zero
- regarding last_change, CSV always report the server's while the HTML
considers either the server's or the reference based on the admin state.
- agent and health are separate in the CSV but mixed in the HTML.
- too few info on agent anyway.
After careful code inspection it happens that both sv->last_change and
ref->last_change are identical and can both derive from [LASTCHG].
Also, the following info are missing from the array to complete the HTML
code :
- cookie, address, status description, check-in-progress, srv->admin
At least for now it still works but a lot of info now need to be added.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
Some limits are only emitted if set, so the HTML part will have to
check for these being set.
A number of fields had to be built using printf-format strings, so
instead of allocating strings that risk not being freed, we use
chunk_newstr() and chunk_appendf().
Text strings are now copied verbatim in the stats fields so that only
the CSV dump encodes them as CSV. A single "out" chunk is used and cut
into multiple substrings using chunk_newstr() so that we don't need
distinct chunks for each stats field holding a string. The total amount
of data being emitted at once will never overflow the "out" chunk since
only a small part of it goes into the trash buffer which is the same size
and will thus overflow first.
One point of care is that failed_checks and down_trans were logged as
64-bit quantities on servers but they're 32-bit on proxies. This may
have to be changed later to unify them.
Some fields are still needed to complete the conversion :
- px->srv : used to take decisions when backend has no server (eg: print down or not)
- algo string (useful for CSV as well) // only if SHLGNDS
- cookie_name (useful for CSV as well) // only if SHLGNDS
- px->mode == HTTP (or px->mode as a string) // same for frontend
- px->be_counters.intercepted_req (stats and redirects ?)
The following field already has a place but was not presented in the
CSV output, so it should simply be added afterwards :
- px->be_counters.http.cum_req (was in HTML and missing from CSV)
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now. It's worth noting that there are some
ambiguities between connections and sessions, for example cum_conn
is dumped into cum_sess. Additionally, there is a naming ambiguity
in that the internal "d_time" (time where the beginning of data
appeared) is called "rtime" in the output (response time) and they
actually are indeed the same.
The conversion still requires some elements which are not present in the
current fields :
- the HTML status may emit "WAITING"/"OPEN"/"FULL" while the CSV format
doesn't propose "WAITING", so this last one will have to be added.
- the HTML output emits the listening adresses when the ST_SHLGNDS flag
is set but this address field doesn't exist in the CSV format
- it's interesting to note that when the ST_SHLGNDS flag is not set, the
HTML output doesn't provide the listener's ID while it's present in the
CSV output accessible from the same interface.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
It is worth mentionning that l->cum_conn is being dumped into a cum_sess
field and that once we introduce an official cum_conn field we may have
to dump the same value at both places to maintain compatibility with the
existing stats.
Now we avoid directly accessing the proxy and instead we pick the values
from the stats fields. This unveils that only a few fields are missing to
complete the job :
- know whether or not the checkbox column needs to be displayed. This
is not directly relevant to the stats but rather to the fact that the
HTML dump is also a control interface. This doesn't need a field, just
a function argument.
- px->mode == HTTP (or px->mode as a string)
- px->fe_counters.intercepted_req (stats and redirects ?)
- px->fe_counters.cum_conn
- px->fe_counters.cps_max
- px->fe_conn_per_sec
All the last ones make sense in the CSV, so they'll have to be added as well.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
The new function stats_dump_fields_csv() currenty walks over all CSV
fields and prints all non-empty ones as one line. Strings are csv-encoded
on the fly.
This is in preparation for a unifying of the stats output between the
multiple formats. The long-term goal will be that HTML stats are built
from the array used to produce the CSV output in order to ensure that
no information is missing in any format.
This function dumps non-empty fields, one per line with their name and
values, in the same format as is currently used by "show info". It relies
on previously added stats_emit_raw_data_field().
The table is completely filled with all relevant information. Only the
fields that should appear are presented. The description field is now
properly omitted if not set, instead of being reported as empty.
We're preparing for various data types for each stats field as they
appear in the CSV output. For now we only cover the regular types handled
by printf, so we have 32 and 64 bit ints and counters, strings, and of
course "empty" to indicate that there's nothing in the field and which
guarantees that any accessed entry will return 0.
More types will surely come later so that some fields are properly
represented. For example, we could see limits where only the value 0
doesn't show up, or human time, etc.
Technically speaking, many SSL sample fetch functions act on the
connection and depend on USE_L5CLI on the client side, which means
they're usable as soon as a handshake is completed on a connection.
This means that the test consisting in refusing to call them when
the stream is NULL will prevent them from working when we implement
the tcp-request session ruleset. Better fix this now. The fix consists
in using smp->sess->origin when they're called for the front connection,
and smp->strm->si[1].end when called for the back connection.
There is currently no known side effect for this issue, though it would
better be backported into 1.6 so that the code base remains consistend.
The buffer could not be null by definition since we moved the stream out
of the session. It's the stream which gained the ability to be NULL (hence
the recent fix for this case). The checks in place are useless and make one
think this situation can happen.
This is the continuation of previous patch called "BUG/MAJOR: samples:
check smp->strm before using it".
It happens that variables may have a session-wide scope, and that their
session is retrieved by dereferencing the stream. But nothing prevents them
from being used from a streamless context such as tcp-request connection,
thus crashing the process. Example :
tcp-request connection accept if { src,set-var(sess.foo) -m found }
In order to fix this, we have to always ensure that variable manipulation
only happens via the sample, which contains the correct owner and context,
and that we never use one from a different source. This results in quite a
large change since a lot of functions are inderctly involved in the call
chain, but the change is easy to follow.
This fix must be backported to 1.6, and requires the last two patches.
Some functions like sample_conv_var2smp(), var_get_byname(), and
var_set_byname() directly or indirectly need to access the current
stream and/or session and must find it in the sample itself and not
as a distinct argument. Thus we first need to call smp_set_owner()
prior to each such calls.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily. This requires from a lot of call places
to initialize 4 fields, and it was even forgotten at a few places.
This patch provides a convenient helper to initialize all these fields
at once, making it easy to prepare a new sample from a previous one for
example.
A few call places were cleaned up to make use of it. It will be needed
by further fixes.
At one place in the Lua code, it was moved earlier because we used to
call sample casts with a non completely initialized sample, which is
not clean eventhough at the moment there are no consequences.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily.
The problem is that earlier commit 87b0966 ("REORG/MAJOR: session:
rename the "session" entity to "stream"") had split the session and
stream resulting in the possibility for smp->strm to be NULL before
the stream was initialized. This is what happens in tcp-request
connection rulesets, as discovered by Baptiste.
The sample fetch functions must now check that smp->strm is valid
before using it. An alternative could consist in using a dummy stream
with nothing in it to avoid some checks but it would only result in
deferring them to the next step anyway, and making it harder to detect
that a stream is valid or the dummy one.
There is still an issue with variables which requires a complete
independant fix. They use strm->sess to find the session with strm
possibly NULL and passed as an argument. All call places indirectly
use smp->strm to build strm. So the problem is there but the API needs
to be changed to remove this duplicate argument that makes it much
harder to know what pointer to use.
This fix must be backported to 1.6, as well as the next one fixing
variables.
Commit baf9794 ("BUG/MINOR: tcpcheck: conf parsing error when no port
configured on server and first rule(s) is (are) COMMENT") was wrong, it
incorrectly implemented a list access by dereferencing a pointer of an
incorrect type resulting in checking the next element in the list. The
consequence is that it stops before the last comment instead of at the
last one and skips the first rule. In the end, rules starting with
comments are not affected, but if a sequence of checks directly starts
with connect, it is then skipped and this is visible when no port is
configured on the server line as the config refuses to load.
There was another occurence of the same bug a few lines below, both
of them were fixed. Tests were made on different configs and confirm
the new fix is OK.
This fix must be backported to 1.6.
This memory leak of about 100 bytes occurs only if there is an error
condidtion during evaluation of a "map" directive in the configuration
file. This evaluation only happens once on startup because haproxy
does not have a mechanism for re-loading the configuration file during
run-time. The startup will be aborted anyway due to error conditions
raised.
Nevertheless fix it to silence warnings of static code analysis tools
and be safe against future revisions of the code.
Found in haproxy 1.5.14.
Ignore samples that are neither SMP_T_IPV4 nor SMP_T_IPV6 instead of
matching with an uninitialized value in this case.
This situation should not occur in the current codebase but triggers
warnings in static code analysis tools.
Found in haproxy 1.5.
stats_map_lookup() sets bit SMP_F_CONST in the uninitialized member
flags of a stack-allocated sample, leaving the other bits
uninitialized. All code paths that can access the struct only ever
check for this specific flag, so there is no risk of unintended
behavior.
Nevertheless fix it as it triggers warnings in static code analysis
tools and might become a problem on future revisions of the code.
Problem found in version 1.5.
Owen Marshall reported an issue depending on the server keywords order in the
configuration.
Working line :
server dev1 <ip>:<port> check inter 5000 ssl verify none sni req.hdr(Host)
Non working line :
server dev1 <ip>:<port> check inter 5000 ssl sni req.hdr(Host) verify none
Indeed, both parse_server() and srv_parse_sni() modified the current argument
offset at the same time. To fix the issue, srv_parse_sni() can work on a local
copy ot the offset, leaving parse_server() responsible of the actual value.
This fix must be backported to 1.6.
If a SIGHUP/SIGUSR2 is sent immediately after a SIGTERM/SIGINT and
before wait() is notified, it will mask it since there's no queue,
only a copy of the last received signal. Let's add a special check
before overwriting the signal so that SIGTERM/SIGINT are not masked.
Some people report that sometimes there's a collection of old processes
after a restart of the systemd wrapper. It's not surprizing when reading
the code, the SIGTERM is propagated as a SIGINT which asks for a graceful
stop instead. If people ask for termination, we should terminate.