With the CI occasionally slowing down, we're starting to see again some
spurious failures despite the long 1-second timeouts. This reports false
positives that are disturbing and doesn't provide as much value as this
could. However at this delay it already becomes a pain for developers
to wait for the tests to complete.
This commit adds support for the new environment variable
HAPROXY_TEST_TIMEOUT that will allow anyone to modify the connect,
client and server timeouts. It was set to 5 seconds by default, which
should be plenty for quite some time in the CI. All relevant values
that were 200ms or above were replaced by this one. A few larger
values were left as they are special. One test for the set-timeout
action that used to rely on a fixed 1-sec value was extended to a
fixed 5-sec, as the timeout is normally not reached, but it needs
to be known to compare the old and new values.
Some tests expect a 503, typically those that check that wrong CA/CRL
will not be accepted between a server and a frontend. But such tests
tend to last very long simply because of the 1-second turn-around on
connection retries that happens during the failure. Let's properly set
the retries count to zero for these ones. One test purposely wants to
exhaust the retries so the retries was set to 1 instead.
The feature cmd was incorrectly set to:
feature cmd "$HAPROXY_PROGRAM -cc 'feature(OPENSSL)' && 'openssl_version_atleast(1.1.1)'"
Which was incorrect since the quotes must surrendered the -cc argument.
Also the test requires openssl and does not work with libressl.
During a troublehooting it came obvious that the SNI always ought to
be logged on httpslog, as it explains errors caused by selection of
the default certificate (or failure to do so in case of strict-sni).
This expectation was also confirmed on the mailing list.
Since the field may be empty it appeared important not to leave an
empty string in the current format, so it was decided to place the
field before a '/' preceding the SSL version and ciphers, so that
in the worst case a missing field leads to a field looking like
"/TLSv1.2/AES...", though usually a missing element still results
in a "-" in logs.
This will change the log format for users who already deployed the
2.5-dev versions (hence the medium level) but no released version
was using this format yet so there's no harm for stable deployments.
The reg-test was updated to check for "-" there since we don't send
SNI in reg-tests.
Link: https://www.mail-archive.com/haproxy@formilux.org/msg41410.html
Cc: William Lallemand <wlallemand@haproxy.org>
Commit 3d2093af9 ("MINOR: connection: Add a connection error code sample
fetch") added these convenient sample-fetch functions but it appears that
due to a misunderstanding the redundant "conn" part was kept in their
name, causing confusion, since "fc" already stands for "front connection".
Let's simply call them "fc_err" and "bc_err" to match all other related
ones before they appear in a final release. The VTC they appeared in were
also updated, and the alpha sort in the keywords table updated.
Cc: William Lallemand <wlallemand@haproxy.org>
In order for the test to run with OpenSSL 1.0.2 the test will now mostly
use TLSv1.2 and use TLS 1.3 only on some specific tests (covered by
preconditions).
The test is strongly dependent on the way the errors are output by the
SSL library so it is not possible to perform the same checks when using
OpenSSL or LibreSSL. It is then reenabled for OpenSSL (whatever the
version) but still disabled for LibreSSL.
This limitation is added thanks to the new ssllib_name_startswith
precondition check.
The OpenSSL error codes for the same errors are not consistent between
OpenSSL versions. The ssl_errors test needs to be modified to only take
into account a fixed part of those error codes.
This patch focuses on the reason part of the error code by applying a
mask on the error code (whose size varies depending on the lib version).
The log-error-via-logformat option was removed in commit
3d6350e108 and was replaced by a dedicated
error-log-format option. The references to this option need to be
removed from the test's description.
ssl_crt-list_filters.vtc was deactivated because they were not compatible with
previous version of OpenSSL and it was not possible to
filter by versions.
Activate it again with a openssl_version_atleast(1.1.1)
check.
The ssl_bc_hsk_err sample fetch will need to raise more errors than only
handshake related ones hence its renaming to a more generic ssl_bc_err.
This patch is required because some handshake failures that should have
been caught by this fetch (verify error on the server side for instance)
were missed. This is caused by a change in TLS1.3 in which the
'Finished' state on the client is reached before its certificate is sent
(and verified) on the server side (see the "Protocol Overview" part of
RFC 8446).
This means that the SSL_do_handshake call is finished long before the
server can verify and potentially reject the client certificate.
The ssl_bc_hsk_err will then need to be expanded to catch other types of
errors.
This change is also applied to the frontend fetches (ssl_fc_hsk_err
becomes ssl_fc_err) and to their string counterparts.
Those fetches are used to identify connection errors and SSL handshake
errors on the backend side of a connection. They can for instance be
used in a log-format line as in the regtest.
This option can be used to define a specific log format that will be
used in case of error, timeout, connection failure on a frontend... It
will be used for any log line concerned by the log-separate-errors
option. It will also replace the format of specific error messages
decribed in section 8.2.6.
If no "error-log-format" is defined, the legacy error messages are still
emitted and the other error logs keep using the regular log-format.
Most of the SSL sample fetches related to the client certificate were
based on the SSL_get_peer_certificate function which returns NULL when
the verification process failed. This made it impossible to use those
fetches in a log format since they would always be empty.
The patch adds a reference to the X509 object representing the client
certificate in the SSL structure and makes use of this reference in the
fetches.
The reference can only be obtained in ssl_sock_bind_verifycbk which
means that in case of an SSL error occurring before the verification
process ("no shared cipher" for instance, which happens while processing
the Client Hello), we won't ever start the verification process and it
will be impossible to get information about the client certificate.
This patch also allows most of the ssl_c_XXX fetches to return a usable
value in case of connection failure (because of a verification error for
instance) by making the "conn->flags & CO_FL_WAIT_XPRT" test (which
requires a connection to be established) less strict.
Thanks to this patch, a log-format such as the following should return
usable information in case of an error occurring during the verification
process :
log-format "DN=%{+Q}[ssl_c_s_dn] serial=%[ssl_c_serial,hex] \
hash=%[ssl_c_sha1,hex]"
It should answer to GitHub issue #693.
Rename the 'dontloglegacyconnerr' option to 'log-error-via-logformat'
which is much more self-explanatory and readable.
Note: only legacy keywords don't use hyphens, it is recommended to
separate words with them in new keywords.
Disable the new ssl_errors.vtc reg-tests because in does not work
correctly on the CI since it requires a version of OpenSSL which is
compatible with TLSv1.3 and the ciphersuites keyword.
This reg-test checks that the connection and SSL sample fetches related
to errors are functioning properly. It also tests the proper behaviour
of the default HTTPS log format and of the log-legacy-conn-error option
which enables or disables the output of a special error message in case
of connection failure (otherwise a line following the configured
log-format is output).
When a default-server line specified a client certificate to use, the
frontend would not take it into account and create an empty SSL context,
which would raise an error on the backend side ("peer did not return a
certificate").
This bug was introduced by d817dc733e in
which the SSL contexts are created earlier than before (during the
default-server line parsing) without setting it in the corresponding
server structures. It then made the server create an empty SSL context
in ssl_sock_prepare_srv_ctx because it thought it needed one.
It was raised on redmine, in Bug #3906.
It can be backported to 2.4.
The `show ssl ocsp-response` feature is not available with BoringSSL,
but we don't have a way to disable this feature only with boringSSL on
the CI. Disable the reg-test until we do.
This file adds tests for the new "show ssl ocsp-response" command and
the new "show ssl cert foo.pem.ocsp" and "show ssl cert *foo.pem.ocsp"
special cases. They are all used to display information about an OCSP
response, committed or not.
This vtc tests the "new ssl crl-file" which allows to create a new empty
CRL file that can then be set through a "set+commit ssl crl-file"
command pair. It also tests the "del ssl crl-file" command which allows
to delete an unused CRL file.
This vtc tests the "new ssl ca-file" which allows to create a new empty
CA file that can then be set through a "set+commit ssl ca-file" command
pair. It also tests the "del ssl ca-file" command which allows to delete
an unused CA file.
This patch adds the "show ssl ca-file [<cafile>[:index]]" CLI command.
This command can be used to display the list of all the known CA files
when no specific file name is specified, or to display the details of a
specific CA file when a name is given. If an index is given as well, the
command will only display the certificate having the specified index in
the CA file (if it exists).
The details displayed for each certificate are the same as the ones
showed when using the "show ssl cert" command on a single certificate.
This fixes a subpart of GitHub issue #1057.
The "abort" command aborts an ongoing transaction started by a "set ssl
ca-file" command. Since the updated CA file data is not pushed into the
cafile tree until a "commit ssl ca-file" call is performed, the abort
command simply clears the new cafile_entry that was stored in the
cafile_transaction.
This fixes a subpart of GitHub issue #1057.
set_ssl_cert_bundle.vtc requires at least OpenSSL 1.1.0 and we don't
have any way to check this when launching the reg-tests suite.
Mark the reg-test as broken since it will fails on old versions of
openSSL and libreSSL.
This test loads a configuration which uses multi-certificates bundle and
tries to change them over the CLI.
Could be backported as far as 2.2, however the 2.2 version must be
adapted to commit the bundle and not each certificate individually.
If the first active line of a crt-list file is also the first mentioned
certificate of a frontend that does not have the strict-sni option
enabled, then its certificate will be used as the default one. We then
do not want this instance to be removable since it would make a frontend
lose its default certificate.
Considering that a crt-list file can be used by multiple frontends, and
that its first mentioned certificate can be used as default certificate
for only a subset of those frontends, we do not want the line to be
removable for some frontends and not the others. So if any of the ckch
instances corresponding to a crt-list line is a default instance, the
removal of the crt-list line will be forbidden.
It can be backported as far as 2.2.
The default SSL_CTX used by a specific frontend is the one of the first
ckch instance created for this frontend. If this instance has SNIs, then
the SSL context is linked to the instance through the list of SNIs
contained in it. If the instance does not have any SNIs though, then the
SSL_CTX is only referenced by the bind_conf structure and the instance
itself has no link to it.
When trying to update a certificate used by the default instance through
a cli command, a new version of the default instance was rebuilt but the
default SSL context referenced in the bind_conf structure would not be
changed, resulting in a buggy behavior in which depending on the SNI
used by the client, he could either use the new version of the updated
certificate or the original one.
This patch adds a reference to the default SSL context in the default
ckch instances so that it can be hot swapped during a certificate
update.
This should fix GitHub issue #1143.
It can be backported as far as 2.2.
If an unknown CA file was first mentioned in an "add ssl crt-list" CLI
command, it would result in a call to X509_STORE_load_locations which
performs a disk access which is forbidden during runtime. The same would
happen if a "ca-verify-file" or "crl-file" was specified. This was due
to the fact that the crt-list file parsing and the crt-list related CLI
commands parsing use the same functions.
The patch simply adds a new parameter to all the ssl_bind parsing
functions so that they know if the call is made during init or by the
CLI, and the ssl_store_load_locations function can then reject any new
cafile_entry creation coming from a CLI call.
It can be backported as far as 2.2.
Flush the SSL session cache when updating a certificate which is used on a
server line. This prevent connections to be established with a cached
session which was using the previous SSL_CTX.
This patch also replace the ha_barrier with a thread_isolate() since there
are more operations to do. The reg-test was also updated to remove the
'no-ssl-reuse' keyword which is now uneeded.
The "abort ssl cert" command is buggy and removes the current ckch store,
and instances, leading to SNI removal. It must only removes the new one.
This patch also adds a check in set_ssl_cert.vtc and
set_ssl_server_cert.vtc.
Must be backported as far as 2.2.
In a previous commit this test was disabled because I though the
feature was broken, but in fact this is the test which is broken.
Indeed the connection between the server and the client was not
renegociated and was using the SSL cache or a ticket. To be work
correctly these 2 features must be disabled or a new connection must be
established after the ticket timeout, which is too long for a regtest.
Also a "nbthread 1" was added as it was easier to reproduce the problem
with it.