smp_fetch_sc0_clr_gpc0, smp_fetch_sc1_clr_gpc0, smp_fetch_sc2_clr_gpc0,
smp_fetch_src_clr_gpc0 and smp_fetch_clr_gpc0 were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_inc_gpc0, smp_fetch_sc1_inc_gpc0, smp_fetch_sc2_inc_gpc0,
smp_fetch_src_inc_gpc0 and smp_fetch_inc_gpc0 were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_gpc0, smp_fetch_sc1_gpc0, smp_fetch_sc2_gpc0,
smp_fetch_src_gpc0 and smp_fetch_gpc0 were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_get_gpc0, smp_fetch_sc1_get_gpc0, smp_fetch_sc2_get_gpc0,
smp_fetch_src_get_gpc0 and smp_fetch_get_gpc0 were merged into a single
function which relies on the fetch name to decide what to return.
This function aims at simplifying the prefetching of the table and entry
when using any of the session counters fetches. The principle is that the
src_* variant produces a stkctr that is used instead of the one from the
session. That way we can call the same function from all session counter
fetch functions and always have a single function to support sc[0-9]_/src_.
We're having a lot of duplicate code just because of minor variants between
fetch functions that could be dealt with if the functions had the pointer to
the original keyword, so let's pass it as the last argument. An earlier
version used to pass a pointer to the sample_fetch element, but this is not
the best solution for two reasons :
- fetch functions will solely rely on the keyword string
- some other smp_fetch_* users do not have the pointer to the original
keyword and were forced to pass NULL.
So finally we're passing a pointer to the keyword as a const char *, which
perfectly fits the original purpose.
Lukas Benes reported that http-send-name-header causes a segfault if no
server is available because we're dereferencing the session's target which
is NULL. The tiniest reproducer looks like this :
listen foo
bind :1234
mode http
http-send-name-header srv
This obvious fix must be backported to 1.4 which is affected as well.
Commit e25c917a introduced a third tracking counter bug forgot
to check it when storing values at the end of the session. The
impact is that if neither the first nor the second one are
changed, none of them are saved.
Benoit Dolez reported a failure to start haproxy 1.5-dev19. The
process would immediately report an internal error with missing
fetches from some crap instead of ACL names.
The cause is that some versions of gcc seem to trim static structs
containing a variable array when moving them to BSS, and only keep
the fixed size, which is just a list head for all ACL and sample
fetch keywords. This was confirmed at least with gcc 3.4.6. And we
can't move these structs to const because they contain a list element
which is needed to link all of them together during the parsing.
The bug indeed appeared with 1.5-dev19 because it's the first one
to have some empty ACL keyword lists.
One solution is to impose -fno-zero-initialized-in-bss to everyone
but this is not really nice. Another solution consists in ensuring
the struct is never empty so that it does not move there. The easy
solution consists in having a non-null list head since it's not yet
initialized.
A new "ILH" list head type was thus created for this purpose : create
an Initialized List Head so that gcc cannot move the struct to BSS.
This fixes the issue for this version of gcc and does not create any
burden for the declarations.
When abortonclose is used and an error is detected on the client side,
better force an RST to the server. That way we propagate to the server
the same vision we got from the client, and we ensure that we won't keep
TIME_WAITs.
It was a bit inconsistent to have gpc start at 0 and sc start at 1,
so make sc start at zero like gpc. No previous release was issued
with sc3 anyway, so no existing setup should be affected.
Some users want to disable logging for certain non-important requests such as
stats requests or health-checks coming from another equipment. Other users want
to log with a higher importance (eg: notice) some special traffic (POST requests,
authenticated requests, requests coming from suspicious IPs) or some abnormally
large responses.
This patch responds to all these needs at once by adding a "set-log-level" action
to http-request/http-response. The 8 syslog levels are supported, as well as "silent"
to disable logging.
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
We're often missin a third counter to track base, src and base+src at
the same time. Here we introduce track_sc3 to have this third counter.
It would be wise not to add much more counters because that slightly
increases the session size and processing time though the real issue
is more the declaration of the keywords in the code and in the doc.
By properly affecting the flags and values, it becomes easier to add
more tracked counters, for example for experimentation. It also slightly
reduces the code and the number of tests. No counters were added with
this patch.
Till now we used to call the function until the connection established, which
means that the header rewriting was performed for nothing upon each even (eg:
uploaded contents) until the server responded or timed out.
Now we only call the function when we assign the server.
Now that ACLs solely rely on sample fetch functions, make them use the
same arg mask. All inconsistencies have been fixed separately prior to
this patch, so this patch almost only adds a new pointer indirection
and removes all references to ARG*() in the definitions.
The parsing is still performed by the ACL code though.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.
In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.
A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
option abortonclose may cause a valid connection to be aborted just
after the request has been sent. This is because we check for it
during the session establishment sequence before checking for write
activity. So if the abort and the connect complete at the same time,
the abort is still considered. Let's check for an explicity partial
write before aborting.
This fix should be backported to 1.4 too.
We now have http_apply_redirect_rule() which does all the redirect-specific
job instead of having this inside http_process_req_common().
Also one of the benefit gained from uniformizing this code is that both
keep-alive and close response do emit the PR-- flags. The fix for the
flags could probably be backported to 1.4 though it's very minor.
The previous function http_perform_redirect() was becoming confusing
so it was renamed http_perform_server_redirect() since it only applies
to server-based redirection.
It happens that all of them call parse_logformat_line() which sets
proxy->to_log with a number of flags affecting the line format for
all three users. For example, having a unique-id specified disables
the default log-format since fe->to_log is tested when the session
is established.
Similarly, having "option logasap" will cause "+" to be inserted in
unique-id or headers referencing some of the fields depending on
LW_BYTES.
This patch first removes most of the dependency on fe->to_log whenever
possible. The first possible cleanup is to stop checking fe->to_log
for being null, considering that it always contains at least LW_INIT
when any such usage is made of the log-format!
Also, some checks are wrong. s->logs.logwait cannot be nulled by
"logwait &= ~LW_*" since LW_INIT is always there. This results in
getting the wrong log at the end of a request or session when a
unique-id or add-header is set, because logwait is still not null
but the log-format is not checked.
Further cleanups are required. Most LW_* flags should be removed or at
least replaced with what they really mean (eg: depend on client-side
connection, depend on server-side connection, etc...) and this should
only affect logging, not other mechanisms.
This patch fixes the default log-format and tries to limit interferences
between the log formats, but does not pretend to do more for the moment,
since it's the most visible breakage.
The stick counters were in two distinct sets of struct members,
causing some code to be duplicated. Now we use an array, which
enables some processing to be performed in loops. This allowed
the code to be shrunk by 700 bytes.
Returns the current amount of concurrent connections tracking the same
tracked counters. This number is automatically incremented when tracking
begins and decremented when tracking stops. It differs from sc1_conn_cur in
that it does not rely on any stored information but on the table's reference
count (the "use" value which is returned by "show table" on the CLI). This
may sometimes be more suited for layer7 tracking.
Until now it was only possible to use track-sc1/sc2 with "src" which
is the IPv4 source address. Now we can use track-sc1/sc2 with any fetch
as well as any transformation type. It works just like the "stick"
directive.
Samples are automatically converted to the correct types for the table.
Only "tcp-request content" rules may use L7 information, and such information
must already be present when the tracking is set up. For example it becomes
possible to track the IP address passed in the X-Forwarded-For header.
HTTP request processing now also considers tracking from backend rules
because we want to be able to update the counters even when the request
was already parsed and tracked.
Some more controls need to be performed (eg: samples do not distinguish
between L4 and L6).
Commit 2b199c9a attempted to fix all places where the transport layer
is improperly closed, but it missed one place in session_free(). If
SSL ciphers are logged, the close() is delayed post-log and performed
in session_free(). However, conn_xprt_close() only closes the transport
layer but not the file descriptor, resulting in a slow FD leak which is
hardly noticeable until the process cannot accept any new connection.
A workaround consisted in disabling %sslv/%sslc in log-format.
So use conn_full_close() instead of conn_xprt_close() to fix this there
too.
A similar pending issue existed in the close during outgoing connection
failure, though on this side, the transport layer is never tracked at the
moment.
When the PROXY protocol header is expected and fails, leading to an
abort of the incoming connection, we now emit a log message. If option
dontlognull is set and it was just a port probe, then nothing is logged.
Since the introduction of SSL, it became quite annoying not to get any useful
info in logs about handshake failures. Let's improve reporting for embryonic
sessions by checking a per-connection error code and reporting it into the logs
if an error happens before the session is completely instanciated.
The "dontlognull" option is supported in that if a connection does not talk
before being aborted, nothing will be emitted.
At the moment, only timeouts are considered for SSL and the PROXY protocol,
but next patches will handle more errors.
To ensure that we only count when a response was compressed, we also
check for the SN_COMP_READY flag which indicates that the compression
was effectively initialized. Comp_algo alone is meaningless.
Depending on the content-types and accept-encoding fields, some responses
might or might not be compressed. Let's have a counter of the number of
compressed responses and report it in the stats to help improve compression
usage.
Some cosmetic issues were fixed in the CSV output too (missing commas at the
end).
Several places got the connection close sequence wrong because it
was not obvious. In practice we always need the same sequence when
aborting, so let's have a common function for this.
There was a possible memory leak in the zlib code when the first response of
a keep-alive session was compressed, because the next request would reset the
compression algo, preventing a later call to session_free() from releasing it.
The reason is that it is necessary to release the assigned resources in
http_end_txn_clean_session().
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.
This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.
In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.
A lot of code was touched by massive replaces, but the changes are not
that important.
Hijackers were functions designed to inject data into channels in the
distant past. They became unused around 1.3.16, and since there has
not been any user of this mechanism to date, it's uncertain whether
the mechanism still works (and it's not really useful anymore). So
better remove it as well as the pointer it uses in the channel struct.
si_fd() is not used a lot, and breaks builds on OpenBSD 5.2 which
defines this name for its own purpose. It's easy enough to remove
this one-liner function, so let's do it.
There is a small waste of CPU cycles when no handshake is required on an
accepted connection, because we had to perform one call to conn_fd_handler()
to mark the connection CONNECTED and to call process_session() again to say
that nothing happened.
By marking the connection CONNECTED when there is no pending handshake, we
avoid this extra call to process_session().
Having a global expiration timer for a task means that the tasks are regularly
woken up (at least after each expiration timer). It's totally useless and counter
productive to process the whole session upon each such wakeup, and it's fairly
easy to detect such wakeups, so let's just update the task's timer and return
to sleep when this happens.
For 100k concurrent connections with 10s of timeouts, this can save 10k wakeups
per second, which is not bad.
With extra-large buffers, it is possible that a lot of data are sent upon
connection establishment before the session is notified. The issue is how
to handle a send() error after some data were actually sent.
At the moment, only a connection error is reported, causing a new connection
attempt and send() to restart after the last data. We absolutely don't want
to retry the connect() if at least one byte was sent, because those data are
lost.
The solution consists in reporting exactly what happens, which is :
- a successful connection attempt
- a read/write error on the channel
That way we go on with sess_establish(), the response analysers are called
and report the appropriate connection state for the error (typically a server
abort while waiting for a response). This mechanism also guarantees that we
won't retry since it's a success. The logs also report the correct connect
time.
Note that 1.4 is not directly affected because it only attempts one send(),
so it cannot detect a send() failure here and distinguish it form a failed
connection attempt. So no backport is needed. Also, this is just a safe belt
we're taking, since this issue should not happen anymore since previous commit.
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.
Using global.tune.bufsize is not the most convenient thing either.
So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.
The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.
Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().
This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
This commit introduces HTTP compression using the zlib library.
http_response_forward_body has been modified to call the compression
functions.
This feature includes 3 algorithms: identity, gzip and deflate:
* identity: this is mostly for debugging, and it was useful for
developping the compression feature. With Content-Length in input, it
is making each chunk with the data available in the current buffer.
With chunks in input, it is rechunking, the output chunks will be
bigger or smaller depending of the size of the input chunk and the
size of the buffer. Identity does not apply any change on data.
* gzip: same as identity, but applying a gzip compression. The data
are deflated using the Z_NO_FLUSH flag in zlib. When there is no more
data in the input buffer, it flushes the data in the output buffer
(Z_SYNC_FLUSH). At the end of data, when it receives the last chunk in
input, or when there is no more data to read, it writes the end of
data with Z_FINISH and the ending chunk.
* deflate: same as gzip, but with deflate algorithm and zlib format.
Note that this algorithm has ambiguous support on many browsers and
no support at all from recent ones. It is strongly recommended not
to use it for anything else than experimentation.
You can't choose the compression ratio at the moment, it will be set to
Z_BEST_SPEED (1), as tests have shown very little benefit in terms of
compression ration when going above for HTML contents, at the cost of
a massive CPU impact.
Compression will be activated depending of the Accept-Encoding request
header. With identity, it does not take care of that header.
To build HAProxy with zlib support, use USE_ZLIB=1 in the make
parameters.
This work was initially started by David Du Colombier at Exceliance.
This field was used to trace precisely where a session was terminated
but it did not survive code rearchitecture and was not used at all
anymore. Let's get rid of it.
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.
Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
This flag will have to be set on log tags which require transport layer
information. They will prevent the conn_xprt_close() call from releasing
the transport layer too early.
When we start logging SSL information, we need the SSL struct to be
present even past the conn_xprt_close() call. In order to achieve this,
we should use refcounting on the connection and the transport layer. At
the moment it's not worth using plain refcounting as only the logs require
this, so instead of real refcounting we just use a flag which will be set
by the log subsystem when SSL data need to be logged.
What happens then is that the xprt->close() call is ignored and the
transport layer is closed again during session_free(), after the log
line is emitted.
This callback was introduced by commit 9683e9a0 but never enabled because
the CO_FL_WAKE_DATA flag was not set. The result is that this function is
never called when an SSL handshake fails, so the connection is only closed
on timeout.
Commit 82569f91 moved the health and monitor-net checks to session.c
but a debug test introduced 0& to disable MSG_DONTWAIT in the recv()
call and this debug code remained there. Since the socket is marked
non-blocking, there should be no effect but it's dangerous to keep
such a thing here.
On Linux, accept4() does the same as accept() except that it allows
the caller to specify some flags to set on the resulting socket. We
use this to set the O_NONBLOCK flag and thus to save one fcntl()
call in each connection. The effect is a small performance gain of
around 1%.
The option is automatically enabled when target linux2628 is set, or
when the USE_ACCEPT4 Makefile variable is set. If the libc is too old
to provide the equivalent function, this is automatically detected and
our own function is used instead. In any case it is possible to force
the use of our implementation with USE_MY_ACCEPT4.
The connection layer will soon call ->wake() only when errors happen, and
not ->init(). So make the session layer use this callback to detect errors
and abort connections.
The generic data-layer init callback is now used after the transport
layer is complete and before calling the data layer recv/send callbacks.
This allows the session to switch from the embryonic session data layer
to the complete stream interface data layer, by making conn_session_complete()
the data layer's init callback.
It sill looks awkwards that the init() callback must be used opon error,
but except by adding yet another one, it does not seem to be mergeable
into another function (eg: it should probably not be merged with ->wake
to avoid unneeded calls during the handshake, though semantically that
would make sense).
We don't want to have the recv or send callbacks in embryonic
sessions, and we want the stream interface to be referenced as
the connection owner only once the session is instanciated. So
let's first have the embryonic session be the owner, then replaced
later by the stream interface once the transport layer is ready.
Instead of calling conn_notify_si() from the connection handler, we
now call data->wake(), which will allow us to use a different callback
with health checks.
Note that we still rely on a flag in order to decide whether or not
to call this function. The reason is that with embryonic sessions,
the callback is already initialized to si_conn_cb without the flag,
and we can't call the SI notify function in the leave path before
the stream interface is initialized.
This issue should be addressed by involving a different data_cb for
embryonic sessions and for stream interfaces, that would be changed
during session_complete() for the final data_cb.
While working on the changes required to make the health checks use the
new connections, it started to become obvious that some naming was not
logical at all in the connections. Specifically, it is not logical to
call the "data layer" the layer which is in charge for all the handshake
and which does not yet provide a data layer once established until a
session has allocated all the required buffers.
In fact, it's more a transport layer, which makes much more sense. The
transport layer offers a medium on which data can transit, and it offers
the functions to move these data when the upper layer requests this. And
it is the upper layer which iterates over the transport layer's functions
to move data which should be called the data layer.
The use case where it's obvious is with embryonic sessions : an incoming
SSL connection is accepted. Only the connection is allocated, not the
buffers nor stream interface, etc... The connection handles the SSL
handshake by itself. Once this handshake is complete, we can't use the
data functions because the buffers and stream interface are not there
yet. Hence we have to first call a specific function to complete the
session initialization, after which we'll be able to use the data
functions. This clearly proves that SSL here is only a transport layer
and that the stream interface constitutes the data layer.
A similar change will be performed to rename app_cb => data, but the
two could not be in the same commit for obvious reasons.
We were having several different behaviours with monitor-net and
"mode health" :
- monitor-net on TCP connections was evaluated just after accept(),
did not count a connection on the frontend and were not subject
to tcp-request connection rules, and caused an immediate close().
- monitor-net in HTTP mode was evaluated once the session was
accepted (eg: on top of SSL), returned "HTTP/1.0 200 OK\r\n\r\n"
over the connection's data layer and instanciated a session which
was responsible for closing this connection. A connection AND a
session were counted for the frontend ;
- "mode health" with "option httpchk" would do exactly the same as
monitor-net in HTTP mode ;
- "mode health" without "option httpchk" would do the same as above
except that "OK" was returned instead of "HTTP/1.0 200 OK\r\n\r\n".
None of them took care of cleaning the input buffer, sometimes resulting
in a TCP reset to be emitted after the last packet if a request was received
over the connection.
Given the inconsistencies and the complexity in keeping all these features
handled at the right position, we now slightly changed the way they are
handled :
- all of them are handled just after the "tcp-request connection" rules,
so that all of them may be blocked using such rules, offering more
flexibility and consistency ;
- no connection handshake is performed anymore for non-TCP modes
- all of them send the response as raw data over the socket, there is no
more difference between TCP and HTTP mode for example (these rules were
never meant to be served over SSL connections and were never documented
as able to do that).
- any possible pending data on the incoming socket is drained before the
response is sent, in order to avoid the risk of a reset.
- none of them exactly did what was documented !
This results in more consistent, more flexible and more accurate handling of
monitor rules, with smaller and more robust code.
It appears that fd.h includes a number of unneeded files and was
included from standard.h, and as such served as an intermediary
to provide almost everything to everyone.
By removing its useless includes, a long dependency chain broke
but could easily be fixed.
These flags were added for TCP_CORK. They were only set at various places
but never checked by any user since TCP_CORK was replaced with MSG_MORE.
Simply get rid of this now.
The PROXY protocol is now decoded in the connection before other
handshakes. This means that it may be extracted from a TCP stream
before SSL is decoded from this stream.
When an incoming connection request is accepted, a connection
structure is needed to store its state. However we don't want to
fully initialize a session until the data layer is about to be
ready.
As long as the connection is physically stored into the session,
it's not easy to split both allocations.
As such, we only initialize the minimum requirements of a session,
which results in what we call an embryonic session. Then once the
data layer is ready, we can complete the function's initialization.
Doing so avoids buffers allocation and ensures that a session only
sees ready connections.
The frontend's client timeout is used as the handshake timeout. It
is likely that another timeout will be used in the future.
SSL need to initialize the data layer before proceeding with data. At
the moment, this data layer is automatically initialized from itself,
which will not be possible once we extract connection from sessions
since we'll only create the data layer once the handshake is finished.
So let's have the application layer initialize the data layer before
using it.
Make it more obvious that this function does not depend on any knowledge
of the session. This is important to plan for TCP rules that can run on
connection without any initialized session yet.
These ones are implicitly handled by the connection's data layer, no need
to rely on them anymore and reaching them maintains undesired dependences
on stream-interface.
We need to have the source and destination addresses in the connection.
They were lying in the stream interface so let's move them. The flags
SI_FL_FROM_SET and SI_FL_TO_SET have been moved as well.
It's worth noting that tcp_connect_server() almost does not use the
stream interface anymore except for a few flags.
It has been identified that once we detach the connection from the SI,
it will probably be needed to keep a copy of the server-side addresses
in the SI just for logging purposes. This has not been implemented right
now though.
This is a massive rename of most functions which should make use of the
word "channel" instead of the word "buffer" in their names.
In concerns the following ones (new names) :
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes);
static inline void channel_init(struct channel *buf)
static inline int channel_input_closed(struct channel *buf)
static inline int channel_output_closed(struct channel *buf)
static inline void channel_check_timeouts(struct channel *b)
static inline void channel_erase(struct channel *buf)
static inline void channel_shutr_now(struct channel *buf)
static inline void channel_shutw_now(struct channel *buf)
static inline void channel_abort(struct channel *buf)
static inline void channel_stop_hijacker(struct channel *buf)
static inline void channel_auto_connect(struct channel *buf)
static inline void channel_dont_connect(struct channel *buf)
static inline void channel_auto_close(struct channel *buf)
static inline void channel_dont_close(struct channel *buf)
static inline void channel_auto_read(struct channel *buf)
static inline void channel_dont_read(struct channel *buf)
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes)
Some functions provided by channel.[ch] have kept their "buffer" name because
they are really designed to act on the buffer according to some information
gathered from the channel. They have been moved together to the same place in
the file for better readability but they were not changed at all.
The "buffer" memory pool was also renamed "channel".
Get rid of these confusing BF_* flags. Now channel naming should clearly
be used everywhere appropriate.
No code was changed, only a renaming was performed. The comments about
channel operations was updated.
This flag is quite complex to get right and updating it everywhere is a
major pain, especially since the buffer/channel split. This is the first
step of getting rid of it. Instead now it's dynamically computed whenever
needed.
This flag was very problematic because it was composite in that both changes
to the pipe or to the buffer had to cause this flag to be updated, which is
not always simple (eg: there may not even be a channel attached to a buffer
at all).
There were not that many users of this flags, mostly setters. So the flag got
replaced with a macro which reports whether the channel is empty or not, by
checking both the pipe and the buffer.
One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
output size is more useful than checking the flag's replacement.
Some parts of the sock_ops structure were only used by the stream
interface and have been moved into si_ops. Some of them were callbacks
to the stream interface from the connection and have been moved into
app_cp as they're the application seen from the connection (later,
health-checks will need to use them). The rest has moved to data_ops.
Normally at this point the connection could live without knowing about
stream interfaces at all.
The splicing is now provided by the data-layer rcv_pipe/snd_pipe functions
which in turn are called by the stream interface's recv and send callbacks.
The presence of the rcv_pipe/snd_pipe functions is used to attest support
for splicing at the data layer. It looks like the stream-interface's
SI_FL_CAP_SPLICE flag does not make sense anymore as it's used as a proxy
for the pointers above.
It also appears that we call chk_snd() from the recv callback and then
try to call it again in update_conn(). It is very likely that this last
function will progressively slip into the recv/send callbacks in order
to avoid duplicate check code.
The code works right now with and without splicing. Only raw_sock provides
support for it and it is automatically selected when the various splice
options are set. However it looks like splice-auto doesn't enable it, which
possibly means that the streamer detection code does not work anymore, or
that it's only called at a time where it's too late to enable splicing (in
process_session).
The "raw_sock" prefix will be more convenient for naming functions as
it will be prefixed with the data layer and suffixed with the data
direction. So let's rename the files now to avoid any further confusion.
The #include directive was also removed from a number of files which do
not need it anymore.
At the moment, the struct is still embedded into the struct channel, but
all the functions have been updated to use struct buffer only when possible,
otherwise struct channel. Some functions would likely need to be splitted
between a buffer-layer primitive and a channel-layer function.
Later the buffer should become a pointer in the struct buffer, but doing so
requires a few changes to the buffer allocation calls.
This is a massive rename. We'll then split channel and buffer.
This change needs a lot of cleanups. At many locations, the parameter
or variable is still called "buf" which will become ambiguous. Also,
the "struct channel" is still defined in buffers.h.
Now the connection handler, the handshake callbacks and the I/O callbacks
make use of the connection-layer polling functions to enable or disable
polling on a file descriptor.
Some changes still need to be done to avoid using the FD_WAIT_* constants.
These functions have a more explicity meaning and will offer provisions
for explicit polling.
EV_FD_ISSET() has been left for now as it is still in use in checks.
The socket data layer code must only focus on moving data between a
socket and a buffer. We need a special stream interface handler to
update the stream interface and the file descriptor status.
At the moment the code works but suffers from a race condition caused
by its API : the read/write callbacks still make use of the fd instead
of using the connection. And when a double shutdown is performed, a call
to ->write() after ->read() processed an error results in dereferencing
a NULL fdtab[]->owner. This is only a temporary issue which doesn't need
to be fixed now since this will automatically go away when the functions
change to use the connection instead.
fdtab[].state was only used to know whether a connection was in progress
or an error was encountered. Instead we now use connection->flags to store
a flag for both. This way, connection management will be able to update the
connection status on I/O.
In an attempt to get rid of fdtab[].state, and to move the relevant
parts to the connection struct, we remove the FD_STCLOSE state which
can easily be deduced from the <owner> pointer as there is a 1:1 match.
The destination address is purely a connection thing and not an fd thing.
It's also likely that later the address will be stored into the connection
and linked to by the SI.
struct fdinfo only keeps the pointer to the port range and the local port
for now. All of this also needs to move to the connection but before this
the release of the port range must move from fd_delete() to a new function
dedicated to the connection.
The listener struct is now aware of the socket layer to use upon accept().
At the moment, only sock_raw is supported so this patch should not change
anything.
When the target is a client, it will be convenient to have a pointer to the
original listener so that we can retrieve some configuration information at
the stream interface level.
We start to move everything needed to manage a connection to a special
entity "struct connection". We have the data layer operations and the
control operations there. We'll also have more info in the future such
as file descriptors and applet contexts, so that in the end it becomes
detachable from the stream interface, which will allow connections to
be reused between sessions.
For now on, we start with minimal changes.
It does not make sense anymore to wait for a session creation to process
a TCP monitor check which only closes the connection and returns. Better
to process this immediately after the accept() return. It also saves us
from counting a connection for monitor checks, which is much more logical.
Calling the init() function in sess_establish was a bad idea, it is
too late to allow it to fail on lack of resource and does not help at
all. Remove it for now before it's used.
This flag indicates that we're not interested in keeping half-open
connections on a stream interface. It has the benefit of allowing
the socket layer to cause an immediate write close when detecting
an incoming read close. This releases resources much faster and
saves one syscall (either a shutdown or setsockopt).
This flag is only set by HTTP on the interface going to the server
since we don't want to continue pushing data there when it has
closed.
Another benefit is that it responds with a FIN to a server's FIN
instead of responding with an RST as it used to, which is much
cleaner.
Performance gains of 7.5% have been measured on HTTP connection
rate on empty objects.
These pointers were used to hold pointers to buffers in the past, but
since we introduced the stream interface, they're no longer used but
they were still sometimes set.
Removing them shrink the struct fdtab from 32 to 24 bytes on 32-bit machines,
and from 52 to 36 bytes on 64-bit machines, which is a significant saving. A
quick tests shows a steady 0.5% performance gain, probably due to the better
cache efficiency.
Tunnel timeouts are used when TCP connections are forwarded, or
when forwarding upgraded HTTP connections (WebSocket) as well as
CONNECT requests to proxies.
This timeout allows long-lived sessions to be supported without
having to set large timeouts to normal requests.
In sess_establish, once we've prepared everythin, we can call the socket layer
init function. We pass an argument for targets which have one (eg: servers). At
the moment, the existing socket layers don't have init functions, but SSL will
need one.
Similarly to the previous patch, we don't need the socket-layer functions
outside of stream_interface. They could even move to a file dedicated to
applets, though that does not seem particularly useful at the moment.
We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
All keywords registered using a cfg_kw_list now make use of the new error reporting
framework. This allows easier and more precise error reporting without having to
deal with complex buffer allocation issues.
Last memory poisonning patch immediately made this issue appear.
The unique_id field is released but not properly initialized. The
feature was introduced very recently, no backport is needed.
Commit b22e55bc introduced send_proxy_ofs but forgot to initialize it,
which remained unnoticed since it's always at the same place in the
stream interface. On a machine with dirty RAM returned by malloc(),
some responses were holding a PROXY header, which normally is not
possible.
The problem goes away after properly initializing the field upon each
new session_accept().
This fix does not need to be backported except if any code makes use of
a backport of this feature.
These operators are used regardless of the socket protocol family. Move
them to a "sock_ops" struct. ->read and ->write have been moved there too
as they have no reason to remain at the protocol level.
This is mainly a massive renaming in the code to get it in line with the
calling convention. Next patch will rename a few files to complete this
operation.
Patterns were using a bitmask to indicate if request or response was desired
in fetch functions and keywords. ACLs were using a bitmask in fetch keywords
and a single bit in fetch functions. ACLs were also using an ACL_PARTIAL bit
in fetch functions indicating that a non-final fetch was performed, which was
an abuse of the existing direction flag.
The change now consists in using :
- a capabilities field for fetch keywords => SMP_CAP_REQ/RES to indicate
if a keyword supports requests, responses, both, etc...
- an option field for fetch functions to indicate what the caller expects
(request/response, final/non-final)
The ACL_PARTIAL bit was reversed to get SMP_OPT_FINAL as it's more explicit
to know we're working on a final buffer than on a non-final one.
ACL_DIR_* were removed, as well as PATTERN_FETCH_*. L4 fetches were improved
to support being called on responses too since they're still available.
The <dir> field of all fetch functions was changed to <opt> which is now
unsigned.
The patch is large but mostly made of cosmetic changes to accomodate this, as
almost no logic change happened.
Having the args everywhere will make it easier to share fetch functions
between patterns and ACLs. The only place where we could have needed
the expr was in the http_prefetch function which can do well without.
This one is not needed anymore as we can return the data and its type in the
sample provided by the caller. ACLs now always return the proper type. BOOL
is already returned when the result is expected to be processed as a boolean.
temp_pattern has been unexported now.
The new sample types are necessary for the acl-pattern convergence.
These types are boolean and signed int. Some types were renamed for
less ambiguity (ip->ipv4, integer->uint).
A large number of ACLs make use of frontend, backend or table names in their
arguments, and fall back to the current proxy when no argument is passed. If
the expected capability is not available, the ACL silently fails at runtime.
Now we make all those names mandatory in the parser and we rely on
acl_find_targets() to replace the missing names with the holding proxy,
then to perform the appropriate tests, and to reject errors at parsing
time.
It is possible that some faulty configurations will get rejected from now
on, while they used to silently fail till now. This is the reason why this
change is marked as MAJOR.
Proxy names are now resolved when the config is parsed and not at runtime.
This means that errors will be caught for real instead of having an ACL
silently never match. Another benefit is that the fetch will be much faster
since the lookup will not have to be performed anymore, eg for all ACLs
based on explicitly named stick-tables.
However some buggy configurations which used to silently fail in the past
will now refuse to load, hence the MAJOR tag.
The types and minimal number of ACL keyword arguments are now stored in
their declaration. This will allow many more fantasies if some ACL use
several arguments or types.
Doing so required to rework all ACL keyword declarations to add two
parameters. So this was a good opportunity for a general cleanup and
to sort all entries in alphabetical order.
We still have two pending issues :
- parse_acl_expr() checks for errors but has no way to report them to
the user ;
- the types of some arguments are still not resolved and kept as strings
(eg: ARGT_FE/BE/TAB) for compatibility reasons, which must be resolved
in acl_find_targets()
The ACL parser now uses the argument parser to build a typed argument list.
Right now arguments are all strings and only one argument is supported since
this is what ACLs currently support.
ACLs and patterns only rely on a struct http_msg and don't know the pointer
to the actual data. struct http_msg will soon only hold relative references
so that's not possible. We need http_msg to hold a reference to the struct
buffer before having relative pointers everywhere.
It is likely that doing so will also result in opportunities to simplify
a number of functions arguments. The following functions are already
candidate :
http_buffer_heavy_realign
http_capture_bad_message
http_change_connection_header
http_forward_trailers
http_header_add_tail
http_header_add_tail2
http_msg_analyzer
http_parse_chunk_size
http_parse_connection_header
http_remove_header2
http_send_name_header
http_skip_chunk_crlf
http_upgrade_v09_to_v10
We don't have buf->l anymore. We have buf->i for pending data and
the total length is retrieved by adding buf->o. Some computation
already become simpler.
Despite extreme care, bugs are not excluded.
It's worth noting that msg->err_pos as set by HTTP request/response
analysers becomes relative to pending data and not to the beginning
of the buffer. This has not been completed yet so differences might
occur when outgoing data are left in the buffer.
Too many flags are stored in the transaction structure. Some flags are
clearly message-specific and exist in two versions (request and response).
Move them to a new "flags" field in the http_message struct instead.
There were a few unchecked write() calls in the debug code that cause
gcc 4.x to emit warnings on recent libc. We don't want to check them
as we can't make anything from the result, let's simply surround them
with an empty if statement.
Note that one of the warnings was for chdir("/") which normally cannot
fail since it follows a successful chroot (which means the perms are
necessarily there). Anyway let's move the call uppe to protect it too.
These callbacks are used to retrieve the source and destination address
of a socket. The address flags are not hold on the stream interface and
not on the session anymore. The addresses are collected when needed.
This still needs to be improved to store the IP and port separately so
that it is not needed to perform a getsockname() when only the IP address
is desired for outgoing traffic.
Sometimes it is desirable to forward a particular request to a specific
server without having to declare a dedicated backend for this server. This
can be achieved using the "use-server" rules. These rules are evaluated after
the "redirect" rules and before evaluating cookies, and they have precedence
on them. There may be as many "use-server" rules as desired. All of these
rules are evaluated in their declaration order, and the first one which
matches will assign the server.
I downloaded version 1.4.19 this morning. While merging the code changes
to a custom build that we have here for our project I noticed a typo in
'session.c', in the new code for inserting the server name in the HTTP
header. The fix that I did is shown in the patch below.
[WT: the bug is harmless, it is only suboptimal]
New option "http-send-name-header" specifies the name of a header which
will hold the server name in outgoing requests. This is the name of the
server the connection is really sent to, which means that upon redispatches,
the header's value is updated so that it always matches the server's name.
All ACL fetches which return integer value now store the result into
the temporary pattern struct. All ACL matches which rely on integer
also get their value there.
Note: the pattern data types are not set right now.
It makes no sense to have one pointer to the hdr_idx pool in each proxy
struct since these pools do not depend on the proxy. Let's have a common
pool instead as it is already the case for other types.
Stream interfaces used to distinguish between client and server addresses
because they were previously of different types (sockaddr_storage for the
client, sockaddr_in for the server). This is not the case anymore, and this
distinction is confusing at best and has caused a number of regressions to
be introduced in the process of converting everything to full-ipv6. We can
now remove this and have a much cleaner code.
We already had the ability to kill a connection, but it was only
for the checks. Now we can do this for any session, and for this we
add a specific flag "K" to the logs.