There is no reason to inline functions which are used to grab a server
depending on an LB algo. They are large and used at several places.
Uninlining them saves 400 bytes of code.
We can get rid of the stats analyser by moving all the stats code
to a stream interface applet. Above being cleaner, it provides new
advantages such as the ability to process requests and responses
from the same function and work only with simple state machines.
There's no need for any hijack hack anymore.
The direct advantage for the user are the interactive mode and the
ability to chain several commands delimited by a semi-colon. Now if
the user types "prompt", he gets a prompt from which he can send
as many requests as he wants. All outputs are terminated by a
blank line followed by a new prompt, so this can be used from
external tools too.
The code is not very clean, it needs some rework, but some part
of the dirty parts are due to the remnants of the hijack mode used
in the old functions we call.
The old AN_REQ_STATS_SOCK analyser flag is now unused and has been
removed.
It will soon be necessary to have stream interfaces running as part of
the current task, or as independant tasks. For instance when we want to
implement compression or SSL. It will also be used for applets running
as stream interfaces.
These new functions are used to perform exactly that. Note that it's
still not easy to write a simple echo applet and more functions will
likely be needed.
Those two functions did not correctly deal with full buffers and/or
buffers that wrapped around. Buffer_skip() was even able to incorrectly
set buf->w further than the end of buffer if its len argument was wrong,
and buffer_si_getline() was able to incorrectly return a length larger
than the effective buffer data available.
It's important that these functions set these flags themselves, otherwise
the callers will always have to do this, and there is no valid reason for
not doing it.
Collect information about last health check result,
including L7 code if possible (for example http or smtp
return code) and time took to finish last check.
Health check info is provided on both stats pages (html & csv)
and logged when a server is marked UP or DOWN. Currently active
check are marked with an asterisk, but only in html mode.
Currently there are 14 status codes:
UNK -> unknown
INI -> initializing
SOCKERR -> socket error
L4OK -> check passed on layer 4, no upper layers testing enabled
L4TOUT -> layer 1-4 timeout
L4CON -> layer 1-4 connection problem, for example "Connection refused"
(tcp rst) or "No route to host" (icmp)
L6OK -> check passed on layer 6
L6TOUT -> layer 6 (SSL) timeout
L6RSP -> layer 6 invalid response - protocol error
L7OK -> check passed on layer 7
L7OKC -> check conditionally passed on layer 7, for example
404 with disable-on-404
L7TOUT -> layer 7 (HTTP/SMTP) timeout
L7RSP -> layer 7 invalid response - protocol error
L7STS -> layer 7 response error, for example HTTP 5xx
In TCP, we don't want to forward chunks of data, we want to forward
indefinitely. This patch introduces a special value for the amount
of data to be forwarded. When buffer_forward() is called with
BUF_INFINITE_FORWARD, it configures the buffer to never stop
forwarding until the end.
The BF_EMPTY flag was once used to indicate an empty buffer. However,
it was used half the time as meaning the buffer is empty for the reader,
and half the time as meaning there is nothing left to send.
"nothing to send" is only indicated by "->send_max=0 && !pipe". Once
we fix this, we discover that the flag is not used anymore. So the
flags has been renamed BF_OUT_EMPTY and means exactly the condition
above, ie, there is nothing to send.
Doing so has allowed us to remove some unused tests for emptiness,
but also to uncover a certain amount of situations where the flag
was not correctly set or tested.
The BF_WRITE_ENA buffer flag became very complex to deal with, because
it was used to :
- enable automatic connection
- enable close forwarding
- enable data forwarding
The last point was not very true anymore since we introduced ->send_max,
but still the test remained everywhere. This was causing issues such as
impossibility to connect without forwarding data, impossibility to prevent
closing when data was forwarded, etc...
This patch clarifies the situation by getting rid of this multi-purpose
flag and replacing it with :
- data forwarding based only on ->send_max || ->pipe ;
- a new BF_AUTO_CONNECT flag to allow automatic connection and only
that ;
- ability to perform an automatic connection when ->send_max or ->pipe
indicate that data is waiting to leave the buffer ;
- a new BF_AUTO_CLOSE flag to let the producer automatically set the
BF_SHUTW_NOW flag when it gets a BF_SHUTR.
During this cleanup, it was discovered that some tests were performed
twice, or that the BF_HIJACK flag was still tested, which is not needed
anymore since ->send_max replcaed it. These places have been fixed too.
These cleanups have also revealed a few areas where the other flags
such as BF_EMPTY are not cleanly used. This will be an opportunity for
a second patch.
By inlining this function and slightly reordering it, we can double
the getchar/putchar test throughput, and reduce its footprint by about
40 bytes. Also, it was the only non-inlined char-based function, which
now makes it more consistent this time.
This function is used to cut the "tail" of a buffer, which means strip it
to the length of unsent data only, and kill any remaining unsent data. Any
scheduled forwarding is stopped. This is mainly to be used to send error
messages after existing data. It does the same as buffer_erase() for buffers
without pending outgoing data.
The computations in buffer_forward() were only valid if buffer_forward()
was used on a buffer which had no more data scheduled for forwarding.
This is always the case right now so this bug is not yet triggered but
it will soon be. Now we correctly discount the bytes to be forwarded
from the data already present in the buffer.
This function works like a traditional putchar() except that it
can return 0 if the output buffer is full.
Now a basic character-based echo function would look like this, from
a stream interface :
while (1) {
c = buffer_si_peekchar(req);
if (c < 0)
break;
if (!buffer_si_putchar(res, c)) {
si->flags |= SI_FL_WAIT_ROOM;
break;
}
buffer_skip(req, 1);
req->flags |= BF_WRITE_PARTIAL;
res->flags |= BF_READ_PARTIAL;
}
The buffer_si_peekline() function is sort of a fgets() to be used from a
stream interface. It returns a complete line whenever possible, and does
not update the buffer's pointer, so that the reader is free to consume
what it wants to.
buffer_si_peekchar() only returns one character, and also needs a call
to buffer_skip() once the character is definitely consumed.
This functions act like their buffer_write*() counter-parts,
except that they're specifically designed to be used from a
stream interface handler, as they carefully check size limits
and automatically advance the read pointer depending on the
to_forward attribute.
buffer_feed_chunk() is an inline calling buffer_feed() as both
are the sames. For this reason, buffer_write_chunk() has also
been turned into an inline which calls buffer_write().
buffer_contig_space(), buffer_contig_data() and buffer_skip()
provide easy methods to extract/insert data from/into a buffer.
buffer_write() and buffer_write_chunk() currently do not check
max_len nor to_forward, so they will quickly become embarrassing
to use or will need an equivalent. The reason is that they are
used to build error messages which currently are not subject to
analysis.
The first step towards dynamic buffer size consists in removing
all static definitions of the buffer size. Instead, we store a
buffer's size in itself. Right now they're all preinitialized
to BUFSIZE, but we will change that.
The remains of the stats socket code has nothing to do in proto_uxst
anymore and must move to dumpstats. The code is much cleaner and more
structured. It was also an opportunity to rename AN_REQ_UNIX_STATS
as AN_REQ_STATS_SOCK as the stats socket is no longer unix-specific
either.
The last item refering to stats in proto_uxst is the setting of the
task's nice value which should in fact come from the listener.
process_session() is now ready to handle unix stats sockets. This
first step works and old code has not been removed. A cleanup is
required. The stats handler is not unix socket-centric anymore and
should move to dumpstats.c.
The connection establishment was completely handled by backend.c which
normally just handles LB algos. Since it's purely TCP, it must move to
proto_tcp.c. Also, instead of calling it directly, we now call it via
the stream interface, which will later help us unify session handling.
The new statement "persist rdp-cookie" enables RDP cookie
persistence. The RDP cookie is then extracted from the RDP
protocol, and compared against available servers. If a server
matches the RDP cookie, then it gets the connection.
The RDP protocol is quite simple and documented, which permits
an easy detection and extraction of cookies. It can be useful
to match the MSTS cookie which can contain the username specified
by the client.
The HTTP processing has been splitted into 7 steps, one of which
is not anymore HTTP-specific (content-switching). That way, it
becomes possible to use "use_backend" rules in TCP mode. A new
"use_server" directive should follow soon.
Some stream analysers might become generic enough to be called
for several bits. So we cannot have the analyser bit hard coded
into the analyser itself. Let's make the caller inform the callee.
We want to split several steps in HTTP processing so that
we can call individual analysers depending on what processing
we want to perform. The first step consists in splitting the
part that waits for a request from the rest.
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
Some users are already hitting the 64k source port limit when
connecting to servers. The system usually maintains a list of
unused source ports, regardless of the source IP they're bound
to. So in order to go beyond the 64k concurrent connections, we
have to manage the source ip:port lists ourselves.
The solution consists in assigning a source port range to each
server and use a free port in that range when connecting to that
server, either for a proxied connection or for a health check.
The port must then be put back into the server's range when the
connection is closed.
This mechanism is used only when a port range is specified on
a server. It makes it possible to reach 64k connections per
server, possibly all from the same IP address. Right now it
should be more than enough even for huge deployments.
Some users want to keep the max sessions/s seen on servers, frontends
and backends for capacity planning. It's easy to grab it while the
session count is updated, so let's keep it.
These functions will be used to deliver asynchronous signals in order
to make the signal handling functions more robust. The goal is to keep
the same interface to signal handlers.
It's useful to be able to accept an invalid header name in a request
or response but still be able to monitor further such errors. Now,
when an invalid request/response is received and accepted due to
an "accept-invalid-http-{request|response}" option, the invalid
request will be captured for later analysis with "show errors" on
the stats socket.
It's sometimes useful at least for statistics to keep a task count.
It's easy to do by forcing the rare task creators to always use the
same functions to create/destroy a task.
The top of a duplicate tree is not where bit == -1 but at the most
negative bit. This was causing tasks to be queued in reverse order
within duplicates. While this is not dramatic, it's incorrect and
might lead to longer than expected duplicate depths under some
circumstances.
Since we're now able to search from a precise expiration date in
the timer tree using ebtree 4.1, we don't need to maintain 4 trees
anymore. Not only does this simplify the code a lot, but it also
ensures that we can always look 24 days back and ahead, which
doubles the ability of the previous scheduler. Indeed, while based
on absolute values, the timer tree is now relative to <now> as we
can always search from <now>-31 bits.
The run queue uses the exact same principle now, and is now simpler
and a bit faster to process. With these changes alone, an overall
0.5% performance gain was observed.
Tests were performed on the few wrapping cases and everything works
as expected.
In order to get termination flags properly updated, the session was
relying a bit too much on http_return_srv_error() which is http-centric.
A generic srv_error function was implemented in the session in order to
catch all connection abort situations. It was then noticed that a request
abort during a connection attempt was not reported, which is now fixed.
Read and write errors/timeouts were not logged either. It was necessary
to add those tests at 4 new locations.
Now it looks like everything is correctly logged. Most likely some error
checking code could now be removed from some analysers.
Most of the time, task_queue() will immediately return. By extracting
the preliminary checks and putting them in an inline function, we can
significantly reduce the number of calls to the function itself, and
most of the tests can be optimized away due to the caller's context.
Another minor improvement in process_runnable_tasks() consisted in
taking benefit from the processor's branch prediction unit by making
a special case of the process_session() callback which is by far the
most common one.
All this improved performance by about 1%, mainly during the call
from process_runnable_tasks().
Timers are unsigned and used as tree positions. Ticks are signed and
used as absolute date within current time frame. While the two are
normally equal (except zero), it's important not to confuse them in
the code as they are not interchangeable.
We add two inline functions to turn each one into the other.
The comments have also been moved to the proper location, as it was
not easy to understand what was a tick and what was a timer unit.
All the tasks callbacks had to requeue the task themselves, and update
a global timeout. This was not convenient at all. Now the API has been
simplified. The tasks callbacks only have to update their expire timer,
and return either a pointer to the task or NULL if the task has been
deleted. The scheduler will take care of requeuing the task at the
proper place in the wait queue.
In many situations, we wake a task on an I/O event, then queue it
exactly where it was. This is a real waste because we delete/insert
tasks into the wait queue for nothing. The only reason for this is
that there was only one tree node in the task struct.
By adding another tree node, we can have one tree for the timers
(wait queue) and one tree for the priority (run queue). That way,
we can have a task both in the run queue and wait queue at the
same time. The wait queue now really holds timers, which is what
it was designed for.
The net gain is at least 1 delete/insert cycle per session, and up
to 2-3 depending on the workload, since we save one cycle each time
the expiration date is not changed during a wake up.
The rate-limit was applied to the smoothed value which does a special
case for frequencies below 2 events per period. This caused irregular
limitations when set to 1 session per second.
The proper way to handle this is to compute the number of remaining
events that can occur without reaching the limit. This is what has
been added. It also has the benefit that the frequency calculation
is now done once when entering event_accept(), before the accept()
loop, and not once per accept() loop anymore, thus saving a few CPU
cycles during very high loads.
With this fix, rate limits of 1/s are perfectly respected.
With this change, all frontends, backends, and servers maintain a session
counter and a timer to compute a session rate over the last second. This
value will be very useful because it varies instantly and can be used to
check thresholds. This value is also reported in the stats in a new "rate"
column.
The new "show errors" command sent on a unix socket will dump
all captured request and response errors for all proxies. It is
also possible to bound the log to frontends and backends whose
ID is passed as an optional parameter.
The output provides information about frontend, backend, server,
session ID, source address, error type, and error position along
with a complete dump of the request or response which has caused
the error.
If a new error scratches the one currently being reported, then
the dump is aborted with a warning message, and processing goes
on to next error.
Using pipe pools makes pipe management a lot easier. It also allows to
remove quite a bunch of #ifdefs in areas which depended on the presence
or not of support for kernel splicing.
The buffer now holds a pointer to a pipe structure which is always NULL
except if there are still data in the pipe. When it needs to use that
pipe, it dynamically allocates it from the pipe pool. When the data is
consumed, the pipe is immediately released.
That way, there is no need anymore to care about pipe closure upon
session termination, nor about pipe creation when trying to use
splice().
Another immediate advantage of this method is that it considerably
reduces the number of pipes needed to use splice(). Tests have shown
that even with 0.2 pipe per connection, almost all sessions can use
splice(), because the same pipe may be used by several consecutive
calls to splice().
A new data type has been added : pipes. Some pre-allocated empty pipes
are maintained in a pool for users such as splice which use them a lot
for very short times.
Pipes are allocated using get_pipe() and released using put_pipe().
Pipes which are released with pending data are immediately killed.
The struct pipe is small (16 to 20 bytes) and may even be further
reduced by unifying ->data and ->next.
It would be nice to have a dedicated cleanup task which would watch
for the pipes usage and destroy a few of them from time to time.
When CONFIG_HAP_LINUX_SPLICE is defined, the buffer structure will be
slightly enlarged to support information needed for kernel splicing
on Linux.
A first attempt consisted in putting this information into the stream
interface, but in the long term, it appeared really awkward. This
version puts the information into the buffer. The platform-dependant
part is conditionally added and will only enlarge the buffers when
compiled in.
One new flag has also been added to the buffers: BF_KERN_SPLICING.
It indicates that the application considers it is appropriate to
use splicing to forward remaining data.
In the buffers, the read limit used to leave some place for header
rewriting was set by a pointer to the end of the buffer. Not only
this required subtracts at every place in the code, but this will
also soon not be usable anymore when we want to support keepalive.
Let's replace this with a length limit, comparable to the buffer's
length. This has also sightly reduced the code size.
The way the buffers and stream interfaces handled ->to_forward was
really not handy for multiple reasons. Now we've moved its control
to the receive-side of the buffer, which is also responsible for
keeping send_max up to date. This makes more sense as it now becomes
possible to send some pre-formatted data followed by forwarded data.
The following explanation has also been added to buffer.h to clarify
the situation. Right now, tests show that the I/O is behaving extremely
well. Some work will have to be done to adapt existing splice code
though.
/* Note about the buffer structure
The buffer contains two length indicators, one to_forward counter and one
send_max limit. First, it must be understood that the buffer is in fact
split in two parts :
- the visible data (->data, for ->l bytes)
- the invisible data, typically in kernel buffers forwarded directly from
the source stream sock to the destination stream sock (->splice_len
bytes). Those are used only during forward.
In order not to mix data streams, the producer may only feed the invisible
data with data to forward, and only when the visible buffer is empty. The
consumer may not always be able to feed the invisible buffer due to platform
limitations (lack of kernel support).
Conversely, the consumer must always take data from the invisible data first
before ever considering visible data. There is no limit to the size of data
to consume from the invisible buffer, as platform-specific implementations
will rarely leave enough control on this. So any byte fed into the invisible
buffer is expected to reach the destination file descriptor, by any means.
However, it's the consumer's responsibility to ensure that the invisible
data has been entirely consumed before consuming visible data. This must be
reflected by ->splice_len. This is very important as this and only this can
ensure strict ordering of data between buffers.
The producer is responsible for decreasing ->to_forward and increasing
->send_max. The ->to_forward parameter indicates how many bytes may be fed
into either data buffer without waking the parent up. The ->send_max
parameter says how many bytes may be read from the visible buffer. Thus it
may never exceed ->l. This parameter is updated by any buffer_write() as
well as any data forwarded through the visible buffer.
The consumer is responsible for decreasing ->send_max when it sends data
from the visible buffer, and ->splice_len when it sends data from the
invisible buffer.
A real-world example consists in part in an HTTP response waiting in a
buffer to be forwarded. We know the header length (300) and the amount of
data to forward (content-length=9000). The buffer already contains 1000
bytes of data after the 300 bytes of headers. Thus the caller will set
->send_max to 300 indicating that it explicitly wants to send those data,
and set ->to_forward to 9000 (content-length). This value must be normalised
immediately after updating ->to_forward : since there are already 1300 bytes
in the buffer, 300 of which are already counted in ->send_max, and that size
is smaller than ->to_forward, we must update ->send_max to 1300 to flush the
whole buffer, and reduce ->to_forward to 8000. After that, the producer may
try to feed the additional data through the invisible buffer using a
platform-specific method such as splice().
*/
In preparation of splice support, let's add the splice_len member
to the buffer struct. An earlier implementation made it conditional,
which made the whole logics very complex due to a large number of
ifdefs.
Now BF_EMPTY is only set once both buf->l and buf->splice_len are
null. Splice_len is initialized to zero during buffer creation and
is currently not changed, so the whole logics remains unaffected.
When splice gets merged, splice_len will reflect the number of bytes
in flight out of the buffer but not yet sent, typically in a pipe for
the Linux case.
If an analyser sets buf->to_forward to a given value, that many
data will be forwarded between the two stream interfaces attached
to a buffer without waking the task up. The same applies once all
analysers have been released. This saves a large amount of calls
to process_session() and a number of task_dequeue/queue.
By letting the producer tell the consumer there is data to check,
and the consumer tell the producer there is some space left again,
we can cut in half the number of session wakeups.
This is also an important starting point for future splicing support.
Sometimes we don't care about a read timeout, for instance, from the
client when waiting for the server, but we still want the client to
be able to read.
Till now it was done by articially forcing the read timeout to ETERNITY.
But this will cause trouble when we want the low level stream sock to
communicate without waking the session up. So we add a BF_READ_NOEXP
flag to indicate that when the read timeout is to be set, it might
have to be set to ETERNITY.
Since BF_READ_ENA was not used, we replaced this flag.
We don't want to report a buffer timeout if there was I/O activity
for the same events. That way we'll not have to always re-arm timeouts
on I/O, without the fear of a timeout triggering too fast.
For keep-alive, line-mode protocols and splicing, we will need to
limit the sender to process a certain amount of bytes. The limit
is automatically set to the buffer size when analysers are detached
from the buffer.
Kai Krueger found that previous patch was incomplete, because there is
an unconditionnal call to process_srv_queue() in session_free() which
still causes a dead server to consume pending connections from the
backend.
This call was made unconditionnal so that we don't leave unserved
connections in the server queue, for instance connections coming
in with "option persist" which can bypass the server status check.
However, the server must not touch the backend's queue if it is down.
Another fear was that some connections might remain unserved when
the server is using a dynamic maxconn if the number of connections
to the backend is too low. Right now, srv_dynamic_maxconn() ensures
this cannot happen, so the call can remain conditionnal.
The fix consists in allowing a server to process it own queue whatever
its state, but not to touch the backend's queue if it is down. Its
queue should normally be empty when the server is down because it is
redistributed when the server goes down. The only remaining cases are
precisely the persistent connections with "option persist" set, coming
in after the queue has been redispatched. Those ones must still be
processed when a connection terminates.
(cherry picked from commit cd485c4480)
Kai Krueger reported a problem when a server goes down with active
connections. A lot of connections were drained by that server. Kai
did an amazing job at tracking this bug down to the dequeuing
mechanism which forgets to check the server state before allowing
a request to be sent to a server.
The problem occurs more often with long requests, which have a chance
to complete after the server is completely marked down, and to find
requests in the global queue which have not yet been fetched by other
servers.
The fix consists in ensuring that a server is up before sending it
any new request from the queue.
(cherry picked from commit 80b286a064)
(cherry picked from commit 2e5e0d2853f059a1d09dc81fdbbad9fd03124a98)
There is a problem when an instance is marked "disabled". Its ports are
still bound but will not be unbound upon termination. This causes processes
to accumulate during soft restarts, and might even cause failures to restart
new ones due to the inability to bind to the same port.
The ideal solution would be to bind all ports at the end of the configuration
parsing. An acceptable workaround is to unbind all listeners of disabled
proxies. This is what the current patch does.
(cherry picked from commit a944218e9c)
(cherry picked from commit 8cfebbb82b87345bade831920177077e7d25840a)
It is now possible to list all known sessions by issuing "show sess"
on the unix stats socket. The format is not much evolved but it is
very useful for debugging.
The doc has been updated to reflect the new keyword.
Both should process the response buffer equally. They now both
clear the hijack bit once done, and both receive a pointer to
the response buffer in their arguments.
Instead of calling a hard-coded function to produce data, let's
reference this function into the buffer and call it from there
when BF_HIJACK is set. This goes in the direction of more generic
session management code.
The unix protocol handler had not been updated during the last
stream_sock changes. This has been done now. There is still a
lot of duplicated code between session.c and proto_uxst.c due
to the way the session is handled. Session.c relies on the existence
of a frontend while it does not exist here.
It is easier to see the difference between the stats part (placed
in dumpstats.c) and the unix-stream part (in proto_uxst.c).
The hijacking function still needs to be dynamically set into the
response buffer, and some cleanup is still required, then all those
changes should be forward-ported to the HTTP part. Adding support
for new keywords should not cause trouble now.
The TCP analyser has moved to proto_tcp.c. Breaking the function
has required finer use of the return value and adding some tests
to process_session().
It was a bit awkward to have session.c call return_srv_error() for
HTTP error messages related to servers. The function has been adapted
to be passed a pointer to the faulty stream interface, and is now a
pointer in the session. It is possible that in the future, it will
become a callback in the stream interface itself.
The new function looks like the previous one except that it operates
at the stream interface level and assumes an already closed SI.
Also remove some old unused occurrences of srv_close_with_err().
proto_http.c was not suitable for session-related processing, it was
just convenient for the tranformation.
Some more splitting must occur: process_request/response in proto_http.c
must be split again per protocol, and the caller must run a list.
Some functions should be directly attached to the session or the buffer
(eg: perform_http_redirect, return_srv_error, http_sess_log).
All the processing has now completely been split in layers. As of
now, everything is still in process_session() which is not the right
place, but the code sequence works. Timeouts, retries, errors, all
work.
The shutdown sequence has been strictly applied: BF_SHUTR/BF_SHUTW
are only assigned by lower layers. Upper layers can only indicate
their wish to close using BF_SHUTR_NOW and BF_SHUTW_NOW.
When a shutdown is performed on a stream interface, the buffer flags
are updated accordingly and re-checked by upper layers. A lot of care
has been taken to ensure that aborts during intermediate connection
setups are correctly handled and shutdowns correctly propagated to
both buffers.
A future evolution would consist in ensuring that BF_SHUT?_NOW may
be set at any time, and applies only when the buffer is empty. This
might help with error messages, but might complicate the processing
of data remaining in buffers.
Some useless buffer flag combinations have been removed.
Stat counters are still broken (eg: per-server total number of sessions).
Error messages should be delayed to the close instant and be produced by
protocol.
Many functions must now move to proper locations.
Now the global variable 'sessions' will be a dual-linked list of all
known sessions. The list element is set at the beginning of the session
so that it's easier to follow them all with gdb.
Two new functions are used instead : buffer_check_{shutr,shutw}.
It is indeed more adequate to check for new closures only when the
buffer reports them.
Several remaining unclosed connections were detected after a test,
even before this patch, so a bug remains. To reproduce, try the
following during 30 seconds :
inject30l4 -n 20000 -l -t 1000 -P 10 -o 4 -u 100 -s 100 -G 127.0.0.1:8000/
Tracking connection status changes was hard, and some code was
redundant. A new SI_ST_CER state was added to the stream interface
to indicate a past connection error, and an SI_FL_ERR flag was
added to report past I/O error. The stream_sock code does not set
the connection to SI_ST_CLO anymore in case of I/O error, it's
the upper layer which does it. This makes it possible to know
exactly when the file descriptors are allocated.
The new SI_ST_CER state permitted to split tcp_connection_status()
in two parts, one processing SI_ST_CON and the other one SI_ST_CER.
Synchronous connection errors now make use of this last state, hence
eliminating duplicate code.
Some ib<->ob copy paste errors were found and fixed, and all entities
setting SI_ST_CLO also shut the buffers down.
Some of these stream_interface specific functions and structures
have migrated to a new stream_interface.c file.
Some types of errors are still not detected by the buffers. For
instance, let's assume the following scenario in one single pass
of process_session: a connection sits in SI_ST_TAR state during
a retry. At TAR expiration, a new connection attempt is made, the
connection is obtained and srv->cur_sess is increased. Then the
buffer timeout is fires and everything is cleared, the new state
becomes SI_ST_CLO. The cleaning code checks that previous state
was either SI_ST_CON or SI_ST_EST to release the connection. But
that's wrong because last state is still SI_ST_TAR. So the
server's connection count does not get decreased.
This means that prev_state must not be used, and must be replaced
by some transition detection instead of level detection.
The following debugging line was useful to track state changes :
fprintf(stderr, "%s:%d: cs=%d ss=%d(%d) rqf=0x%08x rpf=0x%08x\n", __FUNCTION__, __LINE__,
s->si[0].state, s->si[1].state, s->si[1].err_type, s->req->flags, s-> rep->flags);
The connection setup code has been refactored in order to
make it run only on low level (stream interface). Several
complicated functions have been removed from backend.c,
and we now have sess_update_stream_int() to manage
an assigned connection, sess_prepare_conn_req() to assign a
server to a connection request, perform_http_redirect() to
redirect instead of connecting to server, and return_srv_error()
to return connection error status messages.
The stream_interface status changes are checked before adjusting
buffer flags, so that the buffers can be informed about this lower
level update.
A new connection is initiated by changing si->state from SI_ST_INI
to SI_ST_REQ.
The code seems to work but is awfully dirty. Some functions need
to be moved, and the layering is not yet quite clear.
A lot of dead old code has simply been removed.
Those entries were really needed for cleaner and better code. Using them
has permitted to automatically close a file descriptor during a shut write,
reducing by 20% the number of calls to process_session() and derived
functions.
Process_session() does not need to know the file descriptor anymore, though
it still remains very complicated due to the special case for the connect
mode.
As of now, a stream socket does not directly wake up the task
but it does contact the stream interface which itself knows the
task. This allows us to perform a few cleanups upon errors and
shutdowns, which reduces the number of calls to data_update()
from 8 per session to 2 per session, and make all the functions
called in the process_session() loop completely swappable.
Some improvements are required. We need to provide a shutw()
function on stream interfaces so that one side which closes
its read part on an empty buffer can propagate the close to
the remote side.
It's very frequent to require some information about the
reason why a task is running. Some flags have been added
so that a task now knows if it got woken up due to I/O
completion, timeout, etc...
A test has shown that more than 16% of the calls to task_wakeup()
could be avoided because the task is already woken up. So make it
inline and move the test to the inline part.
The buffer flags became a big bazaar. Re-arrange them
so that their names are more explicit and so that they
are more easily readable in hex form. Some aggregates
have also been adjusted.
It was a waste to constantly update the file descriptor's status
and timeouts during a flags update. So stream_sock_process_data
has been slit in two parts :
stream_sock_data_update() => computes updated flags
stream_sock_data_finish() => computes timeouts
Only the first one is called during flag updates. The second one
is only called upon completion. The number of calls to fd_set/fd_clr
has now significantly dropped.
Also, it's useless to check for errors and timeouts in the
process_session() loop, it's enough to check for them at the
beginning.
srv_state has been removed from HTTP state machines, and states
have been split in either TCP states or analyzers. For instance,
the TARPIT state has just become a simple analyzer.
New flags have been added to the struct buffer to compensate this.
The high-level stream processors sometimes need to force a disconnection
without touching a file-descriptor (eg: report an error). But if
they touched BF_SHUTW or BF_SHUTR, the file descriptor would not
be closed. Thus, the two SHUT?_NOW flags have been added so that
an application can request a forced close which the stream interface
will be forced to obey.
During this change, a new BF_HIJACK flag was added. It will
be used for data generation, eg during a stats dump. It
prevents the producer on a buffer from sending data into it.
BF_SHUTR_NOW /* the producer must shut down for reads ASAP */
BF_SHUTW_NOW /* the consumer must shut down for writes ASAP */
BF_HIJACK /* the producer is temporarily replaced */
BF_SHUTW_NOW has precedence over BF_HIJACK. BF_HIJACK has
precedence over BF_MAY_FORWARD (so that it does not need it).
New functions buffer_shutr_now(), buffer_shutw_now(), buffer_abort()
are provided to manipulate BF_SHUT* flags.
A new type "stream_interface" has been added to describe both
sides of a buffer. A stream interface has states and error
reporting. The session now has two stream interfaces (one per
side). Each buffer has stream_interface pointers to both
consumer and producer sides.
The server-side file descriptor has moved to its stream interface,
so that even the buffer has access to it.
process_srv() has been split into three parts :
- tcp_get_connection() obtains a connection to the server
- tcp_connection_failed() tests if a previously attempted
connection has succeeded or not.
- process_srv_data() only manages the data phase, and in
this sense should be roughly equivalent to process_cli.
Little code has been removed, and a lot of old code has been
left in comments for now.
It is not always convenient to run checks on req->l in functions to
check if a buffer is empty or full. Now the stream_sock functions
set flags BF_EMPTY and BF_FULL according to the buffer contents. Of
course, functions which touch the buffer contents adjust the flags
too.