The registerable http_req_rules / http_res_rules used to require a
struct http_txn at the end. It's redundant with struct stream and
propagates very deep into some parts (ie: it was the reason for lua
requiring l7). Let's remove it now.
All of them can now retrieve the HTTP transaction *if it exists* from
the stream and be sure to get NULL there when called with an embryonic
session.
The patch is a bit large because many locations were touched (all fetch
functions had to have their prototype adjusted). The opportunity was
taken to also uniformize the call names (the stream is now always "strm"
instead of "l4") and to fix indent where it was broken. This way when
we later introduce the session here there will be less confusion.
Now this one is dynamically allocated. It means that 280 bytes of memory
are saved per TCP stream, but more importantly that it will become
possible to remove the l7 pointer from fetches and converters since
it will be deduced from the stream and will support being null.
A lot of care was taken because it's easy to forget a test somewhere,
and the previous code used to always trust s->txn for being valid, but
all places seem to have been visited.
All HTTP fetch functions check the txn first so we shouldn't have any
issue there even when called from TCP. When branching from a TCP frontend
to an HTTP backend, the txn is properly allocated at the same time as the
hdr_idx.
The header captures are now general purpose captures since tcp rules
can use them to capture various contents. That removes a dependency
on http_txn that appeared in some sample fetch functions and in the
order by which captures and http_txn were allocated.
Interestingly the reset of the header captures were done at too many
places as http_init_txn() used to do it while it was done previously
in every call place.
Just like for the listener, the frontend is session-wide so let's move
it to the session. There are a lot of places which were changed but the
changes are minimal in fact.
There is now a pointer to the session in the stream, which is NULL
for now. The session pool is created as well. Some parts will move
from the stream to the session now.
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
This library is designed to emit a zlib-compatible stream with no
memory usage and to favor resource savings over compression ratio.
While zlib requires 256 kB of RAM per compression context (and can only
support 4000 connections per GB of RAM), the stateless compression
offered by libslz does not need to retain buffers between subsequent
calls. In theory this slightly reduces the compression ratio but in
practice it does not have that much of an effect since the zlib
window is limited to 32kB.
Libslz is available at :
http://git.1wt.eu/web?p=libslz.git
It was designed for web compression and provides a lot of savings
over zlib in haproxy. Here are the preliminary results on a single
core of a core2-quad 3.0 GHz in 32-bit for only 300 concurrent
sessions visiting the home page of www.haproxy.org (76 kB) with
the default 16kB buffers :
BW In BW Out BW Saved Ratio memory VSZ/RSS
zlib 237 Mbps 92 Mbps 145 Mbps 2.58 84M / 69M
slz 733 Mbps 380 Mbps 353 Mbps 1.93 5.9M / 4.2M
So while the compression ratio is lower, the bandwidth savings are
much more important due to the significantly lower compression cost
which allows to consume even more data from the servers. In the
example above, zlib became the bottleneck at 24% of the output
bandwidth. Also the difference in memory usage is obvious.
More tests run on a single core of a core i5-3320M, with 500 concurrent
users and the default 16kB buffers :
At 100% CPU (no limit) :
BW In BW Out BW Saved Ratio memory VSZ/RSS hits/s
zlib 480 Mbps 188 Mbps 292 Mbps 2.55 130M / 101M 744
slz 1700 Mbps 810 Mbps 890 Mbps 2.10 23.7M / 9.7M 2382
At 85% CPU (limited) :
BW In BW Out BW Saved Ratio memory VSZ/RSS hits/s
zlib 1240 Mbps 976 Mbps 264 Mbps 1.27 130M / 100M 1738
slz 1600 Mbps 976 Mbps 624 Mbps 1.64 23.7M / 9.7M 2210
The most important benefit really happens when the CPU usage is
limited by "maxcompcpuusage" or the BW limited by "maxcomprate" :
in order to preserve resources, haproxy throttles the compression
ratio until usage is within limits. Since slz is much cheaper, the
average compression ratio is much higher and the input bandwidth
is quite higher for one Gbps output.
Other tests made with some reference files :
BW In BW Out BW Saved Ratio hits/s
daniels.html zlib 1320 Mbps 163 Mbps 1157 Mbps 8.10 1925
slz 3600 Mbps 580 Mbps 3020 Mbps 6.20 5300
tv.com/listing zlib 980 Mbps 124 Mbps 856 Mbps 7.90 310
slz 3300 Mbps 553 Mbps 2747 Mbps 5.97 1100
jquery.min.js zlib 430 Mbps 180 Mbps 250 Mbps 2.39 547
slz 1470 Mbps 764 Mbps 706 Mbps 1.92 1815
bootstrap.min.css zlib 790 Mbps 165 Mbps 625 Mbps 4.79 777
slz 2450 Mbps 650 Mbps 1800 Mbps 3.77 2400
So on top of saving a lot of memory, slz is constantly 2.5-3.5 times
faster than zlib and results in providing more savings for a fixed CPU
usage. For links smaller than 100 Mbps, zlib still provides a better
compression ratio, at the expense of a much higher CPU usage.
Larger input files provide slightly higher bandwidth for both libs, at
the expense of a bit more memory usage for zlib (it converges to 256kB
per connection).
This function used to take a zlib-specific flag as argument to indicate
whether a buffer flush or end of contents was met, let's split it in two
so that we don't depend on zlib anymore.
Thanks to MSIE/IIS, the "deflate" name is ambigous. According to the RFC
it's a zlib-wrapped deflate stream, but IIS used to send only a raw deflate
stream, which is the only format MSIE understands for "deflate". The other
widely used browsers do support both formats. For this reason some people
prefer to emit a raw deflate stream on "deflate" to serve more users even
it that means violating the standards. Haproxy only follows the standard,
so they cannot do this.
This patch makes it possible to have one algorithm name in the configuration
and another one in the protocol. This will make it possible to have a new
configuration token to add a different algorithm so that users can decide if
they want a raw deflate or the standard one.
This class is accessible via the TXN object. It is created only if
the attached proxy have HTTP mode. It contain all the HTTP
manipulation functions:
- req_get_headers
- req_del_header
- req_rep_header
- req_rep_value
- req_add_header
- req_set_header
- req_set_method
- req_set_path
- req_set_query
- req_set_uri
- res_get_headers
- res_del_header
- res_rep_header
- res_rep_value
- res_add_header
- res_set_header
This will be useful later to state that some listeners have to use
certain decoders (typically an HTTP/2 decoder) regardless of the
regular processing applied to other listeners. For now it simply
defaults to the frontend's default target, and it is used by the
session.
Listerner->timeout is a vestigal thing going back to 2007 or so. It
used to only be used by stats and peers frontends to hold a pointer
to the proxy's client timeout. Now that we use regular frontends, we
don't use it anymore.
Some services such as peers and CLI pre-set the target applet immediately
during accept(), and for this reason they're forced to have a dedicated
accept() function which does not even properly follow everything the regular
one does (eg: sndbuf/rcvbuf/linger/nodelay are not set, etc).
Let's store the default target when known into the frontend's config so that
it's session_accept() which automatically sets it.
The default value is stored in a special struct that describe the
map. This default value is parsed with special parser. This is
useless because HAProxy provides a space to store the default
value (the args) and HAProxy provides also standard parser for
the input types (args again).
This patch remove this special storage and replace it by an argument.
In other way the args of maps are declared as the expected type in
place of strings.
This struct used to carry only a sample fetch function. Thanks to
lua_pushuserdata(), we don't need to have the Lua engine allocate
that struct for us and we can simply push our pointer onto the stack.
This makes the code more readable by removing several occurrences of
"f->f->". Just like the previous patch, it comes with the nice effect
of saving about 1.3% of performance when fetching samples from Lua.
Since last cleanups, this one was only used to carry a struct channel.
Removing it makes the code a bit cleaner (no more chn->chn) and easier
to follow (no more abstraction for a common type). Interestingly it
happens to also make the Lua code slightly faster (about 1.5%) when
using channels, probably thanks to less pointer dereferences and maybe
the use of lua_pushlightuserdata().
This new flag "SI_FL_ISBACK" is set only on the back SI and is cleared
on the front SI. That way it's possible only by looking at the SI to
know what side it is.
The channels were pointers to outside structs and this is not needed
anymore since the buffers have moved, but this complicates operations.
Move them back into the session so that both channels and stream interfaces
are always allocated for a session. Some places (some early sample fetch
functions) used to validate that a channel was NULL prior to dereferencing
it. Now instead we check if chn->buf is NULL and we force it to remain NULL
until the channel is initialized.
In some cases we don't want to known if a fetch or converter
fails. We just want a valid string. After this patch, we
have two sets of fetches and two sets of converters. There are:
txn.f, txn.sf, txn.c, txn.sc. The version prefixed by 's' always
returns strings for any type, and returns an empty string in the
error case or when the data are not available. This is particularly
useful when manipulating headers or cookies.
This patch implements a wrapper to give access to the converters
in the Lua code. The converters are used with the transaction.
The automatically created function are prefixed by "conv_".
HAProxy proposes many sample fetches. It is possible that the
automatic registration of the sample fetches causes a collision
with an existing Lua function. This patch sets a namespace for
the sample fetches.
If we are writing in the request buffer, we are not waked up
when the data are forwarded because it is useles. The request
analyzers are waked up only when data is incoming. So, if the
request buffer is full, we set the WAKE_ON_WRITE flag.
Before this patch, each yield in a Lua action set a flags to be
waked up when some activity were detected on the response channel.
This behavior causes loop in the analyzer process.
This patch set the wake up on response buffer activity only if we
really want to be waked up on this activity.
This flag indicate that the current yield is returned by the Lua
execution task control. If this flag is set, the current task may
quit but will be set in the run queue to be re-executed immediatly.
This patch modify the "hlua_yieldk()" function, it adds an argument
that contain a field containing yield options.
This is used to ensure that the task doesn't become a zombie
when the Lua returns a yield. The yield wrapper ensure that an
timer used for waking up the task will be set.
The timer is reseted to TICK_ETERNITY if the Lua execution is
done.
This first patch permits to cofigure the Lua execution exipiration.
This expiration is configured but it is not yet avalaible, it will
be add in a future patch.
In the future, the lua execution must return scheduling informations.
We want more than one flag, so I convert an integer used with an
enum into an interer used as bitfield.
The channel class permits manipulation of channels. A channel is
an FIFO buffer between the client and the server. This class provides
function to read, write, forward, destroy and alter data between
the input and the ouput of the buffer.
This patch adds the TCP I/O functionnality. The class implemented
provides the same functions than the "lua socket" project. This
make network compatibility with another LUA project. The documentation
is located here:
http://w3.impa.br/~diego/software/luasocket/tcp.html
This version of sleep is based on a coroutine. A sleeping
task is started and a signal is registered. This sleep version
must disapear to be replaced by a version using the internal
timers.
This patch adds the browsing of all the HAProxy fetches and
create associated LUA functions. The HAProxy internal fetches
can be used in LUA trough the class "TXN".
Note that the symbols "-", "+" and "." in the name of current
sample fetch are rewrited as "_" in LUA because ".", "-" and "+"
are operators.
This class of functions permit to access to all the functions
associated with the transaction like http header, HAProxy internal
fetches, etc ...
This patch puts the skeleton of this class. The class will be
enhanced later.
This system permits to execute some lua function after than HAProxy
complete his initialisation. These functions are executed between
the end of the configuration parsing and check and the begin of the
scheduler.