The sock_ops I/O callbacks made use of an FD till now. This has become
inappropriate and the struct connection is much more useful. It also
fixes the race condition introduced by previous change.
The socket data layer code must only focus on moving data between a
socket and a buffer. We need a special stream interface handler to
update the stream interface and the file descriptor status.
At the moment the code works but suffers from a race condition caused
by its API : the read/write callbacks still make use of the fd instead
of using the connection. And when a double shutdown is performed, a call
to ->write() after ->read() processed an error results in dereferencing
a NULL fdtab[]->owner. This is only a temporary issue which doesn't need
to be fixed now since this will automatically go away when the functions
change to use the connection instead.
Use a single tcp_connect_probe() instead of tcp_connect_write() and
tcp_connect_read(). We call this one only when no data layer function
have been processed, so this is a fallback to test for completion of
a connection attempt.
With this done, we don't have the need for any direct I/O callback
anymore.
The function still relies on ->write() to wake the stream interface up,
so it's not finished.
This handshake handler must be independant, so move it away from
proto_tcp. It has a dedicated connection flag. It is tested before
I/O handlers and automatically removes the CO_FL_WAIT_L4_CONN flag
upon success.
It also sets the BF_WRITE_NULL flag on the stream interface and
stops the SI timeout. However it does not perform the task_wakeup(),
and relies on the data handler to do so for now. The SI wakeup will
have to be moved elsewhere anyway.
It's inappropriate to remove FD_POLL_IN and FD_POLL_OUT in the IO callback
handlers, first because they shouldn't care about this, and second because
it will make it harder to chain multiple callers.
So let's flush these flags only once for all in the connection handler.
Right now, the HUP and ERR flags are still flushed in each IO handler to
avoid multiple calls. This will probably have to be fixed later.
fdtab[].state was only used to know whether a connection was in progress
or an error was encountered. Instead we now use connection->flags to store
a flag for both. This way, connection management will be able to update the
connection status on I/O.
This test is present only in this poller as an optimization, but this
optimization adds some complexity to remove fdtab[].state. Let's get
rid of it for now.
In an attempt to get rid of fdtab[].state, and to move the relevant
parts to the connection struct, we remove the FD_STCLOSE state which
can easily be deduced from the <owner> pointer as there is a 1:1 match.
The correct spelling is "independent", not "independant". This patch
fixes the doc and the configuration parser to accept the correct form.
The config parser still allows the old naming for backwards compatibility.
I came across a couple of typos in configuration.txt and made this patch.
Also, there is an inconsistency between using the word analys/ze in
configuration.txt as well. However, I did not provide a patch for that.
-- Jamie Gloudon
[wt: won't fix the us/uk language mistakes, they'll always exist anyway]
This is used to enter values for stick tables. The most likely usage
is to set gpc0 for a specific IP address in order to block traffic
for abusers without having to reload. Since all data types are
supported, other usages are possible (eg: replace a users's assigned
server).
Right now we only support show/clear on a table. In order to introduce
the "set" keyword we need to get rid of the "show" boolean arg. There
is no functional change up to this commit.
Source addresses of non-TCP families were not correctly handled by
tcp_src_to_stktable_key() as it forgot to return NULL and instead left
the previous value in the stick-table buffer.
This bug is 1.5-specific and was introduced by commit 4f92d320 in 1.5-dev6
so it does not need any backport.
fdtab[].ev was only set in ev_sepoll. Unfortunately, some I/O handling
functions now rely on this, so depending on the polling mechanism, some
useless operations might have been performed, such as performing a useless
recv() when a HUP was reported.
This is a very old issue, the flags were only added to the fdtab and not
propagated into any poller. Then they were used in ev_sepoll which needed
them for the cache. It is unsure whether a backport to 1.4 is appropriate
or not.
Commit fa7e1025 (1.3.16-rc1) introduced a minor bug by comparing req->flags
with BF_READ_ERROR instead of checking for the bit. The result is that the
error message is always returned even in case of client error. This has no
real impact but this must be fixed.
It may be backported to 1.4 and 1.3.
If haproxy is built with support for USE_VSYSCALL_DLSYM, it's very
easy to check for KML availability. So let's enable it. Tests show
a small overall performance improvement around 1%. Other tests show
that the syscall overhead is divided by 4 on a Geode LX using this
method.
This one returns the concatenation of the first Host header entry with
the path. It can make content-switching rules easier, help with fighting
DDoS on certain URLs and improve shared caches efficiency.
Doing so allows us to support sticking on URL, URL's IP, URL's port and
path.
Both fetch functions should be improved to support an optional depth
allowing to stick to a server depending on just a few directory
components. This would help with portals, some prefetch-capable
caches and with outgoing connections using multiple internet links.
Default value for maxconn in the context of a proxy is 2000 and is
unrelated to any other value (like global ulimit-n or global
maxconn). Without an explicit a user may think that the default value
is either no limit or equal to the global maxconn value.
Commit 496aa0 fixed a design issue by adding an "unresolved" flag to the
ACL arguments. Unfortunately this unresolved flag was not set when building
the fake argument some ACL need when using an implicit argument pointing to
the local proxy.
Special thanks to Michael Kearey who reported the issue with a reproducer
and the commit introducing the bug.
Using posix_fadvise() it is possible to tell the system that we're
going to read a whole file at once. The kernel then doubles the
read-ahead size for this file. On Linux with an SSD, this has improved
cold-cache performance by around 20%. Hot-cache is not affected at all.
glibc-2.11 on x86_64 provides a machine-specific memchr() which is faster
than the generic C implementation by around 40%, so let's make it possible
to use it instead of the hand-coded version.
Otherwise we end up comparing the byte past the end, resulting
in duplicate values still being inserted into the tree even if
undesired.
This generally has low impact, though it can sometimes cause one new entry
to be added next to an existing one for stick tables, preventing the results
from being merged.
(cherry picked from commit 12e54ac493a91bb02064568f410592c2700d3933)
This version implements both 32 and 64 bit versions at once, it
avoids the need to have two separate output files. It also improves
efficiency on i386 platforms by adding a little bit of assembly where
gcc isn't efficient.
The destination address is purely a connection thing and not an fd thing.
It's also likely that later the address will be stored into the connection
and linked to by the SI.
struct fdinfo only keeps the pointer to the port range and the local port
for now. All of this also needs to move to the connection but before this
the release of the port range must move from fd_delete() to a new function
dedicated to the connection.
Commit 827aee91 merged in 1.5-dev5 introduced a regression causing
the srv pointer to be tested twice instead of srv then srv->cookie.
The result is that if a server has no cookie in prefix mode, haproxy
will crash when trying to modify it.
Such a config is very unlikely to happen, except maybe with a backup
server, which would cause haproxy to die with the last server in the
farm.
No backport is needed, only 1.5-dev was affected.
Released version 1.5-dev11 with the following main changes :
- BUG/MEDIUM: option forwardfor if-none doesn't work with some configurations
- BUG/MAJOR: trash must always be the size of a buffer
- DOC: fix minor regex example issue and improve doc on stats
- MINOR: stream_interface: add a pointer to the listener for TARG_TYPE_CLIENT
- MEDIUM: protocol: add a pointer to struct sock_ops to the listener struct
- MINOR: checks: add on-marked-up option
- MINOR: balance uri: added 'whole' parameter to include query string in hash calculation
- MEDIUM: stream_interface: remove the si->init
- MINOR: buffers: add a rewind function
- BUG/MAJOR: fix regression on content-based hashing and http-send-name-header
- MAJOR: http: stop using msg->sol outside the parsers
- CLEANUP: http: make it more obvious that msg->som is always null outside of chunks
- MEDIUM: http: get rid of msg->som which is not used anymore
- MEDIUM: http: msg->sov and msg->sol will never wrap
- BUG/MAJOR: checks: don't call set_server_status_* when no LB algo is set
- BUG/MINOR: stop connect timeout when connect succeeds
- REORG: move the send-proxy code to tcp_connect_write()
- REORG/MINOR: session: detect the TCP monitor checks at the protocol accept
- MINOR: stream_interface: introduce a new "struct connection" type
- REORG/MINOR: stream_interface: move si->fd to struct connection
- REORG/MEDIUM: stream_interface: move applet->state and private to connection
- MINOR: stream_interface: add a data channel close function
- MEDIUM: stream_interface: call si_data_close() before releasing the si
- MINOR: peers: use the socket layer operations from the peer instead of sock_raw
- BUG/MINOR: checks: expire on timeout.check if smaller than timeout.connect
- MINOR: add a new function call tracer for debugging purposes
- BUG/MINOR: perform_http_redirect also needs to rewind the buffer
- BUG/MAJOR: b_rew() must pass a signed offset to b_ptr()
- BUG/MEDIUM: register peer sync handler in the proper order
- BUG/MEDIUM: buffers: fix bi_putchr() to correctly advance the pointer
- BUG/MINOR: fix option httplog validation with TCP frontends
- BUG/MINOR: log: don't report logformat errors in backends
- REORG/MINOR: use dedicated proxy flags for the cookie handling
- BUG/MINOR: config: do not report twice the incompatibility between cookie and non-http
- MINOR: http: add support for "httponly" and "secure" cookie attributes
- BUG/MEDIUM: ensure that unresolved arguments are freed exactly once
- BUG/MINOR: commit 196729ef used wrong condition resulting in freeing constants
- MEDIUM: stats: add support for soft stop/soft start in the admin interface
- MEDIUM: stats: add the ability to kill sessions from the admin interface
- BUILD: add support for linux kernels >= 2.6.28
It was not possible to kill remaining sessions from the admin interface,
which is annoying especially when switching to maintenance mode. Now it's
possible.
This implements the feature discussed in the earlier thread of killing
connections on backup servers when a non-backup server comes back up. For
example, you can use this to route to a mysql master & slave and ensure
clients don't stay on the slave after the master goes from down->up. I've done
some minimal testing and it seems to work.
[WT: added session flag & doc, moved the killing after logging the server UP,
and ensured that the new server is really usable]
When passing arguments to ACLs and samples, some types are stored as
strings then resolved later after config parsing is done. Upon exit,
the arguments need to be freed only if the string was not resolved
yet. At the moment we can encounter double free during deinit()
because some arguments (eg: userlists) are freed once as their own
type and once as a string.
The solution consists in adding an "unresolved" flag to the args to
say whether the value is still held in the <str> part or is final.
This could be debugged thanks to a useful bug report from Sander Klein.