Commit Graph

3803 Commits

Author SHA1 Message Date
Willy Tarreau
25002d206b MINOR: polling: create function fd_compute_new_polled_status()
This function is used to compute the new polling state based on
the previous state. All pollers have to do this in their update
loop, so better centralize the logic for it.
2014-01-26 00:42:32 +01:00
Willy Tarreau
e852545594 MEDIUM: polling: centralize polled events processing
Currently, each poll loop handles the polled events the same way,
resulting in a lot of duplicated, complex code. Additionally, epoll
was the only one to handle newly created FDs immediately.

So instead, let's move that code to fd.c in a new function dedicated
to this task : fd_process_polled_events(). All pollers now use this
function.
2014-01-26 00:42:32 +01:00
Willy Tarreau
6c11bd2f89 OPTIM: raw-sock: don't speculate after a short read if polling is enabled
This is the reimplementation of the "done" action : when we experience
a short read, we're almost certain that we've exhausted the system's
buffers and that we'll meet an EAGAIN if we attempt to read again. If
the FD is not yet polled, the stream interface already takes care of
stopping the speculative read. When the FD is already being polled, we
have two options :
  - either we're running from a level-triggered poller, in which case
    we'd rather report that we've reached the end so that we don't
    speculate over the poller and let it report next time data are
    available ;

  - or we're running from an edge-triggered poller in which case we
    have no choice and have to see the EAGAIN to re-enable events.

At the moment we don't have any edge-triggered poller, so it's desirable
to avoid speculative I/O that we know will fail.

Note that this must not be ported to SSL since SSL hides the real
readiness of the file descriptor.

Thanks to this change, we observe no EAGAIN anymore during keep-alive
transfers, and failed recvfrom() are reduced by half in http-server-close
mode (the client-facing side is always being polled and the second recv
can be avoided). Doing so results in about 5% performance increase in
keep-alive mode. Similarly, we used to have up to about 1.6% of EAGAIN
on accept() (1/maxaccept), and these have completely disappeared under
high loads.
2014-01-26 00:42:32 +01:00
Willy Tarreau
baf5b9b445 CLEANUP: connection: fix comments in connection.h to reflect new behaviour.
The polling has substantially changed, better fix the comments.
2014-01-26 00:42:31 +01:00
Willy Tarreau
aad69387ac CLEANUP: connection: use conn_xprt_ready() instead of checking the flag
It's easier and safer to rely on conn_xprt_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !xprt have been removed, as the
XPRT_READY flag itself guarantees xprt is set.
2014-01-26 00:42:31 +01:00
Willy Tarreau
3c72872da1 CLEANUP: connection: use conn_ctrl_ready() instead of checking the flag
It's easier and safer to rely on conn_ctrl_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !ctrl have been removed, as the
CTRL_READY flag itself guarantees ctrl is set.
2014-01-26 00:42:31 +01:00
Willy Tarreau
d8375891fc MEDIUM: connection: no need to recheck FD state
We already have everything in the connection flags using the
CO_FL_DATA_*_ENA bits combined with the fd's ready state, so
we do not need to check fdtab[fd].ev anymore. This considerably
simplifies the connection handling logic since it doesn't
have to mix connection flags with past polling states.
2014-01-26 00:42:31 +01:00
Willy Tarreau
798c3c9c41 MINOR: stream-interface: no need to call fd_stop_both() on error
We don't need to call fd_stop_both() since we already call
conn_cond_update_polling() which will do it. This call was introduced by
commit d29a066 ("BUG/MAJOR: connection: always recompute polling status
upon I/O").
2014-01-26 00:42:31 +01:00
Willy Tarreau
708e717251 MEDIUM: stream-interface: the polling flags must always be updated in chk_snd_conn
We used to only update the polling flags in data phase, but after that
we could update other flags. It does not seem possible to trigger a
bug here but it's not very safe either. Better always keep them up to
date.
2014-01-26 00:42:30 +01:00
Willy Tarreau
fd803bb4d7 MEDIUM: connection: add check for readiness in I/O handlers
The recv/send callbacks must check for readiness themselves instead of
having their callers do it. This will strengthen the test and will also
ensure we never refrain from calling a handshake handler because a
direction is being polled while the other one is ready.
2014-01-26 00:42:30 +01:00
Willy Tarreau
e1f50c4b02 MEDIUM: connection: remove conn_{data,sock}_poll_{recv,send}
We simply remove these functions and replace their calls with the
appropriate ones :

  - if we're in the data phase, we can simply report wait on the FD
  - if we're in the socket phase, we may also have to signal the
    desire to read/write on the socket because it might not be
    active yet.
2014-01-26 00:42:30 +01:00
Willy Tarreau
310987a038 MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags
These flags were used to report the readiness of the file descriptor.
Now this readiness is directly checked at the file descriptor itself.
This removes the need for constantly synchronizing updates between the
file descriptor and the connection and ensures that all layers share
the same level of information.

For now, the readiness is updated in conn_{sock,data}_poll_* by directly
touching the file descriptor. This must move to the lower layers instead
so that these functions can disappear as well. In this state, the change
works but is incomplete. It's sensible enough to avoid making it more
complex.

Now the sock/data updates become much simpler because they just have to
enable/disable access to a file descriptor and not to care anymore about
its readiness.
2014-01-26 00:42:30 +01:00
Willy Tarreau
f817e9f473 MAJOR: polling: rework the whole polling system
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.

This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.

Several new state manipulation functions have been introduced or
adapted :
  - fd_want_{recv,send} : enable receiving/sending on the FD regardless
    of its state (sets the ACTIVE flag) ;

  - fd_stop_{recv,send} : stop receiving/sending on the FD regardless
    of its state (clears the ACTIVE flag) ;

  - fd_cant_{recv,send} : report a failure to receive/send on the FD
    corresponding to EAGAIN (clears the READY flag) ;

  - fd_may_{recv,send}  : report the ability to receive/send on the FD
    as reported by poll() (sets the READY flag) ;

Some functions are used to report the current FD status :

  - fd_{recv,send}_active
  - fd_{recv,send}_ready
  - fd_{recv,send}_polled

Some functions were removed :
  - fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()

The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.

In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.

The following pollers have been updated :

   ev_select() : done, built, tested on Linux 3.10
   ev_poll()   : done, built, tested on Linux 3.10
   ev_epoll()  : done, built, tested on Linux 3.10 & 3.13
   ev_kqueue() : done, built, tested on OpenBSD 5.2
2014-01-26 00:42:30 +01:00
Willy Tarreau
033cd9d78c REORG: polling: rename "fd_process_spec_events()" to "fd_process_cached_events()"
This is in order to be coherent with the rest.
2014-01-26 00:42:29 +01:00
Willy Tarreau
899d95757e REORG: polling: rename the cache allocation functions
- alloc_spec_entry() becomes fd_alloc_cache_entry()
- release_spec_entry() becomes fd_release_cache_entry()
2014-01-26 00:42:29 +01:00
Willy Tarreau
16f649c82c REORG: polling: rename "fd_spec" to "fd_cache"
So fd_spec was renamed "fd_cache" as it's becoming an event cache, and
fd_nbspec becomes fd_cache_num.
2014-01-26 00:42:29 +01:00
Willy Tarreau
15a4dec87e REORG: polling: rename "spec_e" to "state" and "spec_p" to "cache"
We're completely changing the way FDs will be polled. There will be no
more speculative I/O since we'll know the exact FD state, so these will
only be cached events.

First, let's fix a few field names which become confusing. "spec_e" was
used to store a speculative I/O event state. Now we'll store the whole
R/W states for the FD there. "spec_p" was used to store a speculative
I/O cache position. Now let's clearly call it "cache".
2014-01-26 00:42:29 +01:00
Willy Tarreau
8e84c637d1 DOC: add a diagram showing polling state transitions
This is internal stuff.
2014-01-26 00:42:29 +01:00
Willy Tarreau
69a41fa8a3 CLEANUP: polling: rename "spec_e" to "state"
We're completely changing the way FDs will be polled. First, let's fix
a few field names which become confusing. "spec_e" was used to store a
speculative I/O event state. Now we'll store the whole R/W states for
the FD there.
2014-01-26 00:42:28 +01:00
Willy Tarreau
f1ed327a7a BUILD: fix VERDATE exclusion regex
A backslash was missing. It used to work well with GNU grep anyway but
better fix it.
2014-01-26 00:39:22 +01:00
Willy Tarreau
e6300be8f8 BUG/MEDIUM: stream-interface: don't wake the task up before end of transfer
Recent commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes") was not correct. It used to wake up the task
as soon as there was some write activity and the flag was set, even if there
were still some data to be forwarded. This resulted in process_session()
being called a lot when transfering chunk-encoded HTTP responses made of
very large chunks.

The purpose of the flag is to wake up only a task waiting for some
room and not the other ones, so it's totally counter-productive to
wake it up as long as there are data to forward because the task
will not be allowed to write anyway.

Also, the commit above was taking some risks by not considering
certain events anymore (eg: state != SI_ST_EST). While such events
are not used at the moment, if some new features were developped
in the future relying on these, it would be better that they could
be notified when subscribing to the WAKE_WRITE event, so let's
restore the condition.
2014-01-25 22:28:22 +01:00
Willy Tarreau
4afd70aeab BUG/MAJOR: fix freezes during compression
Recent commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes") introduced this new CF_WAKE_WRITE flag that
an analyser which requires some free space to write must set if it wants
to be notified.

Unfortunately, some places were missing. More specifically, the
compression engine can rarely be stuck by a lack of output space,
especially when dealing with non-compressible data. It then has to
stop until some pending data are flushed and for this it must set
the CF_WAKE_WRITE flag. But these cases were missed by the commit
above.

Fortunately, this change was introduced very recently and never
released, so the impact was limited.

Huge thanks to Sander Klein who first reported this issue and who kindly
and patiently provided lots of traces and test data that made it possible
to reproduce, analyze, then fix this issue.
2014-01-25 22:28:22 +01:00
Willy Tarreau
d153b3b096 DOC: fix misleading information about SIGQUIT
SIGQUIT dumps the pools state to stderr, not to the logs. Thanks to
Jim Freeman for reporting this.
2014-01-25 18:19:32 +01:00
Willy Tarreau
1f0da2485e BUG/MEDIUM: unique_id: HTTP request counter is not stable
Patrick Hemmer reported that using unique_id_format and logs did not
report the same unique ID counter since commit 9f09521 ("BUG/MEDIUM:
unique_id: HTTP request counter must be unique!"). This is because
the increment was done while producing the log message, so it was
performed twice.

A better solution consists in fetching a new value once per request
and saving it in the request or session context for all of this
request's life.

It happens that sessions already have a unique ID field which is used
for debugging and reporting errors, and which differs from the one
sent in logs and unique_id header.

So let's change this to reuse this field to have coherent IDs everywhere.
As of now, a session gets a new unique ID once it is instanciated. This
means that TCP sessions will also benefit from a unique ID that can be
logged. And this ID is renewed for each extra HTTP request received on
an existing session. Thus, all TCP sessions and HTTP requests will have
distinct IDs that will be stable along all their life, and coherent
between all places where they're used (logs, unique_id header,
"show sess", "show errors").

This feature is 1.5-specific, no backport to 1.4 is needed.
2014-01-25 11:07:06 +01:00
Thierry FOURNIER
e7054c7177 BUG/MINOR: payload: the patterns of the acl "req.ssl_ver" are no parsed with the good function.
The fetch "req.ssl_ver" is not declared as explicit acl. If it is used
as implicit ACL, the acl engine detect SMP_T_UINT output type and choose
to use the default interger parser: pat_parse_int(). This fetch needs the
parser pat_parse_dotted_ver().

This patch declare explicit ACL named "req.ssl_ver" that use the good
parser function pat_parse_dotted_ver().
2014-01-25 02:50:58 +01:00
Willy Tarreau
4bd07de087 MEDIUM: checks: make use of chk_report_conn_err() for connection errors
Checks used not to precisely report the errors that were detected at the
connection layer (eg: too many SSL connections). Using chk_report_conn_err()
makes this possible.
2014-01-24 16:15:04 +01:00
Willy Tarreau
9ce7013429 MEDIUM: tcp: report connection error at the connection level
Now when a connection error happens, it is reported in the connection
so that upper layers know exactly what happened. This is particularly
useful with health checks and resources exhaustion.
2014-01-24 16:15:04 +01:00
Willy Tarreau
45b34e8abc MINOR: connection: add more error codes to report connection errors
It is quite often that an connection error only reports "socket error" with
no more information. This is especially problematic with health checks where
many causes are possible, including resource exhaustion which do not lead to
a valid errno code. So let's add explicit codes to cover these cases.
2014-01-24 16:15:04 +01:00
Thierry FOURNIER
ee330afba0 MINOR: standard: The parse_binary() returns the length consumed and his documentation is updated
Actually the values returned by this function is never used. All the
callers just check if the resultat is non-zero. Before this patch, the
function returns the length of the produced content. This value is not
useful because is returned twice: the first time in the return value and
the second time in the <binstrlen> argument. Now the function returns
the number of bytes consumed from <source>.
2014-01-21 22:14:44 +01:00
Thierry FOURNIER
e7ba23633b MINOR: pattern: move functions for grouping pat_match_* and pat_parse_* and add documentation. 2014-01-21 22:14:21 +01:00
Thierry FOURNIER
46ceb01c24 BUG/MEDIUM: pattern: Segfault in binary parser
The functions pat_parse_* must return 0 if fail and the number of
elements eated from **text if not fail. The function pat_parse_bin()
returns 0 or the length parsed. This causes a segfault. I just apply the
double operator "!" on the result of the function pat_parse_bin() and
the return value value match the expected value.
2014-01-21 22:14:21 +01:00
Willy Tarreau
46be2e5039 MEDIUM: connection: update callers of ctrl->drain() to use conn_drain()
Now we can more safely rely on the connection state to decide how to
drain and what to do when data are drained. Callers don't need to
manipulate the file descriptor's state anymore.

Note that it also removes the need for the fix ea90063 ("BUG/MEDIUM:
stream-int: fix the keep-alive idle connection handler") since conn_drain()
correctly sets the polling flags.
2014-01-20 22:27:17 +01:00
Willy Tarreau
3bd3e57a9b MEDIUM: tcp: report in tcp_drain() that lingering is already disabled on close
When an incoming shutdown or error is detected, we know that we
can safely close without disabling lingering. Do it in tcp_drain()
so that we don't have to do it from each and every caller.
2014-01-20 22:27:17 +01:00
Willy Tarreau
2aefad5df7 MINOR: connection: add a new conn_drain() function
Till now there was no way to know from a connection if a previous
call to drain() had done any change. This function is used to drain
incoming data and to update the connection's flags at the same time.
It also correctly sets the polling flags on the connection if the
drain function indicates inability to receive. This function will
be used preferably over ctrl->drain() when a connection is used.
2014-01-20 22:27:16 +01:00
Willy Tarreau
7f4bcc312d MINOR: protocol: improve the proto->drain() API
It was not possible to know if the drain() function had hit an
EAGAIN, so now we change the API of this function to return :
  < 0 if EAGAIN was met
  = 0 if some data remain
  > 0 if a shutdown was received
2014-01-20 22:27:16 +01:00
Willy Tarreau
a593ec5bf4 MEDIUM: listener: fix polling management in the accept loop
The accept loop used to force fd_poll_recv() even in places where it
was not completely appropriate (eg: unexpected errors). It does not
yet cause trouble but will do with the upcoming polling changes. Let's
use it only where relevant now. EINTR/ECONNABORTED do not result in
poll() anymore but the failed connection is simply skipped (this code
dates from 1.1.32 when error codes were first considered).
2014-01-20 22:27:16 +01:00
Willy Tarreau
fa7fc95e16 BUG/MEDIUM: polling: ensure we update FD status when there's no more activity
Some rare unexplained busy loops were observed on versions up to 1.5-dev19.
It happens that if a file descriptor happens to be disabled for both read and
write while it was speculatively enabled for both and this without creating a
new update entry, there will be no way to remove it from the speculative I/O
list until some other changes occur. It is suspected that a double sequence
such as enable_both/disable_both could have led to this situation where an
update cancels itself and does not clear the spec list in the poll loop.
While it is unclear what I/O sequence may cause this situation to arise, it
is safer to always add the FD to the update list if nothing could be done on
it so that the next poll round will automatically take care of it.

This is 1.5-specific, no backport is needed.
2014-01-20 20:57:02 +01:00
Willy Tarreau
00b0fb9349 BUG/MAJOR: ssl: fix breakage caused by recent fix abf08d9
Recent commit abf08d9 ("BUG/MAJOR: connection: fix mismatch between rcv_buf's
API and usage") accidentely broke SSL by relying on an uninitialized value to
enter the read loop.

Many thanks to Cyril Bonté and Steve Ruiz for reporting this issue.
2014-01-17 11:09:40 +01:00
Thierry FOURNIER
410f8101ae BUG/MEDIUM: map: segmentation fault with the stats's socket command "set map ..."
The value of the variable "appctx->ctx.map.ent" is used after the loop,
but its value has changed. The variable "value" is initialized and
contains the good value.

This is a recent bug, no backport is needed.
2014-01-15 18:39:38 +01:00
Willy Tarreau
2317976daa BUILD: listener: fix recent accept4() again
Recent commit 4448925 ("BUILD/MINOR: listener: remove a glibc warning on accept4()")
broke accept4() on some systems because the glibc's version may now conflict with
the local one.
2014-01-15 16:45:17 +01:00
Willy Tarreau
abf08d9365 BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage
Steve Ruiz reported some reproducible crashes with HTTP health checks
on a certain page returning a huge length. The traces he provided
clearly showed that the recv() call was performed twice for a total
size exceeding the buffer's length.

Cyril Bonté tracked down the problem to be caused by the full buffer
size being passed to rcv_buf() in event_srv_chk_r() instead of passing
just the remaining amount of space. Indeed, this change happened during
the connection rework in 1.5-dev13 with the following commit :

f150317 MAJOR: checks: completely use the connection transport layer

But one of the problems is also that the comments at the top of the
rcv_buf() functions suggest that the caller only has to ensure the
requested size doesn't overflow the buffer's size.

Also, these functions already have to care about the buffer's size to
handle wrapping free space when there are pending data in the buffer.
So let's change the API instead to more closely match what could be
expected from these functions :

- the caller asks for the maximum amount of bytes it wants to read ;
This means that only the caller is responsible for enforcing the
reserve if it wants to (eg: checks don't).

- the rcv_buf() functions fix their computations to always consider
this size as a max, and always perform validity checks based on
the buffer's free space.

As a result, the code is simplified and reduced, and made more robust
for callers which now just have to care about whether they want the
buffer to be filled or not.

Since the bug was introduced in 1.5-dev13, no backport to stable versions
is needed.
2014-01-15 01:09:48 +01:00
Willy Tarreau
4448925930 BUILD/MINOR: listener: remove a glibc warning on accept4()
The accept4() Linux syscall requires _GNU_SOURCE on ix86, otherwise
it emits a warning. On other archs including x86_64, this problem
doesn't happen. Thanks to Charles Carter from Sigma Software for
reporting this.
2014-01-14 17:54:12 +01:00
Thierry FOURNIER
35249cb045 BUG/MINOR: pattern: pattern comparison executed twice
If the pattern is set as case insensitive, the string comparison
is executed twice. The first time is insensitive comparison, the
second is sensitive.

This is a recent bug, no backport is needed.
2014-01-14 15:42:59 +01:00
Willy Tarreau
8663105095 BUG: Revert "OPTIM: poll: restore polling after a poll/stop/want sequence"
This reverts commit 1208266356.

It randomly breaks SSL. What happens is that if the SSL response is
read at once by the SSL stack and is partially delivered to the buffer,
then there's no way to read the next parts because we wait for some
polling first.

So we'll fix this after the polling rework.
2014-01-13 11:34:42 +01:00
Willy Tarreau
17edc81e7e MEDIUM: config: report a warning when multiple servers have the same name
A config where multiple servers have the same name in the same backend is
prone to a number of issues : logs are not really exploitable, stats get
really tricky and even harder to change, etc...

In fact, it can be safe to have the same name between multiple servers only
when their respective IDs are known and used. So now we detect this situation
and emit a warning for the first conflict detected per server if any of the
servers uses an automatic ID.
2014-01-03 12:20:22 +01:00
Willy Tarreau
2b028dd828 OPTIM: session: put unlikely() around the freewheeling code
The code which enables tunnel mode or TCP transfers is rarely used
and at most once per session. Putting it in an unlikely() clause
reduces the length of the hot path of process_session() which is
already quite long, and also slightly reduces its overall size.
Some measurements show a steady gain of about 0.2% thanks to this.
2013-12-31 23:56:46 +01:00
Willy Tarreau
9fe7aae6eb MINOR: checks: use an inline function for health_adjust()
This function is called twice per request, and does almost always nothing.
Better use an inline version to avoid entering it when we can.

About 0.5% additional performance was gained this way.
2013-12-31 23:47:37 +01:00
Willy Tarreau
9e5a3aacf4 MEDIUM: stream-int: make si_connect() return an established state when possible
si_connect() used to only return SI_ST_CON. But it already detect the
connection reuse and is the function which avoids calling connect().
So it already knows the connection is valid and reuse. Thus we make it
return SI_ST_EST when a connection is reused. This means that
connect_server() can return this state and sess_update_stream_int()
as well.

Thanks to this change, we don't need to leave process_session() in
SI_ST_CON state to immediately enter it again to switch to SI_ST_EST.
Implementing this removes one call to process_session() per request
in keep-alive mode. We're now at 2 calls per request, which is the
minimum (one for the request and another one for the response). The
number of calls to http_wait_for_response() has also dropped from 2
to one.

Tests indicate a performance gain of about 2.6% in request rate in
keep-alive mode. There should be no gain in http-server-close() since
we don't use this faster path.
2013-12-31 23:32:12 +01:00
Willy Tarreau
b44c873d61 MEDIUM: session: prepare to support earlier transitions to the established state
At the moment it is possible in sess_prepare_conn_req() to switch to the
established state when the target is an applet. But sess_update_stream_int()
will soon also have the ability to set the established state via
connect_server() when a connection is reused, leading to a synchronous
connect.

So prepare the code to handle this SI_ST_ASS -> SI_ST_EST transition, which
really matches what's done in the lower layers.
2013-12-31 23:16:50 +01:00
Willy Tarreau
0e37f1c40e MINOR: session: factor out the connect time measurement
Currently there are 3 places in the code where t_connect is set after
switching to state SI_ST_EST, and a fourth one will soon come. Since
all these places lead to an immediate call to sess_establish() to
complete the session establishment, better move that measurement
there.
2013-12-31 23:06:46 +01:00