This flag is set on the stream interface when we should wait for more space in
the channel's buffer to store more incoming data. This means we should wait some
outgoing data are sent before retrying to receive more data.
But in stream interface functions, at many places, instead of checking this
flag, we use the function channel_may_recv to know if we can (re)start
reading. This currently works but it is not really consistent. And, it works
because only raw data are stored in buffers. But it will be a problem when we
start to store structured data in buffers.
So to avoid any problems with futur implementations, we now rely only on
SI_FL_WAIT_ROOM. The function channel_may_recv can still be called, but only
when we are sure to handle raw data (for instance in functions ci_put*). To do
so, among other things, we must be sure to unset SI_FL_WAIT_ROOM and offer an
opportunity to call chk_rcv() on a stream interface when some data are sent
on the other end, which is now granted by the previous patch series.
We exclusively use stream_int_update() now, the lower layers are not
called anymore so let's remove them, as well as si_update() which used
to be their wrapper.
It's far from being clean, but at least it allows to resync both CS and
applets from the same place, taking into account the fact that CS are
processed synchronously for the send side while appletx are processed
outside of the process_stream() loop. The arrangement is optimised to
minimize the amount of iteration by handling send first, then updating
the SI_FL_WAIT_ROOM flags and only then dealing with si_chk_rcv() on
both sides. The SI_FL_WANT_PUT flag is set if needed before calling
si_chk_rcv() since this is done prior to calling stream_int_update().
Now there's no risk that stream_int_notify() is called anymore during
such operations, thus we cannot have any spurious wake-up anymore. The
case where a successful send() could complete a pending connect() is
handled by taking any stream-int state changes into account at the
call place, which is normal since process_stream() is designed to
iterate till stabilisation.
Doing this solves most of the remaining inconsistencies between CS and
applets.
The function used to be called in turn for each side of the stream, but
since it's called exclusively from process_stream(), it prevents us from
making use of the knowledge we have of the operations in progress for
each side, resulting in having to go all the way through functions like
stream_int_notify() which are not appropriate there.
That patch creates a new function, si_update_both() which takes two
stream interfaces expected to belong to the same stream, and processes
their flags in a more suitable order, but for now doesn't change the
logic at all.
The next step will consist in trying to reinsert the rest of the socket
layer-specific update code to ultimately update the flags correctly at
the end of the operation.
After careful inspection, it now seems OK to call si_chk_rcv() only when
SI_FL_WAIT_ROOM is cleared and SI_FL_WANT_PUT is set, since all identified
call places have already taken care of this.
Instead of clearing the SI_FL_WAIT_ROOM flag and losing the information
about the need from the producer to be woken up, we now call si_chk_rcv()
immediately. This is cheap to do and it could possibly be further improved
by only doing it when SI_FL_WAIT_ROOM was still set, though this will
require some extra auditing of the code paths.
The only remaining place where the flag was cleared without a call to
si_chk_rcv() is si_alloc_ibuf(), but since this one is called from a
receive path woken up from si_chk_rcv() or not having failed, the
clearing was not necessary anymore either.
And there was one place in stream_int_notify() where si_chk_rcv() was
called with SI_FL_WAIT_ROOM still explicitly set so this place was
adjusted in order to clear the flag prior to calling si_chk_rcv().
Now we don't have any situation where we randomly clear SI_FL_WAIT_ROOM
without trying to wake the other side up, nor where we call si_chk_rcv()
with the flag set, so this flag should accurately represent a failed
attempt at putting data into the buffer.
When CF_DONT_READ is set, till now we used to set SI_FL_WAIT_ROOM, which
is not appropriate since it would lose the subscribe status. Instead let's
clear SI_FL_WANT_PUT (just like applets do), and set the flag only when
CF_DONT_READ is cleared.
We have to do this in stream_int_update(), and in si_cs_io_cb() after
returning from si_cs_recv() since it would be a bit invasive to hack
this one for now. It must not be done in stream_int_notify() otherwise
it would re-enable blocked applets.
Last, when si_chk_rcv() is called, it immediately clears the flag before
calling ->chk_rcv() so that we are not tempted to uselessly loop on the
same call until the receive function is called. This is the same principle
as what is done with the applet scheduler.
This flag should already be cleared before calling the *chk_rcv() functions.
Before adapting all call places, let's first make sure si_chk_rcv() clears
it before calling them so that these functions do not have to check it again
and so that they do not adjust it. This function will only call the lower
layers if the SI_FL_WANT_PUT flag is present so that the endpoint can decide
not to be called (as done with applets).
There was an ambiguity in which functions of the si_ops struct could be
null or not. only ->update doesn't exist in one of the si_ops (the
embedded one), all others are always defined. ->shutr and ->shutw were
never tested. However ->chk_rcv() and ->chk_snd() were tested, causing
confusion about the proper way to wake the other side up if undefined
(which never happens).
Let's update the comments to state these functions are mandatory and
remove the offending checks.
We now do this on the si_cs_recv() path so that we always have
SI_FL_WANT_PUT properly set when there's a need to receive and
SI_FL_WAIT_ROOM upon failure.
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
The buffer allocation callback appctx_res_wakeup() used to rely on old
tricks to detect if a buffer was already granted to an appctx, namely
by checking the task's state. Not only this test is not valid anymore,
but it's inaccurate.
Let's solely on SI_FL_WAIT_ROOM that is now set on allocation failure by
the functions trying to allocate a buffer. The buffer is now allocated on
the fly and the flag removed so that the consistency between the two
remains granted. The patch also fixes minor issues such as the function
being improperly declared inline(!) and the fact that using appctx_wakeup()
sets the wakeup reason to TASK_WOKEN_OTHER while we try to use TASK_WOKEN_RES
when waking up consecutive to a ressource allocation such as a buffer.
This function replaces stream_res_available(), which is used as a callback
for the buffer allocator. It now carefully checks which stream interface
was blocked on a buffer allocation, tries to allocate the input buffer to
this stream interface, and wakes the task up once such a buffer was found.
It will automatically remove the SI_FL_WAIT_ROOM flag upon success since
the info this flag indicates becomes wrong as soon as the buffer is
allocated.
The code is still far from being perfect because if a call to si_cs_recv()
fails to allocate a buffer, we'll still end up passing via process_stream()
again, but this could be improved in the future by using finer-grained
wake-up notifications.
The independant -> independent error was fixed in 801a0a35 ("DOC: fix
name for "option independant-streams"), but the note about the wrong
name was erroneously fixed in 0e82b92a ("DOC: fix a few config typos").
Restore the "wrong" name so that when reasearching this option people
can actually find it.
Could be backported to 1.8.
When using the CLI proxy of the master and trying to access a worker
with the @ prefix, the worker just crash.
The commit 7216032 ("MEDIUM: mworker: leave when the master die")
reintroduced the old code of the pipe, which was not trying to access
the pointers before. The owner of the FD was modified to a different
value, this is a problem since we call listener_accept() in most cases
now from the mworker_accept_wrapper() and it casts the owner variable to
get the listener.
This patch fix the issue by setting back the previous owner of the FD.
Commit eafd8ebcf ("MEDIUM: stream-int: call si_cs_process() in
stream_int_update_conn") uncovered a sleeping bug. By calling
si_cs_process() within si_update(), we end up calling stream_int_notify().
We rely on it to update the stream-int before quitting as a hack, but
it happens to immediately wake the task up while the stream int's
state is still SI_ST_CON (during the connection establishment). The
observable effect is that an unreachable server causes haproxy to
use 100% CPU until the connection timeout strikes.
This patch fixes this by not causing the wake up for the SI_ST_CON
state. It would equally be possible to check for states higher than
SI_ST_EST as is done in other places, but for now better stay on the
safe side by covering the only issue that can be triggered. It's
suspected that this issue slightly affects older versions by causing
one extra call to process_stream() during the connection setup for
each activity change on the other side, but this should not have
any observable effect.
No backport is needed.
The process was aborting with nbthread > 1.
The mworker_pipe_register() could be called several time in multithread
mode, we don't want to abort() there.
It took me 17 minutes this morning to figure where si->wait_event was
set (it's in si_reset() which should now probably be renamed since it
doesn't just perform a reset anymore but also an allocation) and what
its task was assigned to (si_cs_io_cb() even for applets and empty SI).
This is too confusing and not intuitive enough, let's at least add a
few comments for now to help figure how this stuff works next time.
When the master die, the worker should exit too, this is achieved by
checking if the FD of the socketpair/pipe was closed between the master
and the worker.
In the former architecture of the master-worker, there was only a pipe
between the master and the workers, and it was easy to check an EOF on
the pipe FD to exit() the worker.
With the new architecture, we use a socketpair by process, and this
socketpair is also used to accept new connections with the
listener_accept() callback.
This accept callback can't handle the EOF and the exit of the process,
because it's very specific to the master worker. This is why we
transformed the mworker_pipe_handler() function in a wrapper which check
if there is an EOF and exit the process, and if not call
listener_accept() to achieve the accept.
The former behavior was to exit() the master process with the latest
status code known, which was the one of the last process to exit.
The problem is that the master process was not exiting with the status
code which provoked the exit-on-failure.
The active peers output indicates both the number of established peers
connections and the number of peers connection attempts. The new counter
"ConnectedPeers" also indicates the number of currently connected peers.
This helps detect that some peers cannot be reached for example. It's
worth mentioning that this value changes over time because unused peers
are often disconnected and reconnected. Most of the time it should be
equal to ActivePeers.
Peers are the last type of activity which can maintain a job present, so
it's important to report that such an entity is still active to explain
why the job count may be higher than zero. Here by "ActivePeers" we report
peers sessions, which include both established connections and outgoing
connection attempts.
When an haproxy process doesn't stop after a reload, it's because it
still has some active "jobs", which mainly are active sessions, listeners,
peers or other specific activities. Sometimes it's difficult to troubleshoot
the cause of these issues (which generally are the result of a bug) only
because some indicators are missing.
This patch add the number of listeners, the number of jobs, and the stopping
status to the output of "show info". This way it becomes a bit easier to try
to narrow down the cause of such an issue should it happen. A typical use
case is to connect to the CLI before reloading, then issuing the "show info"
command to see what happens. In the normal situation, stopping should equal
1, jobs should equal 1 (meaning only the CLI is still active) and listeners
should equal zero.
The patch is so trivial that it could make sense to backport it to 1.8 in
order to help with troubleshooting.
The tasks API was changed in 1.9-dev1 with commit 9f6af3322 ("MINOR: tasks:
Change the task API so that the callback takes 3 arguments."), causing the
task's state not to be usable anymore and to have been replaced with an
explicit argument in the callee. The task's state doesn't contain any trace
of the wakeup cause anymore. But there were two places where the old task's
state remained in use :
- sessions, used to more accurately report timeouts in logs when seeing
TASK_WOKEN_TIMEOUT ;
- peers, used to finish resynchronization when seeing TASK_WOKEN_SIGNAL
This commit fixes both occurrences by making sure we don't access task->state
directly (should we rename it by the way ?).
No backport is needed.
This one causes some events to be lost. It has already been tested in
an experimental branch but was not merged until being certain it was
needed. Fred figured that requesting /?k=1&s=447392 from httpterm through
haproxy-master was enough to stall the transfer.
No backport is needed, this only affects 1.9-dev5.
Since http-request was first introduced, more and more actions have been
added over time. This makes the "http-request" difficult to read and some
actions were forgotten in the list.
This is an attempt to make the documenation cleaner. In future steps, it
would be great to provide at least one example for each action.
It was reported here that authentication may fail when threads are
enabled :
https://bugzilla.redhat.com/show_bug.cgi?id=1643941
While I couldn't reproduce the issue, it's obvious that there is a
problem with the use of the non-reentrant crypt() function there.
On Linux systems there's crypt_r() but not on the vast majority of
other ones. Thus a first approach consists in placing a lock around
this crypt() call. Another patch may relax it when crypt_r() is
available.
This fix must be backported to 1.8. Thanks to Ryan O'Hara for the
quick notification.
A bug occurs when the CLI proxy of the master received a command which
is prefixed by some spaces but without a routing prefix (@).
In this case the pcli_parse_request() was returning a wrong number of
data to forward.
The response analyzer was called twice and the prompt displayed twice.
Commit 27346b01a ("OPTIM: tools: optimize my_ffsl() for x86_64") optimized
my_ffsl() for intensive use cases in the scheduler, but as half of the times
I got it wrong so it counted bits the reverse way. It doesn't matter for the
scheduler nor fd cache but it broke cpu-map with threads which heavily relies
on proper ordering.
We should probably consider dropping support for gcc < 3.4 and switching
to builtins for these ones, though often they are as ambiguous.
No backport is needed.
Released version 1.9-dev5 with the following main changes :
- BUILD: Makefile: add the new ERR variable to force -Werror
- MINOR: freq_ctr: add swrate_add_scaled() to work with large samples
- MINOR: stream_interface: Avoid calling si_cs_send/recv if not needed.
- CLEANUP: http: Remove the unused function http_find_header
- MINOR: h1: Export some functions parsing the value of some HTTP headers
- BUG/MEDIUM: stream-int: don't set SI_FL_WAIT_ROOM on CF_READ_DONTWAIT
- MINOR: proxy: add a new option "http-use-htx"
- BUG/MEDIUM: pools: fix the minimum allocation size
- MINOR: shctx: Shared objects block by block allocation.
- MINOR: cache: Larger HTTP objects caching.
- MINOR: shctx: Add a maximum object size parameter.
- MINOR: cache: Add "max-object-size" option.
- DOC: Update about the cache support for big objects.
- BUG/MINOR: cache: Crashes with "total-max-size" > 2047(MB).
- BUG/MINOR: cache: Wrong usage of shctx_init().
- BUG/MINOR: ssl: Wrong usage of shctx_init().
- MINOR: cache: Avoid usage of atoi() when parsing "max-object-size".
- MINOR: shctx: Change max. object size type to unsigned int.
- DOC: cache: Missing information about "total-max-size" and "max-object-size"
- CLEANUP: tools: fix misleading comment above function LIM2A
- MEDIUM: channel: merge back flags CF_WRITE_PARTIAL and CF_WRITE_EVENT
- BUG/MINOR: only mark connections private if NTLM is detected
- BUG/MINOR: only auto-prefer last server if lb-alg is non-deterministic
- MINOR: stream: don't prune variables if the list is empty
- MINOR: stream-int: add si_alloc_ibuf() to ease input buffer allocation
- MEDIUM: stream-int: replace channel_alloc_buffer() with si_alloc_ibuf() everywhere
- MEDIUM: stream: always call si_cs_recv() after a failed buffer allocation
- MEDIUM: stream: don't try to send first in process_stream()
- MEDIUM: stream-int: make si_update() synchronize flag changes before the I/O
- MEDIUM: stream-int: call si_cs_process() in stream_int_update_conn
- MINOR: stream-int: don't needlessly call tasklet_wakeup() in stream_int_chk_snd_conn()
- MINOR: stream-int: make stream_int_notify() not wake the tasklet up
- MINOR: stream-int: don't needlessly call si_cs_send() in si_cs_process()
- MINOR: mworker: number of reload in the life of a worker
- MEDIUM: mworker: each worker socketpair is a CLI listener
- REORG: mworker: move struct mworker_proc to global.h
- MINOR: server: export new_server() function
- MEDIUM: mworker: move proc_list gen before proxies startup
- MEDIUM: mworker: add proc_list in global.h
- MEDIUM: mworker: proxy for the master CLI
- MEDIUM: mworker: create CLI listeners from argv[]
- MEDIUM: cli: disable some keywords in the master
- MEDIUM: mworker: find the server ptr using a CLI prefix
- MEDIUM: cli: 'show proc' displays processus
- MEDIUM: cli: implement 'mode cli' proxy analyzers
- MINOR: cli: displays sockpair@ in "show cli sockets"
- MEDIUM: cli: enable "show cli sockets" for the master
- MINOR: cli: put @master @<relative pid> @!<pid> in the help
- MEDIUM: listeners: set O_CLOEXEC on the accepted FDs
- MEDIUM: mworker: stop the master proxy in the workers
- MEDIUM: channel: reorder the channel analyzers for the cli
- MEDIUM: cli: write a prompt for the CLI proxy of the master
- MINOR: cli: helper to write an response message and close
- MINOR: cache: Add "Age" header.
- REGTEST: make the IP+port logging test more reliable
- BUG/MINOR: memory: make the thread-local cache allocator set the debugging link
- BUG/MAJOR: http: http_txn_get_path() may deference an inexisting buffer
- BUG/MINOR: backend: assign the wait list after the error check
Commit 85b73e9 ("BUG/MEDIUM: stream: Make sure polling is right on retry.")
introduced a possible null dereference on the error path detected by gcc-7.
Let's simply assign srv_conn after checking the error and not before.
No backport is needed.
When the "path" sample fetch function is called without any path, the
function doesn't check that the request buffer is allocated. While this
doesn't happen with the request during processing, it can definitely
happen when mistakenly trying to reference a path from the response
since the request channel is not allocated anymore.
It's certain that this bug was emphasized by the buffer changes that
went in 1.9 and the HTTP refactoring, but at first glance, 1.8 doesn't
seem 100% safe either so it's possible that older version are affected
as well.
Thanks to PiBa-NL for reporting this bug with a reproducer.
When building with DEBUG_MEMORY_POOLS, an element returned from the
cache would not have its pool link initialized unless it's allocated
using pool_alloc(). This is problematic for buffer allocators which
use pool_alloc_dirty(), as freeing this object will make the code
think it was allocated from another pool. This patch does two things :
- make __pool_get_from_cache() set the link
- remove the extra initialization from pool_alloc() since it's always
done in either __pool_get_first() or __pool_refill_alloc()
This patch is marked MINOR since it only affects code explicitly built
for debugging. No backport is needed.
On my machine, test log/b00000.vtc fails ~9/10 times. Apparently, the
connection is often marked as reset before the timeout strikes, so the
log shows "CD" flags instead of "cD". This fix does two things :
1) shorten the client timeout to 1 millisecond instead of 5
2) accept both "cD" and "CD" as valid termination states since the
purpose is to validate the source address and port, and not the
status itself.
This patch makes the cache capable of adding an "Age" header as defined by
rfc7234.
During the storage of new HTTP objects we memorize ->eoh value and
the value of the "Age" header coming from the origin server.
These information may then be reused to return the cached HTTP objects
with a new "Age" header.
May be backported to 1.8.