To be used, error messages declared in a default section must be copied when the
parsing of a proxy section starts. But this was only done for frontends.
This patch may be backported to older versions.
There are still some unwelcome synchronous calls to si_cs_recv() in
process_stream(). Let's have a new function si_sync_recv() to perform
a synchronous receive call on a stream interface regardless of the type
of its endpoint, and move these calls there. For now it only implements
conn_streams since it doesn't seem useful to support applets there. The
function implements an extra check for the stream interface to be in an
established state before attempting anything.
In commit f26c26c ("BUG/MEDIUM: stream-int: change the way buffer room
is requested by a stream-int") we used to call si_want_put() at the
end of sess_update_st_con_tcp(), when switching to SI_ST_EST state.
But this is incorrect as there are a few other situations where we
can switch to this state, such as in si_connect() where a connection
reuse is detected, or when directly calling an applet (in which case
that was already covered anyway). For now it doesn't have any side
effect but it could impact connection reuse after the stream-int
changes by stalling an immediately reused connection.
Let's move this flag change to sess_establish() instead, which is the
only place which is always called exactly once on connection setup.
No backport is needed, this is purely 1.9.
In master-worker mode, the socketpair CLI listener of the worker is now
marked unstoppable, which allows to connect to the CLI of an old process
which is in a leaving state, allowing to debug it.
An unstoppable listener is a listener which won't be stop during a soft
stop. The unstoppable_jobs variable is incremented and the listener
won't prevent the process to leave properly.
It is not a good idea to use this feature (the LI_O_NOSTOP flag) with a
listener that need to be bind again on another process during a soft
reload.
This patch allows a process to properly quit when some jobs are still
active, this feature is handled by the unstoppable_jobs variable, which
must be atomically incremented.
During each new iteration of run_poll_loop() the break condition of the
loop is now (jobs - unstoppable_jobs) == 0.
The unique usage of this at the moment is to handle the socketpair CLI
of a the worker during the stopping of the process. During the soft
stop, we could mark the CLI listener as an unstoppable job and still
handle new connections till every other jobs are stopped.
The previous commit fedceaf33 ("MINOR: http: Regroup return statements of
http_req_get_intercept_rule at the end") partly fixes the problem. But not
entierly. Because HTTP 103 reponses were sent line by line it is possible to mix
them with others. For instance, an early-hint rule followed by a redirect rule
leaving the response buffer totally messed up. Furthermore, if we fail to add
the last CRLF to finish the HTTP 103 response because there is no more space in
the buffer, it leave the buffer with an unfinished and invalid message.
This patch fixes the bug by creating a fully formed HTTP 103 response before
trying to push it in the response buffer. If an error occurred during the copy
or if another response was already sent, the HTTP 103 response is
ignored. However, the last point should never happened because, for redirects
and authentication errors, we first try to copy any pending HTTP 103 response.
Instead of having multiple return statements spreaded here and there in middle
of the function, we just exit from the loop setting the right return code. It
let a chance to do some work before leaving the function. It is also less error
prone.
Instead of having multiple return statements spreaded here and there in middle
of the function, we just exit from the loop setting the right return code. It
let a chance to do some work before leaving the function. It is also less error
prone.
This patch fixes a bug introduced in the commit 6b952c810 ("REORG: http: move
http_get_path() to http.c"). In the reorg, the code responsible to skip the
version to only extract the path in the HTTP request was dropped.
No backport is needed, this only affects 1.9.
Commit 53216e7db ("MEDIUM: connections: Don't directly mess with the
polling from the upper layers.") removed the CS_FL_DATA_RD_ENA and
CS_FL_DATA_WR_ENA flags without updating flags.c, thus breaking the
build. This patch also adds flag CL_FL_NOT_FIRST which was brought
by commit 08088e77c.
When splice() reports a pipe full condition, we go through the common
code used to release a possibly empty pipe (which we don't have) and which
immediately tries to allocate a buffer that will never be used. Further,
it may even subscribe to get this buffer if the resources are low. Let's
simply get out of this way if the pipe is full.
This fix could be backported to 1.8 though the code is a bit different
overthere.
Since we don't necessarily pass through conn_fd_handler() when reading,
conn_refresh_polling_flags() is not necessarily called when performing
a recv() operation, thus flags like CO_FL_WAIT_ROOM are not cleared.
It happens that si_cs_recv() checks CO_FL_WAIT_ROOM before deciding to
receive into a buffer, to see if the previous rcv_pipe() call failed by
lack of pipe room. The combined effect of these two statements is that
at the end of a file transmission, when there's too little data to
warrant the use of a pipe and the pipe is empty, we refrain from using
rcv_pipe() for the last few bytes, but since CO_FL_WAIT_ROOM is still
present, we don't use rcv_buf() either, and the connection remains
frozen in this state with si_cs_recv() called in loops.
In order to fix this we can simply manually clear CO_FL_WAIT_ROOM when
not using pipe so that the next check sees the result of the previous
operation and not an old one. We could equally call
cond_refresh_polling_flags() but that would be overkill and dangerous
given that it would manipulate the connection's flags under the mux.
By the way ideally the mux should report this flag into the connstream
for cleaner manipulation.
No backport is needed as this is only post 1.9-dev2.
As part of the changes that went into 1.9-dev2 regarding the polling
modifications, the changes consecutive to the removal of the wait_list
from the conn_streams (commit 71384551a) made si_cs_recv() occasionally
return without subscribing to receive events, causing spliced transfers
to randomly fail if the client was at least as fast as the server. This
may remain unnoticed on most deployments since servers are usually close
to haproxy with higher bandwidth than clients have, resulting in buffers
always being full.
In order to reproduce his effect, it is better to do it on the local
machine and to transfer very large objects (hundreds of gigs) over a
single connection, to see it suddenly stall after a few tens of gigs.
Now with this fix it's fine even after 3 TB over a single connection.
No backport is needed.
When we allocate struct stksess, we also allocate memory to store the
associated data before the struct itself.
As the data can be of different types, they can have different size. However,
we need the struct stksess to be properly aligned, as it can do 64bits
load/store (including atomic load/stores) on 64bits platforms, and some of
them doesn't support unaligned access.
So, when allocating the struct stksess, round the size up to the next
multiple of sizeof(void *), and make sure the struct stksess itself is
properly aligned.
Many thanks to Paul Martin for investigating and reporting that bug.
This should be backported to earlier releases.
When configuring the logs with a FD and using the master worker, the FD
was closed upon a reload because it was configured with CLOEXEC. It
leads to using the wrong FD for the logs and to close them. Which is
unfortunate since the master rely on the FD left opened during a reload.
The fix is to stop doing a CLOEXEC when the FD is inherited.
No backport needed.
This patch implements http_apply_early_hint_rule() function is responsible of
building HTTP 103 Early Hint responses each time a "early-hint" rule is matched.
This patch adds a "early_hint" struct to "arg" union of "act_rule" struct
and parse "early-hint" http-request keyword with it using the same
code as for "(add|set)-header" parser.
When namespaces are disabled, support is still reported because the file
is built with almost nothing in it but built anyway. Instead of extending
the scope of the numerous ifdefs in this file, better avoid building it
when namespaces are diabled. In this case we define my_socketat() as an
inline function mapping directly to socket(). The struct netns_entry
still needs to be defined because it's used by various other functions
in the code.
Splicing was in great part broken over the last few development version
due to the use of co_data() to detect if data are available in the channel.
But co_data() only looks at buffered data, not spliced data.
Channel_is_empty() takes care of both and should be used. With this,
splicing restarts to work but there are still a few cases where transfers
may stall.
No backport is needed.
Subsequent to the recent stream-int updates, we started to consider that
SI_FL_WANT_PUT needs to be set when receipt is enabled, but this is wrong
and results in 100% CPU when an HTTP client stays idle after a keep-alive
request because the stream-int has nothing to provide and nothing to send.
In fact just like for applets this flag should reflect the continuation
of an attempt. So it's si_cs_recv() which should set the flag, and clear
it if it has nothing more to provide. This function is called the first
time in process_stream()), and called again during transfers, so it will
always be up to date during stream_int_update() and stream_int_notify().
As a special case, it should also be set when a connection switches to
the established state. And we should absolutely refrain from calling
si_cs_recv() to re-enable reading, normally just setting this flag
(from within the stream-int's handler or prior to calling si_chk_rcv())
is expected to be OK.
A corner case remains where it was observed that in stream_int_notify() we
can sometimes be called with an empty output channel with SI_FL_WAIT_ROOM
and no CF_WRITE_PARTIAL, so there's no way to detect that we should
re-enable receiving. It's easy to also take care of this condition
there for the time it takes to figure if this situation is expected
or not.
Now it becomes more obvious that relying on a single flag to request
room (or on two flags to arbiter activity) is not workable given the
autonomy of both sides. The mux_h2 has taught us that blocking flags
are much more reliable, require much less condition and are much easier
to deal with. That's probably something to consider quickly in this
area.
No backport is needed.
This format is pretty similar to the previous "short" format except
that it also removes the severity level. Thus only the raw message is
sent. This is suitable for use in containers, where only the raw
information is expected and where the severity is supposed to come
from the file descriptor used.
This format is meant to be used with local file descriptors. It emits
messages only prefixed with a level, removing all the process name,
system name, date and so on. It is similar to the printk() format used
on Linux. It's suitable to be sent to a local logger compatible with
systemd's output format.
Note that the facility is still required but not used, hence it is
suggested to use "daemon" to remind that it's a local logger.
Example :
log stdout format short daemon # send everything to stdout
log stderr format short daemon notice # send important events to stderr
In certain situations it would be desirable to log to an existing file
descriptor, the most common case being a pipe between containers or
processes. The main issue with pipes is that using write() on them will
randomly truncate messages. But there is a trick. By using writev(), we
can atomically deliver or drop a message, which perfectly fits the
purpose. The only caveat is that large messages (4096 bytes on modern
operating systems) may be interleaved with messages from other processes
if using nbproc for example. In practice such messages are rare and most
of the time when users need such type of logging, the load is low enough
for a single process to be running so this is not really a problem.
This logging method thus uses unbuffered writev() calls and is uses more
CPU than if it used its own buffer with large writes at once, though this
is not a problem for moderate loads.
Logging to a file descriptor attached to a file also works with the side
effect that the process is significantly slowed down during disk accesses
and that it's not possible to rotate the file without restarting the
process. For this reason this option is not offered as a configuration
option, since it would confuse most users, but one could decide to
redirect haproxy's output to a file during debugging sessions. Two aliases
"stdout" and "stderr" are provided, but keep in mind that these are closed
by default in daemon mode.
When logging to a pipe or socket at a high enough rate, some logs will be
dropped and the number of dropped messages is reported in "show info".
It's easy to detect when logs on some paths are lost as sendmsg() will
return EAGAIN. This is particularly true when sending to /dev/log, which
often doesn't support a big logging capacity. Let's keep track of these
and report the total number of dropped messages in "show info".
The "log" statement appears both in the global section and in listeners.
The "format" directive was only documented in the first one. Maybe we
should think about moving this definition to the log section by now.
The error messages used to say something along "socket logger 2 failed"
or "sendmsg logger 2 failed" which are confusing. Let's rephrase this
"sendmsg() failed for logger 2".
Few typos detected by misspell in the README and CONTRIBUTING.
Even if one of them is on a listing of commits. I'm assuming that
if we want to enforce less typos in the commits, having one in the
contributing guide is not the best example.
This fixes a typo in the README of the peers section of this subsystem
and 2 typos in code comments. Groupped together as cleanup to avoid too
many 1 char patches.
Building on 32 bit gives this :
src/cache.c: In function 'http_action_store_cache':
src/cache.c:466:4: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:467:5: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c: In function 'cache_channel_append_age_header':
src/cache.c:578:2: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:579:3: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
It's because of the definition below added in commit e7a770c ("MINOR:
cache: Add "Age" header.") :
#define CACHE_ENTRY_MAX_AGE 2147483648
Just appending "U" to mark it unsigned is enough to fix it. This only
affects 1.9, no backport is needed.
Released version 1.9-dev6 with the following main changes :
- BUG/MEDIUM: tools: fix direction of my_ffsl()
- BUG/MINOR: cli: forward the whole command on master CLI
- BUG/MEDIUM: auth/threads: use of crypt() is not thread-safe
- MINOR: compat: automatically detect support for crypt_r()
- MEDIUM: auth/threads: make use of crypt_r() on systems supporting it
- DOC: split the http-request actions in their own section
- DOC: split the http-response actions in their own section
- BUG/MAJOR: stream-int: don't call si_cs_recv() in stream_int_chk_rcv_conn()
- BUG/MINOR: tasks: make sure wakeup events are properly reported to subscribers
- MINOR: stats: report the number of active jobs and listeners in "show info"
- MINOR: stats: report the number of active peers in "show info"
- MINOR: stats: report the number of currently connected peers
- MINOR: cli: show the number of reload in 'show proc'
- MINOR: cli: can't connect to the target CLI
- MEDIUM: mworker: does not create the CLI proxy when no listener
- MINOR: mworker: displays more information when leaving
- MEDIUM: mworker: exit with the incriminated exit code
- MINOR: mworker: displays a message when a worker is forked
- MEDIUM: mworker: leave when the master die
- CLEANUP: stream-int: retro-document si_cs_io_cb()
- BUG/MEDIUM: mworker: does not abort() in mworker_pipe_register()
- BUG/MEDIUM: stream-int: don't wake up for nothing during SI_ST_CON
- BUG/MEDIUM: cli: crash when trying to access a worker
- DOC: restore note about "independant" typo
- MEDIUM: stream: implement stream_buf_available()
- MEDIUM: appctx: check for allocation attempts in buffer allocation callbacks
- MINOR: stream-int: rename si_applet_{want|stop|cant}_{get|put}
- MINOR: stream-int: add si_done_{get,put} to indicate that we won't do it anymore
- MINOR: stream-int: use si_cant_put() instead of setting SI_FL_WAIT_ROOM
- MINOR: stream-int: make use of si_done_{get,put}() in shut{w,r}
- MINOR: stream-int: make it clear that si_ops cannot be null
- MEDIUM: stream-int: temporarily make si_chk_rcv() take care of SI_FL_WAIT_ROOM
- MINOR: stream-int: factor the SI_ST_EST state test into si_chk_rcv()
- MEDIUM: stream-int: make SI_FL_WANT_PUT reflect CF_DONT_READ
- MEDIUM: stream-int: always call si_chk_rcv() when we make room in the buffer
- MEDIUM: stream-int: make si_chk_rcv() check that SI_FL_WAIT_ROOM is cleared
- MINOR: stream-int: replace si_update() with si_update_both()
- MEDIUM: stream-int: make stream_int_update() aware of the lower layers
- CLEANUP: stream-int: remove the now unused si->update() function
- MEDIUM: stream-int: Rely only on SI_FL_WAIT_ROOM to stop data receipt
- MEDIUM: stream-int: Try to read data even if channel's buffer seems to be full
- BUG/MINOR: config: better detect the presence of the h2 pattern in npn/alpn
In 1.8, commit 45a66cc ("MEDIUM: config: ensure that tune.bufsize is at
least 16384 when using HTTP/2") tried to avoid an annoying issue making
H2 fail when haproxy is built with default buffer sizes smaller than 16kB,
which used to be the case for a very long time. Sadly, the test only sees
when NPN/ALPN exactly match "h2" and not when it's combined like
"h2,http/1.1" nor "http/1.1,h2". We can safely use strstr() there because
the string is prefixed by the token's length (0x02) which is unambiguous
as it cannot be part of any other token.
This fix should be backported to 1.8 as a safety guard against bad
configurations.
Before calling the mux to get incoming data, we get the amount of space
available at the input of the buffer. If there is no space, we don't try to read
more data. This is good enough when raw data are stored in the buffer. But this
info has no meaning when structured data are stored. Because with the HTTP
refactoring, such kind of data will be stored in buffers, it is a bit annoying.
So, to avoid any problems, we always call the mux. It is the mux's responsiblity
to notify the stream interface it needs more space to store more data. This must
be done by setting the flag CS_FL_RCV_MORE on the conn_stream.
This is exactly what we do in the pass-through mux when <count> is null.
This flag is set on the stream interface when we should wait for more space in
the channel's buffer to store more incoming data. This means we should wait some
outgoing data are sent before retrying to receive more data.
But in stream interface functions, at many places, instead of checking this
flag, we use the function channel_may_recv to know if we can (re)start
reading. This currently works but it is not really consistent. And, it works
because only raw data are stored in buffers. But it will be a problem when we
start to store structured data in buffers.
So to avoid any problems with futur implementations, we now rely only on
SI_FL_WAIT_ROOM. The function channel_may_recv can still be called, but only
when we are sure to handle raw data (for instance in functions ci_put*). To do
so, among other things, we must be sure to unset SI_FL_WAIT_ROOM and offer an
opportunity to call chk_rcv() on a stream interface when some data are sent
on the other end, which is now granted by the previous patch series.
We exclusively use stream_int_update() now, the lower layers are not
called anymore so let's remove them, as well as si_update() which used
to be their wrapper.
It's far from being clean, but at least it allows to resync both CS and
applets from the same place, taking into account the fact that CS are
processed synchronously for the send side while appletx are processed
outside of the process_stream() loop. The arrangement is optimised to
minimize the amount of iteration by handling send first, then updating
the SI_FL_WAIT_ROOM flags and only then dealing with si_chk_rcv() on
both sides. The SI_FL_WANT_PUT flag is set if needed before calling
si_chk_rcv() since this is done prior to calling stream_int_update().
Now there's no risk that stream_int_notify() is called anymore during
such operations, thus we cannot have any spurious wake-up anymore. The
case where a successful send() could complete a pending connect() is
handled by taking any stream-int state changes into account at the
call place, which is normal since process_stream() is designed to
iterate till stabilisation.
Doing this solves most of the remaining inconsistencies between CS and
applets.
The function used to be called in turn for each side of the stream, but
since it's called exclusively from process_stream(), it prevents us from
making use of the knowledge we have of the operations in progress for
each side, resulting in having to go all the way through functions like
stream_int_notify() which are not appropriate there.
That patch creates a new function, si_update_both() which takes two
stream interfaces expected to belong to the same stream, and processes
their flags in a more suitable order, but for now doesn't change the
logic at all.
The next step will consist in trying to reinsert the rest of the socket
layer-specific update code to ultimately update the flags correctly at
the end of the operation.
After careful inspection, it now seems OK to call si_chk_rcv() only when
SI_FL_WAIT_ROOM is cleared and SI_FL_WANT_PUT is set, since all identified
call places have already taken care of this.
Instead of clearing the SI_FL_WAIT_ROOM flag and losing the information
about the need from the producer to be woken up, we now call si_chk_rcv()
immediately. This is cheap to do and it could possibly be further improved
by only doing it when SI_FL_WAIT_ROOM was still set, though this will
require some extra auditing of the code paths.
The only remaining place where the flag was cleared without a call to
si_chk_rcv() is si_alloc_ibuf(), but since this one is called from a
receive path woken up from si_chk_rcv() or not having failed, the
clearing was not necessary anymore either.
And there was one place in stream_int_notify() where si_chk_rcv() was
called with SI_FL_WAIT_ROOM still explicitly set so this place was
adjusted in order to clear the flag prior to calling si_chk_rcv().
Now we don't have any situation where we randomly clear SI_FL_WAIT_ROOM
without trying to wake the other side up, nor where we call si_chk_rcv()
with the flag set, so this flag should accurately represent a failed
attempt at putting data into the buffer.
When CF_DONT_READ is set, till now we used to set SI_FL_WAIT_ROOM, which
is not appropriate since it would lose the subscribe status. Instead let's
clear SI_FL_WANT_PUT (just like applets do), and set the flag only when
CF_DONT_READ is cleared.
We have to do this in stream_int_update(), and in si_cs_io_cb() after
returning from si_cs_recv() since it would be a bit invasive to hack
this one for now. It must not be done in stream_int_notify() otherwise
it would re-enable blocked applets.
Last, when si_chk_rcv() is called, it immediately clears the flag before
calling ->chk_rcv() so that we are not tempted to uselessly loop on the
same call until the receive function is called. This is the same principle
as what is done with the applet scheduler.