Since the fix "BUG/MINOR: mworker: don't exit with an ambiguous value"
we are leaving with a EXIT_SUCCESS upon a SIGINT.
We still need to quit with a SIGINT when a worker leaves with a SIGINT.
This is done this way because vtest expect a 130 during the process
stop, haproxy without mworker returns a 130, so it should be the same in
mworker mode.
This should be backported in 1.9, with the previous patch ("BUG/MINOR:
mworker: don't exit with an ambiguous value").
Code has moved, mworker_catch_sigchld() is in haproxy.c.
When the sigchld handler is called and waitpid() returns -1,
the behavior of waitpid() with the status variable is undefined.
It is not a good idea to exit with the value contained in it.
Since this exit path does not use the exitcode variable, it means that
this is an expected and successful exit.
This should be backported in 1.9, code has moved,
mworker_catch_sigchld() is in haproxy.c.
Commit 3f12887 ("MINOR: mworker: don't use children variable anymore")
introduced a regression.
The previous behavior was to send a signal to every children, whether or
not they are former children. Instead of this, we only send a signal to
the current children, so we don't try to kill -INT or -TERM all
processes during a reload.
No backport needed.
When iterating on the CLI using "show activity" and no other load, it
was visible that the last thread was always skipped. This was caused by
the way the thread bits were walking : t1 was updated after t2 to make
sure it never equals t2 (thus it skips t2), and in case of a tie we
choose t1. This results in the chosen thread never to equal t2 unless
the other ones already have one connection. In addition to this, t2 was
recalulated upon each pass due to the fact that only the 31th bit was
looked at instead of looking at the t2'th bit.
This patch fixes this by updating t2 after t1 so that t1 is free to
walk over all positions under equal load. No measurable performance
gains are expected from this though, but it at least removes one
strange indicator which could lead to some suspicion.
No backport is needed.
It's always a pain to get a core dump when enabling user/group setting
(which disables the dumpable flag on Linux), when using a chroot and/or
when haproxy is started by a service management tool which requires
complex operations to just raise the core dump limit.
This patch introduces a new "set-dumpable" global directive to work
around these troubles by doing the following :
- remove file size limits (equivalent of ulimit -f unlimited)
- remove core size limits (equivalent of ulimit -c unlimited)
- mark the process dumpable again (equivalent of suid_dumpable=1)
Some of these will depend on the operating system. This way it becomes
much easier to retrieve a core file. Temporarily moving the chroot to
a user-writable place generally enough.
Since the introduction of the options field, we can use it to store the
type of process.
type = 'm' is replaced by PROC_O_TYPE_MASTER
type = 'w' is replaced by PROC_O_TYPE_WORKER
type = 'e' is replaced by PROC_O_TYPE_PROG
The old values are still used in the HAPROXY_PROCESSES environment
variable to pass the information during a reload.
This option is already the default, but its opposite 'no option
start-on-reload' allows the master to keep a previous instance of a
program and don't start a new one upon a reload.
The old program will then appear as a current one in "show proc" and
could also trigger an exit-on-failure upon a segfault.
Previously we were assuming than a process was in a leaving state when
its number of reload was greater than 0. With mworker programs it's not
the case anymore so we need to store a leaving state.
Maksim Kupriianov reported very strange crashes in fwrr_update_position()
which didn't make sense because of an apparent divide overflow except that
the value was not null in the core.
It happens that while the locking is correct in all the functions' call
graph, the uppermost one (fwrr_get_next_server()) incorrectly expected
that its target server was already locked when called. This stupid
assumption causd the server lock not to be held when calling the other
ones, explaining how it was possible to change the server's eweight by
calling srv_lb_commit_status() under the server lock yet collide with
its unprotected usage.
This commit makes sure that fwrr_get_server_from_group() retrieves a
locked server and that fwrr_get_next_server() is responsible for
unlocking the server before returning it. There is one subtlety in
this function which is that it builds a list of avoided servers that
were full while scanning the tree, and all of them are queued in a
full state so they must be unlocked upon return.
Many thanks to Maksim for providing detailed info allowing to narrow
down this bug.
This fix must be backported to 1.9. In 1.8 the lock seems much wider
and changes to the server's state are performed under the rendez-vous
point so this it doesn't seem possible that it happens there.
Implements "show peers [peers section]" new CLI command to dump information
about the peers and their stick-tables to be synchronized and others internal.
May be backported as far as 1.5.
gcc-3.4 reports this which actually looks like a valid warning when
looking at the code, it's unsure why others didn't notice it :
src/proto_htx.c: In function `htx_manage_server_side_cookies':
src/proto_htx.c:4266: warning: 'is_cookie2' might be used uninitialized in this function
Older compilers (like gcc-3.4) warn about the use of "const" on functions
returning a struct, which makes sense since the return may only be copied :
include/common/htx.h:233: warning: type qualifiers ignored on function return type
Let's simply drop "const" here.
Older compilers don't like to see "inline" placed after the type in a
function declaration, it must be "static inline <type>" only. This
patch touches various areas. The warnings were seen with gcc-3.4.
Move the definition of the various _HA_ATOMIC_* macros that use
__atomic_* in the #if GCC_VERSION >= 4.7, not just after it, so that we
can build with older versions of gcc again.
Instead of abusing the SUB_CALL_UNSUBSCRIBE flag, revamp the H2 code a bit
so that it just checks if h2s->sending_list is empty to know if the tasklet
of the stream_interface has been waken up or not.
send_wait is now set to NULL in h2_snd_buf() (ideally we'd set it to NULL
as soon as we're waking the tasklet, but it can't be done, because we still
need it in case we have to remove the tasklet from the task list).
In h2_subscribe(), don't add ourself to the send_list if we're already in it.
That may happen if we try to send and fail twice, as we're only removed
from the send_list if we managed to send data, to promote fairness.
Failing to do so can lead to either an infinite loop, or some random crashes,
as we'd get the same h2s in the send_list twice.
This should be backported to 1.9.
In the h1 and h2 muxes, make sure we unsubscribed before destroying the
mux context.
Failing to do so will lead in a segfault later, as the connection will
attempt to dereference its conn->send_wait or conn->recv_wait, which pointed
to the now-free'd mux context.
This was introduced by commit 39a96ee16e, so
should only be backported if that commit gets backported.
Commit a8f57d51a ("MINOR: cli/activity: report the accept queue sizes
in "show activity"") broke the single-threaded build because the
accept-rings are not implemented there. Let's ifdef this out. Ideally
we should start to think about always having such elements initialized
even without threads to improve the test coverage.
As expected, commit cde7902ac ("MEDIUM: tasks: improve fairness between
the local and global queues") broke the build with threads disabled,
and I forgot to rerun this test before committing. No backport is
needed.
Whenever HAProxy was reloaded with rotated keys, the resumption would be
broken for previous encryption key. The bug was introduced with the addition
of 80 byte keys in 9e7547 (MINOR: ssl: add support of aes256 bits ticket keys
on file and cli.).
This fix needs to be backported to 1.9.
The allocated trash chunk is not freed properly and causes a memory leak
exhibited as the growth in the trash pool allocations. Bug was introduced
in commit 271022 (BUG/MINOR: map: fix map_regm with backref).
This should be backported to all branches where the above commit was
backported.
In the past we used to reduce the number of tasks consulted at once when
some niced tasks were present in the run queue. This was dropped in 1.8
when the scheduler started to take batches. With the recent fixes it now
becomes possible to restore this behaviour which guarantees a better
latency between tasks when niced tasks are present. Thanks to this, with
the default number of 200 for tune.runqueue-depth, with a parasitic load
of 14000 requests per second, nice 0 gives 14000 rps, nice 1024 gives
12000 rps and nice -1024 gives 16000 rps. The amplitude widens if the
runqueue depth is lowered.
The offset calculated for the nice value used to be wrong for a long
time and got even worse when the improved multi-thread sheduler was
implemented because it continued to rely on the run queue size, which
become irrelevant given that we extract tasks in batches, so the run
queue size moves following a sawtooth form.
However the offsets much better reflects insertion positions in the
queue, so it's worth dropping this rq_size component of the equation.
Last point, due to the batches made of runqueue-depth entries at once,
the higher the depth, the lower the effect of the nice setting since
values are picked together in batches and placed into a list. An
intuitive approach consists in multiplying the nice value with the
batch size to allow tasks to participate to a different batch. And
experimentation shows that this works pretty well.
With a runqueue-depth of 16 and a parasitic load of 16000 requests
per second on 100 streams, a default nice of 0 shows 16000 requests
per second for nice 0, 22000 for nice -1024 and 10000 for nice 1024.
The difference is even bigger with a runqueue depth of 5. At 200
however it's much smoother (16000-22000).
Tasks allowed to run on multiple threads, as well as those scheduled by
one thread to run on another one pass through the global queue. The
local queues only see tasks scheduled by one thread to run on itself.
The tasks extracted from the global queue are transferred to the local
queue when they're picked by one thread. This causes a priority issue
because the global tasks experience a priority contest twice while the
local ones experience it only once. Thus if a tasks returns still
running, it's immediately reinserted into the local run queue and runs
much faster than the ones coming from the global queue.
Till 1.9 the tasks going through the global queue were mostly :
- health checks initialization
- queue management
- listener dequeue/requeue
These ones are moderately sensitive to unfairness so it was not that
big an issue.
Since 2.0-dev2 with the multi-queue accept, tasks are scheduled to
remote threads on most accept() and it becomes fairly visible under
load that the accept slows down, even for the CLI.
This patch remedies this by consulting both the local and the global
run queues in parallel and by always picking the task whose deadline
is the earliest. This guarantees to maintain an excellent fairness
between the two queues and removes the cascade effect experienced
by the global tasks.
Now the CLI always continues to respond quickly even in presence of
expensive tasks running for a long time.
This patch may possibly be backported to 1.9 if some scheduling issues
are reported but at this time it doesn't seem necessary.
This one hasn't been used anymore since the scheduler changes after 1.8
but it kept being exported and maintained up to date while it's always
reset when scanning the trees. Let's stop exporting it and updating it.
When a mux context is released, we must be sure it exists before dereferencing
it. The bug was introduced in the commit 39a96ee16 ("MEDIUM: muxes: Be prepared
to don't own connection during the release").
No need to backport this patch, expect if the commit 39a96ee16 is backported
too.
Because the HTX is now the default mode in HAProxy, it is also enabled by
default in reg-tests. The option '--use-htx' is still available, but deprecated
and have concretly no effect. To run reg-tests with the legacy HTTP mode, you
should use the option '--no-htx'.
The legacy HTTP mode is no more the default one. So now, by default, without any
option in your configuration, all proxies will use the HTX mode. The line
"option http-use-htx" in proxy sections are now useless, except to cancel the
legacy HTTP mode. To fallback on legacy HTTP mode, you should use the line "no
option http-use-htx" explicitly.
Note that the reg-tests still work by default on legacy HTTP mode. The HTX will
be enabled by default in a futur commit.
The upgrade is performed when an H2 preface is detected when the first request
on a connection is parsed. The CS is destroyed by setting EOS flag on it. A
special flag is added on the HTX message to warn the HTX analyzers the stream
will be closed because of an upgrade. This way, no error and no log are
emitted. When the mux h1 is released, we create a mux h2, without any CS and
passing the buffer with the unparsed H2 preface.
It is now possible to upgrade TCP streams to HTX when an HTTP backend is set for
a TCP frontend (both with the HTX enabled). So concretely, in such case, an
upgrade is performed from the mux pt to the mux h1. The current CS and the
channel's buffer are used to initialize the mux h1.
This will be mandatory to allow upgrades from TCP to HTTP in HTX. Of course, raw
buffers will still be used by default on TCP proxies, this option sets or
not. But if you want to handle mux upgrades from a TCP proxy, you must enable
the HTX on it and on all its backends.
There is only a small change in the lua code. Because TCP proxies can be HTX
aware, to exclude TCP services only for HTTP proxies, we must also check the
mode (TCP/HTTP) now.
This function will handle mux upgrades, for frontend connections only. It will
retrieve the best mux in the same way than conn_install_mux_fe except that the
mode and optionnally the proto are forced.
The new multiplexer is initialized using a new context and a specific input
buffer. Then, the old one is destroyed. If an error occurred, everything is
rolled back.
This happens during mux upgrades. In such case, when the destroy() callback is
called, the connection points to a different mux's context than the one passed
to the callback. It means the connection is owned by another mux. The old mux is
then released but the connection is not closed.
It is mandatory to handle mux upgrades, because during a mux upgrade, the
connection will be reassigned to another multiplexer. So when the old one is
destroyed, it does not own the connection anymore. Or in other words, conn->ctx
does not point to the old mux's context when its destroy() callback is
called. So we now rely on the multiplexer context do destroy it instead of the
connection.
In addition, h1_release() and h2_release() have also been updated in the same
way.
The mux's callback init() now take a pointer to a buffer as extra argument. It
must be used by the multiplexer as its input buffer. This buffer is always NULL
when a multiplexer is initialized with a fresh connection. But if a mux upgrade
is performed, it may be filled with existing data. Note that, for now, mux
upgrades are not supported. But this commit is mandatory to do so.
Instead of using the connection context to make the difference between a
frontend connection and a backend connection, we now rely on the function
conn_is_back().
In the function flt_stream_add_filter(), if the HTX is enabled, before attaching
a filter to a stream, we test if the filter can handle it or not. If not, the
filter is ignored. Before the proxy mode was tested. Now we test if the stream
is an HTX stream or not.
In the function smp_prefetch_htx(), we must know if data in the channel's buffer
are structured or not. Before the proxy mode was tested. Now we test if the
stream is an HTX stream or not. If yes, we know the HTX is used to structure
data in the channel's buffer.
The flag SF_HTX has been added to know when a stream uses the HTX or not. It is
set when an HTX stream is created. There are 2 conditions to set it. The first
one is when the HTTP frontend enables the HTX. The second one is when the attached
conn_stream uses an HTX multiplexer.
A multiplexer must now set the flag MX_FL_HTX when it uses the HTX to structured
the data exchanged with channels. the muxes h1 and h2 set this flag. Of course,
for the mux h2, it is set on h2_htx_ops only.
Instead of using the same mux_ops structure for the legacy HTTP mode and the HTX
mode, a dedicated mux_ops is now used for the HTX mode. Same callbacks are used
for both. But the flags may be different depending on the mode used.
This flag is used to explicitly kill the connection when the CS is closed. It
may be set by tcp rules. It must be respect by the mux-h1.
This patch must be backported to 1.9.
An H1 stream is destroyed when the conn_stream is detached or when the H1
connection is destroyed. In the first case, the CS is released by the caller. In
the second one, because the connection is closed, no CS is attached anymore. In
both, there is no reason to release the conn_stream in h1s_destroy().