The run queue scheduler now considers task->nice to queue a task and
to pick a task out of the queue. This makes it possible to boost the
access to statistics (both via HTTP and UNIX socket). The UNIX socket
receives twice as much a boost as the HTTP socket because it is more
sensible.
The ultree code has been removed in favor of a simpler and
cleaner ebtree implementation. The eternity queue does not
need to exist anymore, and the pool_tree64 has been removed.
The ebtree node is stored in the task itself. The qlist list
header is still used by the run-queue, but will be able to
disappear once the run-queue uses ebtree too.
The dequeuing logic was completely wrong. First, a task was assigned
to all servers to process the queue, but this task was never scheduled
and was only woken up on session free. Second, there was no reservation
of server entries when a task was assigned a server. This means that
as long as the task was not connected to the server, its presence was
not accounted for. This was causing trouble when detecting whether or
not a server had reached maxconn. Third, during a redispatch, a session
could lose its place at the server's and get blocked because another
session at the same moment would have stolen the entry. Fourth, the
redispatch option did not work when maxqueue was reached for a server,
and it was not possible to do so without indefinitely hanging a session.
The root cause of all those problems was the lack of pre-reservation of
connections at the server's, and the lack of tracking of servers during
a redispatch. Everything relied on combinations of flags which could
appear similarly in quite distinct situations.
This patch is a major rework but there was no other solution, as the
internal logic was deeply flawed. The resulting code is cleaner, more
understandable, uses less magics and is overall more robust.
As an added bonus, "option redispatch" now works when maxqueue has
been reached on a server.
Due to the way the stats socket work, it was not possible to
maintain the information related to the command entered, so
after filling a whole buffer, the request was lost and it was
considered that there was nothing to write anymore.
The major reason was that some flags were passed directly
during the first call to stats_dump_raw() instead of being
stored persistently in the session.
To definitely fix this problem, flags were added to the stats
member of the session structure.
A second problem appeared. When the stats were produced, a first
call to client_retnclose() was performed, then one or multiple
subsequent calls to buffer_write_chunks() were done. But once the
stats buffer was full and a reschedule operated, the buffer was
flushed, the write flag cleared from the buffer and nothing was
done to re-arm it.
For this reason, a check was added in the proto_uxst_stats()
function in order to re-call the client FSM when data were added
by stats_dump_raw(). Finally, the whole unix stats dump FSM was
rewritten to avoid all the magics it depended on. It is now
simpler and looks more like the HTTP one.
Currently there is a ~16KB limit for a data size passed via unix socket.
It is caused by a trivial bug ttat is going to fixed soon, however
in most cases there is no need to dump a full stats.
This patch makes possible to select a scope of dumped data by extending
current "show stat" to "show stat [<iid> <type> <sid>]":
- iid is a proxy id, -1 to dump all proxies
- type selects type of dumpable objects: 1 for frontend, 2 for backend, 4 for
server, -1 for all types. Values can be ORed, for example:
1+2=3 -> frontend+backend.
1+2+4=7 -> frontend+backend+server.
- sid is a service id, -1 to dump everything from the selected proxy.
To do this I implemented a new session flag (SN_STAT_BOUND), added three
variables in data_ctx.stats (iid, type, sid), modified dumpstats.c and
completely revorked the process_uxst_stats: now it waits for a "\n"
terminated string, splits args and uses them. BTW: It should be quite easy
to add new commands, for example to enable/disable servers, the only problem
I can see is a not very lucky config name (*stats* socket). :|
During the work I also fixed two bug:
- s->flags were not initialized for proto_uxst
- missing comma if throttling not enabled (caused by a stupid change in
"Implement persistent id for proxies and servers")
Other changes:
- No more magic type valuse, use STATS_TYPE_FE/STATS_TYPE_BE/STATS_TYPE_SV
- Don't memset full s->data_ctx (it was clearing s->data_ctx.stats.{iid/type/sid},
instead initialize stats.sv & stats.sv_st (stats.px and stats.px_st were already
initialized)
With all that changes it was extremely easy to write a short perl plugin
for a perl-enabled net-snmp (also included in this patch).
29385 is my PEN (Private Enterprise Number) and I'm willing to donate
the SNMPv2-SMI::enterprises.29385.106.* OIDs for HAProxy if there
is nothing assigned already.
Due to the way Linux delivers EPOLLIN and EPOLLHUP, a closed connection
received after some server data sometimes results in truncated responses
if the client disconnects before server starts to respond. The reason
is that the EPOLLHUP flag is processed as an indication of end of
transfer while some data may remain in the system's socket buffers.
This problem could only be triggered with sepoll, although nothing should
prevent it from happening with normal epoll. In fact, the work factoring
performed by sepoll increases the risk that this bug appears.
The fix consists in making FD_POLL_HUP and FD_POLL_ERR sticky and that
they are only checked if FD_POLL_IN is not set, meaning that we have
read all pending data.
That way, the problem is definitely fixed and sepoll still remains about
17% faster than epoll since it can take into account all information
returned by the kernel.
It is sometimes required to know some informations such as the
process uptime when consulting statistics. This patch adds the
"show info" command to query those informations on the UNIX
socket.
By default, counters used for statistics calculation are incremented
only when a session finishes. It works quite well when serving small
objects, but with big ones (for example large images or archives) or
with A/V streaming, a graph generated from haproxy counters looks like
a hedgehog.
This patch implements a contstats (continous statistics) option.
When set counters get incremented continuously, during a whole session.
Recounting touches a hotpath directly so it is not enabled by default,
as it has small performance impact (~0.5%).
It is important that unbind_listener() calls fd_delete() to remove
a file descriptor, because only this one can update the fdtab and
the maxfd entries.
There was a missing state for listeners, when they are not listening
but still attached to the protocol. The LI_ASSIGNED state was added
for this purpose. This permitted to clean up the assignment/release
workflow quite a bit. Generic enable/enable_all/disable/disable_all
primitives were added, and a disable_all entry was added to the
struct protocol.
A unix socket can now access the statistics. It currently only
recognizes the "show stat\n" command at the beginning of the
input, then returns the statistics in CSV format.
A new file, proto_uxst.c, implements support of PF_UNIX sockets
of type SOCK_STREAM. It relies on generic stream_sock_read/write
and uses its own accept primitive which also tries to be generic.
Right now it only implements an echo service in sight of a general
support for start dumping via unix socket. The echo code is more
of a proof of concept than useful code.