Commit Graph

1865 Commits

Author SHA1 Message Date
Willy Tarreau
2e9c1d2960 MINOR: stream: measure and report a stream's call rate in "show sess"
Quite a few times some bugs have made a stream task incorrectly
handle a complex combination of events, which was often reported as
"100% CPU", and was usually caused by the event not being properly
identified and flushed, and the stream's handler called in loops.

This patch adds a call rate counter to the stream struct. It's not
huge, it's really inexpensive (especially compared to the rest of the
processing function) and will easily help spot such tasks in "show sess"
output, possibly even allowing to kill them.

A future patch should probably consist in alerting when they're above a
certain threshold, possibly sending a dump and killing them. Some options
could also consist in aborting in order to get an analyzable core dump
and let a service manager restart a fresh new process.
2019-04-24 16:04:23 +02:00
Willy Tarreau
0212fadd65 MINOR: tasks/activity: report the context switch and task wakeup rates
It's particularly useful to spot runaway tasks to see this. The context
switch rate covers all tasklet calls (tasks and I/O handlers) while the
task wakeups only covers tasks picked from the run queue to be executed.
High values there will indicate either an intense traffic or a bug that
mades a task go wild.
2019-04-24 16:04:23 +02:00
Baptiste Assmann
333939c2ee MINOR: action: new '(http-request|tcp-request content) do-resolve' action
The 'do-resolve' action is an http-request or tcp-request content action
which allows to run DNS resolution at run time in HAProxy.
The name to be resolved can be picked up in the request sent by the
client and the result of the resolution is stored in a variable.
The time the resolution is being performed, the request is on pause.
If the resolution can't provide a suitable result, then the variable
will be empty. It's up to the admin to take decisions based on this
statement (return 503 to prevent loops).

Read carefully the documentation concerning this feature, to ensure your
setup is secure and safe to be used in production.

This patch creates a global counter to track various errors reported by
the action 'do-resolve'.
2019-04-23 11:41:52 +02:00
Baptiste Assmann
0b9ce82dfa MINOR: obj_type: new object type for struct stream
This patch creates a new obj_type for the struct stream in HAProxy.
2019-04-23 11:35:56 +02:00
Baptiste Assmann
dfd35fd71a MINOR: dns: dns_requester structures are now in a memory pool
dns_requester structure can be allocated at run time when servers get
associated to DNS resolution (this happens when SRV records are used in
conjunction with service discovery).
Well, this memory allocation is safer if managed in an HAProxy pool,
furthermore with upcoming HTTP action which can perform DNS resolution
at runtime.

This patch moves the memory management of the dns_requester structure
into its own pool.
2019-04-23 11:33:48 +02:00
Olivier Houchard
88698d966d MEDIUM: connections: Add a way to control the number of idling connections.
As by default we add all keepalive connections to the idle pool, if we run
into a pathological case, where all client don't do keepalive, but the server
does, and haproxy is configured to only reuse "safe" connections, we will
soon find ourself having lots of idling, unusable for new sessions, connections,
while we won't have any file descriptors available to create new connections.

To fix this, add 2 new global settings, "pool_low_ratio" and "pool_high_ratio".
pool-low-fd-ratio  is the % of fds we're allowed to use (against the maximum
number of fds available to haproxy) before we stop adding connections to the
idle pool, and destroy them instead. The default is 20. pool-high-fd-ratio is
the % of fds we're allowed to use (against the maximum number of fds available
to haproxy) before we start killing idling connection in the event we have to
create a new outgoing connection, and no reuse is possible. The default is 25.
2019-04-18 19:52:03 +02:00
Olivier Houchard
e179d0e88f MEDIUM: connections: Provide a xprt_ctx for each xprt method.
For most of the xprt methods, provide a xprt_ctx.  This will be useful later
when we'll want to be able to stack xprts.
The init() method now has to create and provide the said xprt_ctx if needed.
2019-04-18 14:56:24 +02:00
Olivier Houchard
7b5fd1ec26 MEDIUM: connections: Move some fields from struct connection to ssl_sock_ctx.
Move xprt_st, tmp_early_data and sent_early_data from struct connection to
struct ssl_sock_ctx, as they are only used in the SSL code.
2019-04-18 14:56:24 +02:00
Willy Tarreau
636848aa86 MINOR: init: add a "set-dumpable" global directive to enable core dumps
It's always a pain to get a core dump when enabling user/group setting
(which disables the dumpable flag on Linux), when using a chroot and/or
when haproxy is started by a service management tool which requires
complex operations to just raise the core dump limit.

This patch introduces a new "set-dumpable" global directive to work
around these troubles by doing the following :

  - remove file size limits     (equivalent of ulimit -f unlimited)
  - remove core size limits     (equivalent of ulimit -c unlimited)
  - mark the process dumpable again (equivalent of suid_dumpable=1)

Some of these will depend on the operating system. This way it becomes
much easier to retrieve a core file. Temporarily moving the chroot to
a user-writable place generally enough.
2019-04-16 14:31:23 +02:00
William Lallemand
8f7069a389 CLEANUP: mworker: remove the type field in mworker_proc
Since the introduction of the options field, we can use it to store the
type of process.

type = 'm' is replaced by PROC_O_TYPE_MASTER
type = 'w' is replaced by PROC_O_TYPE_WORKER
type = 'e' is replaced by PROC_O_TYPE_PROG

The old values are still used in the HAPROXY_PROCESSES environment
variable to pass the information during a reload.
2019-04-16 13:26:43 +02:00
William Lallemand
bd3de3efb7 MEDIUM: mworker-prog: implements 'option start-on-reload'
This option is already the default, but its opposite 'no option
start-on-reload' allows the master to keep a previous instance of a
program and don't start a new one upon a reload.

The old program will then appear as a current one in "show proc" and
could also trigger an exit-on-failure upon a segfault.
2019-04-16 13:26:43 +02:00
William Lallemand
4528611ed6 MEDIUM: mworker: store the leaving state of a process
Previously we were assuming than a process was in a leaving state when
its number of reload was greater than 0. With mworker programs it's not
the case anymore so we need to store a leaving state.
2019-04-16 13:26:43 +02:00
Frédéric Lécaille
95679dc096 MINOR: peers: Add a new command to the CLI for peers.
Implements "show peers [peers section]" new CLI command to dump information
about the peers and their stick-tables to be synchronized and others internal.

May be backported as far as 1.5.
2019-04-16 09:58:40 +02:00
Olivier Houchard
e5eef1f1b4 MINOR: connections: Remove the SUB_CALL_UNSUBSCRIBE flag.
Garbage collect SUB_CALL_UNSUBSCIRBE, as it's now unused.
2019-04-15 19:27:57 +02:00
Nenad Merdanovic
8ef706502a BUG/MINOR: ssl: Fix 48 byte TLS ticket key rotation
Whenever HAProxy was reloaded with rotated keys, the resumption would be
broken for previous encryption key. The bug was introduced with the addition
of 80 byte keys in 9e7547 (MINOR: ssl: add support of aes256 bits ticket keys
on file and cli.).

This fix needs to be backported to 1.9.
2019-04-15 10:09:54 +02:00
Christopher Faulet
73c1207c71 MINOR: muxes: Pass the context of the mux to destroy() instead of the connection
It is mandatory to handle mux upgrades, because during a mux upgrade, the
connection will be reassigned to another multiplexer. So when the old one is
destroyed, it does not own the connection anymore. Or in other words, conn->ctx
does not point to the old mux's context when its destroy() callback is
called. So we now rely on the multiplexer context do destroy it instead of the
connection.

In addition, h1_release() and h2_release() have also been updated in the same
way.
2019-04-12 22:06:53 +02:00
Christopher Faulet
51f73eb11a MEDIUM: muxes: Add an optional input buffer during mux initialization
The mux's callback init() now take a pointer to a buffer as extra argument. It
must be used by the multiplexer as its input buffer. This buffer is always NULL
when a multiplexer is initialized with a fresh connection. But if a mux upgrade
is performed, it may be filled with existing data. Note that, for now, mux
upgrades are not supported. But this commit is mandatory to do so.
2019-04-12 22:06:53 +02:00
Christopher Faulet
0e160ff5bb MINOR: stream: Set a flag when the stream uses the HTX
The flag SF_HTX has been added to know when a stream uses the HTX or not. It is
set when an HTX stream is created. There are 2 conditions to set it. The first
one is when the HTTP frontend enables the HTX. The second one is when the attached
conn_stream uses an HTX multiplexer.
2019-04-12 22:06:53 +02:00
Christopher Faulet
9f38f5aa80 MINOR: muxes: Add a flag to specify a multiplexer uses the HTX
A multiplexer must now set the flag MX_FL_HTX when it uses the HTX to structured
the data exchanged with channels. the muxes h1 and h2 set this flag. Of course,
for the mux h2, it is set on h2_htx_ops only.
2019-04-12 22:06:53 +02:00
Willy Tarreau
64a9c05f37 MINOR: cli/listener: report the number of accepts on "show activity"
The "show activity" command reports the number of incoming connections
dispatched per thread but doesn't report the number of connections
received by each thread. It is important to be able to monitor this
value as it can show that for whatever reason a smaller set of threads
is receiving the connections and dispatching them to all other ones.
2019-04-12 15:54:15 +02:00
William Lallemand
9a1ee7ac31 MEDIUM: mworker-prog: implement program for master-worker
This patch implements the external binary support in the master worker.

To configure an external process, you need to use the program section,
for example:

	program dataplane-api
		command ./dataplane_api

Those processes are launched at the same time as the workers.

During a reload of HAProxy, those processes are dealing with the same
sequence as a worker:

  - the master is re-executed
  - the master sends a USR1 signal to the program
  - the master launches a new instance of the program

During a stop, or restart, a SIGTERM is sent to the program.
2019-04-01 14:45:37 +02:00
William Lallemand
e25473c846 REORG: mworker: move signal handlers and related functions
Move the following functions to mworker.c:

void mworker_catch_sighup(struct sig_handler *sh);
void mworker_catch_sigterm(struct sig_handler *sh);
void mworker_catch_sigchld(struct sig_handler *sh);

static void mworker_kill(int sig);
int current_child(int pid);
2019-04-01 14:45:37 +02:00
Christopher Faulet
87a8f353f1 CLEANUP: muxes/stream-int: Remove flags CS_FL_READ_NULL and SI_FL_READ_NULL
Since the flag CF_SHUTR is no more set to mark the end of the message, these
flags become useless.

This patch should be backported to 1.9.
2019-03-25 06:55:23 +01:00
Christopher Faulet
297d3e2e0f MINOR: channel: Report EOI on the input channel if it was reached in the mux
The flag CF_EOI is now set on the input channel when the flag CS_FL_EOI is set
on the corresponding conn_stream. In addition, if a read activity is reported
when this flag is set, the stream is woken up.

This patch should be backported to 1.9.
2019-03-25 06:24:43 +01:00
Christopher Faulet
5311a9255d MINOR: connection: and new flag to mark end of input (EOI)
Since the begining, in the H2 multiplexer, when the end of a message is reached,
the flag CS_FL_(R)EOS is set on the conn_stream to notify the upper layer that
all data were received and consumed and there is no longer any expected. The
stream-interface converts it into a shutdown read. But it leads to some
ambiguities with the real shutr. Once it was reported at the end of the message,
there is no way to report it when the read0 is received. For this reason, aborts
after the message was fully received cannot be reported. And on the channel
side, it is hard to make the difference between a shutr because the end of the
message was reached and a shutr because of an abort.

For these reasons, there is now a flag to mark the end of the message. It is
called CS_FL_EOI (end-of-input) because it is only used on the receipt path.
This flag is only declared and not used yet.

This patch will be used by future bug fixes and will have to be backported
to 1.9.
2019-03-25 06:24:25 +01:00
Willy Tarreau
0f22299435 CLEANUP: cache: don't export http_cache_applet anymore
This one can become static since it's not used by http/htx anymore.
2019-03-19 09:58:35 +01:00
Christopher Faulet
3a78aa6e95 BUG/MINOR: stats: Fully consume large requests in the stats applet
In the stats applet (in HTX and legacy HTTP), after a response is fully sent to
a client, the request is consumed. It is done at the end, after all the response
was copied into the channel's buffer. But only outgoing data at time the applet
is called are consumed. Then the applet is closed. If a request with a huge body
is sent, an error is triggerred because a SHUTW is catched for an unfinisehd
request.

Now, we consume request data until the end. In fact, we don't try to shutdown
the request's channel for write anymore.

This patch must be backported to 1.9 after some observation period. It should
probably be backported in prior versions too. But honnestly, with refactoring
on the connection layer and the stream interface in 1.9, it is probably safer
to not do so.
2019-03-19 09:49:29 +01:00
Christopher Faulet
203b2b0a5a MINOR: muxes: Report the Last read with a dedicated flag
For conveniance, in HTTP muxes (h1 and h2), the end of the stream and the end of
the message are reported the same way to the stream, by setting the flag
CS_FL_EOS. In the stream-interface, when CS_FL_EOS is detected, a shutdown for
read is reported on the channel side. This is historical. With the legacy HTTP
layer, because the parsing is done by the stream in HTTP analyzers, the EOS
really means a shutdown for read.

Most of time, for muxes h1 and h2, it works pretty well, especially because the
keep-alive is handled by the muxes. The stream is only used for one
transaction. So mixing EOS and EOM is good enough. But not everytime. For now,
client aborts are only reported if it happens before the end of the request. It
is an error and it is properly handled. But because the EOS was already
reported, client aborts after the end of the request are silently
ignored. Eventually an error can be reported when the response is sent to the
client, if the sending fails. Otherwise, if the server does not reply fast
enough, an error is reported when the server timeout is reached. It is the
expected behaviour, excpect when the option abortonclose is set. In this case,
we must report an error when the client aborts. But as said before, this event
can be ignored. So to be short, for now, the abortonclose is broken.

In fact, it is a design problem and we have to rethink all channel's flags and
probably the conn-stream ones too. It is important to split EOS and EOM to not
loose information anymore. But it is not a small job and the refactoring will be
far from straightforward.

So for now, temporary flags are introduced. When the last read is received, the
flag CS_FL_READ_NULL is set on the conn-stream. This way, we can set the flag
SI_FL_READ_NULL on the stream interface. Both flags are persistant. And to be
sure to wake the stream, the event CF_READ_NULL is reported. So the stream will
always have the chance to handle the last read.

This patch must be backported to 1.9 because it will be used by another patch to
fix the option abortonclose.
2019-03-18 15:50:23 +01:00
Christopher Faulet
2b9b6784b9 MINOR: stats: Move stuff about the stats status codes in stats files
The status codes definition (STAT_STATUS_*) and their string representation
stat_status_codes) have been moved in stats files. There is no reason to keep
them in proto_http files.
2019-03-15 14:34:59 +01:00
Christopher Faulet
3c2ecf75c8 MINOR: stats: Add the status code STAT_STATUS_IVAL to handle invalid requests
This patch must be backported to 1.9 because a bug fix depends on it.
2019-03-15 14:34:52 +01:00
Willy Tarreau
0cf33176bd MINOR: listener: move thr_idx from the bind_conf to the listener
Tests show that it's slightly faster to have this field in the listener.
The cache walk patterns are under heavy stress and having only this field
written to in the bind_conf was wasting a cache line that was heavily
read. Let's move this close to the other entries already written to in
the listener. Warning, the position does have an impact on peak performance.
2019-03-07 14:08:26 +01:00
Willy Tarreau
9f1d4e7f7f CLEANUP: listener: remove old thread bit mapping
Now that the P2C algorithm for the accept queue is removed, we don't
need to map a number to a thread bit anymore, so let's remove all
these fields which are taking quite some space for no reason.
2019-03-07 13:59:04 +01:00
Willy Tarreau
fc630bd373 MINOR: listener: improve incoming traffic distribution
By picking two randoms following the P2C algorithm, we seldom observe
asymmetric loads on bursts of small session counts. This is typically
what makes h2load take a bit of time to complete the last 100% because
if a thread gets two connections while the other ones only have one,
it takes twice the time to complete its work.

This patch proposes a modification of the p2c algorithm which seems
more suitable to this case : it mixes a rotating index with a random.
This way, we're certain that all threads are consulted in turn and at
the same time we're not forced to use the ones we're giving a chance.

This significantly increases the traffic rate. Now h2load shows faster
completion and the average request rates on H2 and the TLS resume rate
increases by a bit more than 5% compared to pure p2c.

The index was placed into the struct bind_conf because 1) it's faster
there and it's the best place to optimally distribute traffic among a
group of listeners. It's the only runtime-modified element there and
it will be quite cache-hot.
2019-03-07 13:48:04 +01:00
Frédéric Lécaille
756d97f205 MINOR: sample: Rework gRPC converter code.
For now on, "ungrpc" may take a second optional argument to provide
the protocol buffers types used to encode the field value to be extracted.
When absent the field value is extracted as a binary sample which may then
followed by others converters like "hex" which takes binary as input sample.
When this second argument is a type which does not match the one found by "ungrpc",
this field is considered as not found even if present.

With this patch we also remove the useless "varint" and "svarint" converters.

Update the documentation about "ungrpc" converters.
2019-03-05 11:04:23 +01:00
Frédéric Lécaille
7c93e88d0c MINOR: sample: Code factorization "ungrpc" converter.
Parsing protocol buffer fields always consists in skip the field
if the field is not found or store the field value if found.
So, with this patch we factorize a little bit the code for "ungrpc" converter.
2019-03-05 11:03:53 +01:00
Willy Tarreau
bf6964007a MINOR: global: keep a copy of the initial rlim_fd_cur and rlim_fd_max values
Let's keep a copy of these initial values. They will be useful to
compute automatic maxconn, as well as to restore proper limits when
doing an execve() on external checks.
2019-03-01 10:40:30 +01:00
Frédéric Lécaille
645635da84 MINOR: peers: Add a message for heartbeat.
This patch implements peer heartbeat feature to prevent any haproxy peer
from reconnecting too often, consuming sockets for nothing.

To do so, we add PEER_MSG_CTRL_HEARTBEAT new message to PEER_MSG_CLASS_CONTROL peers
control class of messages. A ->heartbeat field is added to peer structs
to store the heatbeat timeout value which is handled by the same function as for ->reconnect
to control the session timeouts. A 2-bytes heartbeat message is sent every 3s when
no updates have to be sent. This way, the peer which receives such a message is sure
the remote peer is still alive. So, it resets the ->reconnect peer session
timeout to its initial value (5s). This prevents any reconnection to an
already connected alive peer.
2019-03-01 09:33:26 +01:00
Willy Tarreau
7ac908bf8c MINOR: config: add global tune.listener.multi-queue setting
tune.listener.multi-queue { on | off }
  Enables ('on') or disables ('off') the listener's multi-queue accept which
  spreads the incoming traffic to all threads a "bind" line is allowed to run
  on instead of taking them for itself. This provides a smoother traffic
  distribution and scales much better, especially in environments where threads
  may be unevenly loaded due to external activity (network interrupts colliding
  with one thread for example). This option is enabled by default, but it may
  be forcefully disabled for troubleshooting or for situations where it is
  estimated that the operating system already provides a good enough
  distribution and connections are extremely short-lived.
2019-02-27 14:27:07 +01:00
Willy Tarreau
8a03408d81 MINOR: activity: add accept queue counters for pushed and overflows
It's important to monitor the accept queues to know if some incoming
connections had to be handled by their originating thread due to an
overflow. It's also important to be able to confirm thread fairness.
This patch adds "accq_pushed" to activity reporting, which reports
the number of connections that were successfully pushed into each
thread's queue, and "accq_full", which indicates the number of
connections that couldn't be pushed because the thread's queue was
full.
2019-02-27 14:27:07 +01:00
Willy Tarreau
1efafce61f MINOR: listener: implement multi-queue accept for threads
There is one point where we can migrate a connection to another thread
without taking risk, it's when we accept it : the new FD is not yet in
the fd cache and no task was created yet. It's still possible to assign
it a different thread than the one which accepted the connection. The
only requirement for this is to have one accept queue per thread and
their respective processing tasks that have to be woken up each time
an entry is added to the queue.

This is a multiple-producer, single-consumer model. Entries are added
at the queue's tail and the processing task is woken up. The consumer
picks entries at the head and processes them in order. The accept queue
contains the fd, the source address, and the listener. Each entry of
the accept queue was rounded up to 64 bytes (one cache line) to avoid
cache aliasing because tests have shown that otherwise performance
suffers a lot (5%). A test has shown that it's important to have at
least 256 entries for the rings, as at 128 it's still possible to fill
them often at high loads on small thread counts.

The processing task does almost nothing except calling the listener's
accept() function and updating the global session and SSL rate counters
just like listener_accept() does on synchronous calls.

At this point the accept queue is implemented but not used.
2019-02-27 14:27:07 +01:00
Willy Tarreau
b2b50a7784 MINOR: listener: pre-compute some thread counts per bind_conf
In order to quickly pick a thread ID when accepting a connection, we'll
need to know certain pre-computed values derived from the thread mask,
which are counts of bits per position multiples of 1, 2, 4, 8, 16 and
32. In practice it is sufficient to compute only the 4 first ones and
store them in the bind_conf. We update the count every time the
bind_thread value is adjusted.

The fields in the bind_conf struct have been moved around a little bit
to make it easier to group all thread bit values into the same cache
line.

The function used to return a thread number is bind_map_thread_id(),
and it maps a number between 0 and 31/63 to a thread ID between 0 and
31/63, starting from the left.
2019-02-27 14:27:07 +01:00
Willy Tarreau
9e85318417 MINOR: listener: maintain a per-thread count of the number of connections on a listener
Having this information will help us improve thread-level distribution
of incoming traffic.
2019-02-27 14:27:07 +01:00
Willy Tarreau
a36b324777 MEDIUM: listener: keep a single thread-mask and warn on "process" misuse
Now that nbproc and nbthread are exclusive, we can still provide more
detailed explanations about what we've found in the config when a bind
line appears on multiple threads and processes at the same time, then
ignore the setting.

This patch reduces the listener's thread mask to a single mask instead
of an array of masks per process. Now we have only one thread mask and
one process mask per bind-conf. This removes ~504 bytes of RAM per
bind-conf and will simplify handling of thread masks.

If a "bind" line only refers to process numbers not found by its parent
frontend or not covered by the global nbproc directive, or to a thread
not covered by the global nbthread directive, a warning is emitted saying
what will be used instead.
2019-02-27 14:27:07 +01:00
Olivier Houchard
9ea5d361ae MEDIUM: servers: Reorganize the way idle connections are cleaned.
Instead of having one task per thread and per server that does clean the
idling connections, have only one global task for every servers.
That tasks parses all the servers that currently have idling connections,
and remove half of them, to put them in a per-thread list of connections
to kill. For each thread that does have connections to kill, wake a task
to do so, so that the cleaning will be done in the context of said thread.
2019-02-26 18:17:32 +01:00
Frédéric Lécaille
3a463c92cf MINOR: arg: Add support for ARGT_PBUF_FNUM arg type.
This new argument type is used to parse Protocol Buffers field number
with dotted notation (e.g: 1.2.3.4).
2019-02-26 16:27:05 +01:00
Olivier Houchard
f131481a0a BUG/MEDIUM: servers: Add a per-thread counter of idle connections.
Add a per-thread counter of idling connections, and use it to determine
how many connections we should kill after the timeout, instead of using
the global counter, or we're likely to just kill most of the connections.

This should be backported to 1.9.
2019-02-21 19:07:45 +01:00
Willy Tarreau
ff9c9140f4 MINOR: config: make MAX_PROCS configurable at build time
For some embedded systems, it's pointless to have 32- or even 64- large
arrays of processes when it's known that much fewer processes will be
used in the worst case. Let's introduce this MAX_PROCS define which
contains the highest number of processes allowed to run at once. It
still defaults to LONGBITS but may be lowered.
2019-02-07 15:10:19 +01:00
Willy Tarreau
2415727a00 MINOR: global: add proc_mask() and thread_mask()
These two functions return either all_{proc,threads}_mask, or the argument.
This is used to default to all_proc_mask or all_threads_mask when not set
on bind_conf or proxies.
2019-02-04 05:09:15 +01:00
Willy Tarreau
a38a7175b1 MINOR: config: keep an all_proc_mask like we have all_threads_mask
This simplifies some mask comparisons at various places where
nbits(global.nbproc) was used.
2019-02-04 05:09:15 +01:00
Willy Tarreau
4ed84c96cf OPTIM: listener: optimize cache-line packing for struct listener
Some unused fields were placed early and some important ones were on
the second cache line. Let's move the proto_list and name closer to
the end of the structure to bring accept() and default_target() into
the first cache line.
2019-02-04 05:09:14 +01:00