This function returns true is some notifications are registered.
This function is usefull for the following patch
BUG/MEDIUM: lua/socket: Sheduling error on write: may dead-lock
It should be backported in 1.6, 1.7 and 1.8
Don't forget to increase tasks_run_queue when we're adding a task to the
tasklet list, and to decrease it when we remove a task from a runqueue,
or its value won't be accurate, and could lead to tasks not being executed
when put in the global run queue.
1.9-dev only, no backport is needed.
This patch adds a warning if an http-(request|reponse) (add|set)-header
rewrite fails to change the respective header in a request or response.
This usually happens when tune.maxrewrite is not sufficient to hold all
the headers that should be added.
There's no real reason to have a specific scheduler for applets anymore, so
nuke it and just use tasks. This comes with some benefits, the first one
being that applets cannot induce high latencies anymore since they share
nice values with other tasks. Later it will be possible to configure the
applets' nice value. The second benefit is that the applet scheduler was
not very thread-friendly, having a big lock around it in prevision of this
change. Thus applet-intensive workloads should now scale much better with
threads.
Some more improvement is possible now : some applets also use a task to
handle timers and timeouts. These ones could now be simplified to use only
one task.
Introduce tasklets, lightweight tasks. They have no notion of priority,
they are just run as soon as possible, and will probably be used for I/O
later.
For the moment they're used to replace the temporary thread-local list
that was used in the scheduler. The first part of the struct is common
with tasks so that tasks can be cast to tasklets and queued in this list.
Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that
it cannot accidently be confused as not in the queue.
Pure tasklets are identifiable by their nice value of -32768 (which is
normally not possible).
A lot of tasks are run on one thread only, so instead of having them all
in the global runqueue, create a per-thread runqueue which doesn't require
any locking, and add all tasks belonging to only one thread to the
corresponding runqueue.
The global runqueue is still used for non-local tasks, and is visited
by each thread when checking its own runqueue. The nice parameter is
thus used both in the global runqueue and in the local ones. The rare
tasks that are bound to multiple threads will have their nice value
used twice (once for the global queue, once for the thread-local one).
In preparation for thread-specific runqueues, change the task API so that
the callback takes 3 arguments, the task itself, the context, and the state,
those were retrieved from the task before. This will allow these elements to
change atomically in the scheduler while the application uses the copied
value, and even to have NULL tasks later.
A few users reported that building without threads was accidently broken
after commit 6b96f72 ("BUG/MEDIUM: pollers: Use a global list for fd
shared between threads.") due to all_threads_mask not being defined.
It's OK to set it to zero as other code parts do when threads are
enabled but only one thread is used.
This needs to be backported to 1.8.
The function hlua_ctx_resume return less text message and more error
code. These error code allow the caller to return appropriate
message to the user.
The polled_mask is only used in the pollers, and removing it from the
struct fdtab makes it fit in one 64B cacheline again, on a 64bits machine,
so make it a separate array.
With the old model, any fd shared by multiple threads, such as listeners
or dns sockets, would only be updated on one threads, so that could lead
to missed event, or spurious wakeups.
To avoid this, add a global list for fd that are shared, using the same
implementation as the fd cache, and only remove entries from this list
when every thread as updated its poller.
[wt: this will need to be backported to 1.8 but differently so this patch
must not be backported as-is]
Modify fd_add_to_fd_list() and fd_rm_from_fd_list() so that they take an
offset in the fdtab to the list entry, instead of hardcoding the fd cache,
so we can use them with other lists.
While running a task, we may try to delete and free a task that is about to
be run, because it's part of the local tasks list, or because rq_next points
to it.
So flag any task that is in the local tasks list to be deleted, instead of
run, by setting t->process to NULL, and re-make rq_next a global,
thread-local variable, that is modified if we attempt to delete that task.
Many thanks to PiBa-NL for reporting this and analysing the problem.
This should be backported to 1.8.
For large farms where servers are regularly added or removed, picking
a random server from the pool can ensure faster load transitions than
when using round-robin and less traffic surges on the newly added
servers than when using leastconn.
This commit introduces "balance random". It internally uses a random as
the key to the consistent hashing mechanism, thus all features available
in consistent hashing such as weights and bounded load via hash-balance-
factor are usable. It is extremely convenient because one common concern
when using random is what happens when a server is hammered a bit too
much. Here that can trivially be avoided, like in the configuration below :
backend bk0
balance random
hash-balance-factor 110
server-template s 1-100 127.0.0.1:8000 check inter 1s
Note that while "balance random" internally relies on a hash algorithm,
it holds the same properties as round-robin and as such is compatible with
reusing an existing server connection with "option prefer-last-server".
In order to use arbitrary data in the CLI (multiple lines or group of words
that must be considered as a whole, for example), it is now possible to add a
payload to the commands. To do so, the first line needs to end with a special
pattern: <<\n. Everything that follows will be left untouched by the CLI parser
and will be passed to the commands parsers.
Per-command support will need to be added to take advantage of this
feature.
Signed-off-by: Aurlien Nephtali <aurelien.nephtali@corp.ovh.com>
We'll need this in order to support uploading chunks. The h2 to h1
converter checks for the presence of the content-length header field
as well as the CONNECT method and returns these information to the
caller. The caller indicates whether or not a body is detected for
the message (presence of END_STREAM or not). No transfer-encoding
header is emitted yet.
In some cases, we call cs_destroy() very early, so early the connection
doesn't yet have a mux, so we can't call mux->detach(). In this case,
just destroy the associated connection.
This should be backported to 1.8.
With gcc < 4.7, when HAProxy is built with threads, the macros
HA_ATOMIC_CAS/XCHG/STORE relies on the legacy __sync builtins. These macros
are slightly complicated than the versions relying on the '_atomic'
builtins. Internally, some local variables are defined, prefixed with '__' to
avoid name clashes with the caller.
On the other hand, the macros HA_ATOMIC_UPDATE_MIN/MAX call HA_ATOMIC_CAS. Some
local variables are also definied in these macros, following the same naming
rule as below. The problem is that '__new' variable is used in
HA_ATOMIC_MIN/_MAX and in HA_ATOMIC_CAS. Obviously, the behaviour is undefined
because '__new' in HA_ATOMIC_CAS is left uninitialized. Unfortunatly gcc fails
to detect this error.
To fix the problem, all internal variables to macros are now suffixed with name
of the macros to avoid clashes (for instance, '__new_cas' in HA_ATOMIC_CAS).
This patch must be backported in 1.8.
In addition to metrics about time spent in the SPOE, following counters have
been added:
* applets : number of SPOE applets.
* idles : number of idle applets.
* nb_sending : number of streams waiting to send data.
* nb_waiting : number of streams waiting for a ack.
* nb_processed : number of events/groups processed by the SPOE (from the
stream point of view).
* nb_errors : number of errors during the processing (from the stream point of
view).
Log messages has been updated to report these counters. Following pattern has
been added at the end of the log message:
... <idles>/<applets> <nb_sending>/<nb_waiting> <nb_error>/<nb_processed>
Now it is possible to configure a logger in a spoe-agent section using a "log"
line, as for a proxy. "no log", "log global" and "log <address> ..." syntaxes
are supported.
With "log global" line, the global list of loggers are copied into the proxy's
struct. The list coming from the default section is also copied when a frontend
or a backend section is parsed. So it is possible to have duplicate entries in
the proxy's list. For instance, with this following config, all messages will be
logged twice:
global
log 127.0.0.1 local0 debug
daemon
defaults
mode http
log global
option httplog
frontend front-http
log global
bind *:8888
default_backend back-http
backend back-http
server www 127.0.0.1:8000
Now, the function parse_logsrv should be used to parse a "log" line. This
function will update the list of loggers passed in argument. It can release all
log servers when "no log" line was parsed (by the caller) or it can parse "log
global" or "log <address> ... " lines. It takes care of checking the caller
context (global or not) to prohibit "log global" usage in the global section.
"set-process-time" and "set-total-time" options have been added to store
processing times in the transaction scope, at each event and group processing,
the current one and the total one. So it is possible to get them.
TODO: documentation
Following metrics are added for each event or group of messages processed in the
SPOE:
* processing time: the delay to process the event or the group. From the
stream point of view, it is the latency added by the SPOE
processing.
* request time : It is the encoding time. It includes ACLs processing, if
any. For fragmented frames, it is the sum of all fragments.
* queue time : the delay before the request gets out the sending queue. For
fragmented frames, it is the sum of all fragments.
* waiting time: the delay before the reponse is received. No fragmentation
supported here.
* response time: the delay to process the response. No fragmentation supported
here.
* total time: (unused for now). It is the sum of all events or groups
processed by the SPOE for a specific threads.
Log messages has been updated. Before, only errors was logged (status_code !=
0). Now every processing is logged, following this format:
SPOE: [AGENT] <TYPE:NAME> sid=STREAM-ID st=STATUC-CODE reqT/qT/wT/resT/pT
where:
AGENT is the agent name
TYPE is EVENT of GROUP
NAME is the event or the group name
STREAM-ID is an integer, the unique id of the stream
STATUS_CODE is the processing's status code
reqT/qT/wT/resT/pT are delays descrive above
For all these delays, -1 means the processing was interrupted before the end. So
-1 for the queue time means the request was never dequeued. For fragmented
frames it is harder to know when the interruption happened.
For now, messages are logged using the same logger than the backend of the
stream which initiated the request.
Clearing the update_mask bit in fd_insert may lead to duplicate insertion
of fd in fd_updt, that could lead to a write past the end of the array.
Instead, make sure the update_mask bit is cleared by the pollers no matter
what.
This should be backported to 1.8.
[wt: warning: 1.8 doesn't have the lockless fdcache changes and will
require some careful changes in the pollers]
This function will be called from the CLI's "show fd" command to append some
extra mux-specific information that only the mux handler can decode. This is
supposed to help collect various hints about what is happening when facing
certain anomalies.
This patch add option crc32c (PP2_TYPE_CRC32C) to proxy protocol v2.
It compute the checksum of proxy protocol v2 header as describe in
"doc/proxy-protocol.txt".
Commit 4815c8c ("MAJOR: fd/threads: Make the fdcache mostly lockless.")
made the fd cache lockless, but after a few iterations, a subtle part was
lost, consisting in setting the bit on the fd_cache_mask immediately when
adding an event. Now it was done only when the cache started to process
events, but the problem it causes is that fd_cache_mask isn't reliable
anymore as an indicator of presence of events to be processed with no
delay outside of fd_process_cached_events(). This results in some spurious
delays when processing inter-thread wakeups between tasks. Just restoring
the flag when the event is added is enough to fix the problem.
Kudos to Christopher for spotting this one!
No backport is needed as this is only in the development version.
The management of the servers and the proxies queues was not thread-safe at
all. First, the accesses to <strm>->pend_pos were not protected. So it was
possible to release it on a thread (for instance because the stream is released)
and to use it in same time on another one (because we redispatch pending
connections for a server). Then, the accesses to stream's information (flags and
target) from anywhere is forbidden. To be safe, The stream's state must always
be updated in the context of process_stream.
So to fix these issues, the queue module has been refactored. A lock has been
added in the pendconn structure. And now, when we try to dequeue a pending
connection, we start by unlinking it from the server/proxy queue and we wake up
the stream. Then, it is the stream reponsibility to really dequeue it (or
release it). This way, we are sure that only the stream can create and release
its <pend_pos> field.
However, be careful. This new implementation should be thread-safe
(hopefully...). But it is not optimal and in some situations, it could be really
slower in multi-threaded mode than in single-threaded one. The problem is that,
when we try to dequeue pending connections, we process it from the older one to
the newer one independently to the thread's affinity. So we need to wait the
other threads' wakeup to really process them. If threads are blocked in the
poller, this will add a significant latency. This problem happens when maxconn
values are very low.
This patch must be backported in 1.8.
When a listener is temporarily disabled, we start by locking it and then we call
.pause callback of the underlying protocol (tcp/unix). For TCP listeners, this
is not a problem. But listeners bound on an unix socket are in fact closed
instead. So .pause callback relies on unbind_listener function to do its job.
Unfortunatly, unbind_listener hold the listener's lock and then call an internal
function to unbind it. So, there is a deadlock here. This happens during a
reload. To fix the problemn, the function do_unbind_listener, which is lockless,
is now exported and is called when a listener bound on an unix socket is
temporarily disabled.
This patch must be backported in 1.8.
This patch implement proxy protocol v2 options related to crypto information:
ssl-cipher (PP2_SUBTYPE_SSL_CIPHER), cert-sig (PP2_SUBTYPE_SSL_SIG_ALG) and
cert-key (PP2_SUBTYPE_SSL_KEY_ALG).
ssl_sock_get_pkey_algo can be used to report pkey algorithm to log
and ppv2 (RSA2048, EC256,...).
Extract pkey information is not free in ssl api (lock/alloc/free):
haproxy can use the pkey information computed in load_certificate.
Store and use this information in a SSL ex_data when available,
compute it if not (SSL multicert bundled and generated cert).
Private key information is used in switchctx to implement native multicert
selection (ecdsa/rsa/anonymous). This patch extract and store full pkey
information: dsa type and pkey size in bits. This can be used for switchctx
or to report pkey informations in ppv2 and log.
When the block of data need to be split to support the wrapping, the start of
the second block of data was wrong. We must be sure to skup data copied during
the first memcpy.
This patch must be backported to 1.8.
When the block of data need to be split to support the wrapping, the start of
the second block of data was wrong. We must be sure to skip data copied during
the first memcpy.
This patch must be backported to 1.8, 1.7, 1.6 and 1.5.
Since we use padding before the allocated page, it's trivial to place
the allocated address there and see if it gets mangled once we release
it.
This may be backported to stable releases already using DEBUG_UAF.
Commit 158fa75 ("MINOR: pools: implement DEBUG_UAF to detect use after free")
implemented pool use-after-free detection, but the mmap() return value isn't
properly checked, preventing the call to pool_alloc_area() from returning
NULL. So on out-of-memory a mangled pointer is returned, causing a crash on
the pool_alloc() site instead of forcing a GC. It doesn't affect regular
operations however, just complicates complex bug investigations.
This fix should be backported to 1.8 and to 1.7.
Since commit cf975d4 ("MINOR: pools/threads: Implement lockless memory
pools."), we support lockless pools. However the parts dedicated to
detecting use-after-free are not present in this part, making DEBUG_UAF
useless in this situation.
The present patch sets a new define CONFIG_HAP_LOCKLESS_POOLS when such
a compatible architecture is detected, and when pool debugging is not
requested, then makes use of this everywhere in pools and buffers
functions. This way enabling DEBUG_UAF will automatically disable the
lockless version.
No backport is needed as this is purely 1.9-dev.
This removes the end label from memory.h.
The labels are unused as of cf975d46bc
which is unreleased (and incidentally the first commit containing
those labels, thus they never have been used).