This reverts previous commit 347bbf79d20e1cff57075a8a378355dfac2475e2i.
The original code was correct. This patch resulted from a mistaken analysis
and breaks the scheduler:
########################## Starting vtest ##########################
Testing with haproxy version: 2.2-dev11-90b7d9-23
# top TEST reg-tests/lua/close_wait_lf.vtc TIMED OUT (kill -9)
# top TEST reg-tests/lua/close_wait_lf.vtc FAILED (10.008) signal=9
1 tests failed, 0 tests skipped, 88 tests passed
Program terminated with signal SIGABRT, Aborted.
[Current thread is 1 (Thread 0x7fb0dac2c700 (LWP 11292))]
(gdb) bt
#0 0x00007fb0e7c143f8 in raise () from /lib64/libc.so.6
#1 0x00007fb0e7c15ffa in abort () from /lib64/libc.so.6
#2 0x000000000053f5d6 in ha_panic () at src/debug.c:269
#3 0x00000000005a6248 in wdt_handler (sig=14, si=<optimized out>, arg=<optimized out>) at src/wdt.c:119
#4 <signal handler called>
#5 0x00000000004fbccd in tasklet_wakeup (tl=0x1b5abc0) at include/haproxy/task.h:351
#6 listener_accept (fd=<optimized out>) at src/listener.c:999
#7 0x00000000004262df in fd_update_events (evts=<optimized out>, fd=6) at include/haproxy/fd.h:418
#8 _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:251
#9 0x0000000000548d0f in run_poll_loop () at src/haproxy.c:2949
#10 0x000000000054908b in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3067
#11 0x00007fb0e902b684 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fb0e7ce5eed in clone () from /lib64/libc.so.6
(gdb) up
#5 0x00000000004fbccd in tasklet_wakeup (tl=0x1b5abc0) at include/haproxy/task.h:351
351 if (MT_LIST_ADDQ(&task_per_thread[tl->tid].shared_tasklet_list, (struct mt_list *)&tl->list) == 1) {
If the commit above is ever backported, this one must be as well!
In MT_LIST_ADDQ() and MT_LIST_ADD() we can't just check if the element is
already in a list, because there's a small race condition, it could be added
between the time we checked, and the time we actually set its next and prev.
So we have to lock it first.
This should be backported to 2.1.
The max_used_conns value is used as an estimate of the needed number of
connections on a server to know how many to keep open. But this one is
not reported, making it hard to troubleshoot reuse issues. Let's export
it in the sessions/current column.
Starting with commit 079cb9a ("MEDIUM: connections: Revamp the way idle
connections are killed") we started to improve the way to compute the
need for idle connections. But the condition to keep a connection idle
or drop it when releasing it was not updated. This often results in
storms of close when certain thresholds are met, and long series of
takeover() when there aren't enough connections left for a thread on
a server.
This patch tries to improve the situation this way:
- it keeps an estimate of the number of connections needed for a server.
This estimate is a copy of the max over previous purge period, or is a
max of what is seen over current period; it differs from max_used_conns
in that this one is a counter that's reset on each purge period ;
- when releasing, if the number of current idle+used connections is
lower than this last estimate, then we'll keep the connection;
- when releasing, if the current thread's idle conns head is empty,
and we don't exceed the estimate by the number of threads, then
we'll keep the connection.
- when cleaning up connections, we consider the max of the last two
periods to avoid killing too many idle conns when facing bursty
traffic.
Thanks to this we can better converge towards a situation where, provided
there are enough FDs, each active server keeps at least one idle connection
per thread all the time, with a total number close to what was needed over
the previous measurement period (as defined by pool-purge-delay).
On tests with large numbers of concurrent connections (30k) and many
servers (200), this has quite smoothed the CPU usage pattern, increased
the reuse rate and roughly halved the takeover rate.
The FD takeover operation might have certain impacts explaining
unexpected activities, so it's important to report such a counter
there. We thus count the number of times a thread has stolen an
FD from another thread.
The servers have internal states describing the status of idle connections,
unfortunately these were not exported in the stats. This patch adds the 3
following gauges:
- idle_conn_cur : Current number of unsafe idle connections
- safe_conn_cur : Current number of safe idle connections
- used_conn_cur : Current number of connections in use
The LRU cache head was an array of list, which causes false sharing
between 4 to 8 threads in the same cache line. Let's move it to the
thread_info structure instead. There's no need to do the same for the
pool_cache[] array since it's already quite large (32 pointers each).
By doing this the request rate increased by 1% on a 16-thread machine.
pool-t.h was mistakenly including the full-blown includes for threads,
lists and api instead of the types, and as such, CONFIG_HAP_LOCAL_POOLS
and CONFIG_HAP_LOCKLESS_POOLS were not visible everywhere.
The thread_info struct is convenient to store various per-thread info
without having to resort to a painful thread_local storage which is
slow and painful to initialize.
The problem is, by having this one in thread.h it's very difficult to
add more entries there because everyone already includes thread.h so
conversely thread.h cannot reference certain types.
There's no point in having this there, instead let's create a new pair
of files, tinfo{,-t}.h, which declare the structure. This way it will
become possible to extend them with other includes and have certain
files store their own types there.
We used to have 3 thread-based arrays for toremove_lock, idle_cleanup,
and toremove_connections. The problem is that these items are small,
and that this creates false sharing between threads since it's possible
to pack up to 8-16 of these values into a single cache line. This can
cause real damage where there is contention on the lock.
This patch creates a new array of struct "idle_conns" that is aligned
on a cache line and which contains all three members above. This way
each thread has access to its variables without hindering the other
ones. Just doing this increased the HTTP/1 request rate by 5% on a
16-thread machine.
The definition was moved to connection.{c,h} since it appeared a more
natural evolution of the ongoing changes given that there was already
one of them declared in connection.h previously.
It looked strange to see pool_evict_from_cache() always very present
on "perf top", but there was actually a reason to this: while b_free()
uses pool_free() which properly disposes the buffer into the local cache
and b_alloc_fast() allocates using pool_get_first() which considers the
local cache, b_alloc_margin() does not consider the local cache as it
only uses __pool_get_first() which only allocates from the shared pools.
The impact is that basically everywhere a buffer is allocated (muxes,
streams, applets), it's always picked from the shared pool (hence
involves locking) and is released to the local one and makes it grow
until it's required to trigger a flush using pool_evict_from_cache().
Buffers usage are thus not thread-local at all, and cause eviction of
a lot of possibly useful objects from the local caches.
Just fixing this results in a 10% request rate increase in an HTTP/1 test
on a 16-thread machine.
This bug was caused by recent commit ed891fd ("MEDIUM: memory: make local
pools independent on lockless pools") merged into 2.2-dev9, so not backport
is needed.
BoringSSL does not have X509_get_X509_PUBKEY
let our emulation level define that for BoringSSL as well
Build log:
src/ssl_sample.o: In function `smp_fetch_ssl_x_key_alg':
/home/travis/build/haproxy/haproxy/src/ssl_sample.c:592: undefined reference to `X509_get_X509_PUBKEY'
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
Makefile:860: recipe for target 'haproxy' failed
make: *** [haproxy] Error 1
travis-ci: https://travis-ci.com/github/haproxy/haproxy/jobs/351670996
With the rework of the config line parser, we've started to emit a dump
of the initial line underlined by a caret character indicating the error
location. But with extremely large lines it starts to take time and can
even cause trouble to slow terminals (e.g. over ssh), and this becomes
useless. In addition, control characters could be dumped as-is which is
bad, especially when the input file is accidently wrong (an executable).
This patch adds a string sanitization function which isolates an area
around the error position in order to report only that area if the string
is too large. The limit was set to 80 characters, which will result in
roughly 40 chars around the error being reported only, prefixed and suffixed
with "..." as needed. In addition, non-printable characters in the line are
now replaced with '?' so as not to corrupt the terminal. This way invalid
variable names, unmatched quotes etc will be easier to spot.
A typical output is now:
[ALERT] 176/092336 (23852) : parsing [bad.cfg:8]: forbidden first char in environment variable name at position 811957:
...c$PATH$PATH$d(xlc`%?$PATH$PATH$dgc?T$%$P?AH?$PATH$PATH$d(?$PATH$PATH$dgc?%...
^
Now that all tasklet queues are scanned at once by run_tasks_from_lists(),
it becomes possible to always check for lower priority classes and jump
back to them when they exist.
This patch adds tune.sched.low-latency global setting to enable this
behavior. What it does is stick to the lowest ranked priority list in
which tasks are still present with an available budget, and leave the
loop to refill the tasklet lists if the trees got new tasks or if new
work arrived into the shared urgent queue.
Doing so allows to cut the latency in half when running with extremely
deep run queues (10k-100k), thus allowing forwarding of small and large
objects to coexist better. It remains off by default since it does have
a small impact on large traffic by default (shorter batches).
Now process_runnable_tasks is responsible for calculating the budgets
for each queue, dequeuing from the tree, and calling run_tasks_from_lists().
This latter one scans the queues, picking tasks there and respecting budgets.
Note that its name was updated with a plural "s" for this reason.
It is neither convenient nor scalable to check each and every tasklet
queue to figure whether it's empty or not while we often need to check
them all at once. This patch introduces a tasklet class mask which gets
a bit 1 set for each queue representing one class of service. A single
test on the mask allows to figure whether there's still some work to be
done. It will later be usable to better factor the runqueue code.
Bits are set when tasklets are queued. They're cleared when queues are
emptied. It is possible that a queue is empty but has a bit if a tasklet
was added then removed, but this is not a problem as this is properly
checked for in run_tasks_from_list().
It will be convenient to have the tasklet queue number soon, better make
current_queue an index rather than a pointer to the queue. When not currently
running (e.g. from I/O), the index is -1.
Add some functions to deinit the whole crtlist and ckch architecture.
It will free all crtlist, crtlist_entry, ckch_store, ckch_inst and their
associated SNI, ssl_conf and SSL_CTX.
The SSL_CTX in the default_ctx and initial_ctx still needs to be free'd
separately.
Since commit 2954c47 ("MEDIUM: ssl: allow crt-list caching"), the
ssl_bind_conf is allocated directly in the crt-list, and the crt-list
can be shared between several bind_conf. The deinit() code wasn't
changed to handle that.
This patch fixes the issue by removing the free of the ssl_conf in
ssl_sock_free_all_ctx().
It should be completed with a patch that free the ssl_conf and the
crt-list.
Fix issue #700.
A test on large objects revealed a big performance loss from 2.1. The
cause was found to be related to cache locality between scheduled
operations that are batched using tasklets. It happens that we now
have several layers of tasklets and that queuing all these operations
leaves time to let memory objects cool down in the CPU cache, effectively
resulting in halving the performance.
A quick test consisting in putting most unknown tasklets into the BULK
queue almost fixed the performance regression, but this is a wrong
approach as it can also slow down some low-latency transfers or access
to applets like the CLI.
What this patch does instead is to queue unknown tasklets into the same
queue as the current one when tasklet_wakeup() is itself called from a
task/tasklet, otherwise it uses urgent for real I/O (when sched->current
is NULL). This results in the called tasklet being woken up much sooner,
often at the end of the current batch of tasklets.
By doing so, a test on 2 cores 4 threads with 256 concurrent H1 conns
transferring 16m objects with 256kB buffers jumped from 55 to 88 Gbps.
It's even possible to go as high as 101 Gbps by evaluating the URGENT
queue after the BULK one, though this was not done as considered
dangerous for latency sensitive operations.
This reinforces the importance of getting back the CPU transfer
mechanisms based on tasklet_wakeup_after() to work at the tasklet level
by supporting an immediate wakeup in certain cases.
No backport is needed, this is strictly 2.2.
In task_per_thread[] we now have current_queue which is a pointer to
the current tasklet_list entry being evaluated. This will be used to
know the class under which the current task/tasklet is currently
running.
When DEBUG_FD is set at build time, we'll keep a counter of per-FD events
in the fdtab. This counter is reported in "show fd" even for closed FDs if
not zero. The purpose is to help spot situations where an apparently closed
FD continues to be reported in loops, or where some events are dismissed.
As reported in issue #419, a "clear map" operation on a very large map
can take a lot of time and freeze the entire process for several seconds.
This patch makes sure that pat_ref_prune() can regularly yield after
clearing some entries so that the rest of the process continues to work.
The first part, the removal of the patterns, can take quite some time
by itself in one run but it's still relatively fast. It may block for
up to 100ms for 16M IP addresses in a tree typically. This change needed
to declare an I/O handler for the clear operation so that we can get
back to it after yielding.
The second part can be much slower because it deconstructs the elements
and its users, but it iterates progressively so we can yield less often
here.
The patch was tested with traffic in parallel sollicitating the map being
released and showed no problem. Some traffic will definitely notice an
incomplete map but the filling is already not atomic anyway thus this is
not different.
It may be backported to stable versions once sufficiently tested for side
effects, at least as far as 2.0 in order to avoid the watchdog triggering
when the process is frozen there. For a better behaviour, all these
prune_* functions should support yielding so that the callers have a
chance to continue also yield in turn.
Some of the recent optimizations around the polling to save a few
epoll_ctl() calls have shown that they could also cause some trouble.
However, over time our code base has become totally asynchronous with
I/Os always attempted from the upper layers and only retried at the
bottom, making it look like we're getting closer to EPOLLET support.
There are showstoppers there such as the listeners which cannot support
this. But given that most of the epoll_ctl() dance comes from the
connections, we can try to enable edge-triggered polling on connections.
What this patch does is to add a new global tunable "tune.fd.edge-triggered",
that makes fd_insert() automatically set an et_possible bit on the fd if
the I/O callback is conn_fd_handler. When the epoll code sees an update
for such an FD, it immediately registers it in both directions the first
time and doesn't update it anymore.
On a few tests it proved quite useful with a 14% request rate increase in
a H2->H1 scenario, reducing the epoll_ctl() calls from 2 per request to
2 per connection.
The option is obviously disabled by default as bugs are still expected,
particularly around the subscribe() code where it is possible that some
layers do not always re-attempt reading data after being woken up.
localpeer <name>
Sets the local instance's peer name. It will be ignored if the "-L"
command line argument is specified or if used after "peers" section
definitions. In such cases, a warning message will be emitted during
the configuration parsing.
This option will also set the HAPROXY_LOCALPEER environment variable.
See also "-L" in the management guide and "peers" section in the
configuration manual.
This one was confusingly called, I thought it was the cumulated number
of streams but it's the number of calls to process_stream(). Let's make
this clearer.
We have poll_drop, poll_dead and poll_skip which are confusingly named
like their poll_io and poll_exp counterparts except that they are not
per poll() call but per-fd. This patch renames them to poll_drop_fd(),
poll_dead_fd() and poll_skip_fd() for this reason.
The "show activity" output mentions a number of indicators to explain
wake up reasons but doesn't have the number of times poll() sees some
I/O. And given that multiple events can happen simultaneously, it's
not always possible to deduce this metric by subtracting.
This patch adds a new "poll_io" counter that allows one to see how
often poll() returns with at least one active FD. This should help
detect stuck events and measure various ratios of poll sub-metrics.
fd_set_running() and fd_takeover() may both use a double-word CAS on the
(running_mask, thread_mask) couple and as such they expect the fields to
be exactly arranged like this. It's critical not to reorder them, so add
a comment to avoid such a potential mistake later.
Since 2.1-dev2, with commit 305d5ab46 ("MAJOR: fd: Get rid of the fd cache.")
we don't have the fd_lock anymore and as such its acitvity counter is always
zero. Let's remove it from the struct and from "show activity" output, as
there are already plenty of indicators to look at.
The cache line comment in the struct activity was updated to reflect
reality as it looks like another one already got removed in the past.
This macro is provided by clang but gcc lacks it. Not having it makes
it painful to test features on both compilers. Better define it to zero
when not available so that __has_feature(foo) never errors.
This function takes on input a string to tokenize, an output storage
(which may be the same) and a number of options indicating how to handle
certain characters (single & double quote support, backslash support,
end of line on '#', environment variables etc). On output it will provide
a list of pointers to individual words after having possibly unescaped
some character sequences, handled quotes and resolved environment
variables, and it will also indicate a status made of:
- a list of failures (overlap between src/dst, wrong quote etc)
- the pointer to the first sequence in error
- the required output length (a-la snprintf()).
This allows a caller to freely unescape/unquote a string by using a
pre-allocated temporary buffer and expand it as necessary. It takes
extreme care at avoiding expensive operations and intentionally does
not use memmove() when removing escapes, hence the reason for the
different input and output buffers. The goal is to use it as the basis
for the config parser.
As reported in issue #689, there is a subtle bug in the ebtree code used
to compared memory blocks. It stems from the platform-dependent memcmp()
implementation. Original implementations used to perform a byte-per-byte
comparison and to stop at the first non-matching byte, as in this old
example:
https://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/compat-sys5/memcmp.c.html
The ebtree code has been relying on this to detect the first non-matching
byte when comparing keys. This is made so that a zero-terminated string
can fail to match against a longer string.
Over time, especially with large busses and SIMD instruction sets,
multi-byte comparisons have appeared, making the processor fetch bytes
past the first different byte, which could possibly be a trailing zero.
This means that it's possible to read past the allocated area for a
string if it was allocated by strdup().
This is not correct and definitely confuses address sanitizers. In real
life the problem doesn't have visible consequences. Indeed, multi-byte
comparisons are implemented so that aligned words are loaded (e.g. 512
bits at once to process a cache line at a time). So there is no way such
a multi-byte access will cross a page boundary and end up reading from
an unallocated zone. This is why it was never noticed before.
This patch addresses this by implementing a one-byte-at-a-time memcmp()
variant for ebtree, called eb_memcmp(). It's optimized for both small and
long strings and guarantees to stop after the first non-matching byte. It
only needs 5 instructions in the loop and was measured to be 3.2 times
faster than the glibc's AVX2-optimized memcmp() on short strings (1 to
257 bytes), since that latter one comes with a significant setup cost.
The break-even seems to be at 512 bytes where both version perform
equally, which is way longer than what's used in general here.
This fix should be backported to stable versions and reintegrated into
the ebtree code.
Commit 0a3b43d9c ("MINOR: haproxy: Make use of deinit_and_exit() for
clean exits") introduced this build warning:
src/haproxy.c: In function 'main':
src/haproxy.c:3775:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
This is because the new deinit_and_exit() is not marked as "noreturn"
so depending on the optimizations, the noreturn attribute of exit() will
either leak through it and silence the warning or not and confuse the
compiler. Let's just add the attribute to fix this.
No backport is needed, this is purely 2.2.
As reported in issue #686, ARM64 build fails since the include files
reorganization. This is caused by the lack of string.h while a memcpy()
is present in __ha_cas_dw().
clang just failed on fd.c with this error:
src/fd.c:491:9: error: logical not is only applied to the left hand side of this comparison [-Werror,-Wlogical-not-parentheses]
while (HA_SPIN_TRYLOCK(OTHER_LOCK, &log_lock) != 0) {
^ ~~
That's because this expands to this:
while (!pl_try_s(&log_lock) != 0) {
Let's just add parenthesis in the TRYLOCK macros to avoid this.
This may need to be backported if commit df187875d ("BUG/MEDIUM: log:
don't hold the log lock during writev() on a file descriptor") is
backported as well as it seems to be the first one to trigger it.
The set of files proto_udp.{c,h} were misleadingly named, as they do not
provide anything related to the UDP protocol but to datagram handling
instead, since currently all UDP processing is hard-coded where it's used
(dns, logs). They are to UDP what connection.{c,h} are to proto_tcp. This
was causing confusion about how to insert UDP socket management code,
so let's rename them right now to dgram.{c,h} which more accurately
matches what's inside since every function and type is already prefixed
with "dgram_".
There are list definitions everywhere in the code, let's drop the need
for including list-t.h to declare them. The rest of the list manipulation
is huge however and not needed everywhere so using the list walking macros
still requires to include list.h.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
Since these are used as type attributes or conditional clauses, they
are used about everywhere and should not require a dependency on
thread.h. Moving them to compiler.h along with other similar statements
like ALIGN() etc looks more logical; this way they become part of the
base API. This allowed to remove thread-t.h from ~12 files, one was
found to only require thread-t and not thread and dict.c was found to
require thread.h.
That's already where MAX_PROCS is set, and we already handle the case of
the default value so there is no reason for placing it in thread.h given
that most call places don't need the rest of the threads definitions. The
include was removed from global-t.h and activity.c.
Sometimes we need to align a struct member or a struct's size only when
threads are enabled. This is the case on fdtab for example. Instead of
using ugly ifdefs in the code itself, let's have a THREAD_ALIGNED() macro
performing the alignment only when threads are enabled. For now this was
only applied to fd-t.h as it was the only place found.
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().
This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
The struct sample_data is used by pattern, map and vars, and currently
requires to include sample-t which comes with many other dependencies.
Let's move sample_data into its own file to shorten the dependency tree.
This revealed a number of issues in adjacent files which were hidden by
the fact that sample-t.h brought everything that was missing.
Directly including stddef.h in many files results in it being processed
multiple times while it can be centralized in api-t.h and be guarded
against multiple inclusions. Doing so reduces the number of preprocessed
lines by 1200!
Checks.c remains one of the largest file of the project and it contains
too many things. The tcpchecks code represents half of this file, and
both parts are relatively isolated, so let's move it away into its own
file. We now have tcpcheck.c, tcpcheck{,-t}.h.
Doing so required to export quite a number of functions because check.c
has almost everything made static, which really doesn't help to split!
check.c is one of the largest file and contains too many things. The
e-mail alerting code is stored there while nothing is in mailers.c.
Let's move this code out. That's only 4% of the code but a good start.
In order to do so, a few tcp-check functions had to be exported.
When building contrib/hpack there is a warning about an unused static
function. Actually it makes no sense to make it static, instead it must
be regularly exported. Similarly there is hpack_dht_get_tail() which is
inlined in the C file and which would make more sense with all other ones
in the H file.
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.
There is nothing left in common/ now so this directory must not be used
anymore.
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
The files remained mostly unchanged since they were OK. However, half of
the users didn't need to include them, and about as many actually needed
to have it and used to find functions like srv_currently_usable() through
a long chain that broke when moving the file.
This one is particularly difficult to split because it provides all the
functions used to manipulate a proxy state and to retrieve names or IDs
for error reporting, and as such, it was included in 73 files (down to
68 after cleanup). It would deserve a small cleanup though the cut points
are not obvious at the moment given the number of structs involved in
the struct proxy itself.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
It was moved without any change, however many callers didn't need it at
all. This was a consequence of the split of proto_http.c into several
parts that resulted in many locations to still reference it.
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
Just some minor reordering, and the usual cleanup of call places for
those which didn't need it. We don't include the whole tools.h into
stats-t anymore but just tools-t.h.
The type file was slightly tidied. The cli-specific APPCTX_CLI_ST1_* flag
definitions were moved to cli.h. The type file was adjusted to include
buf-t.h and not the huge buf.h. A few call places were fixed because they
did not need this include.
Initially it looked like this could have been placed into auth.h or
stats.h but it's not the case as it's what makes the link between them
and the HTTP layer. However the file needed to be split in two. Quite
a number of call places were dropped because these were mostly leftovers
from the early days where the stats and cli were packed together.
The files were moved almost as-is, just dropping arg-t and auth-t from
acl-t but keeping arg-t in acl.h. It was useful to revisit the call places
since a handful of files used to continue to include acl.h while they did
not need it at all. Struct stream was only made a forward declaration
since not otherwise needed.
The stktable_types[] array declaration was moved to the main file as
it had nothing to do in the types. A few declarations were reordered
in the types file so that defines were before the structs. Thread-t
was added since there are a few __decl_thread(). The loss of peers.h
revealed that cfgparse-listen needed it.
The cfg_peers external declaration was moved to the main file instead
of the type one. A few types were still missing from the proto, causing
warnings in the functions prototypes (proxy, stick_table).
All includes that were not absolutely necessary were removed because
checks.h happens to very often be part of dependency loops. A warning
was added about this in check-t.h. The fields, enums and structs were
a bit tidied because it's particularly tedious to find anything there.
It would make sense to split this in two or more files (at least
extract tcp-checks).
The file was renamed to the singular because it was one of the rare
exceptions to have an "s" appended to its name compared to the struct
name.
The type file is becoming a mess, half of it is for the proxy protocol,
another good part describes conn_streams and mux ops, it would deserve
being split again. At least it was reordered so that elements are easier
to find, with the PP-stuff left at the end. The MAX_SEND_FD macro was moved
to compat.h as it's said to be the value for Linux.
The TASK_IS_TASKLET() macro was moved to the proto file instead of the
type one. The proto part was a bit reordered to remove a number of ugly
forward declaration of static inline functions. About a tens of C and H
files had their dependency dropped since they were not using anything
from task.h.
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.
It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.
The UNIX_MAX_PATH definition was moved to compat.h.
There is no C file for this one, the code was placed into sample.c which
thus has a dependency on this file which itself includes sample.h. Probably
that it would be wise to split that later.
This one is particularly tricky to move because everyone uses it
and it depends on a lot of other types. For example it cannot include
arg-t.h and must absolutely only rely on forward declarations to avoid
dependency loops between vars -> sample_data -> arg. In order to address
this one, it would be nice to split the sample_data part out of sample.h.
There's no type file, it only contains fetch_rdp_cookie_name() and
val_payload_lv() which probably ought to move somewhere else instead
of staying there.
It was moved as-is, except for extern declaration of pattern_reference.
A few C files used to include it but didn't need it anymore after having
been split apart so this was cleaned.
One function prototype makes reference to struct mworker_proc which was
not defined there but in global.h instead. This definition, along with
the PROC_O_* fields were moved to mworker-t.h instead.
The file mostly contained struct definitions but there was also a
variable export. Most of the stuff currently lies in checks.h and
should definitely move here!
The STATS_DEFAULT_REALM and STATS_DEFAULT_URI were moved to defaults.h.
It was required to include types/pattern.h and types/sample.h since they
are mentioned in function prototypes.
It would be wise to merge this with uri_auth.h later.
List.h was missing for LIST_ADDQ(). A few unneeded includes of action.h
were removed from certain files.
This one still relies on applet.h and stick-table.h.
A few includes had to be added, namely list-t.h in the type file and
types/proxy.h in the proto file. actions.h was including http-htx.h
but didn't need it so it was dropped.
The sink files could be moved with almost no change at since they
didn't rely on anything fancy. ssize_t required sys/types.h and
thread.h was needed for the locks.
A few includes were missing in each file. A definition of
struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was
moved to defaults.h
Stdio used to be silently inherited from whatever path but it's needed
for list_pollers() which takes a FILE* and which can thus not be
forward-declared.
And also rename standard.c to tools.c. The original split between
tools.h and standard.h dates from version 1.3-dev and was mostly an
accident. This patch moves the files back to what they were expected
to be, and takes care of not changing anything else. However this
time tools.h was split between functions and types, because it contains
a small number of commonly used macros and structures (e.g. name_desc)
which in turn cause the massive list of includes of tools.h to conflict
with the callers.
They remain the ugliest files of the whole project and definitely need
to be cleaned and split apart. A few types are defined there only for
functions provided there, and some parts are even OS-specific and should
move somewhere else, such as the symbol resolution code.
The protocol.h files are pretty low in the dependency and (sadly) used
by some files from common/. Almost nothing was changed except lifting a
few comments.
The file was moved almost verbatim (only stdio.h was dropped as useless).
It was not split between types and functions because it's only included
from direct C code (fcgi.c and mux_fcgi.c) as well as fcgi_app.h, included
from the same ones, which should also be remerged as a single one.
The various hpack files are self-contained, but hpack-tbl was one of
those showing difficulties when pools were added because that began
to add quite some dependencies. Now when built in standalone mode,
it still uses the bare minimum pool definitions and doesn't require
to know the prototypes anymore when only the structures are needed.
Thus the files were moved verbatim except for hpack-tbl which was
split between types and prototypes.
Most of the file was a large set of HTX elements manipulation functions
and few types, so splitting them allowed to further reduce dependencies
and shrink the build time. Doing so revealed that a few files (h2.c,
mux_pt.c) needed haproxy/buf.h and were previously getting it through
htx.h. They were fixed.
The file was moved as-is. There was a wrong dependency on dynbuf.h
instead of buf.h which was addressed. There was no benefit to
splitting this between types and functions.
There's only one struct and 2 inline functions. It could have been
merged into http.h but that would have added a massive dependency on
the hpack parts for nothing, so better keep it this way since hpack
is already freestanding and portable.
So the enums and structs were placed into http-t.h and the functions
into http.h. This revealed that several files were dependeng on http.h
but not including it, as it was silently inherited via other files.
The type is the only element needed by applet.h and hlua.h, while hlua.c
needs the various functions. XREF_BUSY was placed into the types as well
since it's better to have the special values there.
Regex are essentially included for myregex_t but it turns out that
several of the C files didn't include it directly, relying on the
one included by their own .h. This has been cleanly addressed so
that only the type is included by H files which need it, and adding
the missing includes for the other ones.
The type was moved out as it's used by standard.h for netns_entry.
Instead of just being a forward declaration when not used, it's an
empty struct, which makes gdb happier (the resulting stripped executable
is the same).
The pretty confusing "buffer.h" was in fact not the place to look for
the definition of "struct buffer" but the one responsible for dynamic
buffer allocation. As such it defines the struct buffer_wait and the
few functions to allocate a buffer or wait for one.
This patch moves it renaming it to dynbuf.h. The type definition was
moved to its own file since it's included in a number of other structs.
Doing this cleanup revealed that a significant number of files used to
rely on this one to inherit struct buffer through it but didn't need
anything from this file at all.
This moves types/activity.h to haproxy/activity-t.h and
proto/activity.h to haproxy/activity.h.
The macros defining the bit field values for the profiling variable
were moved to the type file to be more future-proof.
Now the file is ready to be stored into its final destination. A few
minor reorderings were performed to keep the file properly organized,
making the various sections more visible (cache & lockless).
In addition and to stay consistent, memory.c was renamed to pool.c.
Till now the local pool caches were implemented only when lockless pools
were in use. This was mainly due to the difficulties to disentangle the
code parts. However the locked pools would further benefit from the local
cache, and having this would reduce the variants in the code.
This patch does just this. It adds a new debug macro DEBUG_NO_LOCAL_POOLS
to forcefully disable local pool caches, and makes sure that the high
level functions are now strictly the same between locked and lockless
(pool_alloc(), pool_alloc_dirty(), pool_free(), pool_get_first()). The
pool index calculation was moved inside the CONFIG_HAP_LOCAL_POOLS guards.
This allowed to move them out of the giant #ifdef and to significantly
reduce the code duplication.
A quick perf test shows that with locked pools the performance increases
by roughly 10% on 8 threads and gets closer to the lockless one.
pool_free() was not identical between locked and lockless pools. The
different was the call to __pool_free() in one case versus open-coded
accesses in the other, and the poisoning brought by commit da52035a45
("MINOR: memory: also poison the area on freeing") which unfortunately
did if only for the lockless path.
Let's now have __pool_free() to work on the global pool also in the
locked case so that the code is architected similarly.
Just as for the allocation path, the release path was not symmetrical.
It was not logical to have pool_put_to_cache() free the objects itself,
it was pool_free's job. In addition, just because of a variable export
issue, it the insertion of the object to free back into the local cache
couldn't be inlined while it was very cheap.
This patch just slightly reorganizes this code path by making pool_free()
decide whether or not to put the object back into the cache via
pool_put_to_cache() otherwise place it back to the global pool using
__pool_free(). Then pool_put_to_cache() adds the item to the local
cache and only calls pool_evict_from_cache() if the cache is too big.
When building with the local cache support, we have an asymmetry in
the allocation path which is that __pool_get_first() picks from the
cache while when no cache support is used, this one directly accesses
the shared area. It looks like it was done this way only to centralize
the call to __pool_get_from_cache() but this was not a good idea as it
complicates the splitting the code.
Let's move the cache access to the upper layer so thatt __pool_get_first()
remains agnostic to the cache support.
The call tree now looks like this with the cache enabled :
pool_get_first()
__pool_get_from_cache() // if cache enabled
__pool_get_first()
pool_alloc()
pool_alloc_dirty()
__pool_get_from_cache() // if cache enabled
__pool_get_first()
__pool_refill_alloc()
__pool_free()
pool_free_area()
pool_put_to_cache()
__pool_free()
__pool_put_to_cache()
pool_free()
pool_put_to_cache()
With cache disabled, the pool_free() path still differs:
pool_free()
__pool_free_area()
__pool_put_to_cache()
The memory.h file is particularly complex due to the combination of
debugging options. This patch extracts the OS-level interface and
places it into a new file: pool-os.h.
Doing this also moves pool_alloc_area() and pool_free_area() out of
the #ifndef CONFIG_HAP_LOCKLESS_POOLS, making them usable from
__pool_refill_alloc(), pool_free(), pool_flush() and pool_gc() instead
of having direct calls to malloc/free there that are hard to wrap for
debugging purposes.
This is the beginning of the move and cleanup of memory.h. This first
step only extracts type definitions and basic macros that are needed
by the files which reference a pool. They're moved to pool-t.h (since
"pool" is more obvious than "memory" when looking for pool-related
stuff). 3 files which didn't need to include the whole memory.h were
updated.
In memory.h we had to reimplement the swrate* functions just because of
a broken circular dependency around freq_ctr.h. Now that this one is
solved, let's get rid of this copy and use the original ones instead.
types/freq_ctr.h was moved to haproxy/freq_ctr-t.h and proto/freq_ctr.h
was moved to haproxy/freq_ctr.h. Files were updated accordingly, no other
change was applied.
Some of them were simply removed as unused (possibly some leftovers
from an older cleanup session), some were turned to haproxy/bitops.h
and a few had to be added (hlua.c and stick-table.h need standard.h
for parse_time_err; htx.h requires chunk.h but used to get it through
standard.h).
There are quite a number of integer manipulation functions defined in
standard.h, which is one of the reasons why standard.h is included from
many places and participates to the dependencies loop.
Let's just have a new file, intops.h to place all these operations.
These are a few bitops, 32/64 bit mul/div/rotate, integer parsing and
encoding (including varints), the full avalanche hash function, and
the my_htonll/my_ntohll functions.
For now no new C file was created for these yet.
This one is included almost everywhere and used to rely on a few other
.h that are not needed (unistd, stdlib, standard.h). It could possibly
make sense to split it into multiple parts to distinguish operations
performed on timers and the internal time accounting, but at this point
it does not appear much important.
This splits the hathreads.h file into types+macros and functions. Given
that most users of this file used to include it only to get the definition
of THREAD_LOCAL and MAXTHREADS, the bare minimum was placed into thread-t.h
(i.e. types and macros).
All the thread management was left to haproxy/thread.h. It's worth noting
the drop of the trailing "s" in the name, to remove the permanent confusion
that arises between this one and the system implementation (no "s") and the
makefile's option (no "s").
For consistency, src/hathreads.c was also renamed thread.c.
A number of files were updated to only include thread-t which is the one
they really needed.
Some future improvements are possible like replacing empty inlined
functions with macros for the thread-less case, as building at -O0 disables
inlining and causes these ones to be emitted. But this really is cosmetic.
A few files were including it while not needing it (anymore). Some
only required access to the atomic ops and got haproxy/atomic.h in
exchange. Others didn't need it at all. A significant number of
files still include it only for THREAD_LOCAL definition.
The hathreads.h file has quickly become a total mess because it contains
thread definitions, atomic operations and locking operations, all this for
multiple combinations of threads, debugging and architectures, and all this
done with random ordering!
This first patch extracts all the atomic ops code from hathreads.h to move
it to haproxy/atomic.h. The code there still contains several sections
based on non-thread vs thread, and GCC versions in the latter case. Each
section was arranged in the exact same order to ease finding.
The redundant HA_BARRIER() which was the same as __ha_compiler_barrier()
was dropped in favor of the latter which follows the naming convention
of all other barriers. It was only used in freq_ctr.c which was updated.
Additionally, __ha_compiler_barrier() was defined inconditionally but
used only for thread-related operations, so it was made thread-only like
HA_BARRIER() used to be. We'd still need to have two types of compiler
barriers, one for the general case (e.g. signals) and another one for
concurrency, but this was not addressed here.
Some comments were added at the beginning of each section to inform about
the use case and warn about the traps to avoid.
Some files which continue to include hathreads.h solely for atomic ops
should now be updated.
Half of the users of this include only need the type definitions and
not the manipulation macros nor the inline functions. Moves the various
types into mini-clist-t.h makes the files cleaner. The other one had all
its includes grouped at the top. A few files continued to reference it
without using it and were cleaned.
In addition it was about time that we'd rename that file, it's not
"mini" anymore and contains a bit more than just circular lists.
File buf.h is one common cause of pain in the dependencies. Many files in
the code need it to get the struct buffer definition, and a few also need
the inlined functions to manipulate a buffer, but the file used to depend
on a long chain only for BUG_ON() (addressed by last commit).
Now buf.h is split into buf-t.h which only contains the type definitions,
and buf.h for all inlined functions. Callers who don't care can continue
to use buf.h but files in types/ must only use buf-t.h. sys/types.h had
to be added to buf.h to get ssize_t as used by b_move(). It's worth noting
that ssize_t is only supposed to be a size_t supporting -1, so b_move()
ought to be rethought regarding this.
The files were moved to haproxy/ and all their users were updated
accordingly. A dependency issue was addressed on fcgi whose C file didn't
include buf.h.
This one was introduced 5 years ago for debugging and never really used.
It is the one which used to cause circular dependencies issues. Let's drop
it instead of starting to split the debug include in two.
This one used to be stored into debug.h but the debug tools got larger
and require a lot of other includes, which can't use BUG_ON() anymore
because of this. It does not make sense and instead this macro should
be placed into the lower includes and given its omnipresence, the best
solution is to create a new bug.h with the few surrounding macros needed
to trigger bugs and place assertions anywhere.
Another benefit is that it won't be required to add include <debug.h>
anymore to use BUG_ON, it will automatically be covered by api.h. No
less than 32 occurrences were dropped.
The FSM_PRINTF macro was dropped since not used at all anymore (probably
since 1.6 or so).
Fortunately that file wasn't made dependent upon haproxy since it was
integrated, better isolate it before it's too late. Its dependency on
api.h was the result of the change from config.h, which in turn wasn't
correct. It was changed back to stddef.h for size_t and sys/types.h for
ssize_t. The recently added reference to MAX() was changed as it was
placed only to avoid a zero length in the non-free-standing version and
was causing a build warning in the hpack encoder.
This file is to openssl what compat.h is to the libc, so it makes sense
to move it to haproxy/. It could almost be part of api.h but given the
amount of openssl stuff that gets loaded I fear it could increase the
build time.
Note that this file contains lots of inlined functions. But since it
does not depend on anything else in haproxy, it remains safe to keep
all that together.
There is one "template.h" per include subdirectory to show how to create
a new file but in practice nobody knows they're here so they're useless.
Let's simply remove them.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This file includes everything that must be guaranteed to be available to
any buildable file in the project (including the contrib/ subdirs). For
now it includes <haproxy/api-t.h> so that standard integer types and
compiler macros are known, <common/initcall.h> to ease dynamic registration
of init functions, and <common/tools.h> for a few MIN/MAX macros.
version.h should probably also be added, though at the moment it doesn't
bring a great value.
All files which currently include the ones above should now switch to
haproxy/api.h or haproxy/api-t.h instead. This should also reduce build
time by having a single guard for several files at once.
This file is at the lowest level of the include tree. Its purpose is to
make sure that common types are known pretty much everywhere, particularly
in structure declarations. It will essentially cover integer types such as
uintXX_t via inttypes.h, "size_t" and "ptrdiff_t" via stddef.h, and various
type modifiers such as __maybe_unused or ALIGN() via compiler.h, compat.h
and defaults.h.
It could be enhanced later if required, for example if some macros used
to compute array sizes are needed.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
By default, HAProxy is able to implicitly upgrade an H1 client connection to an
H2 connection if the first request it receives from a given HTTP connection
matches the HTTP/2 connection preface. This way, it is possible to support H1
and H2 clients on a non-SSL connections. It could be a problem if for any
reason, the H2 upgrade is not acceptable. "option disable-h2-upgrade" may now be
used to disable it, per proxy. The main puprose of this option is to let an
admin to totally disable the H2 support for security reasons. Recently, a
critical issue in the HPACK decoder was fixed, forcing everyone to upgrade their
HAProxy version to fix the bug. It is possible to disable H2 for SSL
connections, but not on clear ones. This option would have been a viable
workaround.
The support for reqrep and friends was removed in 2.1 but the
chain_regex() function and the "action" field in the regex struct
was still there. This patch removes them.
One point worth mentioning though. There is a check_replace_string()
function whose purpose was to validate the replacement strings passed
to reqrep. It should also be used for other replacement regex, but is
never called. Callers of exp_replace() should be checked and a call to
this function should be added to detect the error early.
log-proto <logproto>
The "log-proto" specifies the protocol used to forward event messages to
a server configured in a ring section. Possible values are "legacy"
and "octet-count" corresponding respectively to "Non-transparent-framing"
and "Octet counting" in rfc6587. "legacy" is the default.
Notes: a separated io_handler was created to avoid per messages test
and to prepare code to set different log protocols such as
request- response based ones.
This patch adds new statement "server" into ring section, and the
related "timeout connect" and "timeout server".
server <name> <address> [param*]
Used to configure a syslog tcp server to forward messages from ring buffer.
This supports for all "server" parameters found in 5.2 paragraph.
Some of these parameters are irrelevant for "ring" sections.
timeout connect <timeout>
Set the maximum time to wait for a connection attempt to a server to succeed.
Arguments :
<timeout> is the timeout value specified in milliseconds by default, but
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
timeout server <timeout>
Set the maximum time for pending data staying into output buffer.
Arguments :
<timeout> is the timeout value specified in milliseconds by default, but
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
Example:
global
log ring@myring local7
ring myring
description "My local buffer"
format rfc3164
maxlen 1200
size 32764
timeout connect 5s
timeout server 10s
server mysyslogsrv 127.0.0.1:6514
Commit 04f5fe87d3 introduced an rwlock in the pools to deal with the risk
that pool_flush() dereferences an area being freed, and commit 899fb8abdc
turned it into a spinlock. The pools already contain a spinlock in case of
locked pools, so let's use the same and simplify the code by removing ifdefs.
At this point I'm really suspecting that if pool_flush() would instead
rely on __pool_get_first() to pick entries from the pool, the concurrency
problem could never happen since only one user would get a given entry at
once, thus it could not be freed by another user. It's not certain this
would be faster however because of the number of atomic ops to retrieve
one entry compared to a locked batch.
HTTP_1XX, HTTP_3XX and HTTP_4XX message templates are no longer used. Only
HTTP_302 and HTTP_303 are used during configuration parsing by "errorloc" family
directives. So these templates are removed from the generic http code. And
HTTP_302 and HTTP_303 templates are moved as static strings in the function
parsing "errorloc" directives.
Now http-request auth rules are evaluated in a dedicated function and no longer
handled "in place" during the HTTP rules evaluation. Thus the action name
ACT_HTTP_REQ_AUTH is removed. In additionn, http_reply_40x_unauthorized() is
also removed. This part is now handled in the new action_ptr callback function.
There is no reason to not use proxy's error replies to emit 401/407
responses. The function http_reply_40x_unauthorized(), responsible to emit those
responses, is not really complex. It only adds a
WWW-Authenticate/Proxy-Authenticate header to a generic message.
So now, error replies can be defined for 401 and 407 status codes, using
errorfile or http-error directives. When an http-request auth rule is evaluated,
the corresponding error reply is used. For 401 responses, all occurrences of the
WWW-Authenticate header are removed and replaced by a new one with a basic
authentication challenge for the configured realm. For 407 responses, the same
is done on the Proxy-Authenticate header. If the error reply must not be
altered, "http-request return" rule must be used instead.
During pool_free(), when the ->allocated value is 125% of needed_avg or
more, instead of putting the object back into the pool, it's immediately
freed using free(). By doing this we manage to significantly reduce the
amount of memory pinned in pools after transient traffic spikes.
During a test involving a constant load of 100 concurrent connections
each delivering 100 requests per second, the memory usage was a steady
21 MB RSS. Adding a 1 minute parallel load of 40k connections all looping
on 100kB objects made the memory usage climb to 938 MB before this patch.
With the patch it was only 660 MB. But when this parasit load stopped,
before the patch the RSS would remain at 938 MB while with the patch,
it went down to 480 then 180 MB after a few seconds, to stabilize around
69 MB after about 20 seconds.
This can be particularly important to improve reloads where the memory
has to be shared between the old and new process.
Another improvement would be welcome, we ought to have a periodic task
to check pools usage and continue to free up unused objects regardless
of any call to pool_free(), because the needed_avg value depends on the
past and will not cover recently refilled objects.
This adds a sliding estimate of the pools' usage. The goal is to be able
to use this to start to more aggressively free memory instead of keeping
lots of unused objects in pools. The average is calculated as a sliding
average over the last 1024 consecutive measures of ->used during calls to
pool_free(), and is bumped up for 1/4 of its history from ->allocated when
allocation from the pool fails and results in a call to malloc().
The result is a floating value between ->used and ->allocated, that tries
to react fast to under-estimates that result in expensive malloc() but
still maintains itself well in case of stable usage, and progressively
goes down if usage shrinks over time.
This new metric is reported as "needed_avg" in "show pools".
Sadly due to yet another include dependency hell, we couldn't reuse the
functions from freq_ctr.h so they were temporarily duplicated into memory.h.
It is possible to globally declare ring-buffers, to be used as target for log
servers or traces.
ring <ringname>
Creates a new ring-buffer with name <ringname>.
description <text>
The descritpition is an optional description string of the ring. It will
appear on CLI. By default, <name> is reused to fill this field.
format <format>
Format used to store events into the ring buffer.
Arguments:
<format> is the log format used when generating syslog messages. It may be
one of the following :
iso A message containing only the ISO date, followed by the text.
The PID, process name and system name are omitted. This is
designed to be used with a local log server.
raw A message containing only the text. The level, PID, date, time,
process name and system name are omitted. This is designed to be
used in containers or during development, where the severity
only depends on the file descriptor used (stdout/stderr). This
is the default.
rfc3164 The RFC3164 syslog message format. This is the default.
(https://tools.ietf.org/html/rfc3164)
rfc5424 The RFC5424 syslog message format.
(https://tools.ietf.org/html/rfc5424)
short A message containing only a level between angle brackets such as
'<3>', followed by the text. The PID, date, time, process name
and system name are omitted. This is designed to be used with a
local log server. This format is compatible with what the systemd
logger consumes.
timed A message containing only a level between angle brackets such as
'<3>', followed by ISO date and by the text. The PID, process
name and system name are omitted. This is designed to be
used with a local log server.
maxlen <length>
The maximum length of an event message stored into the ring,
including formatted header. If an event message is longer than
<length>, it will be truncated to this length.
size <size>
This is the optional size in bytes for the ring-buffer. Default value is
set to BUFSIZE.
Example:
global
log ring@myring local7
ring myring
description "My local buffer"
format rfc3164
maxlen 1200
Note: ring names are resolved during post configuration processing.
With "MINOR: lua: Use vars_unset_by_name_ifexist()" the last user was
removed and as outlined in that commit there is no good reason for this
function to exist.
May be backported together with the commit mentioned above.
Recent commit 2bdcc70fa7 ("MEDIUM: hpack: use a pool for the hpack table")
made the hpack code finally use a pool with very unintrusive code that was
assumed to be trivial enough to adjust if the code needed to be reused
outside of haproxy. Unfortunately the code in contrib/hpack already uses
it and broke the oss-fuzz tests as it doesn't build anymore.
This patch adds an HPACK_STANDALONE macro to decide if we should use the
pools or malloc+free. The resulting macros are called hpack_alloc() and
hpack_free() respectively, and the size must be passed into the pool
itself.
The http-error directive can now be used instead of errorfile to define an error
message in a proxy section (including default sections). This directive uses the
same syntax that http return rules. The only real difference is the limitation
on status code that may be specified. Only status codes supported by errorfile
directives are supported for this new directive. Parsing of errorfile directive
remains independent from http-error parsing. But functionally, it may be
expressed in terms of http-errors :
errorfile <status> <file> ==> http-errror status <status> errorfile <file>
The htx_copy_msg() function can now be used to copy the HTX message stored in a
buffer in an existing HTX message. It takes care to not overwrite existing
data. If the destination message is empty, a raw copy is performed. All the
message is copied or nothing.
This function is used instead of channel_htx_copy_msg().
When HAProxy returns an http error message, the corresponding http reply is now
used instead of the buffer containing the corresponding HTX message. So,
http_error_message() function now returns the http reply to use for a given
stream. And the http_reply_and_close() function now relies on
http_reply_message() to send the response to the client.
The txn flag TX_CONST_REPLY may now be used to prevent after-response ruleset
evaluation. It is used if this ruleset evaluation failed on an internal error
response. Before, it was done incrementing the parameter <final>. But it is not
really convenient if an intermediary function is used to produce the
response. Using a txn flag could also be a good way to prevent after-response
ruleset evaluation in a different context.
When an http reply is configured to use an error message from an http-errors
section, instead of referencing the error message, the http reply is used. To do
so the new http reply type HTTP_REPLY_INDIRECT has been added.
Error messages defined in proxy section or inherited from a default section are
now also referenced using an array of http replies. This is done during the
configuration validity check.
During configuration parsing, error messages resulting of parsing of errorloc
and errorfile directives are now also stored as an http reply. So, for now,
these messages are stored as a buffer and as an http reply. To be able to
release all these http replies when haproxy is stopped, a global list is
used. We must do that because the same http reply may be referenced several
times by different proxies if it is defined in a default section.
Error messages specified in an http-errors section is now also stored in an
array of http replies. So, for now, these messages are stored as a buffer and as
a http reply.
Default error messages are stored as a buffer, in http_err_chunks global array.
Now, they are also stored as a http reply, in http_err_replies global array.
"http-request deny", "http-request tarpit" and "http-response deny" rules now
use the same syntax than http return rules and internally rely on the http
replies. The behaviour is not the same when no argument is specified (or only
the status code). For http replies, a dummy response is produced, with no
payload. For old deny/tarpit rules, the proxy's error messages are used. Thus,
to be compatible with existing configuration, the "default-errorfiles" parameter
is implied. For instance :
http-request deny deny_status 404
is now an alias of
http-request deny status 404 default-errorfiles
The http_reply_message() function may be used to send an http reply to a
client. This function is responsile to convert the reply in HTX, to push it in
the response buffer and to forward it to the client. It is also responsible to
terminate the transaction.
This function is used during evaluation of http return rules.
A dedicated function is added to check the validity of an http reply object,
after parsing. It is used to check the validity of http return rules.
For now, this function is only used to find the right error message in an
http-errors section for http replies of type HTTP_REPLY_ERRFILES (using
"errorfiles" argument). On success, such replies are updated to point on the
corresponding error message and their type is set to HTTP_REPLY_ERRMSG. If an
unknown http-errors section is referenced, anx error is returned. If a unknown
error message is referenced inside an existing http-errors section, a warning is
emitted and the proxy's error messages are used instead.
A dedicated function to parse arguments and create an http_reply object is
added. It is used to parse http return rule. Thus, following arguments are
parsed by this function :
... [status <code>] [content-type <type>]
[ { default-errorfiles | errorfile <file> | errorfiles <name> |
file <file> | lf-file <file> | string <str> | lf-string <fmt> } ]
[ hdr <name> <fmt> ]*
Because the status code argument is optional, a default status code must be
defined when this function is called.
No real change here. Instead of using an internal structure to the action rule,
the http return rules are now stored as an http reply. The main change is about
the action type. It is now always set to ACT_CUSTOM. The http reply type is used
to know how to evaluate the rule.
The structure owns an error message, most of time loaded from a file, and
converted to HTX. It is created when an errorfile or errorloc directive is
parsed. It is renamed to avoid ambiguities with http_reply structure.
The http_reply structure is added. It represents a generic HTTP message used as
internal response by HAProxy. It is based on the structure used to store http
return rules. The aim is to store all error messages using this structure, as
well as http return and http deny rules.
TX_CLDENY, TX_CLALLOW, TX_SVDENY and TX_SVALLOW flags are unused. Only
TX_CLTARPIT is used to make the difference between an http deny rule and an http
tarpit rule. So these unused flags are removed.
In the CLI command 'show ssl crt-list', the ssl-min-ver and the
ssl-min-max arguments were always displayed because the dumped versions
were the actual version computed and used by haproxy, instead of the
version found in the configuration.
To fix the problem, this patch separates the variables to have one with
the configured version, and one with the actual version used. The dump
only shows the configured version.
This reverts commit 957ec59571.
As discussed with Emeric, the current syntax is not extensible enough,
this will be turned to a section instead in a forthcoming patch.
The ring to applet communication was only made to deal with CLI functions
but it's generic. Let's have generic appctx functions and have the CLI
rely on these instead. This patch introduces ring_attach_appctx() and
ring_detach_appctx().
A few fields, including a generic list entry, were added to the CLI context
by commit 300decc8d9 ("MINOR: cli: extend the CLI context with a list and
two offsets"). It turns out that the list entry (l0) is solely used to
consult rings and that the generic ring_write() code is restricted to a
consumer on the CLI due to this, which was not the initial intent. Let's
make it a general purpose wait_entry field that is properly initialized
during appctx_init(). This will allow any applet to wait on a ring, not
just the CLI.
Instead of using malloc/free to allocate an HPACK table, let's declare
a pool. However the HPACK size is configured by the H2 mux, so it's
also this one which allocates it after post_check.
This patch adds the new global statement:
ring <name> [desc <desc>] [format <format>] [size <size>] [maxlen <length>]
Creates a named ring buffer which could be used on log line for instance.
<desc> is an optionnal description string of the ring. It will appear on
CLI. By default, <name> is reused to fill this field.
<format> is the log format used when generating syslog messages. It may be
one of the following :
iso A message containing only the ISO date, followed by the text.
The PID, process name and system name are omitted. This is
designed to be used with a local log server.
raw A message containing only the text. The level, PID, date, time,
process name and system name are omitted. This is designed to be
used in containers or during development, where the severity only
depends on the file descriptor used (stdout/stderr). This is
the default.
rfc3164 The RFC3164 syslog message format. This is the default.
(https://tools.ietf.org/html/rfc3164)
rfc5424 The RFC5424 syslog message format.
(https://tools.ietf.org/html/rfc5424)
short A message containing only a level between angle brackets such as
'<3>', followed by the text. The PID, date, time, process name
and system name are omitted. This is designed to be used with a
local log server. This format is compatible with what the systemd
logger consumes.
timed A message containing only a level between angle brackets such as
'<3>', followed by ISO date and by the text. The PID, process
name and system name are omitted. This is designed to be
used with a local log server.
<length> is the maximum length of event message stored into the ring,
including formatted header. If the event message is longer
than <length>, it would be truncated to this length.
<name> is the ring identifier, which follows the same naming convention as
proxies and servers.
<size> is the optionnal size in bytes. Default value is set to BUFSIZE.
Note: Historically sink's name and desc were refs on const strings. But with new
configurable rings a dynamic allocation is needed.
Before this path, they rely directly on ring_write bypassing
a part of the sink API.
Now the maxlen parameter of the log will apply only on the text
message part (and not the header, for this you woud prefer
to use the maxlen parameter on the sink/ring).
sink_write prototype was also reviewed to return the number of Bytes
written to be compliant with the other write functions.
This patch extends the sink_write prototype and code to
handle the rfc5424 and rfc3164 header.
It uses header building tools from log.c. Doing this some
functions/vars have been externalized.
facility and minlevel have been removed from the struct sink
and passed to args at sink_write because they depends of the log
and not of the sink (they remained unused by rest of the code
until now).
since commit c0cdaffaa3 ("REORG: ssl: move ssl_sock_ctx and fix
cross-dependencies issues"), `struct ssl_sock_ctx` was moved in
ssl_sock.h. As it contains a `struct buffer`, including
`common/buffer.h` is now mandatory. I encountered an issue while
including ssl_sock.h on another patch:
include/types/ssl_sock.h:240:16: error: field ‘early_buf’ has incomplete type
240 | struct buffer early_buf; /* buffer to store the early data received */
no backport needed.
Fixes: c0cdaffaa3 ("REORG: ssl: move ssl_sock_ctx and fix
cross-dependencies issues")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Add swrate_add_dynamic function which is similar to swrate_add, but more
accurate when calculating moving averages when not enough samples have
been processed yet.
In order to move all SSL sample fetches in another file, moving the
ssl_sock_ctx definition in a .h file is required.
Unfortunately it became a cross dependencies hell to solve, because of
the struct wait_event field, so <types/connection.h> is needed which
created other problems.
Add forward declarations in types/ssl_crtlist.h in order to avoid
circular dependencies. Also remove the listener.h include which is not
needed anymore.
The ssl_sock.c file contains a lot of macros and structure definitions
that should be in a .h. Move them to the more appropriate
types/ssl_sock.h file.
This patch adds the ability to register callbacks for SSL/TLS protocol
messages by using the function ssl_sock_register_msg_callback().
All registered callback functions will be called when observing received
or sent SSL/TLS protocol messages.
The EVP_MD_CTX_create() and EVP_MD_CTX_destroy() functions were renamed to
EVP_MD_CTX_new() and EVP_MD_CTX_free() in OpenSSL 1.1.0, respectively. These
functions are used by the digest converter, introduced by the commit 8e36651ed
("MINOR: sample: Add digest and hmac converters"). So for prior versions of
openssl, macros are used to fallback on old functions.
This patch must only be backported if the commit 8e36651ed is backported too.
This one really ought to be defined in hathreads.h like all other thread
definitions, which is what this patch does. As expected, all files but
one (regex.h) were already including hathreads.h when using THREAD_LOCAL;
regex.h was fixed for this.
This was the last entry in config.h which is now useless.
The setting of CONFIG_HAP_LOCKLESS_POOLS depending on threads and
compat was done in config.h for use only in memory.h and memory.c
where other settings are dealt with. Further, the default pool cache
size was set there from a fixed value instead of being set from
defaults.h
Let's move the decision to enable lockless pools via
CONFIG_HAP_LOCKLESS_POOLS to memory.h, and set the default pool
cache size in defaults.h like other default settings.
This was the next-to-last setting in config.h.