The ring lock was initially mostly used for the logs and used to inherit
its name in lock stats. Now that it's exclusively used by rings, let's
rename it accordingly.
fd.state is reported without the "0x" prefix in show sess, let's support
this during decoding.
This may be backported to all versions supporting this utility.
This script can be used through a http-request rules to log SSL keys for
traffic on a dedicated frontend. The resulting file can then be injected
into wireshark to decipher the corresponding network capture.
Highlighting a few fields helps spot them, but only if there are not too
many. What is done here is the following:
- the first line of each stream is highlighted in white (helps find
beginning/end in long dumps
- fields in the form name=value where value starts with upper case
letters are considered as a state dump (e.g. stconn state) and are
also highlighted. This results in ~20 pairs. In this case the name
and value use two different colors (cyan vs yellow) to further help
find what is being looked for
This is only done when the output is a terminal or when --color=always
is passed. It's also possible to disable it with --color=never or
--no-color.
Some fields are followed by a comma or a closing parenthesis and we
take them because we read everything that's not a space. Better be
stricter, we're causing warnings about incorrect hex format when
they're passed to printf.
Ideally haring should be compiled with the same options as haproxy so
that ring headers have the same size (e.g. with/without locks, with/
without lock debugging). But when enabling DEBUG_STRICT, BUG_ON() is
enabled and breaks the build by making references to complain() and
ha_backtrace_to_stderr().
Let's just disable DEBUG_STRICT before opening include files. This is
sufficient to address the problem.
This may be backorted to older versions that include haring.
Now we can build a series of data frames by reading from a file and
chunking it into frames of requested length. It's mostly useful for
data frames (e.g. post). One way to announce these upfront is to
capture the output of curl use without content-length:
$ nc -lp4446 > post-h2-nocl.bin
$ curl -v --http2-prior-knowledge http://127.0.0.1:4446/url -H "content-length:" -d @/dev/null
Then just change the 5th byte from the end from 1 to 0 to remove the
end-of-stream bit, it will allow to chain a file, then to send an
empty DATA frame with ES set :
$ (dev/h2/mkhdr.sh -i 1 -t data -d CHANGELOG;
dev/h2/mkhdr.sh -i 1 -t data -l 0 -f es) > h2-data-changelog.bin
Then post that to the server:
$ cat post-h2-nocl.bin h2-data-changelog.bin | nc 0 4446
The ring's offset currently contains a perpetually growing custor which
is the number of bytes written from the start. It's used by readers to
know where to (re)start reading from. It was made absolute because both
the head and the tail can change during writes and we needed a fixed
position to know where the reader was attached. But this is complicated,
error-prone, and limits the ability to reduce the lock's coverage. In
fact what is needed is to know where the reader is currently waiting, if
at all. And this location is exactly where it stored its count, so the
absolute position in the buffer (the seek offset from the first storage
byte) does represent exactly this, as it doesn't move (we don't realign
the buffer), and is stable regardless of how head/tail changes with writes.
This patch modifies this so that the application code now uses this
representation instead. The most noticeable change is the initialization,
where we've kept ~0 as a marker to go to the end, and it's now set to
the tail offset instead of trying to resolve the current write offset
against the current ring's position.
The offset was also used at the end of the consuming loop, to detect
if a new write had happened between the lock being released and taken
again, so as to wake the consumer(s) up again. For this we used to
take a copy of the ring->ofs before unlocking and comparing with the
new value read in the next lock. Since it's not possible to write past
the current reader's location, there's no risk of complete rollover, so
it's sufficient to check if the tail has changed.
Note that the change also has an impact on the haring consumer which
needs to adapt as well. But that's good in fact because it will rely
on one less variable, and will use offsets relative to the buffer's
head, and the change remains backward-compatible.
Since 7d84439 ("BUILD: hpack: include global.h for the trash that is needed
in debug mode"), hpack decode tool fails to compile on targets that enable
USE_THREAD. (ie: linux-glibc target as reported by Christian Ruppert)
When building hpack devtool, we are including src/hpack-dec.c as a dependency.
src/hpack-dec.c relies on the global trash whe debug mode is enabled.
But as we're building hpack tool with a limited scope of haproxy
sources, global trash (which is declared in src/chunk.c) is not available.
Thus, src/hpack-dec.c relies on a local 'trash' variable declared within
dev/hpack/decode.c
This used to work fine until 7d84439.
But now that global.h is explicitely included in src/hpack-dec.c,
trash variable definition from decode.c conflicts with the one from global.h:
In file included from include/../src/hpack-dec.c:35,
from dev/hpack/decode.c:87:
include/haproxy/global.h:52:35: error: thread-local declaration of 'trash' follows non-thread-local declaration
52 | extern THREAD_LOCAL struct buffer trash;
Adding THREAD_LOCAL attribute to 'decode.c' local trash variable definition
makes the compiler happy again.
This should fix GH issue #2009 and should be backported to 2.7.
In case a file-backed ring was not properly synced before being dumped,
the output can look bogus due to the head pointer not being perfectly
up to date. In this case, passing "-r" will make haring automatically
skip entries not starting with a zero, and resynchronize with the rest
of the messages.
This should be backported to 2.6.
Since the tool permits to pass an FD bound for listening, it's convenient
to test haproxy's "bind fd@". Let's add support for UNIX sockets the same
way. -U needs to be passed to change the default address family, and the
address must contain a "/".
E.g.
$ dev/tcploop/tcploop -U /tmp/ux L Xi ./haproxy -f fd1.cfg
When -e is passed, epoll is used instead of poll. The FD is added
then removed around the call to epoll_wait() so that we don't need
to track it. The only purpose is to compare events reported by each
syscall.
There are multiple call places for poll(), let's first centralize them
to make it easier to enhance it. All callers now use the wait_for_fd()
function which was extended to take a timeout and which can return the
indication that an error was seen.
When called with -e, epoll is used instead of poll. The poller does
very little in this code (just checks for any event without waiting) but
that's sufficient to see return values.
A number of internal flags started to be exposed to external programs
at the location of their definition since commit 77acaf5af ("MINOR:
flags: add a new file to host flag dumping macros"). This allowed the
"flags" utility to decode many more of them and always correctly. The
condition to expose them was to rely on the preliminary definition of
EOF that indicates that stdio is already included. But this was a
wrong approach. It only guarantees that snprintf() can safely be used
but still causes large functions to be built. But stdio is often
included before some of these includes, so these heavy inline functions
actually have to be compiled in many cases. The result is that the
build time significantly increased, especially with fast compilers
like gcc -O0 which took +50% or TCC which took +100%!
This patch addresses the problem by instead relying on an explicit
macro HA_EXPOSE_FLAGS that the calling program must explicitly define
before including these files. flags.c does this and that's all. The
previous build time is now restored with a speed up of 20 to 50%
depending on the build options.
Now the connect() step becomes an action. It's still implicit before
any -c/-s but it allows the listener to close() before connect()
happens, showing the polling status for this condition:
$ dev/poll/poll -v -l clo -c pol
#### BEGIN ####
cmd #1 stp #1: do_clo(3): ret=0
cmd #2 stp #0: do_con(4): ret=-1 (Connection refused)
cmd #2 stp #1: do_pol(4): ret=1 ev=0x14 (OUT HUP)
#### END ####
which differs from a case where the server closes the just accepted
connection:
$ dev/poll/poll -v -s clo -c pol
#### BEGIN ####
cmd #1 stp #0: do_con(4): ret=0
cmd #1 stp #0: do_acc(3): ret=5
cmd #1 stp #1: do_clo(5): ret=0
cmd #2 stp #1: do_pol(4): ret=1 ev=0x2005 (IN OUT RDHUP)
#### END ####
It's interesting to see OUT+HUP since HUP indicates that both directions
were closed, hence nothing may be written now, thus OUT just wants the
write handler to be notified.
The $(Q), $(V), $(cmd_xx) handling needs to be reused in sub-project
makefiles and it's a pain to maintain inside the main makefile. Let's
just move that into a new subdir include/make/ with a dedicated file
"verbose.mk". It slightly cleans up the makefile in addition.
The "poll" and "tcploop" sub-projects have their own makefiles. But
since the cmd_* commands were migrated from "echo" to $(info) with
make 3.81, the command is confusingly displayed in the top-level
makefile before entering the directory, even making one think that
the build occurred.
Let's instead propagate the verbosity level through the sub-projects
and let them adapt their own cmd_CC. For now this peans a little bit
of duplication for poll and tcploop.
The new functions fconn_show_flags() and fstrm_show_flags() decode the flags
state into a string, and are used by dev/flags:
$ /dev/flags/flags fconn 0x3100
fconn->flags = FCGI_CF_GET_VALUES | FCGI_CF_KEEP_CONN | FCGI_CF_MPXS_CONNS
./dev/flags/flags fstrm 0x3300
fstrm->flags = FCGI_SF_WANT_SHUTW | FCGI_SF_WANT_SHUTR | FCGI_SF_OUTGOING_DATA | FCGI_SF_BEGIN_SENT
The new functions h1c_show_flags() and h1s_show_flags() decode the flags
state into a string, and are used by dev/flags:
$ /dev/flags/flags h1c 0x2200
h1c->flags = H1C_F_ST_READY | H1C_F_ST_ATTACHED
./dev/flags/flags h1s 0x190
h1s->flags = H1S_F_BODYLESS_RESP | H1S_F_NOT_FIRST | H1S_F_WANT_KAL
The new functions h2c_show_flags() and h2s_show_flags() decode the flags
state into a string, and are used by dev/flags:
$ ./dev/flags/flags h2c 0x0600
h2c->flags = H2_CF_DEM_IN_PROGRESS | H2_CF_DEM_SHORT_READ
$ ./dev/flags/flags h2s 0x7003
h2s->flags = H2_SF_HEADERS_RCVD | H2_SF_OUTGOING_DATA | H2_SF_HEADERS_SENT \
| H2_SF_ES_SENT | H2_SF_ES_RCVD
The new function is fd_show_flags() and it reports known FD flags:
$ ./dev/flags/flags fd 0x000121
fd->flags = FD_POLL_IN | FD_EV_READY_W | FD_EV_ACTIVE_R
There's no more point keeping functions that are just wrappers around
other ones, let's directly call them from the main entry point. It helps
visually control the mapping between output formats and their definition
and doesn't require to invent long names. For a bit more readability, the
tmpbuf and its size adopted slightly shorter names.
The new function is strm_et_show_flags(). Only the error type is
handled at the moment, as a bit more complex logic is needed to
mix the values and enums present in some fields.
The two new functions are se_show_flags() and sc_show_flags().
Maybe something could be done for SC_ST_* values but as it's a
small enum, a simple switch/case should work fine.
The two new functions are chn_show_analysers() and chn_show_flags().
They work on an existing buffer so one was declared in flags.c for this
purpose. File flags.c does not have to know about channel flags anymore.
This was added in 2.6 by commit c78a9698e ("MINOR: connection: add a new
flag CO_FL_FDLESS on fd-less connections") but forgotten in flags.c.
This must be backported to 2.6.
The proposed decoding options were not updated after the changes in 2.6,
let's fix that by taking the names from the existing declaration. This
should be backported to 2.6.
Harden the UDP datagram corruption applying it on both sides. This approaches
the conditions of some tests run by the QUIC interop runner developed by
Marten Seeman.
addrlen was not preset to sizeof(addr) on rx, resulting in the address
often not being filled and response packets not always flowing back.
Let's also consistently use "addr" in the bind call (it points to
frt_addr there but it's a bit confusing).
Some traces may contain LF characters which are quite cumbersome to
deal with using the common tools. Given that the utility still has
access to the raw traces and knows where the delimiters are, let's
offer the possibility to remap LF characters to a different sequence.
Here we're using CR VT which will have the same visual appearance but
will remain on the same line for grep etc. This behavior is enabled by
the -l option. It's not enabled by default because it's 50% slower due
to content processing.
With the ability to back a memory ring into an mmapped file, it makes
sense to be able to dump these files. That's what this utility does.
The entire ring is dumped to stdout. It's well suited to large dumps,
it converts roughly 6 GB of logs per second.
The utility is really meant for developers at the moment. It might
evolve into a more general tool but at the moment it's still possible
that it might need to be run under gdb to process certain crash dumps.
Also at the moment it must not be used on a ring being actively written
to or it will dump garbage.
The code is made so that we can envision later to attach to a live
ring and dump live contents, but this requires that the utility is
built with the exact same options (threads etc), and that the file
is opened read-write. For now these parts have been commented out,
waiting for a reasonably balanced and non-intrusive solution to be
found (e.g. signals must be intercepted so that the tool cannot
leave the ring with a watcher present).
If it is detected that the memory layout of the ring struct differs,
a warning is emitted. At the end, if an error occurs, a warning is
printed as well (this does happen when the process is not cleanly
stopped, but it indicates the end was reached).
TASK_SHARED_WQ was set upon task creation and never changed afterwards.
Thus if a task was created to run anywhere (e.g. a check or a Lua task),
all its timers would always pass through the shared timers queue with a
lock. Now we know that tid<0 indicates a shared task, so we can use that
to decide whether or not to use the shared queue. The task might be
migrated using task_set_affinity() but it's always dequeued first so
the check will still be valid.
Not only this removes a flag that's difficult to keep synchronized with
the thread ID, but it should significantly lower the load on systems with
many checks. A quick test with 5000 servers and fast checks that were
saturating the CPU shows that the check rate increased by 20% (hence the
CPU usage dropped by 17%). It's worth noting that run_task_lists() almost
no longer appears in perf top now.