Some architectures like x86_64 and aarch64 support efficient unaligned
64-bit reads. On such architectures, we already know that each string
passed to field_start() has some margin at the end because it's parsed
using fgets2() which looks for the trailing LF using the same method.
Thus let's skip spaces by packs of 8. This increases the parsing speed
by 35%.
Modern compilers were producing producing less efficient code in the
field_start() loop, by not emitting two conditional jumps for a single
test. However by reordering the test we can merge the optimal case and
the default one and get back to good performance so let's simplify the
test. This improves the parsing speed by 5%.
The usage message was starting to have long lines, it's preferable that
it still fits well into a default 80-col display so that options are
easy to find. Also cut that into the 3 parts (input filter, modifier,
output format) for improved legibility.
This patch adds support for extracting captured header fields to halog. A field
can be extracted by passing the `-hdr <block>:<field>` output filter.
Both `<block>` and `<field>` are 1-indexed.
`<block>` refers to the index of the brace-delimited list of headers. If both
request and response headers are captured, then request headers are referenced
by `<block> = 1`, response headers are `2`. If only one direction is captured,
there will only be a single block `1`.
`<field>` refers to a single field within the selected block.
The output will contain one line, possibly empty, per log line processed.
Passing a non-existent `<block>` or `<field>` will result in an empty line.
Example:
capture request header a len 50
capture request header b len 50
capture request header c len 50
capture response header d len 50
capture response header e len 50
capture response header f len 50
`-srv 1:1` will extract request header `a`
`-srv 1:2` will extract request header `b`
`-srv 1:3` will extract request header `c`
`-srv 2:3` will extract response header `f`
This resolves GitHub issue #1146.
Our use-case for this is a dynamic application that performs routing based on
the query string. Without this option all URLs will just point to the central
entrypoint of this location, making the output completely useless.
Dmitry reported this warning on FreeBSD since the introduction of -Wundef:
admin/halog/fgets2.c:38:30: warning: '__GLIBC__' is not defined, evaluates to 0 [-Wundef]
#if defined(__x86_64__) && (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 15))
^
A defined() was missing.
halog currently emits lots of warnings because it does not benefit from
the default flags. Let's update the main makefile to build it by itself
and remove the other one. The sub-project's makefile was replaced with
A readme indicating how to build it.
There has been a USE_MEMCHR option for ages that was mostly never enabled
because it was unclear when glibc became faster. A quick look at the code
indicates that this arrived with the SSE implementation of memchr() which
arrived at commit 093ecf92998de2 between 2.14 and 2.15, so let's automatically
turn this on on x86_64 with glibc >= 2.15.
This results in ~6GB of logs read per second (20 million lines) and ~2.5GB/s
(8 million lines) parsed for errors or status codes classification, or 1 GB/s
(3 million lines) for time percentiles.