There were several unused variables in halog.c that each caused a
compiler warning [-Wunused-but-set-variable]. This patch simply
removes the declaration of said vairables and any instance where the
unused variable was assigned a value.
Dmitry Sivachenko reported that "uint" doesn't build on FreeBSD 10.
On Linux it's defined in sys/types.h and indicated as "old". Just
get rid of the very few occurrences.
The last commit provides time-based filtering. Unfortunately, it wastes
90% of the time calling the expensive time()/localtime()/mktime()
functions.
This patch does 3 things :
- call time()/localtime() only once to initialize the correct
struct timeinfo ;
- call mktime() only when the time has changed regardless of
the current second.
- manually add the current second to the cached result.
Doing just this is enough to multiply the parsing speed by 8.
I wanted to make a graph with average answer time in nagios that takes only
the last 5 mn of the log. Filtering the log before using halog was too
slow, so I added that filter to halog.
The patch attached to this mail is a proposal to add a new option : -time
[min][:max]
The values are min timestamp and/or max timestamp of the lines to be used
for stats. The date and time of the log lines between '[' and ']' are
converted to timestamp and compared to these values.
Here is an exemple of usage :
cat /var/log/haproxy.log | ./halog -srv -H -q -time $(date --date '-5 min' +%s)
This is the same as -uc except that instead of counting URLs, it
counts source addresses. The reported times are request times and
not response times.
The code becomes heavily ugly, the url struct is being abused to
store an address, and there are no more bit fields available. The
code needs a major revamp.
Commit 667c905f introduced parameter -m to halog which limits the size
of the output. Unfortunately it is completely broken in that it doesn't
check that the limit was previously set or not, and also prevents a
simple counting operation from returning anything if a limit is not set.
Note that the -gt and -pct outputs behave differently in face of this
limit, since they count the valid output lines BEFORE actually producing
the data, so the limit really applies to valid input lines.
Sometimes it's useful to limit the output to a number of lines, for
example when output is already sorted (eg: 10 slowest URLs, ...). Now
we can use -m for this.
There was a lines_out++ left from earlier code, causing each input
line to be counted as an output line.
This fix also affects 1.4 and should be backported.
Using posix_fadvise() it is possible to tell the system that we're
going to read a whole file at once. The kernel then doubles the
read-ahead size for this file. On Linux with an SSD, this has improved
cold-cache performance by around 20%. Hot-cache is not affected at all.
glibc-2.11 on x86_64 provides a machine-specific memchr() which is faster
than the generic C implementation by around 40%, so let's make it possible
to use it instead of the hand-coded version.
This version implements both 32 and 64 bit versions at once, it
avoids the need to have two separate output files. It also improves
efficiency on i386 platforms by adding a little bit of assembly where
gcc isn't efficient.
Using "halog -c" is still something quite common to perform on logs,
but unfortunately since the recent added controls, it was sensibly
slowed down due to the parsing of the accept date field.
Now we use a specific loop for the case where nothing is needed from
the input, and this sped up the line counting by 2.5x. A 2.4 GHz Xeon
now counts lines at a rate of 2 GB of logs per second.
Gcc tries to be a bit too smart in these small loops and the result is
that on i386 we waste a lot of time there. By recoding these loops in
assembly, we save up to 23% total processing time on i386! The savings
on x86_64 are much lower, probably because there are more registers and
gcc has to do less tricks. However, those savings vary a lot between gcc
versions and even cause harm on some of them (eg: 4.4) because gcc does
not know how to optimize the code once inlined.
However, by recoding field_start() in C to try to match the assembly
code as much as possible, we can significantly reduce its execution
time without risking the negative impacts. Thus, the assembly version
is less interesting there but still worth being used on some compilers.
By adding a "landing area" at the end of the buffer, it becomes safe to
parse more bytes at once. On 32-bit this makes fgets run about 4% faster
but it does not save anything on 64-bit.
A bug in the algorithm used to find an LF in multiple bytes at once
made byte 0x80 trigger detection of byte 0x00, thus 0x8A matches byte
0x0A. In practice, this issue never happens since byte 0x8A won't be
displayed in logs (or it will be encoded). This could still possibly
happen in mixed logs.
Some syslog servers escape quotes, which make the resulting logs unusable
for URL processing since the parser looks for the first field beginning
with a quote. It now supports also fields starting with backslash and
quote in order to address this. No performance impact was measured.
The code was merged with the error code checking which is very similar and
which shares the same information. The new test adds about 1% slowdown to
error checking but makes it more reliable when facing wrongly formated
status codes.
It is now possible to filter by termination code with -tcn <termcode>, to be
able to track one kind of errors, for example after counting it with -tc.
Use -TCN <termcode> gives you the opposite.
There were too many filters, we were losing time in all the "if" statements.
By moving all the filters to independant functions, we made the code cleaner
and slightly faster (3%).
One minor bug was found, the -tc and -st options did not report the number
of output lines, but always zero.
Almost all filters first check the line format, which takes a lot of code
and requires parsing back and forth. By centralizing this test, we can
save about 15-20 more percent of performance for all filters.
Also, the test was wrong, it was checking that the source IP address was
starting with a digit, which is not always true with local IPv6 addresses.
Instead, we now check that the next field (accept field) starts with an
opening bracket and is followed by a digit between 0 and 3 (day of the
month). Doing this has contributed a 2% speedup because all other field
calculations were relative to a closer field.
Since many fields are relative and some are used a lot, try to cache them
the first time they're used in order to avoid skipping them twice. The
status counts with HTTP pre-check enabled has sped up by 40%.
The SKIP_CHAR fix caused a measurable performance drop. Since we can
consider all chars below 0x20 as delimiters, we can avoid a cache lookup
which requires a char to pointer conversion.
The timer parser looks for the next slash after the last timer, which is
very far away. Those 4 occurrences have been fixed to match the way it's
done in URL sorting, which is faster. Average speed gain is 5-6% on -srv
and -pct.
(cherry picked from commit 3555671c93695f48c02ef05c8bb228523f17ca20)
Using -u{,c,e,t,a,to,ao} it is possible to get per-URL statistics, sorted by
URL, request count, error count, total time, avg time, total time on OK requests,
avg time on OK requests.
Since it has to parse URLs and store a number of fields, it's quite slower
than other methods, but still correct for production usage (typically 800000
lines or 270 MB per second on a 2 GHz system).
Results are sorted in reverse order so that it's easy to catch them by piping
the output to the "head" command.
(cherry picked from commit 15ce7f56d15f839ce824279b84ffe14c58e41fda)
It's sometimes very useful to be able to monitor a production status in real
time by comparing servers behaviours. Now halog is able to do this when called
with "-srv". It reports various fields for each server found in a log, including
statuses, total reqs, valid reqs, percent of valid reqs, average connection time,
average response time.
A new idea came up to detect the presence of a null byte in a word.
It saves several operations compared to the previous one, and eliminates
the jumps (about 6 instructions which can run 2-by-2 in parallel).
This sole optimisation improved the line count speed by about 30%.
All files referencing the previous ebtree code were changed to point
to the new one in the ebtree directory. A makefile variable (EBTREE_DIR)
is also available to use files from another directory.
The ability to build the libebtree library temporarily remains disabled
because it can have an impact on some existing toolchains and does not
appear worth it in the medium term if we add support for multi-criteria
stickiness for instance.