The code was merged with the error code checking which is very similar and
which shares the same information. The new test adds about 1% slowdown to
error checking but makes it more reliable when facing wrongly formated
status codes.
It is now possible to filter by termination code with -tcn <termcode>, to be
able to track one kind of errors, for example after counting it with -tc.
Use -TCN <termcode> gives you the opposite.
There were too many filters, we were losing time in all the "if" statements.
By moving all the filters to independant functions, we made the code cleaner
and slightly faster (3%).
One minor bug was found, the -tc and -st options did not report the number
of output lines, but always zero.
Almost all filters first check the line format, which takes a lot of code
and requires parsing back and forth. By centralizing this test, we can
save about 15-20 more percent of performance for all filters.
Also, the test was wrong, it was checking that the source IP address was
starting with a digit, which is not always true with local IPv6 addresses.
Instead, we now check that the next field (accept field) starts with an
opening bracket and is followed by a digit between 0 and 3 (day of the
month). Doing this has contributed a 2% speedup because all other field
calculations were relative to a closer field.
Since many fields are relative and some are used a lot, try to cache them
the first time they're used in order to avoid skipping them twice. The
status counts with HTTP pre-check enabled has sped up by 40%.
The SKIP_CHAR fix caused a measurable performance drop. Since we can
consider all chars below 0x20 as delimiters, we can avoid a cache lookup
which requires a char to pointer conversion.
The timer parser looks for the next slash after the last timer, which is
very far away. Those 4 occurrences have been fixed to match the way it's
done in URL sorting, which is faster. Average speed gain is 5-6% on -srv
and -pct.
(cherry picked from commit 3555671c93695f48c02ef05c8bb228523f17ca20)
Using -u{,c,e,t,a,to,ao} it is possible to get per-URL statistics, sorted by
URL, request count, error count, total time, avg time, total time on OK requests,
avg time on OK requests.
Since it has to parse URLs and store a number of fields, it's quite slower
than other methods, but still correct for production usage (typically 800000
lines or 270 MB per second on a 2 GHz system).
Results are sorted in reverse order so that it's easy to catch them by piping
the output to the "head" command.
(cherry picked from commit 15ce7f56d15f839ce824279b84ffe14c58e41fda)
It's sometimes very useful to be able to monitor a production status in real
time by comparing servers behaviours. Now halog is able to do this when called
with "-srv". It reports various fields for each server found in a log, including
statuses, total reqs, valid reqs, percent of valid reqs, average connection time,
average response time.
A new idea came up to detect the presence of a null byte in a word.
It saves several operations compared to the previous one, and eliminates
the jumps (about 6 instructions which can run 2-by-2 in parallel).
This sole optimisation improved the line count speed by about 30%.
All files referencing the previous ebtree code were changed to point
to the new one in the ebtree directory. A makefile variable (EBTREE_DIR)
is also available to use files from another directory.
The ability to build the libebtree library temporarily remains disabled
because it can have an impact on some existing toolchains and does not
appear worth it in the medium term if we add support for multi-criteria
stickiness for instance.