Commit Graph

59 Commits

Author SHA1 Message Date
Willy Tarreau 03ca6054d0 CONTRIB: halog: fix signed/unsigned build warnings on counts and timestamps
Some variables were signed while they were compared to unsigned ones,
causing warnings to be issued when -Wextra is enabled.
2020-12-21 08:43:09 +01:00
Willy Tarreau f531dfff18 CONTRIB: halog: mark the has_zero* functions unused
These ones will depend on the use of memchr() or not, let's mark them unused
to avoid the warning reported in issue #1013.
2020-12-21 08:43:09 +01:00
Willy Tarreau 2df860cb13 CONTRIB: halog: fix build issue caused by %L printf format
%Ld isn't standard, %lld is more portable. In addition, the format
should be %llu since the printed values are unsigned. This should
address issue #1013.
2020-12-21 08:43:09 +01:00
Willy Tarreau fc80e30217 REORG: ebtree: clean up remains of the ebtree/ directory
The only leftovers were the unused compiler.h file and the LICENSE file
which is already mentioned in each and every ebtree file header.

A few build paths were updated in the contrib/ directory not to mention
this directory anymore, and all its occurrences were dropped from the
main makefile. From now on no other include path but include/ will be
needed anymore to build any file.
2020-06-11 09:31:11 +02:00
Willy Tarreau 8d2b777fe3 REORG: ebtree: move the include files from ebtree to include/import/
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.

The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).

A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.

No further cleanup was done yet in order to keep changes minimal.
2020-06-11 09:31:11 +02:00
Willy Tarreau ff0e8a44a4 REORG: ebtree: move the C files from ebtree/ to src/
As part of the include files cleanup, we're going to kill the ebtree
directory. For this we need to host its C files in a different location
and src/ is the right one.
2020-06-11 09:31:11 +02:00
Aleksandar Lazi 6112f5ccd2 DOC/MINOR: halog: Add long help info for ic flag
Add missing long help text for the ic (ip count) flag
2020-05-18 09:30:43 +02:00
Joseph Herlant 42172bdc97 CLEANUP: fix a typo in a comment for the contrib/halog subsystem
Typo in comment, not visible by end-users.
2018-11-12 08:52:16 +01:00
Ryan O'Hara 8cb9993469 CONTRIB: halog: Fix compiler warnings in halog.c
There were several unused variables in halog.c that each caused a
compiler warning [-Wunused-but-set-variable]. This patch simply
removes the declaration of said vairables and any instance where the
unused variable was assigned a value.
2017-12-20 09:36:58 +01:00
Aleksandar Lazic f2b5d75ae2 CONTRIB: halog: Add help text for -s switch in halog program
It was not documented. May be backported to older releases.
2017-12-07 19:27:47 +01:00
Ilya Shipitsin 4473a2e9aa BUG/MINOR: contrib/halog: fixing small memory leak
Issue was identified by cppcheck
2017-10-03 13:52:45 +02:00
Willy Tarreau c874653bb4 BUILD: don't use type "uint" which is not portable
Dmitry Sivachenko reported that "uint" doesn't build on FreeBSD 10.
On Linux it's defined in sys/types.h and indicated as "old". Just
get rid of the very few occurrences.
2014-05-28 23:05:07 +02:00
Willy Tarreau 9f66aa9cc4 CONTRIB: halog: avoid calling time/localtime/mktime for each line
The last commit provides time-based filtering. Unfortunately, it wastes
90% of the time calling the expensive time()/localtime()/mktime()
functions.

This patch does 3 things :
  - call time()/localtime() only once to initialize the correct
    struct timeinfo ;

  - call mktime() only when the time has changed regardless of
    the current second.

  - manually add the current second to the cached result.

Doing just this is enough to multiply the parsing speed by 8.
2014-05-23 16:40:25 +02:00
Olivier Burgard e97b904801 CONTRIB: halog: Filter input lines by date and time through timestamp
I wanted to make a graph with average answer time in nagios that takes only
the last 5 mn of the log. Filtering the log before using halog was too
slow, so I added that filter to halog.

The patch attached to this mail is a proposal to add a new option : -time
[min][:max]

The values are min timestamp and/or max timestamp of the lines to be used
for stats. The date and time of the log lines between '[' and ']' are
converted to timestamp and compared to these values.

Here is an exemple of usage :
cat /var/log/haproxy.log | ./halog -srv -H -q -time $(date --date '-5 min' +%s)
2014-05-23 16:18:48 +02:00
Willy Tarreau 7cf479cc09 MEDIUM: halog: add support for counting per source address (-ic)
This is the same as -uc except that instead of counting URLs, it
counts source addresses. The reported times are request times and
not response times.

The code becomes heavily ugly, the url struct is being abused to
store an address, and there are no more bit fields available. The
code needs a major revamp.
2013-02-16 23:49:04 +01:00
Willy Tarreau a1629a59d1 BUG: halog: fix broken output limitation
Commit 667c905f introduced parameter -m to halog which limits the size
of the output. Unfortunately it is completely broken in that it doesn't
check that the limit was previously set or not, and also prevents a
simple counting operation from returning anything if a limit is not set.

Note that the -gt and -pct outputs behave differently in face of this
limit, since they count the valid output lines BEFORE actually producing
the data, so the limit really applies to valid input lines.
2012-11-13 20:48:15 +01:00
Willy Tarreau 667c905fe5 MINOR: halog: add a parameter to limit output line count
Sometimes it's useful to limit the output to a number of lines, for
example when output is already sorted (eg: 10 slowest URLs, ...). Now
we can use -m for this.
2012-10-10 16:49:28 +02:00
Willy Tarreau 4201df77df BUG/MINOR: halog: fix help message for -ut/-uto
Erroneous copy-paste suggesting wrong option.
2012-10-10 14:57:35 +02:00
Willy Tarreau 0a70688016 BUG/MINOR: halog: -ad/-ac report the correct number of output lines
There was a lines_out++ left from earlier code, causing each input
line to be counted as an output line.

This fix also affects 1.4 and should be backported.
2012-10-10 13:43:17 +02:00
Willy Tarreau 8a09b663a8 MINOR: halog: sort output by cookie code
It's sometimes useful to have the output sorted by cookie code to see
the ratios of NI vs VN for example. This is now possible with -cc.
2012-10-10 10:27:18 +02:00
Baptiste 61aaad06e8 CONTRIB: halog: sort URLs by avg bytes_read or total bytes_read
The patch attached to this mail brings ability to sort URLs by
averaged bytes read and total bytes read in HALog tool.
In most cases, bytes read is also the object size.
The purpose of this patch is to know which URL consume the most
bandwith, in average or in total.
It may be interesting as well to know the standard deviation (ecart
type in french) for some counters (like bytes_read).

The results:
- Sorting by average bytes read per URL:
./halog -uba <~/tmp/haproxy.log | column -t | head
2246 lines in, 302 lines out, 194 parsing errors
18    0    5101     283    5101   283    126573  2278327  /lib/exe/js.php
1     0    1        1      1      1      106734  106734   /wp-admin/images/screenshots/theme-customizer.png
2     0    2        1      2      1      106511  213022   /wp-admin/css/wp-admin.css
1     0    1        1      1      1      96698   96698    /wp-admin/images/screenshots/captions-1.png
1     0    1        1      1      1      73165   73165    /wp-admin/images/screenshots/flex-header-1.png
4     0    0        0      0      0      64832   259328   /cuisine/wp-content/plugins/stats/open-flash-chart.swf
1     0    0        0      0      0      48647   48647    /wp-admin/images/screenshots/flex-header-3.png
1     0    0        0      0      0      44046   44046    /wp-admin/images/screenshots/captions-2.png
1     0    1        1      1      1      38830   38830    /wp-admin/images/screenshots/flex-header-2.png

- Sorting by total bytes read per URL:
./halog -ubt <~/tmp/haproxy.log | column -t | head
2246 lines in, 302 lines out, 194 parsing errors
18    0    5101     283    5101   283    126573  2278327  /lib/exe/js.php
60    0    14387    239    14387  239    10081   604865   /lib/exe/css.php
64    2    8820     137    8819   142    7742    495524   /doku.php
14    0    250      17     250    17     24045   336632   /wp-admin/load-scripts.php
71    0    6422     90     6422   90     4048    287419   /wp-admin/
4     0    0        0      0      0      64832   259328   /cuisine/wp-content/plugins/stats/open-flash-chart.swf
2     0    2        1      2      1      106511  213022   /wp-admin/css/wp-admin.css
31    3    5423     174    5040   180    6804    210931   /index
10    0    429      42     429    42     18009   180093   /cuisine/files/2011/10/tarte_figue_amande-e1318281546905-225x300.jpg
2012-09-09 08:44:01 +02:00
Willy Tarreau f8c95d2a25 OPTIM: halog: improve cold-cache behaviour when loading a file
Using posix_fadvise() it is possible to tell the system that we're
going to read a whole file at once. The kernel then doubles the
read-ahead size for this file. On Linux with an SSD, this has improved
cold-cache performance by around 20%. Hot-cache is not affected at all.
2012-06-12 09:16:56 +02:00
Willy Tarreau 419a598eae OPTIM: halog: make use of memchr() on platforms which provide a fast one
glibc-2.11 on x86_64 provides a machine-specific memchr() which is faster
than the generic C implementation by around 40%, so let's make it possible
to use it instead of the hand-coded version.
2012-06-12 08:52:22 +02:00
Willy Tarreau 8ad4193100 CLEANUP: halog: make clean should also remove .o files 2012-06-12 07:59:16 +02:00
Willy Tarreau de5dc0509c MINOR: halog: use the more recent dual-mode fgets2 implementation
This version implements both 32 and 64 bit versions at once, it
avoids the need to have two separate output files. It also improves
efficiency on i386 platforms by adding a little bit of assembly where
gcc isn't efficient.
2012-06-09 11:22:27 +02:00
Willy Tarreau 615674cdec MINOR: halog: add some help on the command line 2012-01-23 08:17:59 +01:00
Willy Tarreau e1a908c369 OPTIM: halog: keep a fast path for the lines-count only
Using "halog -c" is still something quite common to perform on logs,
but unfortunately since the recent added controls, it was sensibly
slowed down due to the parsing of the accept date field.

Now we use a specific loop for the case where nothing is needed from
the input, and this sped up the line counting by 2.5x. A 2.4 GHz Xeon
now counts lines at a rate of 2 GB of logs per second.
2012-01-03 09:28:05 +01:00
Willy Tarreau 08911ff896 MINOR: halog: add support for matching queued requests
-Q outputs all requests which went through at least one queue.
-QS outputs all requests which went through a server queue.
2011-10-13 13:28:36 +02:00
Willy Tarreau 6ee71754e2 BUILD: halog: make halog build on solaris
Solaris' "rm" command does not support -v. Also, specify CC=gcc
because "cc" generally is not gcc there.
2011-09-16 15:03:37 +02:00
Willy Tarreau f9042060c9 [OPTIM] halog: add assembly version of the field lookup code
Gcc tries to be a bit too smart in these small loops and the result is
that on i386 we waste a lot of time there. By recoding these loops in
assembly, we save up to 23% total processing time on i386! The savings
on x86_64 are much lower, probably because there are more registers and
gcc has to do less tricks. However, those savings vary a lot between gcc
versions and even cause harm on some of them (eg: 4.4) because gcc does
not know how to optimize the code once inlined.

However, by recoding field_start() in C to try to match the assembly
code as much as possible, we can significantly reduce its execution
time without risking the negative impacts. Thus, the assembly version
is less interesting there but still worth being used on some compilers.
2011-09-10 12:39:30 +02:00
Willy Tarreau 31a02e9c5b [OPTIM] halog: make fgets parse more bytes by blocks
By adding a "landing area" at the end of the buffer, it becomes safe to
parse more bytes at once. On 32-bit this makes fgets run about 4% faster
but it does not save anything on 64-bit.
2011-09-10 10:46:39 +02:00
Willy Tarreau 96c148b0d2 [MINOR] halog: do not consider byte 0x8A as end of line
A bug in the algorithm used to find an LF in multiple bytes at once
made byte 0x80 trigger detection of byte 0x00, thus 0x8A matches byte
0x0A. In practice, this issue never happens since byte 0x8A won't be
displayed in logs (or it will be encoded). This could still possibly
happen in mixed logs.
2011-09-09 08:21:55 +02:00
Willy Tarreau 61a40c7402 [MINOR] halog: support backslash-escaped quotes
Some syslog servers escape quotes, which make the resulting logs unusable
for URL processing since the parser looks for the first field beginning
with a quote. It now supports also fields starting with backslash and
quote in order to address this. No performance impact was measured.
2011-09-06 08:11:27 +02:00
Willy Tarreau d3007ffa6f [MINOR] halog: add -hs/-HS to filter by HTTP status code range
The code was merged with the error code checking which is very similar and
which shares the same information. The new test adds about 1% slowdown to
error checking but makes it more reliable when facing wrongly formated
status codes.
2011-09-05 02:09:24 +02:00
Hervé COMMOWICK 927cdddf9c [MINOR] halog: add support for termination code matching (-tcn/-TCN)
It is now possible to filter by termination code with -tcn <termcode>, to be
able to track one kind of errors, for example after counting it with -tc.
Use -TCN <termcode> gives you the opposite.
2011-08-10 18:04:50 +02:00
Willy Tarreau 14389e7036 [OPTIM] halog: remove support for tab delimiters in input data
Haproxy does not use tabs when sending logs, and checking for them
wastes no less than 4% of CPU cycles. Better get rid of these tests.
2011-07-11 06:48:04 +02:00
Willy Tarreau a2b39fb5c5 [OPTIM] halog: remove many 'if' by using a function pointer for the filters
There were too many filters, we were losing time in all the "if" statements.
By moving all the filters to independant functions, we made the code cleaner
and slightly faster (3%).

One minor bug was found, the -tc and -st options did not report the number
of output lines, but always zero.
2011-07-11 06:48:04 +02:00
Willy Tarreau 26deaf51d9 [OPTIM] halog: check once for correct line format and reuse the pointer
Almost all filters first check the line format, which takes a lot of code
and requires parsing back and forth. By centralizing this test, we can
save about 15-20 more percent of performance for all filters.

Also, the test was wrong, it was checking that the source IP address was
starting with a digit, which is not always true with local IPv6 addresses.
Instead, we now check that the next field (accept field) starts with an
opening bracket and is followed by a digit between 0 and 3 (day of the
month). Doing this has contributed a 2% speedup because all other field
calculations were relative to a closer field.
2011-07-11 06:48:04 +02:00
Willy Tarreau 758a6ea46c [OPTIM] halog: cache some common fields positions
Since many fields are relative and some are used a lot, try to cache them
the first time they're used in order to avoid skipping them twice. The
status counts with HTTP pre-check enabled has sped up by 40%.
2011-07-11 06:48:03 +02:00
Willy Tarreau df6f0d1e49 [MINOR] halog: gain back performance before SKIP_CHAR fix
The SKIP_CHAR fix caused a measurable performance drop. Since we can
consider all chars below 0x20 as delimiters, we can avoid a cache lookup
which requires a char to pointer conversion.
2011-07-11 06:48:03 +02:00
Willy Tarreau 70c428f7c6 [MINOR] halog: add support for HTTP log matching (-H)
Now it's possible to restrict analysis to HTTP-looking logs when passing -H.
-H -v gives the opposite (most likely TCP logs).
2011-07-11 06:48:03 +02:00
Willy Tarreau c82570edec [MINOR] halog: make SKIP_CHAR stop on field delimiters
The SKIP_CHAR() macro did not consider field delimiters, causing the timer parser
to be able to search timers at wrong places when fed with TCP logs.
2011-07-11 06:48:02 +02:00
Willy Tarreau 812e7a73b2 [BUG] halog: correctly handle truncated last line
If last line is truncated (eg: truncated file), then halog would loop on
it forever.
2011-07-11 06:48:02 +02:00
Willy Tarreau 24bcb4f2ff [CONTRIB] halog: minor speed improvement in timer parser
The timer parser looks for the next slash after the last timer, which is
very far away. Those 4 occurrences have been fixed to match the way it's
done in URL sorting, which is faster. Average speed gain is 5-6% on -srv
and -pct.
(cherry picked from commit 3555671c93695f48c02ef05c8bb228523f17ca20)
2010-10-30 19:04:37 +02:00
Willy Tarreau abe45b6bb3 [CONTRIB] halog: report per-url counts, errors and times
Using -u{,c,e,t,a,to,ao} it is possible to get per-URL statistics, sorted by
URL, request count, error count, total time, avg time, total time on OK requests,
avg time on OK requests.

Since it has to parse URLs and store a number of fields, it's quite slower
than other methods, but still correct for production usage (typically 800000
lines or 270 MB per second on a 2 GHz system).

Results are sorted in reverse order so that it's easy to catch them by piping
the output to the "head" command.
(cherry picked from commit 15ce7f56d15f839ce824279b84ffe14c58e41fda)
2010-10-30 19:04:37 +02:00
Willy Tarreau 5417081c79 [MINOR] halog: skip non-traffic logs for -st and -tc
Those were reporting stupid results in presence of administrative logs.
2010-09-13 22:50:49 +02:00
Willy Tarreau d8fc1103a5 [MINOR] halog: add '-tc' to sort by termination codes
This output lists all encountered termination codes by number of
occurrences.
2010-09-12 17:56:16 +02:00
Willy Tarreau d220106092 [CONTRIB] halog: report per-server status codes, errors and response times
It's sometimes very useful to be able to monitor a production status in real
time by comparing servers behaviours. Now halog is able to do this when called
with "-srv". It reports various fields for each server found in a log, including
statuses, total reqs, valid reqs, percent of valid reqs, average connection time,
average response time.
2010-06-04 14:37:01 +02:00
Willy Tarreau d2c142c7ee [OPTIM] halog: speed up fgets2-64 by about 10%
This version uses more 64-bit lookups and two 32-bit lookups
to converge faster. This saves about 10% performance.
2010-05-05 12:22:08 +02:00
Willy Tarreau 2651ac3302 [OPTIM] halog: minor speedup by using unlikely()
By moving the filter-specific code out of the loop, we can slightly
speed it up (3%).
2010-05-05 12:20:19 +02:00