Commit Graph

37 Commits

Author SHA1 Message Date
David Carlier
7adf8f35df OPTIM: regex: PCRE2 use JIT match when JIT optimisation occured.
When a regex had been succesfully compiled by the JIT pass, it is better
 to use the related match, thanksfully having same signature, for better
 performance.

Signed-off-by: David Carlier <devnexen@gmail.com>
2020-08-14 07:53:40 +02:00
Ilya Shipitsin
46a030cdda CLEANUP: assorted typo fixes in the code and comments
This is 11th iteration of typo fixes
2020-07-06 14:34:32 +02:00
Willy Tarreau
f278eec37a BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char
NetBSD apparently uses macros for tolower/toupper and complains about
the use of char for array subscripts. Let's properly cast all of them
to unsigned char where they are used.

This is needed to fix issue #729.
2020-07-05 21:50:02 +02:00
Willy Tarreau
36979d9ad5 REORG: include: move the error reporting functions to from log.h to errors.h
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().

This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
2020-06-11 10:18:59 +02:00
Willy Tarreau
dfd3de8826 REORG: include: move stream.h to haproxy/stream{,-t}.h
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.

In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.

It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
2020-06-11 10:18:58 +02:00
Willy Tarreau
aeed4a85d6 REORG: include: move log.h to haproxy/log{,-t}.h
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.

Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.

A good approach would consist in splitting that file in 3 parts:
  - one for error output ("errors" ?).
  - one for log_format processing
  - and one for actual logging.
2020-06-11 10:18:58 +02:00
Willy Tarreau
f268ee8795 REORG: include: split global.h into haproxy/global{,-t}.h
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.

It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.

The UNIX_MAX_PATH definition was moved to compat.h.
2020-06-11 10:18:58 +02:00
Willy Tarreau
48fbcae07c REORG: tools: split common/standard.h into haproxy/tools{,-t}.h
And also rename standard.c to tools.c. The original split between
tools.h and standard.h dates from version 1.3-dev and was mostly an
accident. This patch moves the files back to what they were expected
to be, and takes care of not changing anything else. However this
time tools.h was split between functions and types, because it contains
a small number of commonly used macros and structures (e.g. name_desc)
which in turn cause the massive list of includes of tools.h to conflict
with the callers.

They remain the ugliest files of the whole project and definitely need
to be cleaned and split apart. A few types are defined there only for
functions provided there, and some parts are even OS-specific and should
move somewhere else, such as the symbol resolution code.
2020-06-11 10:18:57 +02:00
Willy Tarreau
7cd8b6e3a4 REORG: include: split common/regex.h into haproxy/regex{,-t}.h
Regex are essentially included for myregex_t but it turns out that
several of the C files didn't include it directly, relying on the
one included by their own .h. This has been cleanly addressed so
that only the type is included by H files which need it, and adding
the missing includes for the other ones.
2020-06-11 10:18:57 +02:00
Willy Tarreau
4c7e4b7738 REORG: include: update all files to use haproxy/api.h or api-t.h if needed
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:

  - common/config.h
  - common/compat.h
  - common/compiler.h
  - common/defaults.h
  - common/initcall.h
  - common/tools.h

The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.

In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.

No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
2020-06-11 10:18:42 +02:00
Willy Tarreau
39bd740d00 CLEANUP: regex: remove outdated support for regex actions
The support for reqrep and friends was removed in 2.1 but the
chain_regex() function and the "action" field in the regex struct
was still there. This patch removes them.

One point worth mentioning though. There is a check_replace_string()
function whose purpose was to validate the replacement strings passed
to reqrep. It should also be used for other replacement regex, but is
never called. Callers of exp_replace() should be checked and a call to
this function should be added to detect the error early.
2020-06-02 17:17:13 +02:00
Dragan Dosen
2674303912 MEDIUM: regex: modify regex_comp() to atomically allocate/free the my_regex struct
Now we atomically allocate the my_regex struct within function
regex_comp() and compile the regex or free both in case of failure. The
pointer to the allocated my_regex struct is returned directly. The
my_regex* argument to regex_comp() is removed.

Function regex_free() was modified so that it systematically frees the
my_regex entry. The function does nothing when called with a NULL as
argument (like free()). It will avoid existing risk of not properly
freeing the initialized area.

Other structures are also updated in order to be compatible (the ones
related to Lua and action rules).
2019-05-07 06:58:15 +02:00
Willy Tarreau
8071338c78 MINOR: initcall: apply initcall to all register_build_opts() calls
Most register_build_opts() calls use static strings. These ones were
replaced with a trivial REGISTER_BUILD_OPTS() statement adding the string
and its call to the STG_REGISTER section. A dedicated section could be
made for this if needed, but there are very few such calls for this to
be worth it. The calls made with computed strings however, like those
which retrieve OpenSSL's version or zlib's version, were moved to a
dedicated function to guarantee they are called late in the process.
For example, the SSL call probably requires that SSL_library_init()
has been called first.
2018-11-26 19:50:32 +01:00
Joseph Herlant
eda75484a8 CLEANUP: Fix typos in the regex subsystem
Fix typos in the code comment of the regex subsystem.
2018-11-18 22:26:42 +01:00
Christopher Faulet
767a84bcc0 CLEANUP: log: Rename Alert/Warning in ha_alert/ha_warning 2017-11-24 17:19:12 +01:00
Emeric Brun
272e252e61 MINOR: threads/regex: Change Regex trash buffer into a thread local variable 2017-10-31 13:58:31 +01:00
David Carlier
f2592b29f1 MEDIUM: regex: pcre2 support
this adds a support of the newest pcre2 library,
more secure than its older sibling in a cost of a
more complex API.
It works pretty similarly to pcre's part to keep
the overall change smooth,  except :

- we define the string class supported at compile time.
- after matching the ovec data is properly sized, althought
we do not take advantage of it here.
- the lack of jit support is treated less 'dramatically'
as pcre2_jit_compile in this case is 'no-op'.
2016-12-28 12:51:51 +01:00
Willy Tarreau
7a9ac6dac6 CLEANUP: regex: use the build options list to report the regex type
This removes 3 #ifdef from haproxy.c.
2016-12-21 21:30:54 +01:00
Vincent Bernat
02779b6263 CLEANUP: uniformize last argument of malloc/calloc
Instead of repeating the type of the LHS argument (sizeof(struct ...))
in calls to malloc/calloc, we directly use the pointer
name (sizeof(*...)). The following Coccinelle patch was used:

@@
type T;
T *x;
@@

  x = malloc(
- sizeof(T)
+ sizeof(*x)
  )

@@
type T;
T *x;
@@

  x = calloc(1,
- sizeof(T)
+ sizeof(*x)
  )

When the LHS is not just a variable name, no change is made. Moreover,
the following patch was used to ensure that "1" is consistently used as
a first argument of calloc, not the last one:

@@
@@

  calloc(
+ 1,
  ...
- ,1
  )
2016-04-03 14:17:42 +02:00
Willy Tarreau
15a53a4384 MEDIUM: regex: add support for passing regex flags to regex_exec_match()
This function (and its sister regex_exec_match2()) abstract the regex
execution but make it impossible to pass flags to the regex engine.
Currently we don't use them but we'll need to support REG_NOTBOL soon
(to indicate that we're not at the beginning of a line). So let's add
support for this flag and update the API accordingly.
2015-01-22 14:24:53 +01:00
Christian Ruppert
de898712a0 MEDIUM: regex: Use pcre_study always when PCRE is used, regardless of JIT
pcre_study() has been around long before JIT has been added. It also seems to
affect the performance in some cases (positive).

Below I've attached some test restults. The test is based on
http://sljit.sourceforge.net/regex_perf.html (see bottom). It has been modified
to just test pcre_study vs. no pcre_study. Note: This test does not try to
match specific header it's instead run over a larger text with more and less
complex patterns to make the differences more clear.

% ./runtest
'mark.txt' loaded. (Length: 19665221 bytes)
-----------------
Regex: 'Twain'
[pcre-nostudy] time:    14 ms (2388 matches)
[pcre-study] time:    21 ms (2388 matches)
-----------------
Regex: '^Twain'
[pcre-nostudy] time:   109 ms (100 matches)
[pcre-study] time:   109 ms (100 matches)
-----------------
Regex: 'Twain$'
[pcre-nostudy] time:    14 ms (127 matches)
[pcre-study] time:    16 ms (127 matches)
-----------------
Regex: 'Huck[a-zA-Z]+|Finn[a-zA-Z]+'
[pcre-nostudy] time:   695 ms (83 matches)
[pcre-study] time:    26 ms (83 matches)
-----------------
Regex: 'a[^x]{20}b'
[pcre-nostudy] time:    90 ms (12495 matches)
[pcre-study] time:    91 ms (12495 matches)
-----------------
Regex: 'Tom|Sawyer|Huckleberry|Finn'
[pcre-nostudy] time:  1236 ms (3015 matches)
[pcre-study] time:    34 ms (3015 matches)
-----------------
Regex: '.{0,3}(Tom|Sawyer|Huckleberry|Finn)'
[pcre-nostudy] time:  5696 ms (3015 matches)
[pcre-study] time:  5655 ms (3015 matches)
-----------------
Regex: '[a-zA-Z]+ing'
[pcre-nostudy] time:  1290 ms (95863 matches)
[pcre-study] time:  1167 ms (95863 matches)
-----------------
Regex: '^[a-zA-Z]{0,4}ing[^a-zA-Z]'
[pcre-nostudy] time:   136 ms (4507 matches)
[pcre-study] time:   134 ms (4507 matches)
-----------------
Regex: '[a-zA-Z]+ing$'
[pcre-nostudy] time:  1334 ms (5360 matches)
[pcre-study] time:  1214 ms (5360 matches)
-----------------
Regex: '^[a-zA-Z ]{5,}$'
[pcre-nostudy] time:   198 ms (26236 matches)
[pcre-study] time:   197 ms (26236 matches)
-----------------
Regex: '^.{16,20}$'
[pcre-nostudy] time:   173 ms (4902 matches)
[pcre-study] time:   175 ms (4902 matches)
-----------------
Regex: '([a-f](.[d-m].){0,2}[h-n]){2}'
[pcre-nostudy] time:  1242 ms (68621 matches)
[pcre-study] time:   690 ms (68621 matches)
-----------------
Regex: '([A-Za-z]awyer|[A-Za-z]inn)[^a-zA-Z]'
[pcre-nostudy] time:  1215 ms (675 matches)
[pcre-study] time:   952 ms (675 matches)
-----------------
Regex: '"[^"]{0,30}[?!\.]"'
[pcre-nostudy] time:    27 ms (5972 matches)
[pcre-study] time:    28 ms (5972 matches)
-----------------
Regex: 'Tom.{10,25}river|river.{10,25}Tom'
[pcre-nostudy] time:   705 ms (2 matches)
[pcre-study] time:    68 ms (2 matches)

In some cases it's more or less the same but when it's faster than by a huge margin.
It always depends on the pattern, the string(s) to match against etc.

Signed-off-by: Christian Ruppert <c.ruppert@babiel.com>
2014-11-18 13:26:18 +01:00
Christian Ruppert
955f4613cb BUG/MEDIUM: regex: fix pcre_study error handling
pcre_study() may return NULL even though it succeeded. In this case error is
NULL otherwise error is not NULL. Also see man 3 pcre_study.

Previously a ACL pattern of e.g. ".*" would cause error because pcre_study did
not found anything to speed up matching and returned regex->extra = NULL and
error = NULL which in this case was a false-positive. That happend only when
PCRE_JIT was enabled for HAProxy but libpcre has been built without JIT.

Signed-off-by: Christian Ruppert <c.ruppert@babiel.com>

[wt: this needs to be backported to 1.5 as well]
2014-10-29 17:44:31 +01:00
Thierry FOURNIER
26202760a4 MINOR: regex: Use native PCRE API.
The pcreposix layer (in the pcre projetc) execute strlen to find
thlength of the string. When we are using the function "regex_exex*2",
the length is used to add a final \0, when pcreposix is executed a
strlen is executed to compute the length.

If we are using a native PCRE api, the length is provided as an
argument, and these operations disappear.

This is useful because PCRE regex are more used than POSIC regex.
2014-06-18 15:14:00 +02:00
Thierry FOURNIER
09af0d6d43 MEDIUM: regex: replace all standard regex function by own functions
This patch remove all references of standard regex in haproxy. The last
remaining references are only in the regex.[ch] files.

In the file src/checks.c, the original function uses a "pmatch" array.
In fact this array is unused. This patch remove it.
2014-06-18 15:07:57 +02:00
Thierry FOURNIER
b8f980cc19 MINOR: regex: Create JIT compatible function that return match strings
This patchs rename the "regex_exec" to "regex_exec2". It add a new
"regex_exec", "regex_exec_match" and "regex_exec_match2" function. This
function can match regex and return array containing matching parts.
Otherwise, this function use the compiled method (JIT or PCRE or POSIX).

JIT require a subject with length. PCREPOSIX and native POSIX regex
require a null terminted subject. The regex_exec* function are splited
in two version. The first version take a null terminated string, but it
execute strlen() on the subject if it is compiled with JIT. The second
version (terminated by "2") take the subject and the length. This
version adds a null character in the subject if it is compiled with
PCREPOSIX or native POSIX functions.

The documentation of posix regex and pcreposix says that the function
returns 0 if the string matche otherwise it returns REG_NOMATCH. The
REG_NOMATCH macro take the value 1 with posix regex and the value 17
with the pcreposix. The documentaion of the native pcre API (used with
JIT) returns a negative number if no match, otherwise, it returns 0 or a
positive number.

This patch fix also the return codes of the regex_exec* functions. Now,
these function returns true if the string match, otherwise it returns
false.
2014-06-18 15:07:50 +02:00
Willy Tarreau
c874653bb4 BUILD: don't use type "uint" which is not portable
Dmitry Sivachenko reported that "uint" doesn't build on FreeBSD 10.
On Linux it's defined in sys/types.h and indicated as "old". Just
get rid of the very few occurrences.
2014-05-28 23:05:07 +02:00
Sasha Pachev
c600204ddf BUG/MEDIUM: regex: fix risk of buffer overrun in exp_replace()
Currently exp_replace() (which is used in reqrep/reqirep) is
vulnerable to a buffer overrun. I have been able to reproduce it using
the attached configuration file and issuing the following command:

  wget -O - -S -q http://localhost:8000/`perl -e 'print "a"x4000'`/cookie.php

Str was being checked only in in while (str) and it was possible to
read past that when more than one character was being accessed in the
loop.

WT:
   Note that this bug is only marked MEDIUM because configurations
   capable of triggering this bug are very unlikely to exist at all due
   to the fact that most rewrites consist in static string additions
   that largely fit into the reserved area (8kB by default).

   This fix should also be backported to 1.4 and possibly even 1.3 since
   it seems to have been present since 1.1 or so.

Config:
-------

  global
        maxconn         500
        stats socket /tmp/haproxy.sock mode 600

  defaults
        timeout client      1000
        timeout connect      5000
        timeout server      5000
        retries         1
        option redispatch

  listen stats
        bind :8080
        mode http
        stats enable
        stats uri /stats
        stats show-legends

  listen  tcp_1
        bind :8000
        mode            http
        maxconn 400
        balance roundrobin
        reqrep ^([^\ :]*)\ /(.*)/(.*)\.php(.*) \1\ /\3.php?arg=\2\2\2\2\2\2\2\2\2\2\2\2\2\4
        server  srv1 127.0.0.1:9000 check port 9000 inter 1000 fall 1
        server  srv2 127.0.0.1:9001 check port 9001 inter 1000 fall 1
2014-05-27 14:36:06 +02:00
Thierry FOURNIER
0b6d15fdc8 MINOR: regex: The pointer regstr in the struc regex is no longer used.
The pointer <regstr> is only used to compare and identify the original
regex string with the patterns. Now the patterns have a reference map
containing this original string. It is useless to store this value two
times.
2014-03-17 18:06:08 +01:00
Thierry FOURNIER
39e258fcee MINOR: regex: Copy the original regex expression into string.
This is useful for the debug or for search regex in maps.
2013-12-12 15:43:34 +01:00
Thierry FOURNIER
799c042daa MINOR: regex: Change the struct containing regex
This change permits to remove the typedef. The original regex structs
are set in haproxy's struct.
2013-12-12 15:42:58 +01:00
Thierry FOURNIER
ed5a4aefae CLEANUP: regex: Create regex_comp function that compiles regex using compilation options
The current file "regex.h" define an abstraction for the regex. It
provides the same struct name and the same "regexec" function for the
3 regex types supported: standard libc, basic pcre and jit pcre.

The regex compilation function is not provided by this file. If the
developper wants to use regex, he must write regex compilation code
containing "#define *JIT*".

This patch provides a unique regex compilation function according to
the compilation options.

In addition, the "regex.h" file checks the presence of the "#define
PCRE_CONFIG_JIT" when "USE_PCRE_JIT" is enabled. If this flag is not
present, the pcre lib doesn't support JIT and "#error" is emitted.
2013-10-14 14:42:50 +02:00
Willy Tarreau
f4f04125d4 [MINOR] prepare req_*/rsp_* to receive a condition
It will be very handy to be able to pass conditions to req_* and rsp_*.
For now, we just add the pointer to the condition in the affected
structs.
2010-01-28 18:10:50 +01:00
Willy Tarreau
8f8e645066 [CLEANUP] shut warnings 'is*' macros from ctype.h on solaris
Solaris visibly uses an array for is*, which returns warnings
about the use of signed chars as indexes. Good opportunity to
put casts everywhere.
2007-06-17 21:51:38 +02:00
Willy Tarreau
b17916e89b [CLEANUP] add a few "const char *" where appropriate
As suggested by Markus Elfring, a few "const char *" have replaced
some "char *" declarations where a function is not expected to
modify a value. It does not change the code but it helps detecting
coding errors.
2006-10-15 15:17:57 +02:00
Willy Tarreau
e3ba5f0aaa [CLEANUP] included common/version.h everywhere 2006-06-29 18:54:54 +02:00
Willy Tarreau
2dd0d4799e [CLEANUP] renamed include/haproxy to include/common 2006-06-29 17:53:05 +02:00
Willy Tarreau
baaee00406 [BIGMOVE] exploded the monolithic haproxy.c file into multiple files.
The files are now stored under :
  - include/haproxy for the generic includes
  - include/types.h for the structures needed within prototypes
  - include/proto.h for function prototypes and inline functions
  - src/*.c for the C files

Most include files are now covered by LGPL. A last move still needs
to be done to put inline functions under GPL and not LGPL.

Version has been set to 1.3.0 in the code but some control still
needs to be done before releasing.
2006-06-26 02:48:02 +02:00