mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2024-12-25 22:22:11 +00:00
DOC: better document the config file format and escaping/quoting rules
It's always a pain to figure how to proceed when special characters need to be embedded inside arguments of an expression. Let's document the configuration file format and how unquoting/unescaping works at each level (top level and argument level) so that everyone hopefully finds suitable reminders or examples for complex cases. This is related to github issue #200 and addresses issues #712 and #966.
This commit is contained in:
parent
4f7308335e
commit
6f1129d14d
@ -404,28 +404,137 @@ details.
|
||||
HAProxy's configuration process involves 3 major sources of parameters :
|
||||
|
||||
- the arguments from the command-line, which always take precedence
|
||||
- the "global" section, which sets process-wide parameters
|
||||
- the proxies sections which can take form of "defaults", "listen",
|
||||
"frontend" and "backend".
|
||||
- the configuration file(s), whose format is described here
|
||||
- the running process' environment, in case some environment variables are
|
||||
explicitly referenced
|
||||
|
||||
The configuration file syntax consists in lines beginning with a keyword
|
||||
referenced in this manual, optionally followed by one or several parameters
|
||||
delimited by spaces.
|
||||
The configuration file follows a fairly simple hierarchical format which obey
|
||||
a few basic rules:
|
||||
|
||||
1. a configuration file is an ordered sequence of statements
|
||||
|
||||
2. a statement is a single non-empty line before any unprotected "#" (hash)
|
||||
|
||||
3. a line is a series of tokens or "words" delimited by unprotected spaces or
|
||||
tab characters
|
||||
|
||||
4. the first word or sequence of words of a line is one of the keywords or
|
||||
keyword sequences listed in this document
|
||||
|
||||
5. all other words are all arguments of the first one, some being well-known
|
||||
keywords listed in this document, others being values, references to other
|
||||
parts of the configuration, or expressions
|
||||
|
||||
6. certain keywords delimit a section inside which only a subset of keywords
|
||||
are supported
|
||||
|
||||
7. a section ends at the end of a file or on a special keyword starting a new
|
||||
section
|
||||
|
||||
This is all that is needed to know to write a simple but reliable configuration
|
||||
generator, but this is not enough to reliably parse any configuration nor to
|
||||
figure how to deal with certain corner cases.
|
||||
|
||||
First, there are a few consequences of the rules above. Rule 6 and 7 imply that
|
||||
the keywords used to define a new section are valid everywhere and cannot have
|
||||
a different meaning in a specific section. These keywords are always a single
|
||||
word (as opposed to a sequence of words), and traditionally the section that
|
||||
follows them is designated using the same name. For example when speaking about
|
||||
the "global section", it designates the section of configuration that follows
|
||||
the "global" keyword. This usage is used a lot in error messages to help locate
|
||||
the parts that need to be addressed.
|
||||
|
||||
A number of sections create an internal object or configuration space, which
|
||||
requires to be distinguished from other ones. In this case they will take an
|
||||
extra word which will set the name of this particular section. For some of them
|
||||
the section name is mandatory. For example "frontend foo" will create a new
|
||||
section of type "frontend" named "foo". Usually a name is specific to its
|
||||
section and two sections of different types may use the same name, but this is
|
||||
not recommended as it tends to complexify configuration management.
|
||||
|
||||
A direct consequence of rule 7 is that when multiple files are read at once,
|
||||
each of them must start with a new section, and the end of each file will end
|
||||
a section. A file cannot contain sub-sections nor end an existing section and
|
||||
start a new one.
|
||||
|
||||
Rule 1 mentioned that ordering matters. Indeed, some keywords create directives
|
||||
that can be repeated multiple times to create ordered sequences of rules to be
|
||||
applied in a certain order. For example "tcp-request" can be used to alternate
|
||||
"accept" and "reject" rules on varying criteria. As such, a configuration file
|
||||
processor must always preserve a section's ordering when editing a file. The
|
||||
ordering of sections usually does not matter except for the global section
|
||||
which must be placed before other sections, but it may be repeated if needed.
|
||||
In addition, some automatic identifiers may automatically be assigned to some
|
||||
of the created objects (e.g. proxies), and by reordering sections, their
|
||||
identifiers will change. These ones appear in the statistics for example. As
|
||||
such, the configuration below will assign "foo" ID number 1 and "bar" ID number
|
||||
2, which will be swapped if the two sections are reversed:
|
||||
|
||||
listen foo
|
||||
bind :80
|
||||
|
||||
listen bar
|
||||
bind :81
|
||||
|
||||
Another important point is that according to rules 2 and 3 above, empty lines,
|
||||
spaces, tabs, and comments following and unprotected "#" character are not part
|
||||
of the configuration as they are just used as delimiters. This implies that the
|
||||
following configurations are strictly equivalent:
|
||||
|
||||
global#this is the global section
|
||||
daemon#daemonize
|
||||
frontend foo
|
||||
mode http # or tcp
|
||||
|
||||
and:
|
||||
|
||||
global
|
||||
daemon
|
||||
|
||||
# this is the public web frontend
|
||||
frontend foo
|
||||
mode http
|
||||
|
||||
The common practice is to align to the left only the keyword that initiates a
|
||||
new section, and indent (i.e. prepend a tab character or a few spaces) all
|
||||
other keywords so that it's instantly visible that they belong to the same
|
||||
section (as done in the second example above). Placing comments before a new
|
||||
section helps the reader decide if it's the desired one. Leaving a blank line
|
||||
at the end of a section also visually helps spotting the end when editing it.
|
||||
|
||||
Tabs are very convenient for indent but they do not copy-paste well. If spaces
|
||||
are used instead, it is recommended to avoid placing too many (2 to 4) so that
|
||||
editing in field doesn't become a burden with limited editors that do not
|
||||
support automatic indent.
|
||||
|
||||
In the early days it used to be common to see arguments split at fixed tab
|
||||
positions because most keywords would not take more than two arguments. With
|
||||
modern versions featuring complex expressions this practice does not stand
|
||||
anymore, and is not recommended.
|
||||
|
||||
|
||||
2.2. Quoting and escaping
|
||||
-------------------------
|
||||
|
||||
HAProxy's configuration introduces a quoting and escaping system similar to
|
||||
many programming languages. The configuration file supports 3 types: escaping
|
||||
with a backslash, weak quoting with double quotes, and strong quoting with
|
||||
single quotes.
|
||||
In modern configurations, some arguments require the use of some characters
|
||||
that were previously considered as pure delimiters. In order to make this
|
||||
possible, HAProxy supports character escaping by prepending a backslash ('\')
|
||||
in front of the character to be escaped, weak quoting within double quotes
|
||||
('"') and strong quoting within single quotes ("'").
|
||||
|
||||
If spaces have to be entered in strings, then they must be escaped by preceding
|
||||
them by a backslash ('\') or by quoting them. Backslashes also have to be
|
||||
escaped by doubling or strong quoting them.
|
||||
This is pretty similar to what is done in a number of programming languages and
|
||||
very close to what is commonly encountered in Bourne shell. The principle is
|
||||
the following: while the configuration parser cuts the lines into words, it
|
||||
also takes care of quotes and backslashes to decide whether a character is a
|
||||
delimiter or is the raw representation of this character within the current
|
||||
word. The escape character is then removed, the quotes are removed, and the
|
||||
remaining word is used as-is as a keyword or argument for example.
|
||||
|
||||
Escaping is achieved by preceding a special character by a backslash ('\'):
|
||||
If a backslash is needed in a word, it must either be escaped using itself
|
||||
(i.e. double backslash) or be strongly quoted.
|
||||
|
||||
Escaping outside quotes is achieved by preceding a special character by a
|
||||
backslash ('\'):
|
||||
|
||||
\ to mark a space and differentiate it from a delimiter
|
||||
\# to mark a hash and differentiate it from a comment
|
||||
@ -433,39 +542,161 @@ Escaping is achieved by preceding a special character by a backslash ('\'):
|
||||
\' to use a single quote and differentiate it from strong quoting
|
||||
\" to use a double quote and differentiate it from weak quoting
|
||||
|
||||
Weak quoting is achieved by using double quotes (""). Weak quoting prevents
|
||||
the interpretation of:
|
||||
In addition, a few non-printable characters may be emitted using their usual
|
||||
C-language representation:
|
||||
|
||||
space as a parameter separator
|
||||
\n to insert a line feed (LF, character \x0a or ASCII 10 decimal)
|
||||
\r to insert a carriage return (CR, character \x0d or ASCII 13 decimal)
|
||||
\t to insert a tab (character \x09 or ASCII 9 decimal)
|
||||
\xNN to insert character having ASCII code hex NN (e.g \x0a for LF).
|
||||
|
||||
Weak quoting is achieved by surrounding double quotes ("") around the character
|
||||
or sequence of characters to protect. Weak quoting prevents the interpretation
|
||||
of:
|
||||
|
||||
space or tab as a word separator
|
||||
' single quote as a strong quoting delimiter
|
||||
# hash as a comment start
|
||||
|
||||
Weak quoting permits the interpretation of variables, if you want to use a non
|
||||
-interpreted dollar within a double quoted string, you should escape it with a
|
||||
backslash ("\$"), it does not work outside weak quoting.
|
||||
Weak quoting permits the interpretation of environment variables (which are not
|
||||
evaluated outside of quotes) by preceding them with a dollar sign ('$'). If a
|
||||
dollar character is needed inside double quotes, it must be escaped using a
|
||||
backslash.
|
||||
|
||||
Interpretation of escaping and special characters are not prevented by weak
|
||||
quoting.
|
||||
Strong quoting is achieved by surrounding single quotes ('') around the
|
||||
character or sequence of characters to protect. Inside single quotes, nothing
|
||||
is interpreted, it's the efficient way to quote regular expressions.
|
||||
|
||||
Strong quoting is achieved by using single quotes (''). Inside single quotes,
|
||||
nothing is interpreted, it's the efficient way to quote regexes.
|
||||
As a result, here is the matrix indicating how special characters can be
|
||||
entered in different contexts (unprintable characters are replaced with their
|
||||
name within angle brackets). Note that some characters that may only be
|
||||
represented escaped have no possible representation inside single quotes,
|
||||
hence the '-' there:
|
||||
|
||||
Quoted and escaped strings are replaced in memory by their interpreted
|
||||
equivalent, it allows you to perform concatenation.
|
||||
Character | Unquoted | Weakly quoted | Strongly quoted
|
||||
-----------+---------------+-----------------------------+-----------------
|
||||
<TAB> | \<TAB>, \x09 | "<TAB>", "\<TAB>", "\x09" | '<TAB>'
|
||||
<LF> | \n, \x0a | "\n", "\x0a" | -
|
||||
<CR> | \r, \x0d | "\r", "\x0d" | -
|
||||
<SPC> | \<SPC>, \x20 | "<SPC>", "\<SPC>", "\x20" | '<SPC>'
|
||||
" | \", \x22 | "\"", "\x22" | '"'
|
||||
# | \#, \x23 | "#", "\#", "\x23" | '#'
|
||||
$ | $, \$, \x24 | "\$", "\x24" | '$'
|
||||
' | \', \x27 | "'", "\'", "\x27" | -
|
||||
\ | \\, \x5c | "\\", "\x5c" | '\'
|
||||
|
||||
Example:
|
||||
# those are equivalents:
|
||||
# those are all strictly equivalent:
|
||||
log-format %{+Q}o\ %t\ %s\ %{-Q}r
|
||||
log-format "%{+Q}o %t %s %{-Q}r"
|
||||
log-format '%{+Q}o %t %s %{-Q}r'
|
||||
log-format "%{+Q}o %t"' %s %{-Q}r'
|
||||
log-format "%{+Q}o %t"' %s'\ %{-Q}r
|
||||
|
||||
# those are equivalents:
|
||||
reqrep "^([^\ :]*)\ /static/(.*)" \1\ /\2
|
||||
reqrep "^([^ :]*)\ /static/(.*)" '\1 /\2'
|
||||
reqrep "^([^ :]*)\ /static/(.*)" "\1 /\2"
|
||||
reqrep "^([^ :]*)\ /static/(.*)" "\1\ /\2"
|
||||
There is one particular case where a second level of quoting or escaping may be
|
||||
necessary. Some keywords take arguments within parenthesis, sometimes delimited
|
||||
by commas. These arguments are commonly integers or predefined words, but when
|
||||
they are arbitrary strings, it may be required to perform a separate level of
|
||||
escaping to disambiguate the characters that belong to the argument from the
|
||||
characters that are used to delimit the arguments themselves. A pretty common
|
||||
case is the "regsub" converter. It takes a regular expression in argument, and
|
||||
if a closing parenthesis is needed inside, this one will require to have its
|
||||
own quotes.
|
||||
|
||||
The keyword argument parser is exactly the same as the top-level one regarding
|
||||
quotes, except that is will not make special cases of backslashes. But what is
|
||||
not always obvious is that the delimitors used inside must first be escaped or
|
||||
quoted so that they are not resolved at the top level.
|
||||
|
||||
Let's take this example making use of the "regsub" converter which takes 3
|
||||
arguments, one regular expression, one replacement string and one set of flags:
|
||||
|
||||
# replace all occurrences of "foo" with "blah" in the path:
|
||||
http-request set-path %[path,regsub(foo,blah,g)]
|
||||
|
||||
Here no special quoting was necessary. But if now we want to replace either
|
||||
"foo" or "bar" with "blah", we'll need the regular expression "(foo|bar)". We
|
||||
cannot write:
|
||||
|
||||
http-request set-path %[path,regsub((foo|bar),blah,g)]
|
||||
|
||||
because we would like the string to cut like this:
|
||||
|
||||
http-request set-path %[path,regsub((foo|bar),blah,g)]
|
||||
|---------|----|-|
|
||||
arg1 _/ / /
|
||||
arg2 __________/ /
|
||||
arg3 ______________/
|
||||
|
||||
but actually what is passed is a string between the opening and closing
|
||||
parenthesis then garbage:
|
||||
|
||||
http-request set-path %[path,regsub((foo|bar),blah,g)]
|
||||
|--------|--------|
|
||||
arg1=(foo|bar _/ /
|
||||
trailing garbage _________/
|
||||
|
||||
The obvious solution here seems to be that the closing parenthesis needs to be
|
||||
quoted, but alone this will not work, because as mentioned above, quotes are
|
||||
processed by the top-level parser which will resolve them before processing
|
||||
this word:
|
||||
|
||||
http-request set-path %[path,regsub("(foo|bar)",blah,g)]
|
||||
------------ -------- ----------------------------------
|
||||
word1 word2 word3=%[path,regsub((foo|bar),blah,g)]
|
||||
|
||||
So we didn't change anything for the argument parser at the second level which
|
||||
still sees a truncated regular expression as the only argument, and garbage at
|
||||
the end of the string. By escaping the quotes they will be passed unmodified to
|
||||
the second level:
|
||||
|
||||
http-request set-path %[path,regsub(\"(foo|bar)\",blah,g)]
|
||||
------------ -------- ------------------------------------
|
||||
word1 word2 word3=%[path,regsub("(foo|bar)",blah,g)]
|
||||
|---------||----|-|
|
||||
arg1=(foo|bar) _/ / /
|
||||
arg2=blah ___________/ /
|
||||
arg3=g _______________/
|
||||
|
||||
Another approch consists in using single quotes outside the whole string and
|
||||
double quotes inside (so that the double quotes are not stripped again):
|
||||
|
||||
http-request set-path '%[path,regsub("(foo|bar)",blah,g)]'
|
||||
------------ -------- ----------------------------------
|
||||
word1 word2 word3=%[path,regsub("(foo|bar)",blah,g)]
|
||||
|---------||----|-|
|
||||
arg1=(foo|bar) _/ / /
|
||||
arg2 ___________/ /
|
||||
arg3 _______________/
|
||||
|
||||
When using regular expressions, it can happen that the dollar ('$') character
|
||||
appears in the expression or that a backslash ('\') is used in the replacement
|
||||
string. In this case these ones will also be processed inside the double quotes
|
||||
thus single quotes are preferred (or double escaping). Example:
|
||||
|
||||
http-request set-path '%[path,regsub("^/(here)(/|$)","my/\1",g)]'
|
||||
------------ -------- -----------------------------------------
|
||||
word1 word2 word3=%[path,regsub("^/(here)(/|$)","my/\1",g)]
|
||||
|-------------| |-----||-|
|
||||
arg1=(here)(/|$) _/ / /
|
||||
arg2=my/\1 ________________/ /
|
||||
arg3 ______________________/
|
||||
|
||||
Remember that backslahes are not escape characters withing single quotes and
|
||||
that the whole word3 above is already protected against them using the single
|
||||
quotes. Conversely, if double quotes had been used around the whole expression,
|
||||
single the dollar character and the backslashes would have been resolved at top
|
||||
level, breaking the argument contents at the second level.
|
||||
|
||||
When in doubt, simply do not use quotes anywhere, and start to place single or
|
||||
double quotes around arguments that require a comma or a closing parenthesis,
|
||||
and think about escaping these quotes using a backslash of the string contains
|
||||
a dollar or a backslash. Again, this is pretty similar to what is used under
|
||||
a Bourne shell when double-escaping a command passed to "eval". For API writers
|
||||
the best is probably to place escaped quotes around each and every argument,
|
||||
regardless of their contents. Users will probably find that using single quotes
|
||||
around the whole expression and double quotes around each argument provides
|
||||
more readable configurations.
|
||||
|
||||
|
||||
2.3. Environment variables
|
||||
|
Loading…
Reference in New Issue
Block a user