BUG/MINOR: h1: do not accept '#' as part of the URI component

Seth Manesse and Paul Plasil reported that the "path" sample fetch
function incorrectly accepts '#' as part of the path component. This
can in some cases lead to misrouted requests for rules that would apply
on the suffix:

    use_backend static if { path_end .png .jpg .gif .css .js }

Note that this behavior can be selectively configured using
"normalize-uri fragment-encode" and "normalize-uri fragment-strip".

The problem is that while the RFC says that this '#' must never be
emitted, as often it doesn't suggest how servers should handle it. A
diminishing number of servers still do accept it and trim it silently,
while others are rejecting it, as indicated in the conversation below
with other implementers:

   https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html

Looking at logs from publicly exposed servers, such requests appear at
a rate of roughly 1 per million and only come from attacks or poorly
written web crawlers incorrectly following links found on various pages.

Thus it looks like the best solution to this problem is to simply reject
such ambiguous requests by default, and include this in the list of
controls that can be disabled using "option accept-invalid-http-request".

We're already rejecting URIs containing any control char anyway, so we
should also reject '#'.

In the H1 parser for the H1_MSG_RQURI state, there is an accelerated
parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it
should not impact perf since 0x21..0x23 are not supposed to appear in
a URI anyway). This way '#' falls through the fine-grained filter and
we can add the special case for it also conditionned by a check on the
proxy's option "accept-invalid-http-request", with no overhead for the
vast majority of valid URIs. Here this information is available through
h1m->err_pos that's set to -2 when the option is here (so we don't need
to change the API to expose the proxy). Example with a trivial GET
through netcat:

  [08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request
    backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:50812
    buffer starts at 0 (including 0 out), 16361 free,
    len 23, wraps at 16336, error at position 7
    H1 connection flags 0x00000000, H1 stream flags 0x00000810
    H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400
    H1 chunk len 0 bytes, H1 body len 0 bytes :

    00000  GET /aa#bb HTTP/1.0\r\n
    00021  \r\n

This should be progressively backported to all stable versions along with
the following patch:

    REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests

Similar fixes for h2 and h3 will come in followup patches.

Thanks to Seth Manesse and Paul Plasil for reporting this problem with
detailed explanations.
This commit is contained in:
Willy Tarreau 2023-08-08 16:17:22 +02:00
parent 069d0e221e
commit 2eab6d3543

View File

@ -551,13 +551,13 @@ int h1_headers_to_hdr_list(char *start, const char *stop,
case H1_MSG_RQURI:
http_msg_rquri:
#ifdef HA_UNALIGNED_LE
/* speedup: skip bytes not between 0x21 and 0x7e inclusive */
/* speedup: skip bytes not between 0x24 and 0x7e inclusive */
while (ptr <= end - sizeof(int)) {
int x = *(int *)ptr - 0x21212121;
int x = *(int *)ptr - 0x24242424;
if (x & 0x80808080)
break;
x -= 0x5e5e5e5e;
x -= 0x5b5b5b5b;
if (!(x & 0x80808080))
break;
@ -569,8 +569,15 @@ int h1_headers_to_hdr_list(char *start, const char *stop,
goto http_msg_ood;
}
http_msg_rquri2:
if (likely((unsigned char)(*ptr - 33) <= 93)) /* 33 to 126 included */
if (likely((unsigned char)(*ptr - 33) <= 93)) { /* 33 to 126 included */
if (*ptr == '#') {
if (h1m->err_pos < -1) /* PR_O2_REQBUG_OK not set */
goto invalid_char;
if (h1m->err_pos == -1) /* PR_O2_REQBUG_OK set: just log */
h1m->err_pos = ptr - start + skip;
}
EAT_AND_JUMP_OR_RETURN(ptr, end, http_msg_rquri2, http_msg_ood, state, H1_MSG_RQURI);
}
if (likely(HTTP_IS_SPHT(*ptr))) {
sl.rq.u.len = ptr - sl.rq.u.ptr;