Commit Graph

198 Commits

Author SHA1 Message Date
Christopher Faulet
bda8397fba BUG/MINOR: cache/htx: Fix the counting of data already sent by the cache applet
Since the commit 8f3c256f7 ("MEDIUM: cache/htx: Always store info about HTX
blocks in the cache"), it is possible to read info about a data block without
sending anything. It is possible because we rely on the function htx_add_data(),
which will try to add data without any defragmentation. In such case, info about
the data block are skipped but don't count in data sent.

No need to backport this patch, expect if the commit 8f3c256f7 is backported
too.
2019-06-11 14:05:25 +02:00
Christopher Faulet
2d7c5395ed MEDIUM: htx: Add the parsing of trailers of chunked messages
HTTP trailers are now parsed in the same way headers are. It means trailers are
converted to K/V blocks followed by an end-of-trailer marker. For now, to make
things simple, the type for trailer blocks are not the same than for header
blocks. But the aim is to make no difference between headers and trailers by
using the same type. Probably for the end-of marker too.
2019-06-05 10:12:11 +02:00
Christopher Faulet
8f3c256f7e MEDIUM: cache/htx: Always store info about HTX blocks in the cache
It was only done for the headers (including the EOH marker). data were prefixed
by the info field of these blocks. The payload and the trailers of the messages
were stored in raw. The total size of headers and payload were kept in the
cached object state to help output formatting.

Now, info about each HTX block is store in the cache. Only data are allowed to
be splitted. Otherwise, all blocks of an HTX message are handled the same way,
both when storing a message in the cache and when delivering it from the
cache. This will help the cache implementation to be more robust to internal
changes in the HTX. Especially for the upcoming parsing of trailers. There is
also no more need to keep extra info in the cached object state.
2019-06-05 10:12:11 +02:00
Willy Tarreau
0a7ef02074 MINOR: htx: make htx_add_data() return the transmitted byte count
In order to later allow htx_add_data() to transmit partial blocks and
avoid defragmenting the buffer, we'll need to return the number of bytes
consumed. This first modification makes the function do this and its
callers take this into account. At the moment the function still works
atomically so it returns either the block size or zero. However all
call places have been adapted to consider any value between zero and
the block size.
2019-05-28 14:48:59 +02:00
Christopher Faulet
ee847d45d0 MEDIUM: filters/htx: Filter body relatively to the first block
The filters filtering HTX body, in the callback http_payload, must now loop on
an HTX message starting from the first block position. The offset passed as
parameter is relative to this position and not the head one. It is mandatory
because once filtered, data are now forwarded using the function
channel_htx_fwd_payload(). So the first block position is always updated.
2019-05-28 07:42:33 +02:00
Christopher Faulet
29f1758285 MEDIUM: htx: Store the first block position instead of the start-line one
We don't store the start-line position anymore in the HTX message. Instead we
store the first block position to analyze. For now, it is almost the same. But
once all changes will be made on this part, this position will have to be used
by HTX analyzers, and only in the analysis context, to know where the analyse
should start.

When new blocks are added in an HTX message, if the first block position is not
defined, it is set. When the block pointed by it is removed, it is set to the
block following it. -1 remains the value to unset the position. the first block
position is unset when the HTX message is empty. It may also be unset on a
non-empty message, meaning every blocks were already analyzed.

From HTX analyzers point of view, this position is always set during headers
analysis. When they are waiting for a request or a response, if it is unset, it
means the analysis should wait. But once the analysis is started, and as long as
headers are not forwarded, it points to the message start-line.

As mentionned, outside the HTX analysis, no code must rely on the first block
position. So multiplexers and applets must always use the head position to start
a loop on an HTX message.
2019-05-28 07:42:33 +02:00
Christopher Faulet
a3f1550dfa MEDIUM: http/htx: Perform analysis relatively to the first block
The first block is the start-line, if defined. Otherwise it the head of the HTX
message. So now, during HTTP analysis, lookup are all done using the first block
instead of the head. Concretely, for now, it is the same because only one HTTP
message is stored at a time in an HTX message. 1xx informational messages are
handled separatly from the final reponse and from each other. But it will make
sense when the 1xx informational messages and the associated final reponse will
be stored in the same HTX message.
2019-05-28 07:42:12 +02:00
Christopher Faulet
297fbb45fe MINOR: htx: Replace the function http_find_stline() by http_get_stline()
Now, we only return the start-line. If not found, NULL is returned. No lookup is
performed and the HTX message is no more updated. It is now the caller
responsibility to update the position of the start-line to the right value. So
when it is not found, i.e sl_pos is set to -1, it means the last start-line has
been already processed and the next one has not been inserted yet.

It is mandatory to rely on this kind of warranty to store 1xx informational
responses and final reponse in the same HTX message.
2019-05-28 07:42:12 +02:00
Christopher Faulet
9c66b980fa MINOR: htx: Store start-line block's position instead of address of its payload
Nothing much to say. This change is just mandatory to consider 1xx informational
messages as part of a response.
2019-05-28 07:42:12 +02:00
Willy Tarreau
2231b63887 BUILD: cache: avoid a build warning with some compilers/linkers
The struct http_cache_applet was fully declared at the beginning
instead of just doing a forward declaration using an extern modifier.
Some linkers report warnings about a redefined symbol since these
really are two complete declarations.

The proper way to do this is to use extern on the first one and to
have a full declaration later. However it's not permitted to have
both static and extern so the change done in commit 0f2229943
("CLEANUP: cache: don't export http_cache_applet anymore") has to
be partially undone.

This should be backported to 1.9 for sanity but has no effet on
most platforms. However on 1.9 the extern keyword must also be
added to include/types/cache.h.
2019-03-29 21:03:24 +01:00
Willy Tarreau
0f22299435 CLEANUP: cache: don't export http_cache_applet anymore
This one can become static since it's not used by http/htx anymore.
2019-03-19 09:58:35 +01:00
Christopher Faulet
adb363135c BUG/MINOR: cache: Fully consume large requests in the cache applet
In the cache applet (in HTX and legacy HTTP), when an cached object is sent to a
client, the request must be consumed. It is done at the end, after all the
response was copied into the channel's buffer. But only outgoing data at time
the applet is called are consumed. Then the applet is closed. If a request with
a huge body is sent, an error is triggerred because a SHUTW is catched on an
unfinished request.

Now, we consume request data as soon as possible and we do it until the end. In
fact, we don't try to shutdown the request's channel for write anymore.

This patch must be backported to 1.9 after some observation period.
2019-03-19 09:49:08 +01:00
Olivier Houchard
aa090d46fe MEDIUM: cache: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:38 +01:00
Christopher Faulet
f0dd037456 BUG/MINOR: cache/htx: Return only the headers of cached objects to HEAD requests
The body of a cached object must not be sent in response to a HEAD request. This
works for the legacy HTTP because the parsing is performed by HTTP analyzers
_AND_ because the connection is closed at the end of the transaction. So the
body is ignored. But the applet send it. For the HTX, the applet must skip the
body explicitly.

This patch must be backported to 1.9.
2019-02-26 14:04:23 +01:00
Christopher Faulet
b3d4bca415 BUG/MEDIUM: cache: Get objects from the cache only for GET and HEAD requests
Only responses for GET requests are stored in the cache. But there is no check
on the method during the lookup. So it is possible to retrieve an object from
the cache independently of the method, from the time the key of the object
matches. Now, lookups are performed only for GET and HEAD requests.

This patch must be backportedi in 1.9.
2019-02-26 14:04:23 +01:00
Christopher Faulet
a0df957471 BUG/MAJOR: cache/htx: Set the start-line offset when a cached object is served
When the function htx_add_stline() is used, this offset is automatically set
when necessary. But the HTX cache applet adds all header blocks of the responses
manually, including the start-line. So its offset must be explicitly set by the
applet.

When everything goes well, the HTTP analyzer http_wait_for_response() looks for
the start-line in the HTX messages, calling http_find_stline(). If necessary,
the start-line offet will also be automatically set during this stage. So the
bug of the HTX cache applet does not hurt most of the time. But, when an error
occurred, HTTP responses analyzers can be bypassed. In such caese, the
start-line offset of cached responses remains unset.

Some part of the code relies on the start-line offset to process the HTX
messages. Among others, when H2 responses are sent to clients, the H2
multiplexer read the start-line without any check, because it _MUST_ always be
there. if its offset is not set, a NULL pointer is dereferenced leading to a
segfault.

The patch must be backported to 1.9.
2019-02-26 14:04:23 +01:00
Willy Tarreau
c9036c0004 BUG/MAJOR: cache: fix confusion between zero and uninitialized cache key
The cache uses the first 32 bits of the uri's hash as the key to reference
the object in the cache. It makes a special case of the value zero to mean
that the object is not in the cache anymore. The problem is that when an
object hashes as zero, it's still inserted but the eb32_delete() call is
skipped, resulting in the object still being chained in the memory area
while the block has been reclaimed and used for something else. Then when
objects which were chained below it (techically any object since zero is
at the root) are deleted, the walk through the upper object may encounter
corrupted values where valid pointers were expected.

But while this should only happen statically once on 4 billion, the problem
gets worse when the cache-use conditions don't match the cache-store ones,
because cache-store runs with an uninitialized key, which can create objects
that will never be found by the lookup code, or worse, entries with a zero
key preventing eviction of the tree node and resulting in a crash. It's easy
to accidently end up on such a config because the request rules generally
can't be used to decide on the response :

  http-request  cache-use cache   if { path_beg /images }
  http-response cache-store cache

In this test, mixing traffic with /images/$RANDOM and /foo/$RANDOM will
result in random keys being inserted, some of them possibly being zero,
and crashes will quickly happen.

The fix consists in 1) always initializing the transaction's cache_hash
to zero, and 2) never storing a response for which the hash has not been
calculated, as indicated by the value zero.

It is worth noting that objects hashing as value zero will never be cached,
but given that there's only one chance among 4 billion that this happens,
this is totally harmless.

This fix must be backported to 1.9 and 1.8.
2019-01-14 10:31:31 +01:00
Christopher Faulet
839791af0d BUG/MINOR: cache: Disable the cache if any compression filter precedes it
We need to check if any compression filter precedes the cache filter. This is
only possible when the compression is configured in the frontend while the cache
filter is configured on the backend (via a cache-store action or
explicitly). This case cannot be detected during HAProxy startup. So in such
cases, the cache is disabled.

The patch must be backported to 1.9.
2019-01-08 11:32:23 +01:00
Christopher Faulet
cc156623b2 BUG/MEDIUM: cache/htx: Respect the reserve when cached objects are served
It is only true for HTX streams. The legacy code relies on ci_putblk() which is
already aware of the reserve. It is mandatory to not fill the reserve to let
other filters analysing data. It is especially true for the compression
filter. It needs at least 20 bytes of free space, plus at most 5 bytes per 32kB
block. So if the cache fully fills the channel's buffer, the compression will
not have enough space to do its job and it will block the data forwarding,
waiting for more free space. But if the buffer fully filled with input data (ie
no outgoing data), the stream will be frozen infinitely.

This patch must be backported to 1.9. It depends on the following patches:

  * BUG/MEDIUM: cache/htx: Respect the reserve when cached objects are served
    from the cache
  * MINOR: channel/htx: Add HTX version for some helper functions
2019-01-07 16:32:07 +01:00
Christopher Faulet
74b41ba025 BUG/MINOR: cache/htx: Be sure to count partial trailers
When a chunked object is served from the cache, If the trailers are not pushed
in the channel's buffer in one time, we still have to count them in the total
written bytes in the buffer.

This patch must be backported to 1.9.
2019-01-04 16:23:03 +01:00
Christopher Faulet
6112391f81 BUG/MEDIUM: cache: Be sure to end the forwarding when XFER length is unknown
This bug exists in the HTX code and in the legacy one. When the body length is
unknown, the applet hangs. For the legacy code, it hangs because the end of the
cached object is not correctly handled and the applet is never recalled. For the
HTX code, only the begining of the response (the 1st buffer) is sent then the
applet hangs. To work in HTX, The fast forwarding must be correctly handled.

This patch must be backported to 1.9.

[cf: the patch adding the function channel_add_input must be backported with
this one. It does not exist in 1.8 because only responses with a C-L are cached.]
2019-01-02 20:12:49 +01:00
Willy Tarreau
14bfe9af12 CLEANUP: stream-int: consistently call the si/stream_int functions
As long-time changes have accumulated over time, the exported functions
of the stream-interface were almost all prefixed "si_<something>" while
most private ones (mostly callbacks) were called "stream_int_<something>".
There were still a few confusing exceptions, which were addressed to
follow this shcme :
  - stream_sock_read0(), only used internally, was renamed stream_int_read0()
    and made static
  - stream_int_notify() is only private and was made static
  - stream_int_{check_timeouts,report_error,retnclose,register_handler,update}
    were renamed si_<something>.

Now it is clearer when checking one of these if it risks to be used outside
or not.
2018-12-19 15:25:43 +01:00
Willy Tarreau
efef323783 BUG/MINOR: cache: also consider CF_SHUTR to abort delivery
The cache runs in an applet, so it delivers data into the input side
of the channel's buffer. Thus it must also abort feeding the buffer
as soon as CF_SHUTR is present, not just CF_SHUTW*, since these last
ones may only appear later. There doesn't seem to be an observable
side effect of this bug, the fix probably doesn't even need to be
backported.
2018-12-16 00:40:31 +01:00
Willy Tarreau
273e964f6e BUG/MEDIUM: htx/cache: use the correct class of error codes on abort
The HTX-specific cache code uses HTX_CACHE_* states which overlap with
the legacy HTTP states. A typo in the error handling made the state
become HTTP_CACHE_END, which equals 3 and is the value for HTX_CACHE_EOD,
which explains why we were seeing a transition to trailers and memory
corruption.

no backport needed.
2018-12-16 00:40:30 +01:00
Christopher Faulet
27d93c3f94 BUG/MAJOR: compression/cache: Make it really works with these both filters
Caching the response with the compression enabled was totally broken. To fix the
problem, the compression must be done after caching the response. Otherwise it
needs to change the cache to store compressed and uncompressed objects for the
same ressource. So, because it is not possible for now, it is forbidden to
declare the compression filter before the cache one. To ease the configuration,
both can be implicitly declared (without "filter" keyword). The compression will
automatically be inserted after the cache.

Then, to make it works this way, the compression filter has been slighly
modified. Now, the response headers are updated after http-response rules
evaluations, instead of before. So, if the response contains a "Content-length"
header, it will be kept with the response stored in the cache. So this cached
response will be able to be served to clients not supporting the compression at
all.
2018-12-15 23:50:07 +01:00
Willy Tarreau
a1214a501f MINOR: cache: report the number of cache lookups and cache hits
The cache lookups and hits is now accounted per frontend and per backend,
and reported on the stats page.
2018-12-14 14:00:25 +01:00
Willy Tarreau
a73da1ed25 BUG/MEDIUM: cache: fix random crash on filter parser's error path
The cconf variable was not initialized before the two first possible
error exits before being freed, resulting in random crashes instead
of displaying an error message if the cache ID was missing from the
filter declaration.

No backport is needed, this is exclusively 1.9.
2018-12-14 10:19:28 +01:00
Willy Tarreau
b96b77ed6e REORG: htx: merge types+proto into common/htx.h
All the HTX definition is self-contained and doesn't really depend on
anything external since it's a mostly protocol. In addition, some
external similar files (like h2) also placed in common used to rely
on it, making it a bit awkward.

This patch moves the two htx.h files into a single self-contained one.
The historical dependency on sample.h could be also removed since it
used to be there only for http_meth_t which is now in http.h.
2018-12-11 17:15:04 +01:00
Christopher Faulet
99a17a2d91 MEDIUM: cache: Require an explicit filter declaration if other filters are used
As for the compression filter, the cache filter must be explicitly declared
(using the filter keyword) if other filters than cache are used. It is mandatory
to explicitly define the filters order.

Documentation has been updated accordingly.
2018-12-11 17:09:31 +01:00
Christopher Faulet
afd819c54a MEDIUM: cache/compression: Add a way to safely combined compression and cache
This is only true for HTX proxies. On legacy HTTP proxy, if the compression and
the cache are both enabled, an error during HAProxy startup is triggered.

With the HTX, now you can use both in any order. If the compression is defined
before the cache, then the responses will be stored compressed. If the
compression is defined after the cache, then the responses will be stored
uncompressed. So in the last case, when a response is served from the cache, it
will compressed too like any response.
2018-12-11 17:09:31 +01:00
Christopher Faulet
f4a4ef7d7c MINOR: filters: Export the name of known filters
It could be useful to know if some filter is declared on a proxy or if it is
enabled on a stream.
2018-12-11 17:09:31 +01:00
Christopher Faulet
95220e2ed8 MINOR: cache: Improve and simplify the cache configuration check
To do so, a dedicated configuration has been added on cache filters. Before the
cache filter configuration pointed directly to the cache it used. Now, it is the
dedicated structure cache_flt_conf. Store and use rules also point to this
structure. It is linked to the cache the filter must used. It also contains a
flags field. This will allow us to define the behavior of a cache filter when a
response is stored in the cache or delivered from it.

And now, Store and use rules uses a common parsing function. So if it does not
already exists, a filter is always created for both kind of rules. The cache
filters configuration is checked using their check callback. In the postparser
function, we only check the caches configuration. This removes the loop on all
proxies in the postparser function.
2018-12-11 17:09:31 +01:00
Christopher Faulet
54a8d5a4a0 MEDIUM: cache/htx: Add the HTX support into the cache
The cache is now able to store and resend HTX messages. When an HTX message is
stored in the cache, the headers are prefixed with their block's info (an
uint32_t), containing its type and its length. Data, on their side, are stored
without any prefix. Only the value is copied in the cache. 2 fields have been
added in the structure cache_entry, hdrs_len and data_len, to known the size, in
the cache, of the headers part and the data part. If the message is chunked, the
trailers are also copied, the same way as data. When the HTX message is
recreated in the cache applet, the trailers size is known removing the headers
length and the data lenght from the total object length.
2018-12-11 17:09:31 +01:00
Christopher Faulet
67658c9c9a MINOR: cache: Register the cache as a data filter only if response is cacheable
Instead of calling register_data_filter() when the stream analyze starts, we now
call it when we are sure the response is cacheable. It is done in the
http_headers callback, just before the body analyzis, and only if the headers
was already been cached. And during the body analyzis, if an error occurred or
if the response is too big, we unregistered the cache immediatly.

This patch may be backported in 1.8. It is not a bug but a significant
improvement.
2018-12-11 17:09:31 +01:00
Christopher Faulet
1f672c536d MINOR: cache/htx: Don't use the same cache on HTX and legacy HTTP proxies
It is not possible to mix the format of messages stored in a cache. So we reject
the configurations with a cache used by an HTX proxy and a legacy HTTP proxy in
same time.
2018-12-11 17:09:31 +01:00
Willy Tarreau
8ceae72d44 MEDIUM: init: use initcall for all fixed size pool creations
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.

It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
2018-11-26 19:50:32 +01:00
Willy Tarreau
e655251e80 MINOR: initcall: use initcalls for section parsers
The two calls to cfg_register_section() and cfg_register_postparser()
are now supported by initcalls. This allowed to remove two other
constructors.
2018-11-26 19:50:32 +01:00
Willy Tarreau
0108d90c6c MEDIUM: init: convert all trivial registration calls to initcalls
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :

- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
2018-11-26 19:50:32 +01:00
Joseph Herlant
8dae5b38b8 CLEANUP: Fix typos in the cache subsystem
Fix common misspells in the code comments of the cache subsystem.
2018-11-18 22:26:42 +01:00
Willy Tarreau
db398435aa MINOR: stream-int: replace si_cant_put() with si_rx_room_{blk,rdy}()
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().

The now unused si_cant_put() function was removed.
2018-11-18 21:41:50 +01:00
Willy Tarreau
4b962a4179 MEDIUM: stream-int: fix the si_cant_put() calls used for buffer readiness
A number of calls to si_cant_put() were used in fact to request being
called back once a buffer is available. These ones are not needed anymore
since si_alloc_ibuf() already sets the SI_FL_RXBLK_BUFF flag when called
in appctx context. Those called with a foreign stream-int are simply turned
to si_rx_buff_blk().
2018-11-18 21:41:48 +01:00
Willy Tarreau
96062a181d BUILD: cache: fix a build warning regarding too large an integer for the age
Building on 32 bit gives this :

  src/cache.c: In function 'http_action_store_cache':
  src/cache.c:466:4: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
  src/cache.c:467:5: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
  src/cache.c: In function 'cache_channel_append_age_header':
  src/cache.c:578:2: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
  src/cache.c:579:3: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]

It's because of the definition below added in commit e7a770c ("MINOR:
cache: Add "Age" header.") :

  #define CACHE_ENTRY_MAX_AGE 2147483648

Just appending "U" to mark it unsigned is enough to fix it. This only
affects 1.9, no backport is needed.
2018-11-11 14:03:02 +01:00
Willy Tarreau
0cd3bd628a MINOR: stream-int: rename si_applet_{want|stop|cant}_{get|put}
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
2018-11-11 10:18:37 +01:00
Frdric Lcaille
e7a770ce80 MINOR: cache: Add "Age" header.
This patch makes the cache capable of adding an "Age" header as defined by
rfc7234.

During the storage of new HTTP objects we memorize ->eoh value and
the value of the "Age" header coming from the origin server.
These information may then be reused to return the cached HTTP objects
with a new "Age" header.

May be backported to 1.8.
2018-10-28 19:06:59 +01:00
Frdric Lcaille
4eba544e24 MINOR: cache: Avoid usage of atoi() when parsing "max-object-size".
With this patch we avoid parsing "max-object-size" with atoi() and we store its
value as an unsigned int to prevent bad implicit conversion issues especially
when we compare it with others unsigned value (content length).
2018-10-26 04:54:40 +02:00
Frdric Lcaille
bc584494e6 BUG/MINOR: cache: Wrong usage of shctx_init().
With this patch we check that shctx_init() does not returns 0.
This is possible if the maxblocks argument, which is passed as an
int, is negative due to an implicit conversion.

Must be backported to 1.8.
2018-10-26 04:54:40 +02:00
Frdric Lcaille
b9b8b6b6be BUG/MINOR: cache: Crashes with "total-max-size" > 2047(MB).
With this patch we support cache size larger than 2047 (MB) and prevent haproxy from crashing when "total-max-size" is parsed as negative values by atoi().

The limit at parsing time is 4095 MB (UINT_MAX >> 20).

May be backported to 1.8.
2018-10-26 04:54:40 +02:00
Frdric Lcaille
a2219f5e3b MINOR: cache: Add "max-object-size" option.
This patch adds "max-object-size" option to the cache to limit
the size in bytes of the HTTP objects to be cached. When not provided,
the maximum size of an HTTP object is a 256th of the cache size.
2018-10-24 04:40:03 +02:00
Frdric Lcaille
b7838afe6f MINOR: shctx: Add a maximum object size parameter.
This patch adds a new parameter to shctx_init() function to be used to
limit the size of each shared object, -1 value meaning "no limit".
2018-10-24 04:39:44 +02:00
Frdric Lcaille
8df65ae5e2 MINOR: cache: Larger HTTP objects caching.
This patch makes the capable of storing HTTP objects larger than a buffer.
It makes usage of the "block by block shared object allocation" new shctx API.

A new pointer to struct shared_block has been added to the cache applet
context to memorize the next block to be used by the HTTP cache I/O handler
http_cache_io_handler() to emit the data. Another member, named "sent" memorize
the number of bytes already sent by this handler. So, to send an object from cache,
http_cache_io_handler() must be called until "sent" counter reaches the size
of this object.
2018-10-24 04:37:12 +02:00
Frdric Lcaille
0bec807e08 MINOR: shctx: Shared objects block by block allocation.
This patch makes shctx capable of storing objects in several parts,
each parts being made of several blocks. There is no more need to
walk through until reaching the end of a row to append new blocks.

A new pointer to a struct shared_block member, named last_reserved,
has been added to struct shared_block so that to memorize the last block which was
reserved by shctx_row_reserve_hot(). Same thing about "last_append" pointer which
is used to memorize the last block used by shctx_row_data_append() to store the data.
2018-10-24 04:35:53 +02:00
Willy Tarreau
61c112aa5b REORG: http: move HTTP rules parsing to http_rules.c
These ones are mostly called from cfgparse.c for the parsing and do
not depend on the HTTP representation. The functions's prototypes
were moved to proto/http_rules.h, making this file work exactly like
tcp_rules. Ideally we should stop calling these functions directly
from cfgparse and register keywords, but there are a few cases where
that wouldn't work (stats http-request) so it's probably not worth
trying to go this far.
2018-10-02 18:28:05 +02:00
Willy Tarreau
6b952c8101 REORG: http: move http_get_path() to http.c
This function is purely HTTP once http_txn is put aside. So the original
one was renamed to http_txn_get_path() and it extracts the relevant offsets
from the txn to pass them to http_get_path(). One benefit of the new version
is that it returns the length at the same time so that allowed to slightly
simplify http_get_path_from_string() which had to look up the end pointer
previously and which is not needed anymore.
2018-09-11 10:30:25 +02:00
Willy Tarreau
83061a820e MAJOR: chunks: replace struct chunk with struct buffer
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
2018-07-19 16:23:43 +02:00
Willy Tarreau
843b7cbe9d MEDIUM: chunks: make the chunk struct's fields match the buffer struct
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.

Most of the changes were made with spatch using this coccinelle script :

  @rule_d1@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.str
  + chunk.area

  @rule_d2@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.len
  + chunk.data

  @rule_i1@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->str
  + chunk->area

  @rule_i2@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->len
  + chunk->data

Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
2018-07-19 16:23:43 +02:00
Willy Tarreau
c9fa0480af MAJOR: buffer: finalize buffer detachment
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.

The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.

The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.

The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
2018-07-19 16:23:43 +02:00
Willy Tarreau
178b987025 MINOR: cache: use the new buffer API
A few direct accesses to buf->p now use ci_head() instead.
2018-07-19 16:23:42 +02:00
Olivier Houchard
acd1403794 MINOR: buffer: Use b_add()/bo_add() instead of accessing b->i/b->o.
Use the newly available functions instead of using the buffer fields directly.
2018-07-19 16:23:42 +02:00
Willy Tarreau
8f9c72d301 MINOR: buffer: remove bi_end()
It was replaced by ci_tail() when the channel is known, or b_tail() in
other cases.
2018-07-19 16:23:40 +02:00
Willy Tarreau
dda2e41881 MINOR: buffer: remove bi_ptr()
It's now been replaced by b_head() when b->o is null, ci_head() when
the channel is known, or b_peek(b, b->o) in other situations.
2018-07-19 16:23:40 +02:00
Willy Tarreau
7194d3cc3b MINOR: buffer: split bi_contig_data() into ci_contig_data and b_config_data()
This function was sometimes used from a channel and sometimes from a buffer.
In both cases it requires knowledge of the size of the output data (to skip
them). Here the split ensures the channel can deal with this point, and that
other places not having output data can continue to work.
2018-07-19 16:23:40 +02:00
Willy Tarreau
bcbd39370f MINOR: channel/buffer: replace b_{adv,rew} with c_{adv,rew}
These ones manipulate the output data count which will be specific to
the channel soon, so prepare the call points to use the channel only.
The b_* functions are now unused and were removed.
2018-07-19 16:23:40 +02:00
Aurlien Nephtali
abbf607105 MEDIUM: cli: Add payload support
In order to use arbitrary data in the CLI (multiple lines or group of words
that must be considered as a whole, for example), it is now possible to add a
payload to the commands. To do so, the first line needs to end with a special
pattern: <<\n. Everything that follows will be left untouched by the CLI parser
and will be passed to the commands parsers.

Per-command support will need to be added to take advantage of this
feature.

Signed-off-by: Aurlien Nephtali <aurelien.nephtali@corp.ovh.com>
2018-04-26 14:19:33 +02:00
Willy Tarreau
1093a4586c BUG/MAJOR: cache: always initialize newly created objects
Recent commit 5bd37fa ("BUG/MAJOR: cache: fix random crashes caused by
incorrect delete() on non-first blocks") addressed an issue where
dangling objects could be deleted in the cache, but even after this fix
some similar segfaults were reported at the same place (cache_free_blocks()).
The tree was always corrupted as well. Placing some traces revealed
that this time it's caused by a missing initialization in
http_action_store_cache() : while object->eb.key is used to note that
the object is not in the tree, the first retrieved block may contain
random data and is not initialized. Further, this entry can be updated
later without the object being inserted into the tree. Thus, if at the
end the object is not stored and the blocks are put back to the avail
list, the next attempt to use them will find eb.key != 0 and will try
to delete the uninitialized block, will see that eb.node.leaf_p is not
NULL (random data), and will dereference it as well as a few other
uninitialized pointers. It was harder to trigger than the previous one,
despite being very closely related. This time the following config
was used :

  listen l1
    mode http
    bind :8888
    http-request cache-use c1
    http-response cache-store c1
    server s1 127.0.0.1:8000

  cache c1
    total-max-size 4
    max-age 10

Httpterm was running on port 8000.

And it was stressed this way :

  $ inject -o 1 -u 500 -P 1 -G '127.0.0.1:8888/?s=4097&p=1&x=%s'
  ... wait 5 seconds then Ctrl-C ...
  # wait 3 seconds doing nothing
  $ inject -o 1 -u 500 -P 1 -G '127.0.0.1:8888/?s=4097&p=1&x=%s'
  => segfault

Other values don't work well. The size and the small pieces in the
responses (p=1) are critical to make it work.

Here the fix consists in pre-zeroing object->eb.key AND object->eb.leaf_p
just after the object is allocated so as to stay consistent with other
locations. Ideally this could be simplified later by only relying on
eb->node.leaf_p everywhere since in the end the key alone is not a
reliable indicator, so that we use only one indicator of being part of
the tree or not.

This fix needs to be backported to 1.8.
2018-04-06 19:02:25 +02:00
Willy Tarreau
5bd37fa625 BUG/MAJOR: cache: fix random crashes caused by incorrect delete() on non-first blocks
Several segfaults were reported in the cache, each time in eb_delete()
called from cache_free_blocks() itself called from shctx_row_reserve_hot().
Each time the tree node was corrupted with random cached data (often JS or
HTML contents).

The problem comes from an incompatibility between the cache's expectations
and the recycling algorithm used in the shctx. The shctx allocates and
releases a chain of blocks at once. And when it needs to allocate N blocks
from the avail list while a chain of M>N is found, it picks the first N
from the list, moves them to the hot list, and marks all remaining M-N
blocks as isolated blocks (chains of 1).

For each such released block, the shctx->free_block() callback is used
and passed a pointer to the first and current block of the chain. For
the cache, it's cache_free_blocks(). What this function does is check
that the current block is the first one, and in this case delete the
object from the tree and mark it as not in tree by setting key to zero.

The problem this causes is that the tail blocks when M>N become first
blocks for the next call to shctx_row_reserve_hot(), these ones will
be passed to cache_free_blocks() as list heads, and will be sent to
eb_delete() despite containing only cached data.

The simplest solution for now is to mark each block as holding no cache
object by setting key to zero all the time. It keeps the principle used
elsewhere in the code. The SSL code is not subject to this problem
because it relies on the block's len not being null, which happens
immediately after a block was released. It was uncertain however whether
this method is suitable for the cache. It is not critical though since
this code is going to change soon in 1.9 to dynamically allocate only
the number of required blocks.

This fix must be backported to 1.8. Thanks to Thierry for providing
exploitable cores.
2018-04-04 20:17:03 +02:00
Willy Tarreau
afe1de5d98 BUG/MINOR: cache: fix "show cache" output
The "show cache" command used to dump the header for each entry into into
the handler loop, making it repeated every ~16kB of output data. Additionally
chunk_appendf() was used instead of chunk_printf(), causing the output to
repeat already emitted lines, and the output size to grow in O(n^2). It used
to take several minutes to report tens of millions of objects from a small
cache containing only a few thousands. There was no more impact though.

This fix must be backported to 1.8.
2018-04-04 11:56:43 +02:00
Willy Tarreau
d4569d1937 BUG/MEDIUM: cache: don't cache the response on no-cache="set-cookie"
If the server mentions no-cache="set-cookie" in the response headers,
we must guarantee that any set-cookie field will not be stored. We
cannot edit the stored response on the fly to trim the set-cookie
header so we can refrain from storing a response containing such a
header. In theory we could use TX_SCK_PRESENT for this but this one
is only set when the cookie is being watched by the configuration.
Since these responses are not very frequent and often accompanied
with a set-cookie header, let's simply refrain from caching whenever
such directive is present.

This needs to be backported to 1.8.
2017-12-22 18:03:04 +01:00
Willy Tarreau
504455c533 BUG/MEDIUM: cache: respect the request cache-control header
Till now if a client emitted a request featureing a cache-control header,
this one was not respected and a stale object could still be delievered.r
 This patch ensures that :
  - cache-control: no-cache disables retrieval from the cache but does
    not prevent the newly fetched object from being stored ;
  - cache-control: no-store can safely retrieve from the cache but prevents
    from storing any fetched object
  - cache-control: max-age/max-stale/min-fresh act like no-cache
  - pragma: no-cache acts like cache-control: no-cache.

This needs to be backported to 1.8.
2017-12-22 17:56:18 +01:00
Willy Tarreau
c9bd34c7e0 BUG/MEDIUM: cache: replace old object on store
Currently the cache aborts a store operation if the object to store
already exists in the cache. This is used to avoid storing multiple
copies at the same time on concurrent accesses. It causes an issue
though, which is that existing unexpired objects cannot be updated.
This happens when any request criterion disables the retrieval from
the cache (eg: with max-age or any other cache-control condition).

For now, let's simply replace the previous existing entry by unlinking
it from the index. This could possibly be improved in the future if
needed.

This fix needs to be backported to 1.8.
2017-12-22 17:56:18 +01:00
Willy Tarreau
7704b1e89a BUG/MEDIUM: cache: do not try to retrieve host-less requests from the cache
All HTTP/1.1 requests the Host header share the same hash key 0 and
will be return the first cached object. Let's add the check on the call
to sha1_hosturi() to prevent this from happening.

This must be backported to 1.8.
2017-12-22 17:56:17 +01:00
Willy Tarreau
faf2909f9f BUG/MINOR: cache: do not force the TX_CACHEABLE flag before checking cacheability
The cache used to set this flag before calling
check_response_for_cacheability() due to the way the flags were previously
set (too late), but this is a bad idea as it loses the information of the
implicit caching rules related to the method and the status code. Let's
only rely on what was determined during the request and response parsing
instead and not change it.

This fix must be backported to 1.8, and it requires that the following
patches are also merged :
 - MINOR: http: adjust the list of supposedly cacheable methods
 - MINOR: http: update the list of cacheable status codes as per RFC7231
 - MINOR: http: start to compute the transaction's cacheability from the request
 - BUG/MINOR: http: do not ignore cache-control: public
2017-12-22 15:49:15 +01:00
William Lallemand
bcd9101a66 BUG/MEDIUM: cache: bad computation of the remaining size
The cache was not setting the hdrs_len to zero when we are called
in the http_forward_data with headers + body.

The consequence is to always try to store a size - the size of headers,
during the calls to http_forward_data even when it has already forwarded
the headers.

Thanks to Cyril Bont for reporting this bug.

Must be backported to 1.8.
2017-11-28 12:06:06 +01:00
Willy Tarreau
fd5efb5936 CLEANUP: cache: more efficiently pack the struct cache
By having the cache id on 33 bytes as the first member, it was
creating a hole and forcing the "hot" remaining part to be split
across two cache lines. Let's move the id at the end as it's used
only during config parsing.
2017-11-26 11:10:53 +01:00
William Lallemand
49b4453b58 MEDIUM: cache: max-age configuration keyword
Add a configuration keyword to change the max-age.
The default one is still 60s.
2017-11-24 19:31:01 +01:00
William Lallemand
a71cd1d407 MINOR: cache: replace a fprint() by an abort()
In the applet I/O handler we can never get an object bigger than a
buffer, so we should never reach this case.
2017-11-24 19:00:07 +01:00
Willy Tarreau
bafbe01028 CLEANUP: pools: rename all pool functions and pointers to remove this "2"
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
2017-11-24 17:49:53 +01:00
Olivier Houchard
fbc74e8556 MINOR/CLEANUP: proxy: rename "proxy" to "proxies_list"
Rename the global variable "proxy" to "proxies_list".
There's been multiple proxies in haproxy for quite some time, and "proxy"
is a potential source of bugs, a number of functions have a "proxy" argument,
and some code used "proxy" when it really meant "px" or "curproxy". It worked
by pure luck, because it usually happened while parsing the config, and thus
"proxy" pointed to the currently parsed proxy, but we should probably not
rely on this.

[wt: some of these are definitely fixes that are worth backporting]
2017-11-24 17:21:27 +01:00
Christopher Faulet
767a84bcc0 CLEANUP: log: Rename Alert/Warning in ha_alert/ha_warning 2017-11-24 17:19:12 +01:00
William Lallemand
ecb73b12c1 MINOR: cache: move the refcount decrease in the applet release
Move the refcount decrease of the cache in the release callback of the
applet. We don't need to decrease it in the applet code.
2017-11-24 15:04:36 +01:00
William Lallemand
49dc048c25 BUG/MEDIUM: cache: free ressources in chn_end_analyze
Upon an aborted HTTP connection, or an error, the filter cache does not
decrement the refcount and does not free the allocated ressources.
2017-11-24 15:04:36 +01:00
William Lallemand
f528fff46b MEDIUM: cache: store sha1 for hashing the cache key
The cache was relying on the txn->uri for creating its key, which was a
big problem when there was no log activated.

This patch does a sha1 of the host + uri, and stores it in the txn.
When a object is stored, the eb32node uses the first 32 bits of the hash
as a key, and the whole hash is stored in the cache entry.

During a lookup, the truncated hash is used, and when it matches an
entry we check the real sha1.
2017-11-23 20:20:04 +01:00
William Lallemand
e899af89b5 BUG/MEDIUM: cache fix cli_kws structure
The cli_kws structure was not ended and was causing undefined behavior.
2017-11-22 16:56:58 +01:00
William Lallemand
55e7674bc4 BUG/MEDIUM: cache: refcount forbids to free the objects
Some refcount decrementation were forgotten and they were forbidding to
reuse the objects in some cases.
2017-11-22 15:13:54 +01:00
William Lallemand
0872766e31 BUG/MEDIUM: cache: use key=0 as a condition for freeing
The cache was trying to remove objects from the tree while they were
already removed from it. We set the key to 0 as a check for not trying
to remove the object from the tree when we are still using the object.
2017-11-22 15:13:54 +01:00
William Lallemand
1f49a366fd MEDIUM: cache: "show cache" on the cli
The cli command "show cache" displays the status of the cache, the first
displayed line is the shctx informations with how much blocks available
blocks it contains (blocks are 1k by default).

The next lines are the objects stored in the cache tree, the pointer,
the size of the object and how much blocks it uses, a refcount for the
number of users of the object, and the remaining expiration time (which
can be negative if expired)

Example:

    $ echo "show cache" | socat - /run/haproxy.sock
    0x7fa54e9ab03a: foobar (shctx:0x7fa54e9ab000, available blocks:3921)
    0x7fa54ed65b8c (size: 43190 (43 blocks), refcount:2, expire: 2)
    0x7fa54ecf1b4c (size: 45238 (45 blocks), refcount:0, expire: 2)
    0x7fa54ed70cec (size: 61622 (61 blocks), refcount:0, expire: 2)
    0x7fa54ecdbcac (size: 42166 (42 blocks), refcount:1, expire: 2)
    0x7fa54ec9736c (size: 44214 (44 blocks), refcount:2, expire: 2)
    0x7fa54eca28ec (size: 46262 (46 blocks), refcount:2, expire: -2)
2017-11-21 21:35:04 +01:00
William Lallemand
75d93291c9 CLEANUP: cache: reorder includes 2017-11-21 21:35:04 +01:00
William Lallemand
eee5c39715 CLEANUP: cache: remove wrong comment 2017-11-20 19:22:27 +01:00
William Lallemand
a400a3a6d0 BUG/MEDIUM: cache: free callback to remove from tree
Call the shctx free_blocks callback in order to remove the row from the
cache tree.

Put the row in the hot list during allocation, forbid the blocks to be
stolen by a free or a row_reserve
2017-11-20 19:22:27 +01:00
William Lallemand
e1533f5790 MINOR: cache: disable cache if shctx_row_data_append fail
Disable the cache if the append of data failed, it should never happen
because the allocated row size is at least equal to the size of the
object to allocate.
2017-11-14 15:20:44 +01:00
William Lallemand
10935bc547 MINOR: cache: forward data with headers
Forward the remaining headers with the data in the first call of
cache_store_http_forward_data().

Previously the headers were forwarded first, and the function left,
implying an additionnal call to cache_store_http_forward_data() for the
data.

Cc: Christopher Faulet <cfaulet@haproxy.com>
2017-11-14 15:20:44 +01:00
William Lallemand
9d5f54daad BUG/MEDIUM: cache: use msg->sov to forward header
Use msg->sov to forward headers instead of msg->eoh. It can causes some
problem because eoh does not contains the last \r\n, and the filter does
not support to send the headers partially.

Cc: Christopher Faulet <cfaulet@haproxy.com>
2017-11-14 15:20:44 +01:00
William Lallemand
18f133adb3 BUG/MEDIUM: cache: does not cache if no Content-Length
In the case of Transfer-Encoding: chunked, there is no Content-Length
which causes the cache to allocate a too small shctx row for the data.

It's not possible to allocate a shctx row for the chunks, we need to be
able to allocate on-the-fly the shctx blocks during the data transfer.
2017-11-11 14:01:21 +01:00
William Lallemand
9c54c53f2f BUG/MEDIUM: cache: don't try to resolve wrong filters
Don't try to resolve wrong filters which are not cache filters during
the post configuration callback.
2017-11-02 16:58:25 +01:00
Olivier Houchard
fccf840cdf MINOR: cache: Don't confuse act_return and act_parse_ret. 2017-11-01 15:10:51 +01:00
Olivier Houchard
cd2867a012 MINOR: cache: Remove useless test for nonzero.
Don't bother testing if len is nonzero, we know it is, as we're in the
"else" part of a if (!len), and testing it confuses clang into thinking
ret may be left uninitialized.
2017-11-01 15:10:51 +01:00
William Lallemand
77c1197bfb MEDIUM: cache: deliver objects from cache
Lookup objects in the cache and deliver them using the http-request
action "cache-use".
2017-10-31 21:17:19 +01:00
William Lallemand
4da3f8a1f2 MEDIUM: cache: store objects in cache
Store object in the cache. The cache use an shctx for storage.

It uses an http-response action to store the headers and a filter to
store the body. The http-response action is used in order to allow
modifications by other actions before caching.
2017-10-31 21:17:19 +01:00
William Lallemand
41db46035e MEDIUM: cache: configuration parsing and initialization
Parse a configuration section "cache" and a http-{response,request}
actions.

Example:

    listen frt
        mode http
        http-response cache-store foobar
        http-request cache-use foobar

    cache foobar
        total-max-size 4   # size in megabytes
2017-10-31 21:17:19 +01:00