In order to manage "If-Modified-Since" requests, we need to keep a
reference time for our cache entries (to which the conditional request's
date will be compared).
This reference is either extracted from the "Last-Modified" header, or
the "Date" header, or the reception time of the response (in decreasing
order of priority).
The date values are converted into seconds since epoch in order to ease
comparisons and to limit storage space.
Partial support of conditional HTTP requests. This commit adds the
support of the 'If-None-Match' header (see RFC 7232#3.2).
When a client specifies a list of ETags through one or more
'If-None-Match' headers, they are all compared to the one that might have
been stored in the corresponding http cache entry until one of them
matches.
If a match happens, a specific "304 Not Modified" response is
sent instead of the cached data. This response has all the stored
headers but no other data (see RFC 7232#4.1). Otherwise, the whole cached data
is sent.
Although unlikely in a GET/HEAD request, the "If-None-Match: *" syntax is
valid and also receives a "304 Not Modified" response (RFC 7434#4.3.2).
This resolves a part of GitHub issue #821.
When sent by a server for a given resource, the ETag header is
stored in the coresponding cache entry (as any other header). So in
order to perform future ETag comparisons (for subsequent conditional
HTTP requests), we keep the length of the ETag and its offset
relative to the start of the cache_entry.
If no ETag header exists, the length and offset are zero.
During the config check, the post parsing is not performed. Thus, cache filters
are not fully initialized and their cache name are never released. To be able to
release them, a flag is now set when a cache filter is fully initialized. On
deinit, if the flag is not set, it means the cache name must be freed.
The patch should fix#849. No backport needed.
[Cf: Tim is the patch author, but I added the commit message]
Using a duplicate cache name most likely is the result of a misgenerated
configuration. There is no good reason to allow this, as the duplicate
caches can't be referred to.
This commit resolves GitHub issue #820.
It can be argued whether this is a fix for a bug or not. I'm erring on the
side of caution and marking this as a "new feature". It can be considered for
backporting to 2.2, but for other branches the risk of accidentally breaking
some working (but non-ideal) configuration might be too large.
When the cache name is left out in 'filter cache' the error message refers
to a missing '<id>'. The name of the cache is called 'name' within the docs.
Adjust the error message for consistency.
The error message was introduced in 99a17a2d91.
This commit first appeared in 1.9, thus the patch must be backported to 1.9+.
The HTX_FL_EOI flag must now be set on a HTX message when no more data are
expected. Most of time, it must be set before adding the EOM block. Thus, if
there is no space for the EOM, there is still an information to know all data
were received and pushed in the HTX message. There is only an exception for the
HTTP replies (deny, return...). For these messages, the flag is set after all
blocks are pushed in the message, including the EOM block, because, on error,
we remove all inserted data.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().
This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.
There is nothing left in common/ now so this directory must not be used
anymore.
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
This one is particularly difficult to split because it provides all the
functions used to manipulate a proxy state and to retrieve names or IDs
for error reporting, and as such, it was included in 73 files (down to
68 after cleanup). It would deserve a small cleanup though the cut points
are not obvious at the moment given the number of structs involved in
the struct proxy itself.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
It was moved without any change, however many callers didn't need it at
all. This was a consequence of the split of proto_http.c into several
parts that resulted in many locations to still reference it.
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
List.h was missing for LIST_ADDQ(). A few unneeded includes of action.h
were removed from certain files.
This one still relies on applet.h and stick-table.h.
A few includes had to be added, namely list-t.h in the type file and
types/proxy.h in the proto file. actions.h was including http-htx.h
but didn't need it so it was dropped.
Most of the file was a large set of HTX elements manipulation functions
and few types, so splitting them allowed to further reduce dependencies
and shrink the build time. Doing so revealed that a few files (h2.c,
mux_pt.c) needed haproxy/buf.h and were previously getting it through
htx.h. They were fixed.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
parse_cache_flt() is the registered callback for the "cache" filter keyword. It
is only called when the "cache" keyword is found on a filter line. So, it is
useless to test the filter name in the callback function.
This patch should fix the issue #634. It may be backported as far as 1.9.
Since the HTX mode is the only mode to process HTTP messages, the stream is
created for a uniq transaction. The keep-alive is handled at the mux level. So,
the cache filter can be initialized when the stream is created and released with
the stream. Concretly, .channel_start_analyze and .channel_end_analyze callback
functions are replaced by .attach and .detach ones.
With this change, it is no longer necessary to call FLT_START_FE/BE and FLT_END
analysers for the cache filter.
During the payload filtering, the offset is relative to the head of the HTX
message and not its first index. This index is the position of the first block
to (re)start the HTTP analysis. It must be used during HTTP analysis but not
during the payload forwarding.
So, from the cache point of view, when we loop on the HTX blocks to cache the
response payload, we must start from the head of the HTX message. To ease the
loop, we use the function htx_find_offset().
This patch must be backported as far as 2.0. It depends on the commit "MINOR:
htx: Add a function to return a block at a specific an offset". So this one must
be backported first.
Enabling strict aliasing fails on the cache's hash which is a series of
20 bytes cast as u32. And in practice it could even fail on some archs
if the http_txn didn't guarantee the hash was properly aligned. Let's
use read_u32() to read the value and write_u32() to set it, this makes
sure the compiler emits the correct code to access these and knows about
the intentional aliasing.
We previously relied on chunk_cat(dst, b_fromist(src)) for this but it
is not reliable as the allocated buffer is inside the expression and
may be on a temporary stack. While it's possible to allocate stack space
for a struct and return a pointer to it, it's not possible to initialize
it form a temporary variable to prevent arguments from being evaluated
multiple times. Since this is only used to append an ist after a chunk,
let's instead have a chunk_istcat() function to perform exactly this
from a native ist.
The only call place (URI computation in the cache) was updated.
When running haproxy -c, the cache parser is trying to allocate the size
of the cache. This can be a problem in an environment where the RAM is
limited.
This patch moves the cache allocation in the post_check callback which
is not executed during a -c.
This patch may be backported at least to 2.0 and 1.9. In 1.9, the callbacks
registration mechanism is not the same. So the patch will have to be adapted. No
need to backport it to 1.8, the code is probably too different.
The recent changes to address URI issues mixed with the recent fix to
stop caching absolute URIs have caused the cache not to cache H2 requests
anymore since these ones come with a scheme and authority. Let's unbreak
this by using absolute URIs all the time, now that we keep host and
authority in sync. So what is done now is that if we have an authority,
we take the whole URI as it is as the cache key. This covers H2 and H1
absolute requests. If no authority is present (most H1 origin requests),
then we prepend "https://" and the Host header. The reason for https://
is that most of the time we don't care about the scheme, but since about
all H2 clients use this scheme, at least we can share the cache between
H1 and H2.
No backport is needed since the breakage only affects 2.1-dev.
If a request contains an absolute URI and gets its Host header field
rewritten, or just the request's URI without touching the Host header
field, it can lead to different Host and authority parts. The cache
will always concatenate the Host and the path while a server behind
would instead ignore the Host and use the authority found in the URI,
leading to incorrect content possibly being cached.
Let's simply refrain from caching absolute requests for now, which
also matches what the comment at the top of the function says. Later
we can improve this by having a special handling of the authority.
This should be backported as far as 1.8.
This reverts commit 1263540fe8.
As discussed in issues #214 and #251, this is not the correct way to
cache CORS responses, since it relies on hacking the cache to cache
the OPTIONS method which is explicitly non-cacheable and for which
we cannot rely on any standard caching semantics (cache headers etc
are not expected there). Let's roll this back for now and keep that
for a more reliable and flexible CORS-specific solution later.
The FCGI application handles all the configuration parameters used to format
requests sent to an application. The configuration of an application is grouped
in a dedicated section (fcgi-app <name>) and referenced in a backend to be used
(use-fcgi-app <name>). To be valid, a FCGI application must at least define a
document root. But it is also possible to set the default index, a regex to
split the script name and the path-info from the request URI, parameters to set
or unset... In addition, this patch also adds a FCGI filter, responsible for
all processing on a stream.
HTTP responses with headers than impinge upon the reserve must not be
cached. Otherwise, there is no warranty to have enough space to add the header
"Age" when such cached responses are delivered.
This patch must be backported to 2.0 and 1.9. For these versions, the same must
be done for the legacy HTTP mode.
In the cache, huge HTTP headers will use several shctx blocks. When a response
is returned from the cache, these headers must be properly copied in the
corresponding HTX message by updating the pointer where to copied a header
part.
This patch must be backported to 2.0 and 1.9.
Allow HAProxy to cache responses to OPTIONS HTTP requests.
This is useful in the use case of "Cross-Origin Resource Sharing" (cors)
to cache CORS responses from API servers.
Since HAProxy does not support Vary header for now, this would be only
useful for "access-control-allow-origin: *" use case.
Current HTTP cache hash contains only the Host header and the url path.
That said, request method should also be added to the mix to support
caching other request methods on the same URL. IE GET and OPTIONS.
Default HTTP error messages are stored in an array of chunks. And since the HTX
was added, these messages are also converted in HTX and stored in another
array. But now, the first array is not used anymore because the legacy HTTP mode
was removed.
So now, only the array with the HTX messages are kept. The other one was
removed.
The old module proto_http does not exist anymore. All code dedicated to the HTTP
analysis is now grouped in the file proto_htx.c. So, to finish the polishing
after removing the legacy HTTP code, proto_htx.{c,h} files have been moved in
http_ana.{c,h} files.
In addition, all HTX analyzers and related functions prefixed with "htx_" have
been renamed to start with "http_" instead.
First of all, all legacy HTTP analyzers and all functions exclusively used by
them were removed. So the most of the functions in proto_http.{c,h} were
removed. Only functions to deal with the HTTP transaction have been kept. Then,
http_msg and hdr_idx modules were entirely removed. And finally the structure
http_msg was lightened of all its useless information about the legacy HTTP. The
structure hdr_ctx was also removed because unused now, just like unused states
in the enum h1_state. Note that the memory pool "hdr_idx" was removed and
"http_txn" is now smaller.
The applet delivering cached objects based on the legacy HTTP code was removed
as the filter callback cache_store_http_forward_data(). And the action analyzing
the response coming from the server to store it in the cache or not was purged
of the legacy HTTP code.
The function http_calc_maxage() was not updated to be HTX aware. So the header
"Cache-Control" on the response was never parsed to find "max-age" or "s-maxage"
values.
This patch must be backported to 2.0 and 1.9.
Since the commit 8f3c256f7 ("MEDIUM: cache/htx: Always store info about HTX
blocks in the cache"), it is possible to read info about a data block without
sending anything. It is possible because we rely on the function htx_add_data(),
which will try to add data without any defragmentation. In such case, info about
the data block are skipped but don't count in data sent.
No need to backport this patch, expect if the commit 8f3c256f7 is backported
too.
HTTP trailers are now parsed in the same way headers are. It means trailers are
converted to K/V blocks followed by an end-of-trailer marker. For now, to make
things simple, the type for trailer blocks are not the same than for header
blocks. But the aim is to make no difference between headers and trailers by
using the same type. Probably for the end-of marker too.
It was only done for the headers (including the EOH marker). data were prefixed
by the info field of these blocks. The payload and the trailers of the messages
were stored in raw. The total size of headers and payload were kept in the
cached object state to help output formatting.
Now, info about each HTX block is store in the cache. Only data are allowed to
be splitted. Otherwise, all blocks of an HTX message are handled the same way,
both when storing a message in the cache and when delivering it from the
cache. This will help the cache implementation to be more robust to internal
changes in the HTX. Especially for the upcoming parsing of trailers. There is
also no more need to keep extra info in the cached object state.
In order to later allow htx_add_data() to transmit partial blocks and
avoid defragmenting the buffer, we'll need to return the number of bytes
consumed. This first modification makes the function do this and its
callers take this into account. At the moment the function still works
atomically so it returns either the block size or zero. However all
call places have been adapted to consider any value between zero and
the block size.
The filters filtering HTX body, in the callback http_payload, must now loop on
an HTX message starting from the first block position. The offset passed as
parameter is relative to this position and not the head one. It is mandatory
because once filtered, data are now forwarded using the function
channel_htx_fwd_payload(). So the first block position is always updated.
We don't store the start-line position anymore in the HTX message. Instead we
store the first block position to analyze. For now, it is almost the same. But
once all changes will be made on this part, this position will have to be used
by HTX analyzers, and only in the analysis context, to know where the analyse
should start.
When new blocks are added in an HTX message, if the first block position is not
defined, it is set. When the block pointed by it is removed, it is set to the
block following it. -1 remains the value to unset the position. the first block
position is unset when the HTX message is empty. It may also be unset on a
non-empty message, meaning every blocks were already analyzed.
From HTX analyzers point of view, this position is always set during headers
analysis. When they are waiting for a request or a response, if it is unset, it
means the analysis should wait. But once the analysis is started, and as long as
headers are not forwarded, it points to the message start-line.
As mentionned, outside the HTX analysis, no code must rely on the first block
position. So multiplexers and applets must always use the head position to start
a loop on an HTX message.
The first block is the start-line, if defined. Otherwise it the head of the HTX
message. So now, during HTTP analysis, lookup are all done using the first block
instead of the head. Concretely, for now, it is the same because only one HTTP
message is stored at a time in an HTX message. 1xx informational messages are
handled separatly from the final reponse and from each other. But it will make
sense when the 1xx informational messages and the associated final reponse will
be stored in the same HTX message.
Now, we only return the start-line. If not found, NULL is returned. No lookup is
performed and the HTX message is no more updated. It is now the caller
responsibility to update the position of the start-line to the right value. So
when it is not found, i.e sl_pos is set to -1, it means the last start-line has
been already processed and the next one has not been inserted yet.
It is mandatory to rely on this kind of warranty to store 1xx informational
responses and final reponse in the same HTX message.
The struct http_cache_applet was fully declared at the beginning
instead of just doing a forward declaration using an extern modifier.
Some linkers report warnings about a redefined symbol since these
really are two complete declarations.
The proper way to do this is to use extern on the first one and to
have a full declaration later. However it's not permitted to have
both static and extern so the change done in commit 0f2229943
("CLEANUP: cache: don't export http_cache_applet anymore") has to
be partially undone.
This should be backported to 1.9 for sanity but has no effet on
most platforms. However on 1.9 the extern keyword must also be
added to include/types/cache.h.
In the cache applet (in HTX and legacy HTTP), when an cached object is sent to a
client, the request must be consumed. It is done at the end, after all the
response was copied into the channel's buffer. But only outgoing data at time
the applet is called are consumed. Then the applet is closed. If a request with
a huge body is sent, an error is triggerred because a SHUTW is catched on an
unfinished request.
Now, we consume request data as soon as possible and we do it until the end. In
fact, we don't try to shutdown the request's channel for write anymore.
This patch must be backported to 1.9 after some observation period.
The body of a cached object must not be sent in response to a HEAD request. This
works for the legacy HTTP because the parsing is performed by HTTP analyzers
_AND_ because the connection is closed at the end of the transaction. So the
body is ignored. But the applet send it. For the HTX, the applet must skip the
body explicitly.
This patch must be backported to 1.9.
Only responses for GET requests are stored in the cache. But there is no check
on the method during the lookup. So it is possible to retrieve an object from
the cache independently of the method, from the time the key of the object
matches. Now, lookups are performed only for GET and HEAD requests.
This patch must be backportedi in 1.9.
When the function htx_add_stline() is used, this offset is automatically set
when necessary. But the HTX cache applet adds all header blocks of the responses
manually, including the start-line. So its offset must be explicitly set by the
applet.
When everything goes well, the HTTP analyzer http_wait_for_response() looks for
the start-line in the HTX messages, calling http_find_stline(). If necessary,
the start-line offet will also be automatically set during this stage. So the
bug of the HTX cache applet does not hurt most of the time. But, when an error
occurred, HTTP responses analyzers can be bypassed. In such caese, the
start-line offset of cached responses remains unset.
Some part of the code relies on the start-line offset to process the HTX
messages. Among others, when H2 responses are sent to clients, the H2
multiplexer read the start-line without any check, because it _MUST_ always be
there. if its offset is not set, a NULL pointer is dereferenced leading to a
segfault.
The patch must be backported to 1.9.
The cache uses the first 32 bits of the uri's hash as the key to reference
the object in the cache. It makes a special case of the value zero to mean
that the object is not in the cache anymore. The problem is that when an
object hashes as zero, it's still inserted but the eb32_delete() call is
skipped, resulting in the object still being chained in the memory area
while the block has been reclaimed and used for something else. Then when
objects which were chained below it (techically any object since zero is
at the root) are deleted, the walk through the upper object may encounter
corrupted values where valid pointers were expected.
But while this should only happen statically once on 4 billion, the problem
gets worse when the cache-use conditions don't match the cache-store ones,
because cache-store runs with an uninitialized key, which can create objects
that will never be found by the lookup code, or worse, entries with a zero
key preventing eviction of the tree node and resulting in a crash. It's easy
to accidently end up on such a config because the request rules generally
can't be used to decide on the response :
http-request cache-use cache if { path_beg /images }
http-response cache-store cache
In this test, mixing traffic with /images/$RANDOM and /foo/$RANDOM will
result in random keys being inserted, some of them possibly being zero,
and crashes will quickly happen.
The fix consists in 1) always initializing the transaction's cache_hash
to zero, and 2) never storing a response for which the hash has not been
calculated, as indicated by the value zero.
It is worth noting that objects hashing as value zero will never be cached,
but given that there's only one chance among 4 billion that this happens,
this is totally harmless.
This fix must be backported to 1.9 and 1.8.
We need to check if any compression filter precedes the cache filter. This is
only possible when the compression is configured in the frontend while the cache
filter is configured on the backend (via a cache-store action or
explicitly). This case cannot be detected during HAProxy startup. So in such
cases, the cache is disabled.
The patch must be backported to 1.9.
It is only true for HTX streams. The legacy code relies on ci_putblk() which is
already aware of the reserve. It is mandatory to not fill the reserve to let
other filters analysing data. It is especially true for the compression
filter. It needs at least 20 bytes of free space, plus at most 5 bytes per 32kB
block. So if the cache fully fills the channel's buffer, the compression will
not have enough space to do its job and it will block the data forwarding,
waiting for more free space. But if the buffer fully filled with input data (ie
no outgoing data), the stream will be frozen infinitely.
This patch must be backported to 1.9. It depends on the following patches:
* BUG/MEDIUM: cache/htx: Respect the reserve when cached objects are served
from the cache
* MINOR: channel/htx: Add HTX version for some helper functions
When a chunked object is served from the cache, If the trailers are not pushed
in the channel's buffer in one time, we still have to count them in the total
written bytes in the buffer.
This patch must be backported to 1.9.
This bug exists in the HTX code and in the legacy one. When the body length is
unknown, the applet hangs. For the legacy code, it hangs because the end of the
cached object is not correctly handled and the applet is never recalled. For the
HTX code, only the begining of the response (the 1st buffer) is sent then the
applet hangs. To work in HTX, The fast forwarding must be correctly handled.
This patch must be backported to 1.9.
[cf: the patch adding the function channel_add_input must be backported with
this one. It does not exist in 1.8 because only responses with a C-L are cached.]
As long-time changes have accumulated over time, the exported functions
of the stream-interface were almost all prefixed "si_<something>" while
most private ones (mostly callbacks) were called "stream_int_<something>".
There were still a few confusing exceptions, which were addressed to
follow this shcme :
- stream_sock_read0(), only used internally, was renamed stream_int_read0()
and made static
- stream_int_notify() is only private and was made static
- stream_int_{check_timeouts,report_error,retnclose,register_handler,update}
were renamed si_<something>.
Now it is clearer when checking one of these if it risks to be used outside
or not.
The cache runs in an applet, so it delivers data into the input side
of the channel's buffer. Thus it must also abort feeding the buffer
as soon as CF_SHUTR is present, not just CF_SHUTW*, since these last
ones may only appear later. There doesn't seem to be an observable
side effect of this bug, the fix probably doesn't even need to be
backported.
The HTX-specific cache code uses HTX_CACHE_* states which overlap with
the legacy HTTP states. A typo in the error handling made the state
become HTTP_CACHE_END, which equals 3 and is the value for HTX_CACHE_EOD,
which explains why we were seeing a transition to trailers and memory
corruption.
no backport needed.
Caching the response with the compression enabled was totally broken. To fix the
problem, the compression must be done after caching the response. Otherwise it
needs to change the cache to store compressed and uncompressed objects for the
same ressource. So, because it is not possible for now, it is forbidden to
declare the compression filter before the cache one. To ease the configuration,
both can be implicitly declared (without "filter" keyword). The compression will
automatically be inserted after the cache.
Then, to make it works this way, the compression filter has been slighly
modified. Now, the response headers are updated after http-response rules
evaluations, instead of before. So, if the response contains a "Content-length"
header, it will be kept with the response stored in the cache. So this cached
response will be able to be served to clients not supporting the compression at
all.
The cconf variable was not initialized before the two first possible
error exits before being freed, resulting in random crashes instead
of displaying an error message if the cache ID was missing from the
filter declaration.
No backport is needed, this is exclusively 1.9.
All the HTX definition is self-contained and doesn't really depend on
anything external since it's a mostly protocol. In addition, some
external similar files (like h2) also placed in common used to rely
on it, making it a bit awkward.
This patch moves the two htx.h files into a single self-contained one.
The historical dependency on sample.h could be also removed since it
used to be there only for http_meth_t which is now in http.h.
As for the compression filter, the cache filter must be explicitly declared
(using the filter keyword) if other filters than cache are used. It is mandatory
to explicitly define the filters order.
Documentation has been updated accordingly.
This is only true for HTX proxies. On legacy HTTP proxy, if the compression and
the cache are both enabled, an error during HAProxy startup is triggered.
With the HTX, now you can use both in any order. If the compression is defined
before the cache, then the responses will be stored compressed. If the
compression is defined after the cache, then the responses will be stored
uncompressed. So in the last case, when a response is served from the cache, it
will compressed too like any response.
To do so, a dedicated configuration has been added on cache filters. Before the
cache filter configuration pointed directly to the cache it used. Now, it is the
dedicated structure cache_flt_conf. Store and use rules also point to this
structure. It is linked to the cache the filter must used. It also contains a
flags field. This will allow us to define the behavior of a cache filter when a
response is stored in the cache or delivered from it.
And now, Store and use rules uses a common parsing function. So if it does not
already exists, a filter is always created for both kind of rules. The cache
filters configuration is checked using their check callback. In the postparser
function, we only check the caches configuration. This removes the loop on all
proxies in the postparser function.
The cache is now able to store and resend HTX messages. When an HTX message is
stored in the cache, the headers are prefixed with their block's info (an
uint32_t), containing its type and its length. Data, on their side, are stored
without any prefix. Only the value is copied in the cache. 2 fields have been
added in the structure cache_entry, hdrs_len and data_len, to known the size, in
the cache, of the headers part and the data part. If the message is chunked, the
trailers are also copied, the same way as data. When the HTX message is
recreated in the cache applet, the trailers size is known removing the headers
length and the data lenght from the total object length.
Instead of calling register_data_filter() when the stream analyze starts, we now
call it when we are sure the response is cacheable. It is done in the
http_headers callback, just before the body analyzis, and only if the headers
was already been cached. And during the body analyzis, if an error occurred or
if the response is too big, we unregistered the cache immediatly.
This patch may be backported in 1.8. It is not a bug but a significant
improvement.
It is not possible to mix the format of messages stored in a cache. So we reject
the configurations with a cache used by an HTX proxy and a legacy HTTP proxy in
same time.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().
The now unused si_cant_put() function was removed.
A number of calls to si_cant_put() were used in fact to request being
called back once a buffer is available. These ones are not needed anymore
since si_alloc_ibuf() already sets the SI_FL_RXBLK_BUFF flag when called
in appctx context. Those called with a foreign stream-int are simply turned
to si_rx_buff_blk().
Building on 32 bit gives this :
src/cache.c: In function 'http_action_store_cache':
src/cache.c:466:4: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:467:5: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c: In function 'cache_channel_append_age_header':
src/cache.c:578:2: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:579:3: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
It's because of the definition below added in commit e7a770c ("MINOR:
cache: Add "Age" header.") :
#define CACHE_ENTRY_MAX_AGE 2147483648
Just appending "U" to mark it unsigned is enough to fix it. This only
affects 1.9, no backport is needed.
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
This patch makes the cache capable of adding an "Age" header as defined by
rfc7234.
During the storage of new HTTP objects we memorize ->eoh value and
the value of the "Age" header coming from the origin server.
These information may then be reused to return the cached HTTP objects
with a new "Age" header.
May be backported to 1.8.
With this patch we avoid parsing "max-object-size" with atoi() and we store its
value as an unsigned int to prevent bad implicit conversion issues especially
when we compare it with others unsigned value (content length).
With this patch we check that shctx_init() does not returns 0.
This is possible if the maxblocks argument, which is passed as an
int, is negative due to an implicit conversion.
Must be backported to 1.8.
With this patch we support cache size larger than 2047 (MB) and prevent haproxy from crashing when "total-max-size" is parsed as negative values by atoi().
The limit at parsing time is 4095 MB (UINT_MAX >> 20).
May be backported to 1.8.
This patch adds "max-object-size" option to the cache to limit
the size in bytes of the HTTP objects to be cached. When not provided,
the maximum size of an HTTP object is a 256th of the cache size.
This patch makes the capable of storing HTTP objects larger than a buffer.
It makes usage of the "block by block shared object allocation" new shctx API.
A new pointer to struct shared_block has been added to the cache applet
context to memorize the next block to be used by the HTTP cache I/O handler
http_cache_io_handler() to emit the data. Another member, named "sent" memorize
the number of bytes already sent by this handler. So, to send an object from cache,
http_cache_io_handler() must be called until "sent" counter reaches the size
of this object.