Commit Graph

277 Commits

Author SHA1 Message Date
Aurelien DARRAGON
0a47e6bccc MEDIUM: stktable/cli: simplify entry key handling
Make use of smp_to_stkey() in table_process_entry_per_key() to simplify
key handling and leverage auto type conversions from sample API.

One noticeable side effect is that integer input checks will be relaxed
given that c_str2int() sample conv is more permissible than the integrated
table_process_entry_per_key() integer parser.
2023-11-08 16:38:06 +01:00
Aurelien DARRAGON
c6826b9570 BUG/MINOR: stick-table/cli: Check for invalid ipv4 key
When an ipv4 key is used to filter a CLI command on a stick table
clear/set/show table ...), inetaddr_host+htonl combination was used
with no error checking.

Instead, we now use inet_pton(), which is what we use for ipv6 addresses
since b7c962b0c0 ("BUG/MINOR: stick-table/cli: Check for invalid ipv6 key")

Doing this allows us to easily check for parsing errors: we're trading off
some parsing efficience to better catch input errors and ensure we get
similar behavior between ipv4 and ipv6 addresses handling.

This patch may be backported to all supported versions.
2023-11-08 16:38:06 +01:00
Aurelien DARRAGON
5158c0ff69 MEDIUM: stktable/peers: "write-to" local table on peer updates
In this patch, we add the possibility to declare on a table definition
("table" in peer section, or "stick-table" in proxy section) that we
want the remote/peer updates on that table to be pushed on a local
haproxy table in addition to the source table.

Consider this example:

  |peers mypeers
  |        peer local 127.0.0.1:3334
  |        peer clust 127.0.0.1:3333
  |        table t1.local type string size 10m store server_id,server_key expire 30s
  |        table t1.clust type string size 10m store server_id,server_key write-to mypeers/t1.local expire 30s

With this setup, we consider haproxy uses t1.local as cache/local table
for read and write operations, and that t1.clust is a remote table
containing datas processed from t1.local and similar tables from other
haproxy peers in a cluster setup. The t1.clust table will be used to
refresh the local/cache one via the "write-to" statement.

What will happen, is that every time haproxy will see entry updates for
the t1.clust table: it will overwrite t1.local table with fresh data and
will update the entry expiration timer. If t1.local entry doesn't exist
yet (key doesn't exist), it will automatically create it. Note that only
types that cannot be used for arithmetic ops will be handled, and this
to prevent processed values from the remote table from interfering with
computations based on values from the local table. (ie: prevent
cumulative counters from growing indefinitely).

"write-to" will only push supported types if they both exist in the source
and the target table. Be careful with server_id and server_key storage
because they are often declared implicitly when referencing a table in
sticking rules but it is required to declare them explicitly for them to
be pushed between a remote and a local table through "write-to" option.

Also note that the "write-to" target table should have the same type as
the source one, and that the key length should be strictly equal,
otherwise haproxy will raise an error due to the tables being
incompatibles. A table that is already being written to cannot be used
as a source table for a "write-to" target.

Thanks to this patch, it will now be possible to use sticking rules in
peer cluster context by using a local table as a local cache which
will be automatically refreshed by one or multiple remote table(s).

This commit depends on:
 - "MINOR: stktable: stktable_init() sets err_msg on error"
 - "MINOR: stktable: check if a type should be used as-is"
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
db0cb54f81 MINOR: stktable: check if a type should be used as-is
stick table types now have an extra bit named 'as_is' that allows us to
check if such type should be used as-is or if it may be involved in
arithmetic operations such as counters. This can be useful since those
types are not common and may require specific handling.

e.g.: stktable_data_types[data_type].as_is will be set to 1 if the type
cannot be used in arithmetic operations.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
b8c19f877a MINOR: stktable: stktable_init() sets err_msg on error
stktable_init() now sets err_msg when error occurs so that caller is able
to precisely report the cause of the failure.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
6376fe9142 BUG/MINOR: stktable: missing free in parse_stick_table()
When "peers" keyword is encountered within a stick table definition,
peers.name hint gets replaced with a new copy of the provided name using
strdup(). However, there is no detection on whether the name was
previously set or not, so it is currently allowed to reuse the keyword
multiple time to overwrite previous value, but here we forgot to free
previous value for peers.name before assigning it to a new one.

This should be backported to every stable versions.
2023-11-03 17:30:30 +01:00
Aurelien DARRAGON
7eb05891d8 BUG/MINOR: stktable: allow sc-add-gpc from tcp-request connection
Following the previous commit's logic, we enable the use of sc-add-gpc
from tcp-request connection since it was probably forgotten in the first
place for sc-set-gpt0, and since sc-add-gpc was inspired from it, it also
lacks its.

As sc-add-gpc was implemented in 5a72d03a58 ("MINOR: stick-table: implement
the sc-add-gpc() action"), this should only be backported to 2.8
2023-08-14 09:03:49 +02:00
Aurelien DARRAGON
6c79309fda BUG/MINOR: stktable: allow sc-set-gpt(0) from tcp-request connection
Both the documentation and original developer intents seem to suggest
that sc-set-gpt/sc-set-gpt0 actions should be available from
tcp-request connection.

Yet because it was probably forgotten when expr support was added to
sc-set-gpt0 in 0d7712dff0 ("MINOR: stick-table: allow sc-set-gpt0 to
set value from an expression") it doesn't work and will report this
kind of errors:
 "internal error, unexpected rule->from=0, please report this bug!"

Fixing the code to comply with the documentation and the expected
behavior.

This must be backported to every stable versions.

[for < 2.5, as only sc-set-gpt0 existed back then, the patch must be
manually applied to skip irrelevant parts]
2023-08-14 09:03:44 +02:00
Willy Tarreau
cfeca3a3a3 MEDIUM: stick-table: touch updates under an upgradable read lock
Instead of taking the update's write lock in stktable_touch_with_exp(),
while most of the time under high load there is nothing to update because
the entry is touched before having been synchronized present, let's do
the check under a read lock and upgrade it to perform the update if
needed. These updates are rare and the contention is not expected to be
very high, so at the first failure to upgrade we retry directly with a
write lock.

By doing so the performance has almost doubled again, from 1140 to 2050k
with a peers section enabled. The contention is now on taking the read
lock itself, so there's little to be gained beyond this in this function.
2023-08-11 19:03:35 +02:00
Willy Tarreau
87e072eea5 MEDIUM: stick-table: use a distinct lock for the updates tree
Updating an entry in the updates tree is currently performed under the
table's write lock, which causes huge contention with other accesses
such as lookups and free. Aside the updates tree, the update,
localupdate and commitupdate variables, nothing is manipulated, so
let's create a distinct lock (updt_lock) to protect these together
to remove this contention. It required to add an extra lock in the
few places where we delete the update (though only if we're really
going to delete it) to protect the tree. This is very convenient
because now peer_send_teachmsgs() only needs to take this read lock,
and there is very little contention left on the stick-table.

With this alone, the performance jumped from 614k to 1140k/s on a
80-thread machine with a peers section! Stick-table updates with
no peers however now has to stand two locks and slightly regressed
from 4.0-4.1M/s to 3.9-4.0. This is fairly minimal compared to the
significant unlocking of the peers updates and considered totally
acceptable.
2023-08-11 19:03:35 +02:00
Willy Tarreau
7968fe3889 MEDIUM: stick-table: change the ref_cnt atomically
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.

This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
  - the ref_cnt is incremented before write-unlocking on creation otherwise
    the element could vanish before we can do it
  - the ref_cnt is decremented after write-locking on release
  - for all other cases it's updated out of locks since it's guaranteed by
    the sequence that it cannot vanish
  - checks are done before locking every time it's used to decide
    whether we're going to release the element (saves several write locks)
  - expiration tests are just done using atomic loads, since there's no
    particular ordering constraint there, we just want consistent values.

For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.

For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.

On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
2023-08-11 19:03:35 +02:00
Willy Tarreau
73b1dea4d1 MINOR: stick-table: move the task_wakeup() call outside of the lock
The write lock in stktable_touch_with_exp() is quite expensive and should
be shortened as much as possible. There's no need for it when calling
task_wakeup() so let's move it out.

On a 80-thread machine with a peers section, the request rate increased
from 397k to 415k rps.
2023-08-11 19:03:35 +02:00
Willy Tarreau
322e4ab9d2 MINOR: stick-table: move the task_queue() call outside of the lock
The write lock in stktable_requeue_exp() is quite expensive and should
be shortened as much as possible. There's no need for it when calling
task_queue() so let's move it out.

On a 80-thread machine with a peers section, the request rate increased
from 368k to 397k rps.
2023-08-11 19:03:35 +02:00
Christopher Faulet
208c712b40 MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for writes but shutdowns.
2023-04-14 15:01:21 +02:00
William Lallemand
3f210970bf BUG/MINOR: stick_table: alert when type len has incorrect characters
Alert when the len argument of a stick table type contains incorrect
characters.

Replace atol by strtol.

Could be backported in every maintained versions.
2023-04-13 14:46:08 +02:00
Christopher Faulet
7faac7cf34 MINOR: tree-wide: Simplifiy some tests on SHUT flags by accessing SCs directly
At many places, we simplify the tests on SHUT flags to remove calls to
chn_prod() or chn_cons() function because the corresponding SC is available.
2023-04-05 08:57:06 +02:00
Christopher Faulet
87633c3a11 MEDIUM: tree-wide: Move flags about shut from the channel to the SC
The purpose of this patch is only a one-to-one replacement, as far as
possible.

CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
2023-04-05 08:57:06 +02:00
Aurelien DARRAGON
e2907c7ee3 MINOR: stick-table: add sc-add-gpc() to http-after-response
sc-add-gpc() was implemented in 5a72d03 ("MINOR: stick-table:
implement the sc-add-gpc() action")

This new action was exposed everywhere sc-inc-gpc() is available,
except for http-after-response.
But there doesn't seem to be a technical constraint that prevents us from
exposing it in http-after-response.
It was probably overlooked, let's add it.

No backport needed, unless 5a72d03 ("MINOR: stick-table: implement the
sc-add-gpc() action") is being backported.
2023-03-17 13:09:09 +01:00
Aleksey Ponomaryov
593802128c BUG/MEDIUM: stick-table: do not leave entries in end of window during purge
At some moments expired stick table records stop being removed. This
happens when the internal time wraps around the 32-bit limit, or every
49.7 days. What precisely happens is that some elements that are collected
close to the end of the time window (2^32 - table's "expire" setting)
might have been updated and will be requeued further, at the beginning
of the next window. Here, three bad situations happen:

  - the incorrect integer-based comparison that is not aware of wrapping
    will result in the scan to restart from the freshly requeued element,
    skipping all those at the end of the window. The net effect of this
    is that at each wakeup of the expiration task, only one element from
    the end of the window will be expired, and other ones will remain
    there for a very long time, especially if they have to wait for all
    the predecessors to be picked one at a time after slow wakeups due
    to a long expiration ; this is what was observed in issue #2034
    making the table fill up and appear as not expiring at all, and it
    seems that issue #2024 reports the same problem at the same moment
    (since such issues happen for everyone roughly at the same time
    when the clock doesn't drift too much).

  - the elements that were placed at the beginning of the next window
    are skipped as well for as long as there are refreshed entries at
    the end of the previous window, so these ones participate to filling
    the table as well. This is cause by the restart from the current,
    updated node that is generally placed after most other less recently
    updated elements.

  - once the last element at the end of the window is picked, suddenly
    there is a large amount of expired entries at the beginning of the
    next window that all have to be requeued. If the expiration delay
    is large, the number can be big and it can take a long time, which
    can very likely explain the periodic crashes reported in issue #2025.
    Limiting the batch size as done in commit dfe79251d ("BUG/MEDIUM:
    stick-table: limit the time spent purging old entries") would make
    sense for process_table_expire() as well.

This patch addresses the incorrect tree scan algorithm to make sure that:
  - there's always a next element to compare against, even when dealing
    with the last one in the tree, the first one must be used ;

  - time comparisons used to decide whether to restart from the current
    element use tick_is_lt() as it is the only case where we know the
    current element will be placed before any other one (since the tree
    respects insertion ordering for duplicates)

In order to reproduce the issue, it was found that injecting traffic on
a random key that spans over half of the size of a table whose expiration
is set to 15s while the date is going to wrap in 20s does exhibit an
increase of the table's size 5s after startup, when entries start to be
pushed to the next window. It's more effective when a second load
generator constantly hammers a same key to be certain that none of them
is ready to expire. This doesn't happen anymore after this patch.

This fix needs to be backported to all stable versions. The bug has been
there for as long as the stick tables were introduced in 1.4-dev7 with
commit 3bd697e07 ("[MEDIUM] Add stick table (persistence) management
functions and types"). A cleanup could consists in deduplicating that
code by having process_table_expire() call __stktable_trash_oldest(),
with that one improved to support an optional time check.
2023-02-08 08:55:02 +01:00
Christopher Faulet
da89e9b95b MINOR: channel/applets: Stop to test CF_WRITE_ERROR flag if CF_SHUTW is enough
In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown
for writes (CF_SHUTW) is detected. However, any write error leads to an
immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is
set.
2023-01-09 18:41:08 +01:00
Willy Tarreau
5a72d03a58 MINOR: stick-table: implement the sc-add-gpc() action
This action increments the General Purpose Counter at the index <idx> of
the array associated to the sticky counter designated by <sc-id> by the
value of either integer <int> or the integer evaluation of expression
<expr>. Integers and expressions are limited to unsigned 32-bit values.
If an error occurs, this action silently fails and the actions evaluation
continues. <idx> is an integer between 0 and 99 and <sc-id> is an integer
between 0 and 2. It also silently fails if the there is no GPC stored at
this index. The entry in the table is refreshed even if the value is zero.
The 'gpc_rate' is automatically adjusted to reflect the average growth
rate of the gpc value.

The main use of this action is to count scores or total volumes (e.g.
estimated danger per source IP reported by the server or a WAF, total
uploaded bytes, etc).
2023-01-07 09:11:22 +01:00
Willy Tarreau
6c0117168e MEDIUM: stick-table: set the track-sc limit at boottime via tune.stick-counters
The number of stick-counter entries usable by track-sc rules is currently
set at build time. There is no good value for this since the vast majority
of users don't need any, most need only a few and rare users need more.
Adding more counters for everyone increases memory and CPU usages for no
reason.

This patch moves the per-session and per-stream arrays to a pool of a size
defined at boot time. This way it becomes possible to set the number of
entries at boot time via a new global setting "tune.stick-counters" that
sets the limit for the whole process. When not set, the MAX_SESS_STR_CTR
value still applies, or 3 if not set, as before.

It is also possible to lower the value to 0 to save a bit of memory if
not used at all.

Note that a few low-level sample-fetch functions had to be protected due
to the ability to use sample-fetches in the global section to set some
variables.
2023-01-06 18:08:49 +01:00
Christopher Faulet
a92480462c MINOR: http-rules: Add missing actions in http-after-response ruleset
This patch adds the support of following actions in the http-after-response
ruleset:

  * set-map, del-map and del-acl
  * set-log-level
  * sc-inc-gpc, sc-inc-gpc0 and set-inc-gpc1
  * sc-inc-gpt and sc-set-gpt0

This patch should solve the issue #1980.
2023-01-05 11:23:59 +01:00
Willy Tarreau
20391519c3 BUG/MINOR: stick-table: report the correct action name in error message
sc-inc-gpc() learned to use arrays in 2.5 with commit 4d7ada8f9 ("MEDIUM:
stick-table: add the new arrays of gpc and gpc_rate"), but the error
message says "sc-set-gpc" instead of "sc-inc-gpc". Let's fix this to
avoid confusion.

This can be backported to 2.5.
2023-01-02 17:35:50 +01:00
Willy Tarreau
d5cae6a0c7 MINOR: stick-table: change the API of the function used to calculate the shard
The function used to calculate the shard number currently requires a
stktable_key on input for this. Unfortunately, it happens that peers
currently miss this calculation and they do not provide stktable_key
at all, instead they're open-coding all the low-level stick-table work
(hence why it's missing). Thus we'll need to be able to calculate the
shard number in keys coming from peers as well but the current API does
not make it possible.

This commit addresses this by inverting the order where the length and
the shard number are used. Now the low-level function is independent on
stksess and stktable_key, it takes a table, pointer and length and does
all the job. The upper function takes care of the type and key to get
the its length, and is for use only from stick-table code.

This doesn't change anything except that the low-level one will be usable
from outside (hence why it's exported now).
2022-11-29 18:06:42 +01:00
Willy Tarreau
e548a7af45 BUG/MINOR: peers: always initialize the stksess shard value
We need to initialize the shard value in __stksess_init() because there is
not necessarily a key to make it happen later, resulting in an uninitialized
shard value appearing in the entry, typically when entries are learned from
peers. This fixes commit 36d156564 ("MINOR: peers: Support for peer shards"),
no backport is needed.

Note however that it is not sufficient to completely fix the peers code, the
shard value remains zero because the setting of the key was open-coded in
the peers code and these parts were not identified when adding support for
shards.
2022-11-29 16:33:37 +01:00
Willy Tarreau
16b282f4b0 MINOR: stick-table: show the shard number in each entry's "show table" output
Stick-tables support sharding to multiple peers but there was no way to
know to what shard an entry was going to be sent. Let's display this in
the "show table" output to ease debugging.
2022-11-29 12:00:49 +01:00
Willy Tarreau
56460ee52a MINOR: stick-table: store a per-table hash seed and use it
Instead of using memcpy() to concatenate the table's name to the key when
allocating an stksess, let's compute once for all a per-table seed at boot
time and use it to calculate the key's hash. This saves two memcpy() and
the usage of a chunk, it's always nice in a fast path.

When tested under extreme conditions with a 80-byte long table name, it
showed a 1% performance increase.
2022-11-28 18:58:06 +01:00
Willy Tarreau
a4728584ff BUILD: stick-tables: fix build breakage in xxhash on older compilers
Commit 36d156564 ("MINOR: peers: Support for peer shards") reintroduced
a direct dependency on import/xxhash.h which was previously dropped by
commit d5fc8fcb8 ("CLEANUP: Add haproxy/xxhash.h to avoid modifying
import/xxhash.h"). This results in xxhash.h being included twice, which
breaks the build on older compilers which do not like seeing XXH32_hash_t
being defined twice.

Let's just use include/haproxy/xxhash.h instead.

No backport is needed.
2022-11-24 07:38:13 +01:00
Willy Tarreau
fbb934da90 BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task
Pierre Cheynier reported a rare crash that can affect stick-tables. When
a entry is created, the stick-table's expiration date is updated. But if
at exactly the same time the expiration task runs, it finishes by updating
its expiration timer without any protection, which may collide with the
call to task_queue() in another thread. In this case, it sometimes happens
that the first test for TICK_ETERNITY in task_queue() passes, then the
"expire" field is reset, then the BUG_ON() triggers, like below:

  FATAL: bug condition "task->expire == 0" matched at src/task.c:279
    call trace(13):
    |       0x649d86 [c6 04 25 01 00 00 00 00]: __task_queue+0xc6/0xce
    |       0x596bef [eb 90 ba 03 00 00 00 be]: stktable_requeue_exp+0x1ef/0x258
    |       0x596c87 [48 83 bb 90 00 00 00 00]: stktable_touch_with_exp+0x27/0x312
    |       0x563698 [48 8b 4c 24 18 4c 8b 4c]: stream_process_counters+0x3a8/0x6a2
    |       0x569344 [49 8b 87 f8 00 00 00 48]: process_stream+0x3964/0x3b4f
    |       0x64a80b [49 89 c7 e9 23 ff ff ff]: run_tasks_from_lists+0x3ab/0x566
    |       0x64ad66 [29 44 24 14 8b 7c 24 14]: process_runnable_tasks+0x396/0x71e
    |       0x6184b2 [83 3d 47 b3 a6 00 01 0f]: run_poll_loop+0x92/0x4ff
    |       0x618acf [48 8b 1d aa 20 7d 00 48]: main+0x1877ef
    | 0x7fc7d6ec1e45 [64 48 89 04 25 30 06 00]: libpthread:+0x7e45
    | 0x7fc7d6c9e4af [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a

This one is extremely difficult to reproduce in practice, but adding a
printf() in process_table_expire() before assigning the value, while
running with an expire delay of 1ms helps a lot and may trigger the
crash in less than one minute on a 8-thread machine. Interestingly,
depending on the sequencing, this bug could also have made a table fail
to expire if the expire field got reset after the last update but before
the call to task_queue(). It would require to be quite unlucky so that
the table is never touched anymore after the race though.

The solution taken by this patch is to take the table's lock when
updating its expire value in stktable_requeue_exp(), enclosing the call
to task_queue(), and to update the task->expire field while still under
the lock in process_table_expire(). Note that thanks to previous changes,
taking the table's lock for the update in stktable_requeue_exp() costs
almost nothing since we now have the guarantee that this is not done more
than 1000 times a second.

Since process_table_expire() sets the timeout after returning from
stktable_trash_expired() which just released the lock, the two functions
were merged so that the task's expire field is updated while still under
the lock. Note that this heavily depends on the two previous patches
below:

  CLEANUP: stick-table: remove the unused table->exp_next
  OPTIM: stick-table: avoid atomic ops in stktable_requeue_exp() when possible

This is a bit complicated due to the fact that in 2.7 some parts were
made lockless. In 2.6 and older, the second part (the merge of the
two functions) will be sufficient since the task_queue() call was
already performed under the table's lock, and the patches above are
not needed.

This needs to be backported as far as 1.8 scrupulously following
instructions above.
2022-11-14 18:20:38 +01:00
Willy Tarreau
3238f79d12 OPTIM: stick-table: avoid atomic ops in stktable_requeue_exp() when possible
Since the task's time resolution is the millisecond we know there
will not be more than 1000 useful updates per second, so there's no
point in doing a CAS and a task_queue() for each call, better first
check if we're going to change the date. Now we're certain not to
perform such operations more than 1000 times a second for a given
table.

The loop was modified because this improvement will also be used to fix a
bug later.
2022-11-14 18:20:38 +01:00
Willy Tarreau
6342714052 CLEANUP: stick-table: remove the unused table->exp_next
The ->exp_next field of the stick-table was probably useful in 1.5 but
it currently only carries a copy of what the future value of the table's
task's expire value will be, while it's systematically copied over there
immediately after being assigned. As such it provides exactly a local
variable. Let's remove it, as it costs atomic operations.
2022-11-14 18:20:38 +01:00
Frédéric Lécaille
36d1565640 MINOR: peers: Support for peer shards
Add "shards" new keyword for "peers" section to configure the number
of peer shards attached to such secions. This impact all the stick-tables
attached to the section.
Add "shard" new "server" parameter to configure the peers which participate to
all the stick-tables contents distribution. Each peer receive the stick-tables updates
only for keys with this shard value as distribution hash. The "shard" value
is stored in ->shard new server struct member.
cfg_parse_peers() which is the function which is called to parse all
the lines of a "peers" section is modified to parse the "shards" parameter
stored in ->nb_shards new peers struct member.
Add srv_parse_shard() new callback into server.c to pare the "shard"
parameter.
Implement stksess_getkey_hash() to compute the distribution hash for a
stick-table key as the 64-bits xxhash of the key concatenated to the stick-table
name. This function is called by stksess_setkey_shard(), itself
called by the already implemented function which create a new stick-table
key (stksess_new()).
Add ->idlen new stktable struct member to store the stick-table name length
to not have to compute it each time a stick-table key hash is computed.
2022-10-24 10:55:53 +02:00
Amaury Denoyelle
3e0648837c BUG/MINOR: stick-table: fix build with DEBUG_THREAD
Compilation is broken with DEBUG_THREAD since the following patch
  76642223f0
  MEDIUM: stick-table: switch the table lock to rwlock

Fix this by updating a legacy HA_SPIN_INIT() to HA_RWLOCK_INIT().

No backport needed unless the mentionned patch is backported.
2022-10-12 16:54:59 +02:00
Willy Tarreau
cbdb528a76 MEDIUM: stick-table: requeue the wakeup task out of the write lock
We don't need to call stktable_requeue_exp() with the table's lock
held anymore, so let's move it out. It should slightly reduce the
contention on the write lock, though it is now already quite low.
2022-10-12 14:19:05 +02:00
Willy Tarreau
dbae89e09c MEDIUM: stick-table: always use atomic ops to requeue the table's task
We're generalizing the change performed in previous commit "MEDIUM:
stick-table: requeue the expiration task out of the exclusive lock"
to stktable_requeue_exp() so that it can also be used by callers of
__stktable_store(). At the moment there's still no visible change
since it's still called under the write lock. However, the previous
code in stitable_touch_with_exp() was updated to use this function.
2022-10-12 14:19:05 +02:00
Willy Tarreau
eb23e3e243 MINOR: stick-table: split stktable_store() between key and requeue
__staktable_store() performs two distinct things, one is to insert a key
and the other one is to requeue the task's expiration date. Since the
latter might be done without a lock, let's first split the function in
two halves. For now this has no impact.
2022-10-12 14:19:05 +02:00
Willy Tarreau
e3f5ae895a MEDIUM: stick-table: requeue the expiration task out of the exclusive lock
With 48 threads, a heavily loaded table with plenty of trackers and
rules and a short expiration timer of 10ms saturates the CPU at 232k
rps. By carefully using atomic ops we can make sure that t->exp_next
and t->task->expire converge to the earliest next expiration date and
that all of this can be performed under atomic ops without any lock.
That's what this patch is doing in stktable_touch_with_exp(). This is
sufficient to double the performance and reach 470k rps.

It's worth noting that __stktable_store() uses a mix of eb32_insert()
and task_queue, and that the second part of it could possibly benefit
from this, even though sometimes it's called under a lock that was
already held.
2022-10-12 14:19:05 +02:00
Willy Tarreau
e62885237c MEDIUM: stick-table: make stktable_set_entry() look up under a read lock
On a 24-core machine having some "stick-store response" rules, a lot of
time is spent in the write lock in stktable_set_entry(). Let's apply the
same mechanism as for the stktable_get_entry() consisting in looking up
the value under the read lock and upgrading it to a write lock only to
perform modifications. Here we even have the luxury of upgrading the
lock since there are no alloc/free in the path. All this increases the
performance by 40% (from 363k to 510k rps).
2022-10-12 14:19:05 +02:00
Willy Tarreau
996f1a5124 MEDIUM: stick-table: do not take a lock to update t->current anymore.
We don't need to be protected by the table's lock when touching t->current
if we do it using atomics, and that's great because it allows us to have
a cleaner stksess_new() that doesn't require a lock either, and to avoid
manipulating pools under a lock.

That's another 1% performance gain from 2.07 to 2.10M req/s under 48
threads.
2022-10-12 14:19:05 +02:00
Willy Tarreau
47f229702e MEDIUM: stick-table: make stktable_get_entry() look up under a read lock
On a 24-core machine doing lots of track-sc, it was found that the lock
in stktable_get_entry() was responsible for 25% of the CPU alone. It's
sad because most of its job is to protect the table during the lookup.

Here we're taking a slightly different approach: the lock is first taken
for reads during the lookup, and only in case of failure we switch it for
a write lock. We don't even perform an upgrade here since an allocation
is needed between the two, it would be wasted to do it under the lock,
and is generally not a good idea, so better release the read lock and
try again.

Here the performance under 48 threads with 3 trackers on the same table
jumped from 455k to 2.07M, or 4.55x! Note that the same approach should
be possible for stktable_set_entry().
2022-10-12 14:19:05 +02:00
Willy Tarreau
a7d6a1396e MEDIUM: stick-table: switch to rdlock in stktable_lookup() and lookup_key()
These functions do not modify anything in the the table except the refcount
on success. Let's just lock the table for shared accesses and make use of
atomic ops to update the refcount. This brings a nice gain from 425k to
455k under 48 threads (7%), but some contention remains on the exclusive
locks in other parts.

Note that the refcount continues to be updated under the lock because it's
not yet certain whether there are races between it and some of the exclusive
lock on the table. The difference is marginal and we prefer to stay on the
safe side for now.
2022-10-12 14:19:05 +02:00
Willy Tarreau
175aa06232 MEDIUM: stick-table: free newly allocated stkess if it couldn't be inserted
In __stktable_get_entry() now we're planning for the possibility that the
call to __stktable_store() doesn't add the newly allocated entry and instead
finds a previously inserted one. At the moment this doesn't exist because
the lookup + insert passes are made under the same lock. But it will soon
change.
2022-10-12 14:19:05 +02:00
Willy Tarreau
d2d3fd9b5e MEDIUM: stick-table: return inserted entry in __stktable_store()
This function is used to create an entry in the table. But it doesn't
consider the possibility that the entry already exists, because right
now it's only called in situations where it was verified under a lock
that it doesn't exist. Since we'll soon need to break that assumption
we need it to verify that the requested entry was added and to return
a pointer to the one in the tree so that the caller can detect any
possible conflict. At the moment this is not used.
2022-10-12 14:19:05 +02:00
Willy Tarreau
8d3c3336f9 MEDIUM: stick-table: make stksess_kill_if_expired() avoid the exclusive lock
stream_store_counters() calls stksess_kill_if_expired() for each active
counter. And this one takes an exclusive lock on the table before
checking if it has any work to do (hint: it almost never has since it
only wants to delete expired entries). However a lock is still neeed for
now to protect the ref_cnt, but we can do it atomically under the read
lock.

Let's change the mechanism. Now what we do is to check out of the lock
if the entry is expired. If it is, we take the write lock, expire it,
and decrement the refcount. Otherwise we just decrement the refcount
under a read lock. With this change alone, the config based on 3
trackers without the previous patches saw a 2.6x improvement, but here
it doesn't yet change anything because some heavy contention remains
on the lookup part.
2022-10-12 14:19:05 +02:00
Willy Tarreau
a7536ef9e1 MEDIUM: stick-table: only take the lock when needed in stktable_touch_with_exp()
As previously mentioned, this function currently holds an exclusive lock
on the table during all the time it take to check if the entry needs to
be updated and synchronized with peers. The reality is that many setups
do not use peers and that on highly loaded setups, the same entries are
hammered all the time so the key's expiration doesn't change between a
number of consecutive accesses.

With this patch we take a different approach. The function starts
without taking the lock, and will take it only if needed, keeping track
of it. This way we can avoid it most of the time, or even entirely.
Finally if the decrefcnt argument requires that the refcount is
decremented, we either do it using a non-atomic op if the table was
locked (since no other entry may touch it) or via an atomic under the
read lock only.

With this change alone, a 48-thread test with 3 trackers increased
from 193k req/s to 425k req/s, which is a 2.2x factor.
2022-10-12 14:19:05 +02:00
Willy Tarreau
9f5cb435b6 MINOR: stick-table: move the write lock inside stktable_touch_with_exp()
Taking the write lock prior to entering that function is a problem
because this function is full of conditions that most of the time can
lead to eliminating the lock.

This commit first moves the write lock inside the function and passes
the extra argument required to implement stktable_touch_remote() and
stktable_touch_local(). It also renames the function to remove the
underscores since there's no other variant and it's exported under
this name (probably an old rename that was not propagated). The code
was stressed under 48 threads using 3 trackers on the same table. It
already shows a tiny 3% improvement from 187k to 193k rps.
2022-10-12 14:19:05 +02:00
Willy Tarreau
4be073b99b MINOR: stick-table: do not take an exclusive lock when downing ref_cnt
At plenty of places we decrement ts->ref_cnt under the write lock
because it's held. We don't technically need it to be done that way
if there's contention and an atomic could suffice. However until all
places are turned to atomic, we at least need to do that under a
read lock for now, so that we don't mix atomic and non-atomic uses.
Regardless it already brings ~1.5% req rate improvement with 3 trackers
on the same table under 48 threads at 184k->187k rps.
2022-10-12 14:19:05 +02:00
Willy Tarreau
76642223f0 MEDIUM: stick-table: switch the table lock to rwlock
Right now a spinlock is used, but most accesses are for reads, so let's
switch the lock to an rwlock and switch all accesses to exclusive locks
for now. There should be no visible difference at this point.
2022-10-12 14:19:05 +02:00
Frédéric Lécaille
bbeec37b31 MINOR: stick-table: Add table_expire() and table_idle() new converters
table_expire() returns the expiration delay for a stick-table entry associated
to an input sample. Its counterpart table_idle() returns the time the entry
remained idle since the last time it was updated.
Both converters may take a default value as second argument which is returned
when the entry is not present.
2022-08-17 10:52:15 +02:00