haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-02-01 02:52:00 +00:00

History

Willy Tarreau 5bd73063ab BUG/MEDIUM: task: be careful not to run too many tasks at TL_URGENT A test on large objects revealed a big performance loss from 2.1. The cause was found to be related to cache locality between scheduled operations that are batched using tasklets. It happens that we now have several layers of tasklets and that queuing all these operations leaves time to let memory objects cool down in the CPU cache, effectively resulting in halving the performance. A quick test consisting in putting most unknown tasklets into the BULK queue almost fixed the performance regression, but this is a wrong approach as it can also slow down some low-latency transfers or access to applets like the CLI. What this patch does instead is to queue unknown tasklets into the same queue as the current one when tasklet_wakeup() is itself called from a task/tasklet, otherwise it uses urgent for real I/O (when sched->current is NULL). This results in the called tasklet being woken up much sooner, often at the end of the current batch of tasklets. By doing so, a test on 2 cores 4 threads with 256 concurrent H1 conns transferring 16m objects with 256kB buffers jumped from 55 to 88 Gbps. It's even possible to go as high as 101 Gbps by evaluating the URGENT queue after the BULK one, though this was not done as considered dangerous for latency sensitive operations. This reinforces the importance of getting back the CPU transfer mechanisms based on tasklet_wakeup_after() to work at the tasklet level by supporting an immediate wakeup in certain cases. No backport is needed, this is strictly 2.2.	2020-06-23 16:45:28 +02:00
..
haproxy	BUG/MEDIUM: task: be careful not to run too many tasks at TL_URGENT	2020-06-23 16:45:28 +02:00
import	BUG/MEDIUM: ebtree: use a byte-per-byte memcmp() to compare memory blocks	2020-06-16 11:30:33 +02:00

Willy Tarreau 5bd73063ab BUG/MEDIUM: task: be careful not to run too many tasks at TL_URGENT

A test on large objects revealed a big performance loss from 2.1. The
cause was found to be related to cache locality between scheduled
operations that are batched using tasklets. It happens that we now
have several layers of tasklets and that queuing all these operations
leaves time to let memory objects cool down in the CPU cache, effectively
resulting in halving the performance.

A quick test consisting in putting most unknown tasklets into the BULK
queue almost fixed the performance regression, but this is a wrong
approach as it can also slow down some low-latency transfers or access
to applets like the CLI.

What this patch does instead is to queue unknown tasklets into the same
queue as the current one when tasklet_wakeup() is itself called from a
task/tasklet, otherwise it uses urgent for real I/O (when sched->current
is NULL). This results in the called tasklet being woken up much sooner,
often at the end of the current batch of tasklets.

By doing so, a test on 2 cores 4 threads with 256 concurrent H1 conns
transferring 16m objects with 256kB buffers jumped from 55 to 88 Gbps.
It's even possible to go as high as 101 Gbps by evaluating the URGENT
queue after the BULK one, though this was not done as considered
dangerous for latency sensitive operations.

This reinforces the importance of getting back the CPU transfer
mechanisms based on tasklet_wakeup_after() to work at the tasklet level
by supporting an immediate wakeup in certain cases.

No backport is needed, this is strictly 2.2.

2020-06-23 16:45:28 +02:00

haproxy

BUG/MEDIUM: task: be careful not to run too many tasks at TL_URGENT

2020-06-23 16:45:28 +02:00

import

BUG/MEDIUM: ebtree: use a byte-per-byte memcmp() to compare memory blocks

2020-06-16 11:30:33 +02:00