mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2025-01-18 11:40:50 +00:00
BUG/MINOR: checks: postpone the startup of health checks by the boot time
When health checks are started at boot, now_ms could be off by the boot time. In general it's not even noticeable, but with very large configs taking up to one or even a few seconds to start, this can result in a part of the servers' checks being scheduled slightly in the past. As such all of them will start groupped, partially defeating the purpose of the spread-checks setting. For example, this can cause a burst of connections for the network, or an excess of CPU usage during SSL handshakes, possibly even causing some timeouts to expire early. Here in order to compensate for this, we simply add the known boot time to the computed delay when scheduling the startup of checks. That's very simple and particularly efficient. For example, a config with 5k servers in 800 backends checked every 5 seconds, that was taking 3.8 seconds to start used to show this distribution of health checks previously despite the spread-checks 50: 3690 08:59:25 417 08:59:26 213 08:59:27 71 08:59:28 428 08:59:29 860 08:59:30 918 08:59:31 938 08:59:32 1124 08:59:33 904 08:59:34 647 08:59:35 890 08:59:36 973 08:59:37 856 08:59:38 893 08:59:39 154 08:59:40 Now with the fix it shows this: 470 08:59:59 929 09:00:00 896 09:00:01 937 09:00:02 854 09:00:03 827 09:00:04 906 09:00:05 863 09:00:06 913 09:00:07 873 09:00:08 162 09:00:09 This should be backported to all supported versions. It depends on this commit: MINOR: clock: measure the total boot time For 2.8 where the internal clock is now totally independent on the human one, an more generic fix will consist in simply updating now_ms to reflect the startup time.
This commit is contained in:
parent
5723b382ed
commit
8e978a094d
@ -1475,6 +1475,7 @@ int start_check_task(struct check *check, int mininter,
|
||||
int nbcheck, int srvpos)
|
||||
{
|
||||
struct task *t;
|
||||
ulong boottime = tv_ms_remain(&start_date, &ready_date);
|
||||
|
||||
/* task for the check. Process-based checks exclusively run on thread 1. */
|
||||
if (check->type == PR_O2_EXT_CHK)
|
||||
@ -1504,7 +1505,7 @@ int start_check_task(struct check *check, int mininter,
|
||||
mininter = global.max_spread_checks;
|
||||
|
||||
/* check this every ms */
|
||||
t->expire = tick_add(now_ms, MS_TO_TICKS(mininter * srvpos / nbcheck));
|
||||
t->expire = tick_add(now_ms, MS_TO_TICKS(boottime + mininter * srvpos / nbcheck));
|
||||
check->start = now_ns;
|
||||
task_queue(t);
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user