From 3b8f9b7b880397db03bfa54b26c8b6dbf6b5dc5e Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Thu, 9 Jul 2020 05:01:27 +0200 Subject: [PATCH] BUG/MEDIUM: lists: add missing store barrier in MT_LIST_ADD/MT_LIST_ADDQ The torture test run for previous commit 787dc20 ("BUG/MEDIUM: lists: add missing store barrier on MT_LIST_BEHEAD()") finally broke again after 34M connections. It appeared that MT_LIST_ADD and MT_LIST_ADDQ were suffering from the same missing barrier when restoring the original pointers before giving up, when checking if the element was already added. This is indeed something which seldom happens with the shared scheduler, in case two threads simultaneously try to wake up the same tasklet. With a store barrier there after reverting the pointers, the torture test survived 750M connections on the NanoPI-Fire3, so it looks good this time. Probably that MT_LIST_BEHEAD should be added to test-list.c since it seems to be more sensitive to concurrent accesses with MT_LIST_ADDQ. It's worth noting that there is no barrier between the last two pointers update, while there is one in MT_LIST_POP and MT_LIST_BEHEAD, the latter having shown to be needed, but I cannot demonstrate why we would need one here. Given that the code seems solid here, let's stick to what is shown to work. This fix should be backported to 2.1, just for the sake of safety since the issue couldn't be triggered there, but it could change with the compiler or when backporting a fix for example. --- include/haproxy/list.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/haproxy/list.h b/include/haproxy/list.h index 1e45b63a0..9d376494e 100644 --- a/include/haproxy/list.h +++ b/include/haproxy/list.h @@ -244,6 +244,7 @@ if ((el)->next != (el) || (el)->prev != (el)) { \ (n)->prev = p; \ (lh)->next = n; \ + __ha_barrier_store(); \ break; \ } \ (el)->next = n; \ @@ -283,6 +284,7 @@ if ((el)->next != (el) || (el)->prev != (el)) { \ p->next = n; \ (lh)->prev = p; \ + __ha_barrier_store(); \ break; \ } \ (el)->next = n; \