From 5064ab6a982a82c275393220716b75c13789de89 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Wed, 17 Feb 2021 16:26:55 +0100 Subject: [PATCH] OPTIM: lb-leastconn: do not unlink the server if it did not change Due to the two-phase server reservation, there are 3 calls to fwlc_srv_reposition() per request, one during assign_server() to reserve the slot, one in connect_server() to commit it, and one in process_stream() to release it. However only one of the first two will change the key, so it's needlessly costly to take the lock, remove a server and insert it again at the same place when we can already figure we ought not to even have taken the lock. Furthermore, even when the server needs to move, there can be quite some contention on the lbprm lock forcing the thread to wait. During this time the served and nbpend server values might have changed, just like the lb_node.key itself. Thus we measure the values again under the lock before updating the tree. Measurements have shown that under contention with 16 servers and 16 threads, 50% of the updates can be avoided there. This patch makes the function compute the new key and compare it to the current one before deciding to move the entry (and does it again under the lock forthe second test). This removes between 40 and 50% of the tree updates depending on the thread contention and the number of servers. The performance gain due to this (on 16 threads) was: 16 servers: 415 krps -> 440 krps (6%, contention on lbprm) 4 servers: 554 krps -> 714 krps (+29%, little contention) One point worth thinking about is that it's not logic to update the tree 2-3 times per request while it's only read once. half to 2/3 of these updates are not needed. An experiment consisting in subscribing the server to a list and letting the readers reinsert them on the fly showed further degradation instead of an improvement. A better approach would probably consist in avoinding writes to shared cache lines by having the leastconn nodes distinct from the servers, with one node per value, and having them hold an mt-list with all the servers having that number of connections. The connection count tree would then be read-mostly instead of facing heavy writes, and most write operations would be performed on 1-3 list heads which are way cheaper to migrate than a tree node, and do not require updating the last two updated neighbors' cache lines. --- src/lb_fwlc.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/src/lb_fwlc.c b/src/lb_fwlc.c index 7bbeb621b7..4e1b7307f9 100644 --- a/src/lb_fwlc.c +++ b/src/lb_fwlc.c @@ -72,10 +72,29 @@ static inline void fwlc_queue_srv(struct server *s, unsigned int eweight) */ static void fwlc_srv_reposition(struct server *s, int locked) { + unsigned int inflight = _HA_ATOMIC_LOAD(&s->served) + _HA_ATOMIC_LOAD(&s->nbpend); + unsigned int new_key = inflight ? (inflight + 1) * SRV_EWGHT_MAX / s->cur_eweight : 0; + + /* some calls will be made for no change (e.g connect_server() after + * assign_server(). Let's check that first. + */ + if (s->lb_node.node.leaf_p && s->lb_node.key == new_key) + return; + HA_RWLOCK_WRLOCK(LBPRM_LOCK, &s->proxy->lbprm.lock); if (s->lb_tree) { - fwlc_dequeue_srv(s); - fwlc_queue_srv(s, s->cur_eweight); + /* we might have been waiting for a while on the lock above + * so it's worth testing again because other threads are very + * likely to have released a connection or taken one leading + * to our target value (50% of the case in measurements). + */ + inflight = _HA_ATOMIC_LOAD(&s->served) + _HA_ATOMIC_LOAD(&s->nbpend); + new_key = inflight ? (inflight + 1) * SRV_EWGHT_MAX / s->cur_eweight : 0; + if (!s->lb_node.node.leaf_p || s->lb_node.key != new_key) { + eb32_delete(&s->lb_node); + s->lb_node.key = new_key; + eb32_insert(s->lb_tree, &s->lb_node); + } } HA_RWLOCK_WRUNLOCK(LBPRM_LOCK, &s->proxy->lbprm.lock); }