OPTIM: lb-leastconn: do not unlink the server if it did not change

Due to the two-phase server reservation, there are 3 calls to
fwlc_srv_reposition() per request, one during assign_server() to reserve
the slot, one in connect_server() to commit it, and one in process_stream()
to release it. However only one of the first two will change the key, so
it's needlessly costly to take the lock, remove a server and insert it
again at the same place when we can already figure we ought not to even
have taken the lock.

Furthermore, even when the server needs to move, there can be quite some
contention on the lbprm lock forcing the thread to wait. During this time
the served and nbpend server values might have changed, just like the
lb_node.key itself. Thus we measure the values again under the lock
before updating the tree. Measurements have shown that under contention
with 16 servers and 16 threads, 50% of the updates can be avoided there.

This patch makes the function compute the new key and compare it to
the current one before deciding to move the entry (and does it again
under the lock forthe second test).

This removes between 40 and 50% of the tree updates depending on the
thread contention and the number of servers. The performance gain due
to this (on 16 threads) was:

   16 servers:  415 krps -> 440 krps (6%, contention on lbprm)
    4 servers:  554 krps -> 714 krps (+29%, little contention)

One point worth thinking about is that it's not logic to update the
tree 2-3 times per request while it's only read once. half to 2/3 of
these updates are not needed. An experiment consisting in subscribing
the server to a list and letting the readers reinsert them on the fly
showed further degradation instead of an improvement.

A better approach would probably consist in avoinding writes to shared
cache lines by having the leastconn nodes distinct from the servers,
with one node per value, and having them hold an mt-list with all the
servers having that number of connections. The connection count tree
would then be read-mostly instead of facing heavy writes, and most
write operations would be performed on 1-3 list heads which are way
cheaper to migrate than a tree node, and do not require updating the
last two updated neighbors' cache lines.
This commit is contained in:
Willy Tarreau 2021-02-17 16:26:55 +01:00
parent 85b2fb0358
commit 5064ab6a98
1 changed files with 21 additions and 2 deletions

View File

@ -72,10 +72,29 @@ static inline void fwlc_queue_srv(struct server *s, unsigned int eweight)
*/
static void fwlc_srv_reposition(struct server *s, int locked)
{
unsigned int inflight = _HA_ATOMIC_LOAD(&s->served) + _HA_ATOMIC_LOAD(&s->nbpend);
unsigned int new_key = inflight ? (inflight + 1) * SRV_EWGHT_MAX / s->cur_eweight : 0;
/* some calls will be made for no change (e.g connect_server() after
* assign_server(). Let's check that first.
*/
if (s->lb_node.node.leaf_p && s->lb_node.key == new_key)
return;
HA_RWLOCK_WRLOCK(LBPRM_LOCK, &s->proxy->lbprm.lock);
if (s->lb_tree) {
fwlc_dequeue_srv(s);
fwlc_queue_srv(s, s->cur_eweight);
/* we might have been waiting for a while on the lock above
* so it's worth testing again because other threads are very
* likely to have released a connection or taken one leading
* to our target value (50% of the case in measurements).
*/
inflight = _HA_ATOMIC_LOAD(&s->served) + _HA_ATOMIC_LOAD(&s->nbpend);
new_key = inflight ? (inflight + 1) * SRV_EWGHT_MAX / s->cur_eweight : 0;
if (!s->lb_node.node.leaf_p || s->lb_node.key != new_key) {
eb32_delete(&s->lb_node);
s->lb_node.key = new_key;
eb32_insert(s->lb_tree, &s->lb_node);
}
}
HA_RWLOCK_WRUNLOCK(LBPRM_LOCK, &s->proxy->lbprm.lock);
}