mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2025-02-04 12:32:23 +00:00
Revert "OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued"
This reverts commitb7ba1d9011
. Actually this test had already been removed in the past by commitfac0f645d
("BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe"), but the condition to reproduce the bug mentioned there was not clear. Now after analysis and a certain dose of code cleanup, things start to appear more obvious. what happens is that if we check the presence of the node in the tree without taking the lock, we can see the NULL at the instant the node is being unlinked by another thread in pendconn_process_next_strm() as part of __pendconn_unlink_prx() or __pendconn_unlink_srv(). Till now there is no issue except that the pendconn is not removed from the queue during this operation and that the task is scheduled to be woken up by pendconn_process_next_strm() with the stream being added to the list of the server's active connections by __stream_add_srv_conn(). The first thread finishes faster and gets back to stream_free() faster than the second one sets the srv_conn on the stream, so stream_free() skips the s->srv_conn test and doesn't try to dequeue the freshly queued entry. At the very least a barrier would be needed there but we can't afford to free the stream while it's being queued. So there's no other solution than making sure that either __pendconn_unlink_prx() or pendconn_cond_unlink() get the entry but never both, which is why the lock is required around the test. A possible solution would be to set p->target before unlinking the entry and using it to complete the test. This would leave no dead period where the pendconn is not seen as attached. It is possible, yet extremely difficult, to reproduce this bug, which was first noticed in bug #880. Running 100 servers with maxconn 1 and maxqueue 1 on leastconn and a connect timeout of 30ms under 16 threads with DEBUG_UAF, with a traffic making the backend's queue oscillate around zero (typically using 250 connections with a local httpterm server) may rarely manage to trigger a use-after-free. No backport is needed.
This commit is contained in:
parent
8ae8c48eb0
commit
670119955b
@ -43,16 +43,17 @@ void pendconn_unlink(struct pendconn *p);
|
||||
/* Removes the pendconn from the server/proxy queue. It supports being called
|
||||
* with NULL for pendconn and with a pendconn not in the list. It is the
|
||||
* function to be used by default when unsure. Do not call it with server
|
||||
* or proxy locks held however.
|
||||
*
|
||||
* The unlocked check on leaf_p is safe because one must own the pendconn to
|
||||
* add it, thus it may only switch from non-null to null, which allows to first
|
||||
* test if it's worth taking the lock to go further. This test is performed
|
||||
* again under a lock in pendconn_unlink() to get the final verdict.
|
||||
* or proxy locks held however. Warning: this is called from stream_free()
|
||||
* which may run concurrently with pendconn_process_next_strm() which can be
|
||||
* dequeing the entry. The function must not return until the pendconn is
|
||||
* guaranteed not to be known, which means that we must check its presence
|
||||
* in the tree under the queue's lock so that penconn_process_next_strm()
|
||||
* finishes before we return in case it would have grabbed this pendconn. See
|
||||
* github bugs #880 and #908, and the commit log for this fix for more details.
|
||||
*/
|
||||
static inline void pendconn_cond_unlink(struct pendconn *p)
|
||||
{
|
||||
if (p && p->node.node.leaf_p)
|
||||
if (p)
|
||||
pendconn_unlink(p);
|
||||
}
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user