mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2025-03-29 22:57:06 +00:00
BUG/MAJOR: stick-tables: fix race with peers in entry expiration
In 2.9 with commit7968fe3889
("MEDIUM: stick-table: change the ref_cnt atomically") we significantly relaxed the stick-tables locking when dealing with peers by adjusting the ref_cnt atomically and moving it out of the lock. However it opened a tiny window that became problematic in 3.0-dev7 when the table's contention was lowered by commit1a088da7c2
("MAJOR: stktable: split the keys across multiple shards to reduce contention"). What happens is that some peers may access the entry for reading at the moment it's about to expire, and while the read accesses to push the data remain unnoticed (possibly that from time to time we push crap), but the releasing of the refcount causes a new write that may damage anything else. The scenario is the following: process_table_expire() peer_send_teachmsgs() RDLOCK(&updt_lock); tick_is_expired() != 0 ebmb_delete(ts->key); if (ts->upd.node.leaf_p) { HA_ATOMIC_INC(&ts->ref_cnt); RDUNLOCK(&updt_lock); WRLOCK(&updt_lock); eb32_delete(&ts->upd); } __stksess_free(t, ts); peer_send_updatemsg(ts); RDLOCK(&updt_lock); HA_ATOMIC_DEC(&ts->ref_cnt); Here it's clear that the bottom part of peer_send_teachmsgs() believes to be protected but may act on freed data. This is more visible when enabling -dMtag,no-merge,integrity because the ATOMIC_DEC(&ref_cnt) decrements one byte in the area, that makes the eviction check fail while the tag has the address of the left __stksess_free(), proving a completed pool_free() before the decrement, and the anomaly there is pretty visible in the crash dump. Changing INC()/DEC() with ADD(2)/DEC(2) shows that the byte is now off by two, confirming that the operation happened there. The solution is not very hard, it consists in checking for the ref_cnt on the left after grabbing the lock, and doing both before deleting the element, so that we have the guarantee that either the peer will not take it or that it has already started taking it. This was proven to be sufficient, as instead of crashing after 3s of injection with 4 peers, 16 threads and 130k RPS, it survived for 15mn. In order to stress the setup, a config involving 4+ peers, tracking HTTP request with randoms and applying a bwlim-out filter with a random key, with a client made of 160 h2 conns downloading 10 streams of 4MB objects in parallel managed to trigger it within a few seconds: frontend ft http-request track-sc0 rand(100000) table tbl filter bwlim-out lim-out limit 2047m key rand(100000000),ipmask(32) min-size 1 table tbl http-request set-bandwidth-limit lim-out use_backend bk backend bk server s1 198.18.0.30:8000 server s2 198.18.0.34:8000 backend tbl stick-table type ip size 1000k expire 1s store http_req_cnt,bytes_in_rate(1s),bytes_out_rate(1s) peers peers This seems to be very dependent on the timing and setup though. This will need to be backported to 2.9. This part of the code was reindented with shards but the block should remain mostly unchanged. The logic to apply is the same.
This commit is contained in:
parent
d8c2f5c586
commit
21447b1dd4
@ -878,15 +878,27 @@ struct task *process_table_expire(struct task *task, void *context, unsigned int
|
||||
continue;
|
||||
}
|
||||
|
||||
/* session expired, trash it */
|
||||
ebmb_delete(&ts->key);
|
||||
/* if the entry is in the update list, we must be extremely careful
|
||||
* because peers can see it at any moment and start to use it. Peers
|
||||
* will take the table's updt_lock for reading when doing that, and
|
||||
* with that lock held, will grab a ref_cnt before releasing the
|
||||
* lock. So we must take this lock as well and check the ref_cnt.
|
||||
*/
|
||||
if (ts->upd.node.leaf_p) {
|
||||
if (!updt_locked) {
|
||||
updt_locked = 1;
|
||||
HA_RWLOCK_WRLOCK(STK_TABLE_LOCK, &t->updt_lock);
|
||||
}
|
||||
eb32_delete(&ts->upd);
|
||||
/* now we're locked, new peers can't grab it anymore,
|
||||
* existing ones already have the ref_cnt.
|
||||
*/
|
||||
if (HA_ATOMIC_LOAD(&ts->ref_cnt))
|
||||
continue;
|
||||
}
|
||||
|
||||
/* session expired, trash it */
|
||||
ebmb_delete(&ts->key);
|
||||
eb32_delete(&ts->upd);
|
||||
__stksess_free(t, ts);
|
||||
}
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user