BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task

Pierre Cheynier reported a rare crash that can affect stick-tables. When
a entry is created, the stick-table's expiration date is updated. But if
at exactly the same time the expiration task runs, it finishes by updating
its expiration timer without any protection, which may collide with the
call to task_queue() in another thread. In this case, it sometimes happens
that the first test for TICK_ETERNITY in task_queue() passes, then the
"expire" field is reset, then the BUG_ON() triggers, like below:

  FATAL: bug condition "task->expire == 0" matched at src/task.c:279
    call trace(13):
    |       0x649d86 [c6 04 25 01 00 00 00 00]: __task_queue+0xc6/0xce
    |       0x596bef [eb 90 ba 03 00 00 00 be]: stktable_requeue_exp+0x1ef/0x258
    |       0x596c87 [48 83 bb 90 00 00 00 00]: stktable_touch_with_exp+0x27/0x312
    |       0x563698 [48 8b 4c 24 18 4c 8b 4c]: stream_process_counters+0x3a8/0x6a2
    |       0x569344 [49 8b 87 f8 00 00 00 48]: process_stream+0x3964/0x3b4f
    |       0x64a80b [49 89 c7 e9 23 ff ff ff]: run_tasks_from_lists+0x3ab/0x566
    |       0x64ad66 [29 44 24 14 8b 7c 24 14]: process_runnable_tasks+0x396/0x71e
    |       0x6184b2 [83 3d 47 b3 a6 00 01 0f]: run_poll_loop+0x92/0x4ff
    |       0x618acf [48 8b 1d aa 20 7d 00 48]: main+0x1877ef
    | 0x7fc7d6ec1e45 [64 48 89 04 25 30 06 00]: libpthread:+0x7e45
    | 0x7fc7d6c9e4af [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a

This one is extremely difficult to reproduce in practice, but adding a
printf() in process_table_expire() before assigning the value, while
running with an expire delay of 1ms helps a lot and may trigger the
crash in less than one minute on a 8-thread machine. Interestingly,
depending on the sequencing, this bug could also have made a table fail
to expire if the expire field got reset after the last update but before
the call to task_queue(). It would require to be quite unlucky so that
the table is never touched anymore after the race though.

The solution taken by this patch is to take the table's lock when
updating its expire value in stktable_requeue_exp(), enclosing the call
to task_queue(), and to update the task->expire field while still under
the lock in process_table_expire(). Note that thanks to previous changes,
taking the table's lock for the update in stktable_requeue_exp() costs
almost nothing since we now have the guarantee that this is not done more
than 1000 times a second.

Since process_table_expire() sets the timeout after returning from
stktable_trash_expired() which just released the lock, the two functions
were merged so that the task's expire field is updated while still under
the lock. Note that this heavily depends on the two previous patches
below:

  CLEANUP: stick-table: remove the unused table->exp_next
  OPTIM: stick-table: avoid atomic ops in stktable_requeue_exp() when possible

This is a bit complicated due to the fact that in 2.7 some parts were
made lockless. In 2.6 and older, the second part (the merge of the
two functions) will be sufficient since the task_queue() call was
already performed under the table's lock, and the patches above are
not needed.

This needs to be backported as far as 1.8 scrupulously following
instructions above.
This commit is contained in:
Willy Tarreau 2022-11-14 18:02:44 +01:00
parent 3238f79d12
commit fbb934da90

View File

@ -556,6 +556,8 @@ void stktable_requeue_exp(struct stktable *t, const struct stksess *ts)
if (new_exp == old_exp)
return;
HA_RWLOCK_WRLOCK(STK_TABLE_LOCK, &t->lock);
while (new_exp != old_exp &&
!HA_ATOMIC_CAS(&t->exp_task->expire, &old_exp, new_exp)) {
__ha_cpu_relax();
@ -563,6 +565,8 @@ void stktable_requeue_exp(struct stktable *t, const struct stksess *ts)
}
task_queue(t->exp_task);
HA_RWLOCK_WRUNLOCK(STK_TABLE_LOCK, &t->lock);
}
/* Returns a valid or initialized stksess for the specified stktable_key in the
@ -653,11 +657,12 @@ struct stksess *stktable_set_entry(struct stktable *table, struct stksess *nts)
}
/*
* Trash expired sticky sessions from table <t>. The next expiration date is
* returned.
* Task processing function to trash expired sticky sessions. A pointer to the
* task itself is returned since it never dies.
*/
static int stktable_trash_expired(struct stktable *t)
struct task *process_table_expire(struct task *task, void *context, unsigned int state)
{
struct stktable *t = context;
struct stksess *ts;
struct eb32_node *eb;
int looped = 0;
@ -717,20 +722,10 @@ static int stktable_trash_expired(struct stktable *t)
/* We have found no task to expire in any tree */
exp_next = TICK_ETERNITY;
out_unlock:
task->expire = exp_next;
HA_RWLOCK_WRUNLOCK(STK_TABLE_LOCK, &t->lock);
return exp_next;
}
/*
* Task processing function to trash expired sticky sessions. A pointer to the
* task itself is returned since it never dies.
*/
struct task *process_table_expire(struct task *task, void *context, unsigned int state)
{
struct stktable *t = context;
task->expire = stktable_trash_expired(t);
return task;
}