haproxy/include
Aurelien DARRAGON 3ffbf3896d BUG/MEDIUM: httpclient/lua: fix a race between lua GC and hlua_ctx_destroy
In bb581423b ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout
before the httpclient"), a new logic was implemented to make sure that
when a lua ctx destroyed, related httpclients are correctly destroyed too
to prevent a such httpclients from being resuscitated on a destroyed lua ctx.

This was implemented by adding a list of httpclients within the lua ctx,
and a new function, hlua_httpclient_destroy_all(), that is called under
hlua_ctx_destroy() and runs through the httpclients list in the lua context
to properly terminate them.

This was done with the assumption that no concurrent Lua garbage collection
cycles could occur on the same ressources, which seems OK since the "lua"
context is about to be freed and is not explicitly being used by other threads.

But when 'lua-load' is used, the main lua stack is shared between multiple
OS threads, which means that all lua ctx in the process are linked to the
same parent stack.
Yet it seems that lua GC, which can be triggered automatically under
lua_resume() or manually through lua_gc(), does not limit itself to the
"coroutine" stack (the stack referenced in lua->T) when performing the cleanup,
but is able to perform some cleanup on the main stack plus coroutines stacks
that were created under the same main stack (via lua_newthread()) as well.

This can be explained by the fact that lua_newthread() coroutines are not meant
to be thread-safe by design.
Source: http://lua-users.org/lists/lua-l/2011-07/msg00072.html (lua co-author)

It did not cause other issues so far because most of the time when using
'lua-load', the global lua lock is taken when performing critical operations
that are known to interfere with the main stack.
But here in hlua_httpclient_destroy_all(), we don't run under the global lock.

Now that we properly understand the issue, the fix is pretty trivial:

We could simply guard the hlua_httpclient_destroy_all() under the global
lua lock, this would work but it could increase the contention over the
global lock.

Instead, we switched 'lua->hc_list' which was introduced with bb581423b
from simple list to mt_list so that concurrent accesses between
hlua_httpclient_destroy_all and hlua_httpclient_gc() are properly handled.

The issue was reported by @Mark11122 on Github #2037.

This must be backported with bb581423b ("BUG/MEDIUM: httpclient/lua: crash
when the lua task timeout before the httpclient") as far as 2.5.
2023-02-22 11:44:22 +01:00
..
haproxy BUG/MEDIUM: httpclient/lua: fix a race between lua GC and hlua_ctx_destroy 2023-02-22 11:44:22 +01:00
import CLEANUP: assorted typo fixes in the code and comments 2022-11-30 14:02:36 +01:00
make BUILD: makefile: only consider settings from enabled options 2022-12-23 17:01:55 +01:00