This is a complementary patch to 8670db7 ("BUG/MAJOR: hlua: improper lock
usage with hlua_ctx_resume()") for hlua_filter_new().
Indeed, the HLUA_E_ERRMSG case still relies on the lua stack but didn't
take the lock to do so.
This should be backported up to 2.6.
Trying to register the same lua filter from global and per-thread context
(using 'lua-load' + 'lua-load-per-thread') causes a segmentation fault in
hlua_post_init().
This is due to a simple copy paste error as we try to print the function
name in the error message (like we do when loading the same lua function
from different contexts) instead of the filter name.
This should be backported up to 2.6.
The new "ssl-security-level" option allows one to change the OpenSSL
security level without having to change the openssl.cnf global file of
your distribution. This directives applies on every SSL_CTX context.
People sometimes change their security level directly in the ciphers
directive, however there are some cases when the security level change
is not applied in the right order (for example when applying a DH
param).
Before this patch, it was to possible to trick by using a specific
openssl.cnf file and start haproxy this way:
OPENSSL_CONF=./openssl.cnf ./haproxy -f bug-2468.cfg
Values for the security level can be found there:
https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_security_level.html
This was discussed in github issue #2468.
In issue #2448, users are complaining that FIPS is not working correctly
since the removal of SSL_library_init().
This was removed because SSL_library_init() is deprecated with OpenSSL
3.x and emits a warning. But the initialization was not needed anymore
because it is done at the first openssl API call.
However it some cases it is needed. SSL_library_init() is now a define
to OPENSSL_init_ssl(0, NULL). This patch adds OPENSSL_init_ssl(0, NULL)
to the init.
This could be backported in every stable branches, however let's wait
before backporting it.
3.0-dev1 introduced a small regression with commit b4db3be86e ("BUG/MINOR:
server: fix server_find_by_name() usage during parsing"). By changing the
way servers are indexed and moving it into the server template loop, the
first one is no longer indexed because the loop starts at low+1 since it
focuses on duplication. Let's index the first one explicitly now.
This should not be backported, unless the commit above is backported.
This was not useful and was using uninitialized value. Introduced with
the commit 08ac28237 ("MINOR: Add aes_gcm_enc converter").
Must be backported wherever the commit 08ac28237 was backported.
The issue was introduced with the commit c31499d74 ("MINOR: ssl: Add
aes_gcm_dec converter").
This must be backported to all stable branches where the above converter
is present, but it may need to be adjusted for older branches because of
code refactoring.
Where possible (FreeBSD 13+), use the public, documented interface to
the ELF auxiliary argument vector: elf_aux_info().
__elf_aux_vector is a private interface exported so that the runtime
linker can set its value during process startup and not intended for
public consumption. In FreeBSD 15 it has been removed from libc and
moved to libsys.
The previous attempt removed the TLSv1.3 version for the
"ciphersuites" keywords. However it looks like the TLSv1.2 support for
SSL_CTX_set_ciphersuites() is a bug, and can have inconsistent behavior.
This patch revert the previous attempt and add explaining about this
problem and clear examples on how to configure TLSv1.2 ciphers + TLSv1.3
ciphersuites.
Revert "DOC: configuration: clarify ciphersuites usage"
This reverts commit e2a44d6c94.
This must be backported to all stable branches.
Fixes issue #2459.
This commit removes qc_treat_rx_crypto_frms(). This function was used in
a single place inside qc_ssl_provide_all_quic_data(). Besides, its
naming was confusing as conceptually it is directly linked to quic_ssl
module instead of quic_rx.
Thus, body of qc_treat_rx_crypto_frms() is inlined directly inside
qc_ssl_provide_all_quic_data(). Also, qc_ssl_provide_quic_data() is now
only used inside quic_ssl to its scope is set to static. Overall, API
for CRYPTO frame handling is now cleaner.
On CRYPTO frames reception, tasklet is rescheduled with TASK_HEAVY to
limit CPU consumption. This commit slighly simplifies this by regrouping
TASK_HEAVY setting and tasklet_wakeup() instructions in a single
location in qc_handle_crypto_frm(). All other unnecessary
tasklet_wakeup() are removed.
Till now it was still needed to write rules to eliminate bad behaving
H2 clients, while most of the time it would be desirable to just be able
to set a threshold on the level of anomalies on a connection.
This is what this patch does. By setting a glitches threshold for frontend
and backend, it allows to automatically turn a connection to the error
state when the threshold is reached so that the connection dies by itself
without having to write possibly complex rules.
One subtlety is that we still have the error state being exclusive to the
parser's state so this requires the h2c_report_glitches() function to return
a status indicating if the threshold was reached or not so that processing
can instantly stop and bypass the state update, otherwise the state could
be turned back to a valid one (e.g. after parsing CONTINUATION); we should
really contemplate the possibility to use H2_CF_ERROR for this. Fortunately
there were very few places where a glitch was reported outside of an error
path so the changes are quite minor.
Now by setting the front value to 1000, a client flooding with short
CONTINUATION frames is instantly stopped.
The function aims at centralizing counter measures but due to the fact
that it only increments the counter by one unit, sometimes it was not
used and the value was calculated directly. Let's pass the increment in
argument so that it can be used everywhere.
Released version 3.0-dev5 with the following main changes :
- BUG/MEDIUM: applet: Fix HTX .rcv_buf callback function to release outbuf buffer
- BUG/MAJOR: ssl/ocsp: crash with ocsp when old process exit or using ocsp CLI
- BUG/MEDIUM: server: fix dynamic servers initial settings
- BUG/MINOR: ssl/cli: duplicate cleaning code in cli_parse_del_crtlist
- LICENSE: event_hdl: fix GPL license version
- LICENSE: http_ext: fix GPL license version
- BUG/MEDIUM: mux-h1: Fix again 0-copy forwarding of chunks with an unknown size
- BUG/MINOR: mux-h1: Properly report when mux is blocked during a nego
- MINOR: mux-h1: Move checks performed before a shutdown in a dedicated function
- MINOR: mux-h1: Move all stuff to detach a stream in an internal function
- MAJOR: mux-h1: Drain requests on client side before shut a stream down
- MEDIUM: htx/http-ana: No longer close connection on early HAProxy response
- MINOR: quic: filter show quic by address
- MINOR: quic: specify show quic output fields
- MINOR: quic: add MUX output for show quic
- CLEANUP: mux-h2: Fix h2s_make_data() comment about the return value
- DOC: configuration: clarify ciphersuites usage
- BUG/MINOR: config/quic: Alert about PROXY protocol use on a QUIC listener
- BUG/MINOR: hlua: Fix log level to the right value when set via TXN:set_loglevel
- MINOR: hlua: Be able to disable logging from lua
- BUG/MINOR: tools: seed the statistical PRNG slightly better
- BUG/MINOR: hlua: fix unsafe lua_tostring() usage with empty stack
- BUG/MINOR: hlua: don't use lua_tostring() from unprotected contexts
- BUG/MINOR: hlua: fix possible crash in hlua_filter_new() under load
- BUG/MINOR: hlua: improper lock usage in hlua_filter_callback()
- BUG/MINOR: hlua: improper lock usage in hlua_filter_new()
- BUG/MEDIUM: hlua: improper lock usage with SET_SAFE_LJMP()
- BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume()
- BUG/MINOR: hlua: don't call ha_alert() in hlua_event_subscribe()
- MINOR: hlua: use SEND_ERR to report errors in hlua_event_runner()
- CLEANUP: hlua: txn class functions may LJMP
- BUG/MINOR: sink: fix a race condition in the TCP log forwarding code
- BUILD: thread: move lock label definitions to thread-t.h
- BUILD: tree-wide: fix a few missing includes in a few files
- BUILD: buf: make b_ncat() take a const for the source
- CLEANUP: assorted typo fixes in the code and comments
- CLEANUP: fix typo in naming for variable "unused"
- CI: run more smoke tests on config syntax to check memory related issues
- CI: enable monthly build only test on netbsd-9.3
- CI: skip scheduled builds on forks
- BUG/MINOR: ssl/cli: typo in new ssl crl-file CLI description
- BUG/MEDIUM: quic: fix connection freeze on post handshake
- BUG/MINOR: mux-quic: fix crash on aborting uni remote stream
- CLEANUP: log: fix obsolete comment for add_sample_to_logformat_list()
- CLEANUP: tree-wide: use proper ERR_* return values for PRE_CHECK fcts
- BUG/MINOR: cfgparse: report proper location for log-format-sd errors
- MINOR: vars: export var_set and var_unset functions
- MINOR: Add aes_gcm_enc converter
- BUG/MEDIUM: quic: fix handshake freeze under high traffic
- MINOR: quic: always use ncbuf for rx CRYPTO
- BUILD: ssl: define EVP_CTRL_AEAD_GET_TAG for older versions
- DOC: design: write first notes about ring-v2
- OPTIM: sink: try to merge "dropped" messages faster
- OPTIM: sink: drop the sink lock used to count drops
- DEV: haring: make haring not depend on the struct ring itself
- DEV: haring: split the code between ring and buffer
- DEV: haring: automatically use the advertised ring header size
- BUILD: solaris: fix compilation errors
Compilation on solaris fails because of usage of names reserved on that
platform, i.e. 'queue' and 's_addr'.
This patch redefines 'queue' as '_queue' and renames 's_addr' to
'srv_addr' which fixes compilation for now.
Future plan: rename 'queue' in code base so define can be removed again.
Backporting: 2.9, 2.8
Instead of emitting a warning, since we don't need the ring struct
anymore, we can just read what we need, parse the buffer and use the
advertised offset. Thus for now -f is simply ignored.
By splitting the initialization and the parsing of the ring, we'll ease
the support for multiple ring sizes and get rid of the annoyances of the
optional lock.
haring needs to be self-sufficient about the ring format so that it continues
to build when the ring API changes. Let's import the struct ring definition
and call it "ring_v1".
The sink lock was made to prevent event producers from passing while
there were other threads trying to print a "dropped" message, in order
to guarantee the absence of reordering. It has a serious impact however,
which is that all threads need to take the read lock when producing a
regular trace even when there's no reader.
This patch takes a different approach. The drop counter is shifted left
by one so that the lowest bit is used to indicate that one thread is
already taking care of trying to dump the counter. Threads only read
this value normally, and will only try to change it if it's non-null,
in which case they'll first check if they are the first ones trying to
dump it, otherwise will simply count another drop and leave. This has
a large benefit. First, it will avoid the locking that causes stalls
as soon as a slow reader is present. Second, it avoids any write on the
fast path as long as there's no drop. And it remains very lightweight
since we just need to add +2 or subtract 2*dropped in operations, while
offering the guarantee that the sink_write() has succeeded before
unlocking the counter.
While a reader was previously limiting the traffic to 11k RPS under
4C/8T, now we reach 36k RPS vs 14k with no reader, so readers will no
longer slow the traffic down and will instead even speed it up due to
avoiding the contention down the chain in the ring. The locking cost
dropped from ~75% to ~60% now (it's in ring_write now).
When a reader doesn't read fast enough and causes drops, subsequent
threads try to produce a "dropped" message. But it takes time to
produce and emit this message, in part due to the use of chunk_printf()
that relies on vfprintf() which has to parse the printf format, and
during this time other threads may continue to increment the counter.
This is the reason why this is currently performed in a loop. When
reading what is received, it's common to see a large count followed
by one or two single-digit counts, indicating that we could possibly
have improved that by writing faster.
Let's improve the situation a little bit. First we're now using a
static message prefixed with enough space to write the digits, and a
call to ultoa_r() fills these digits from right to left so that we
don't have to process a format string nor perform a copy of the message.
Second, we now re-check the counter immediately after having prepared
the message so that we still get an opportunity for updating it. In
order to avoid too long loops, this is limited to 10 iterations.
Tests show that the number of single-digit "dropped" counters on output
now dropped roughly by 15-30%. Also, it was observed that with 8 threads,
there's almost never more than one retry.
Amaury reported that previous commit 08ac282375 ("MINOR: Add aes_gcm_enc
converter") broke the CI on OpenSSL 1.0.2 due to the define above not
existing there. Let's just map it to its older name when not existing.
For reference, these were renamed when switching to 1.1.0:
https://marc.info/?l=openssl-cvs&m=142244190907706&w=2
No backport is needed.
The previous patch fix the handling of in-order CRYPTO frames which
requires the usage of a new buffer for these data as their handling is
delayed to run under TASK_HEAVY.
In fact, as now all CRYPTO frames handling must be delayed, their
handling can be unify. This is the purpose of this commit, which removes
the just introduced new buffer. Now, all CRYPTO frames are buffered
inside the ncbuf. Unused elements such as crypto_frms member for
encryption level are also removed.
This commit is not a bugcfix but is a direct follow-up to the last one.
As such, it can probably be backported with it to 2.9 to reduce code
differences between these versions.
QUIC relies on SSL_do_hanshake() to be able to validate handshake. As
this function is computation heavy, it is since 2.9 called only under
TASK_HEAVY. This has been implemented by the following patch :
94d20be138
MEDIUM: quic: Heavy task mode during handshake
Instead of handling CRYPTO frames immediately during reception, this
patch delays the process to run under TASK_HEAVY tasklet. A frame copy
is stored in qel.rx.crypto_frms list. However, this frame still
reference the receive buffer. If the receive buffer is cleared before
the tasklet is rescheduled, it will point to garbage data, resulting in
haproxy decryption error. This happens if a fair amount of data is
received constantly to preempt the quic_conn tasklet execution.
This bug can be reproduced with a fair amount of clients. It is
exhibited by 'show quic full' which can report connections blocked on
handshake. Using the following commands result in h2load non able to
complete the last connections.
$ h2load --alpn-list h3 -t 8 -c 800 -m 10 -w 10 -n 8000 "https://127.0.0.1:20443/?s=10k"
Also, haproxy QUIC listener socket mode was active to trigger the issue.
This forces several connections to share the same reception buffer,
rendering the bug even more plausible to occur. It should be possible to
reproduce it with connection socket if increasing the clients amount.
To fix this bug, define a new buffer under quic_cstream. It is used
exclusively to copy CRYPTO data for in-order frame if ncbuf is empty.
This ensures data remains accessible even if receive buffer is cleared.
Note that this fix is only a temporary step. Indeed, a ncbuf is also
already used for out-of-order data. It should be possible to unify its
usage for both in and out-of-order data, rendering this new buffer
instance unnecessary. In this case, several unneeded elements will
become obsolete such as qel.rx.crypto_frms list. This will be done in a
future refactoring patch.
This must be backported up to 2.9.
The converter can be used to encrypt the raw byte input using the
AES-GCM algorithm, using provided nonce and key.
Co-authored-by: Dragan Dosen (ddosen@haproxy.com)
When a parsing error occurs inside a log-format-sd expression, we report
the location of the log-format directive (which may not be set) instead
of reporting the proper log-format-sd directive location where the parsing
error occured.
1|listen test
2| log-format "%B" # no error
3| log-format-sd "%bad" # error
| [ALERT] (322261) : config : Parsing [empty.conf:2]: failed to parse log-format-sd : no such format variable 'bad'. If you wanted to emit the '%' character verbatim, you need to use '%%'.
The fix consists in using the config hints dedicated to log-format-sd
directive instead of the log-format one.
The bug was introduced in 8a4e4420 ("MEDIUM: log-format: Use standard
HAProxy log system to report errors").
This should be backported to every stable versions.
httpclient_precheck(), ssl_ocsp_update_precheck(), and
resolvers_create_default() functions are registered through
REGISTER_PRE_CHECK() macro to be called by haproxy during init from the
pre_check_list list. When calling functions registered in pre_check_list,
haproxy expects ERR_* return values. However those 3 functions currently
use raw return values, so we better use explicit ERR_* macros to prevent
breakage in the future if ERR_* values mapping were to change.
Since 833cc794 ("MEDIUM: sample: handle comma-delimited converter list")
logformat expressions now support having a comma-delimited converter list
right after the fetch. Let's remove a leftover comment from the initial
implementation that says otherwise.
A remote unidirectional stream can be aborted prematurely if application
layers cannot identify its type. In this case, a STOP_SENDING frame is
emitted.
Since QUIC MUX refactoring, a crash would occur in this scenario due to
2 specific characteristics of remote uni streams :
* qcs.tx.fctl was not initialized completely. This cause a crash due to
BUG_ON() statement inside qcs_destroy().
* qcs.stream is never allocated. This caused qcs_prep_bytes() to crash
inside qcc_io_send().
This bug is considered minor as it happens only on very specific QUIC
clients. It was detected when using s2n-quic over interop.
This does not need to be backported.
After handshake completion, QUIC server is responsible to emit
HANDSHAKE_DONE frame. Some clients wait for it to begin STREAM
transfers.
Previously, there was no explicit tasklet_wakeup() after handshake
completion, which is necessary to emit post-handshake frames. In most
cases, this was undetected as most client continue emission which will
reschedule the tasklet. However, as there is no tasklet_wakeup(), this
is not a consistent behavior. If this bug occurs, it causes a connection
freeze, preventing the client to emit any request. The connection is
finally closed on idle timeout.
To fix this, add an explicit tasklet_wakeup() after handshake
completion. It sounds simple enough but in fact it's difficult to find
the correct location efor tasklet_wakeup() invocation, as post-handshake
is directly linked to connection accept, with different orderings.
Notably, if 0-RTT is used, connection can be accepted prior handshake
completion. Another major point is that along HANDSHAKE_DONE frame, a
series of NEW_CONNECTION_ID frames are emitted. However, these new CIDs
allocation must occur after connection is migrated to its new thread as
these CIDs are tied to it. A BUG_ON() is present to check this in
qc_set_tid_affinity().
With all this in mind, 2 locations were selected for the necessary
tasklet_wakeup() :
* on qc_xprt_start() : this is useful for standard case without 0-RTT.
This ensures that this is done only after connection thread migration.
* on qc_ssl_provide_all_quic_data() : this is done on handshake
completion with 0-RTT used. In this case only, connection is already
accepted and migrated, so tasklet_wakeup() is safe.
Note that as a side-change, quic_accept_push_qc() API has evolved to
better reflect differences between standard and 0-RTT usages. It is now
forbidden to call it multiple times on a single quic_conn instance. A
BUG_ON() has been added.
This issue is labelled as medium even though it seems pretty rare. It
was only reproducible using QUIC interop runner, with haproxy compiled
with LibreSSL with quic-go as client. However, affected code parts are
pretty sensible, which justify the chosen severity.
This should fix github issue #2418.
It should be backported up to 2.6, after a brief period of observation.
Note that the extra comment added in qc_set_tid_affinity() can be
removed in 2.6 as thread migration is not implemented for this version.
Other parts should apply without conflict.
tracking bleeding edge changes with some rare platforms or modern
compilers on scheduled basis is not what usually forks do. let's
skip by default in forks, if some fork is interested, it might be
enabled locally
In resolvers.c:rslv_promex_next_ts() and in
stick-tables.c:stk_promex_next_ts(), an unused argument was mistakenly
called "unsued" instead of "unused". Let's fix this in a separate patch
so that it can be omitted from backports if this causes build problems.
This is 39th iteration of typo fixes
The naming issue on the argument called "unsued" instead of "unused"
in two functions from resolvers and stick-tables was put into a second
patch so that it can be omitted if it were to cause backport issues.
In 2.7 with commit 35df34223b ("MINOR: buffers: split b_force_xfer() into
b_cpy() and b_force_xfer()"), b_ncat() was extracted from b_force_xfer()
but kept its source variable instead of constant, making it unusable for
calls from a const source. Let's just fix it.
Some include files, mostly types definitions, are missing a few includes
to define the types they're using, causing include ordering dependencies
between files, which are most often not seen due to the alphabetical
order of includes. Let's just fix them.
These were spotted by building pre-compiled headers for all these files
to .h.gch.
That's exactly the same as commit 53bfab080c ("BUG/MINOR: sink: fix a race
condition between the writer and the reader") that went into 2.7 and was
backported as far as 2.4, except that since the code was duplicated, the
second instance was not noticed, leaving the race present. The race has
a limited impact, if a forwarder reaches the end of the logs and a new
message arrives before it leaves, the forwarder will only wake up after
yet another new message will be sent. In practice it remains unnoticeable
because for the race to trigger, one needs to have a steady flow of logs,
which means the wakeup will happen anyway.
This should be backported, but no need to insist on it if it resists.
Instead of reporting lua errors using ha_alert(), let's use SEND_ERR()
helper which will also try to generate a log message according to lua
log settings.
hlua_event_subscribe() is meant to be called from a protected lua env
during init and/or runtime. As such, only hlua_event_sub() makes uses
of it: when an error happens hlua_event_sub() will already raise a Lua
exception. Thus it's not relevant to use ha_alert() there as it could
generate log pollution (error is relevant from Lua script point of view,
not from haproxy one).
This could be backported in 2.8.
hlua_ctx_resume() itself can safely be used as-is in a multithreading
context because it takes care of taking the lua lock.
However, when hlua_ctx_resume() returns, the lock is released and it is
thus the caller's responsibility to ensure it owns the lock prior to
performing additional manipulations on the Lua stack. Unfortunately, since
early haproxy lua implementation, we used to do it wrong:
The most common hlua_ctx_resume() pattern we can find in the code (because
it was duplicated over and over over time) is the following:
|ret = hlua_ctx_resume()
|switch (ret) {
| case HLUA_E_OK:
| break;
| case HLUA_E_ERRMSG:
| break;
| [...]
|}
Problem is: for some of the switch cases, we still perform lua stack
manipulations. This is the case for the HLUA_E_ERRMSG for instance where
we often use lua_tostring() to retrieve last lua error message on the top
of the stack, or sometimes for the HLUA_E_OK case, when we need to perform
some lua cleanup logic once the resume ended. But all of this is done
WITHOUT the lua lock, so this means that the main lua stack could be
accessed simultaneously by concurrent threads when a script was loaded
using 'lua-load'.
While it is not critical for switch-cases dedicated to error handling,
(those are not supposed to happen very often), it can be very problematic
for stack manipulations occuring in the HLUA_E_OK case under heavy load
for instance. In this case, main lua stack corruptions will eventually
happen. This is especially true inside hlua_filter_new(), where this bug
was known to cause lua stack corruptions under load, leading to lua errors
and even crashing the process as reported by @bgrooot in GH #2467.
The fix is relatively simple, once hlua_ctx_resume() returns: we should
consider that ANY lua stack access should be lua-lock protected. If the
related lua calls may raise lua errors, then (RE)SET_SAFE_LJMP
combination should be used as usual (it allows to lock the lua stack and
catch lua exceptions at the same time), else hlua_{lock,unlock} may be
used if no exceptions are expected.
This patch should fix GH #2467.
It should be backported to all stable versions.
[ada: some ctx adj will be required for older versions as event_hdl
doesn't exist prior to 2.8 and filters were implemented in 2.5, thus
some chunks won't apply]
When we want to perform some unsafe lua stack manipulations from an
unprotected lua environment, we use SET_SAFE_LJMP() RESET_SAFE_LJMP()
combination to lock lua stack and catch potential lua exceptions that
may occur between the two.
Hence, we regularly find this pattern (duplicated over and over):
|if (!SET_SAFE_LJMP(hlua)) {
| const char *error;
|
| if (lua_type(hlua->T, -1) == LUA_TSTRING)
| error = hlua_tostring_safe(hlua->T, -1);
| else
| error = "critical error";
| SEND_ERR(NULL, "*: %s.\n", error);
|}
This is wrong because when SET_SAFE_LJMP() returns false (meaning that an
exception was caught), then the lua lock was released already, thus the
caller is not expected to perform lua stack manipulations (because the
main lua stack may be shared between multiple threads). In the pattern
above we only want to retrieve the lua exception message which may be
found at the top of the stack, to do so we now explicitly take the lua
lock before accessing the lua stack. Note that hlua_lock() doesn't catch
lua exceptions so only safe lua functions are expected to be used there
(lua functions that may NOT raise exceptions).
It should be backported to every stable versions.
[ada: some ctx adj will be required for older versions as event_hdl
doesn't exist prior to 2.8 and filters were implemented in 2.5, thus
some chunks won't apply, but other fixes should stay relevant]
In hlua_filter_new(), after each hlua resume, we systematically try to
empty the stack by calling lua_settop(). However we're doing this without
locking the lua context, so it is unsafe in multithreading context if the
script is loaded using 'lua-load'. To fix the issue, we protect the call
with hlua_{lock,unlock}() helpers.
This should be backported up to 2.6.
In hlua_filter_callback(), some lua stack work is performed under
SET_SAFE_LJMP() guard which also takes care of locking the hlua context
when needed. However, a lua_gettop() call is performed out of the guard,
thus it is unsafe in multithreading context if the script is loaded using
'lua-load' because in this case the main lua stack is shared between
threads and each access to a lua stack must be performed under the lock,
thus we move lua_gettop() call under the lock.
It should be backported up to 2.6.