The EVP_MD_CTX_create() and EVP_MD_CTX_destroy() functions were renamed to
EVP_MD_CTX_new() and EVP_MD_CTX_free() in OpenSSL 1.1.0, respectively. These
functions are used by the digest converter, introduced by the commit 8e36651ed
("MINOR: sample: Add digest and hmac converters"). So for prior versions of
openssl, macros are used to fallback on old functions.
This patch must only be backported if the commit 8e36651ed is backported too.
This one really ought to be defined in hathreads.h like all other thread
definitions, which is what this patch does. As expected, all files but
one (regex.h) were already including hathreads.h when using THREAD_LOCAL;
regex.h was fixed for this.
This was the last entry in config.h which is now useless.
The setting of CONFIG_HAP_LOCKLESS_POOLS depending on threads and
compat was done in config.h for use only in memory.h and memory.c
where other settings are dealt with. Further, the default pool cache
size was set there from a fixed value instead of being set from
defaults.h
Let's move the decision to enable lockless pools via
CONFIG_HAP_LOCKLESS_POOLS to memory.h, and set the default pool
cache size in defaults.h like other default settings.
This was the next-to-last setting in config.h.
CONFIG_HAP_MEM_OPTIM was introduced with memory pools in 1.3 and dropped
in 1.6 when pools became the only way to allocate memory. Still the
option remained present in config.h. Let's kill it.
Just like in previous patch, it happens that HA_ATOMIC_UPDATE_MIN() and
HA_ATOMIC_UPDATE_MAX() would evaluate the (val) argument up to 3 times.
However this time it affects both thread and non-thread versions. It's
strange because the copy was properly performed for the (new) argument
in order to avoid this. Anyway it was done for the "val" one as well.
A quick code inspection showed that this currently has no effect as
these macros are fairly limited in usage.
It would be best to backport this for long-term stability (till 1.8)
but it will not fix an existing bug.
When threads are disabled, HA_ATOMIC_CAS() becomes a simple compound
expression. However this expression presents a problem, which is that
its arguments are evaluated multiple times, once for the comparison
and once again for the assignement. This presents a risk of performing
some side-effect operations twice in the non-threaded case (e.g. in
case of auto-increment or function return).
The macro was rewritten using local copies for arguments like the
other macros do.
Fortunately a complete inspection of the code indicates that this case
currently never happens. It was however responsible for the strict-aliasing
warning emitted when building fd.c without threads but with 64-bit CAS.
This may be backported as far as 1.8 though it will not fix any existing
bug and is more of a long-term safety measure in case a future fix would
depend on this behavior.
I changed my mind twice on this one and pushed after the last test with
threads disabled, without re-enabling long long, causing this rightful
build warning.
This needs to be backported if the previous commit ff64d3b027 ("MINOR:
threads: export the POSIX thread ID in panic dumps") is backported as
well.
It is very difficult to map a panic dump against a gdb thread dump
because the thread numbers do not match. However gdb provides the
pthread ID but this one is supposed to be opaque and not to be cast
to a scalar.
This patch provides a fnuction, ha_get_pthread_id() which retrieves
the pthread ID of the indicated thread and casts it to an unsigned
long long so as to lose the least possible amount of information from
it. This is done cleanly using a union to maintain alignment so as
long as these IDs are stored on 1..8 bytes they will be properly
reported. This ID is now presented in the panic dumps so it now
becomes possible to map these threads. When threads are disabled,
zero is returned. For example, this is a panic dump:
Thread 1 is about to kill the process.
*>Thread 1 : id=0x7fe92b825180 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
stuck=1 prof=0 harmless=0 wantrdv=0
cpu_ns: poll=5119122 now=2009446995 diff=2004327873
curr_task=0xc99bf0 (task) calls=4 last=0
fct=0x592440(task_run_applet) ctx=0xca9c50(<CLI>)
strm=0xc996a0 src=unix fe=GLOBAL be=GLOBAL dst=<CLI>
rqf=848202 rqa=0 rpf=80048202 rpa=0 sif=EST,200008 sib=EST,204018
af=(nil),0 csf=0xc9ba40,8200
ab=0xca9c50,4 csb=(nil),0
cof=0xbf0e50,1300:PASS(0xc9cee0)/RAW((nil))/unix_stream(20)
cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)
call trace(20):
| 0x59e4cf [48 83 c4 10 5b 5d 41 5c]: wdt_handler+0xff/0x10c
| 0x7fe92c170690 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x13690
| 0x7ffce29519d9 [48 c1 e2 20 48 09 d0 48]: linux-vdso:+0x9d9
| 0x7ffce2951d54 [eb d9 f3 90 e9 1c ff ff]: linux-vdso:__vdso_gettimeofday+0x104/0x133
| 0x57b484 [48 89 e6 48 8d 7c 24 10]: main+0x157114
| 0x50ee6a [85 c0 75 76 48 8b 55 38]: main+0xeaafa
| 0x50f69c [48 63 54 24 20 85 c0 0f]: main+0xeb32c
| 0x59252c [48 c7 c6 d8 ff ff ff 44]: task_run_applet+0xec/0x88c
Thread 2 : id=0x7fe92b6e6700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=786738 now=1086955 diff=300217
curr_task=0
Thread 3 : id=0x7fe92aee5700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=828056 now=1129738 diff=301682
curr_task=0
Thread 4 : id=0x7fe92a6e4700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=818900 now=1153551 diff=334651
curr_task=0
And this is the gdb output:
(gdb) info thr
Id Target Id Frame
* 1 Thread 0x7fe92b825180 (LWP 15234) 0x00007fe92ba81d6b in raise () from /lib64/libc.so.6
2 Thread 0x7fe92b6e6700 (LWP 15235) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
3 Thread 0x7fe92a6e4700 (LWP 15237) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7fe92aee5700 (LWP 15236) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
We can clearly see that while threads 1 and 2 are the same, gdb's
threads 3 and 4 respectively are haproxy's threads 4 and 3.
This may be backported to 2.0 as it removes some confusion in github issues.
A shared tcp-check ruleset is now created to support LDAP check. This way no
extra memory is used if several backends use a LDAP check.
The following sequance is used :
tcp-check send-binary "300C020101600702010304008000"
tcp-check expect rbinary "^30" min-recv 14 \
on-error "Not LDAPv3 protocol"
tcp-check expect custom
The last expect rule relies on a custom function to check the LDAP server reply.
A share tcp-check ruleset is now created to support smtp checks. This way no
extra memory is used if several backends use a smtp check.
The following sequence is used :
tcp-check connect default linger
tcp-check expect rstring "^[0-9]{3}[ \r]" min-recv 4 \
error-status "L7RSP" on-error "%[check.payload(),cut_crlf]"
tcp-check expect rstring "^2[0-9]{2}[ \r]" min-recv 4 \
error-status "L7STS" \
on-error %[check.payload(4,0),ltrim(' '),cut_crlf] \
status-code "check.payload(0,3)"
tcp-echeck send "%[var(check.smtp_cmd)]\r\n" log-format
tcp-check expect rstring "^2[0-9]{2}[- \r]" min-recv 4 \
error-status "L7STS" \
on-error %[check.payload(4,0),ltrim(' '),cut_crlf] \
on-success "%[check.payload(4,0),ltrim(' '),cut_crlf]" \
status-code "check.payload(0,3)"
The variable check.smtp_cmd is by default the string "HELO localhost" by may be
customized setting <helo> and <domain> parameters on the option smtpchk
line. Note there is a difference with the old smtp check. The server gretting
message is checked before send the HELO/EHLO comand.
A share tcp-check ruleset is now created to support redis checks. This way no
extra memory is used if several backends use a redis check.
The following sequence is used :
tcp-check send "*1\r\n$4\r\nPING\r\n"
tcp-check expect string "+PONG\r\n" error-status "L7STS" \
on-error "%[check.payload(),cut_crlf]" on-success "Redis server is ok"
list_for_each_entry_rev() and list_for_each_entry_from_rev() and corresponding
safe versions have been added to iterate on a list in the reverse order. All
these functions work the same way than the forward versions, except they use the
.p field to move for an element to another.
The url_decode() function used by the url_dec converter and a few other
call points is ambiguous on its processing of the '+' character which
itself isn't stable in the spec. This one belongs to the reserved
characters for the query string but not for the path nor the scheme,
in which it must be left as-is. It's only in argument strings that
follow the application/x-www-form-urlencoded encoding that it must be
turned into a space, that is, in query strings and POST arguments.
The problem is that the function is used to process full URLs and
paths in various configs, and to process query strings from the stats
page for example.
This patch updates the function to differentiate the situation where
it's parsing a path and a query string. A new argument indicates if a
query string should be assumed, otherwise it's only assumed after seeing
a question mark.
The various locations in the code making use of this function were
updated to take care of this (most call places were using it to decode
POST arguments).
The url_dec converter is usually called on path or url samples, so it
needs to remain compatible with this and will default to parsing a path
and turning the '+' to a space only after a question mark. However in
situations where it would explicitly be extracted from a POST or a
query string, it now becomes possible to enforce the decoding by passing
a non-null value in argument.
It seems to be what was reported in issue #585. This fix may be
backported to older stable releases.
As reported in issue #596, the edx register isn't marked as clobbered
in div64_32(), which could technically allow gcc to try to reuse it
if it needed a copy of the 32 highest bits of the o1 register after
the operation.
Two attempts were tried, one using a dummy 32-bit local variable to
store the intermediary edx and another one switching to "=A" and making
result a long long. It turns out the former makes the resulting object
code significantly dirtier while the latter makes it better and was
kept. This is due to gcc's difficulties at working with register pairs
mixing 32- and 64- bit values on i386. It was verified that no code
change happened at all on x86_64, armv7, aarch64 nor mips32.
In practice it's only used by the frequency counters so this bug
cannot even be triggered but better fix it.
This may be backported to stable branches though it will not fix any
issue.
If haproxy fails to start and emits an alert, then it can be useful
to have it also emit the version and the path used to load it. Some
users may be mistakenly launching the wrong binary due to a misconfigured
PATH variable and this will save them some troubleshooting time when it
reports that some keywords are not understood.
What we do here is that we *try* to extract the binary name from the
AUX vector on glibc, and we report this as a NOTICE tag before the
very first alert is emitted.
Use the refcount of the SSL_CTX' to free them instead of freeing them on
certains conditions. That way we can free the SSL_CTX everywhere its
pointer is used.
The flush_lock was introduced, mostly to be sure that pool_gc() will never
dereference a pointer that has been free'd. __pool_get_first() was acquiring
the lock to, the fear was that otherwise that pointer could get free'd later,
and then pool_gc() would attempt to dereference it. However, that can not
happen, because the only functions that can free a pointer, when using
lockless pools, are pool_gc() and pool_flush(), and as long as those two
are mutually exclusive, nobody will be able to free the pointer while
pool_gc() attempts to access it.
So change the flush_lock to a spinlock, and don't bother acquire/release
it in __pool_get_first(), that way callers of __pool_get_first() won't have
to wait while the pool is flushed. The worst that can happen is we call
__pool_refill_alloc() while the pool is getting flushed, and memory can
get allocated just to be free'd.
This may help with github issue #552
This may be backported to 2.1, 2.0 and 1.9.
In the struct fdtab, introduce a new mask, running_mask. Each thread should
add its bit before using the fd.
Use the running_mask instead of a lock, in fd_insert/fd_delete, we'll just
spin as long as the mask is non-zero, to be sure we access the data
exclusively.
fd_set_running_excl() spins until the mask is 0, fd_set_running() just
adds the thread bit, and fd_clr_running() removes it.
With these debug options we still get these warnings:
include/common/memory.h:501:23: warning: null pointer dereference [-Wnull-dereference]
*(volatile int *)0 = 0;
~~~~~~~~~~~~~~~~~~~^~~
include/common/memory.h:460:22: warning: null pointer dereference [-Wnull-dereference]
*(volatile int *)0 = 0;
~~~~~~~~~~~~~~~~~~~^~~
These are purposely there to crash the process at specific locations.
But the annoying warnings do not help with debugging and they are not
even reliable as the compiler may decide to optimize them away. Let's
pass the pointer through DISGUISE() to avoid this.
It's more generic and versatile than the previous shut_your_big_mouth_gcc()
that was used to silence annoying warnings as it's not limited to ignoring
syscalls returns only. This allows us to get rid of the aforementioned
function and the shut_your_big_mouth_gcc_int variable, that started to
look ugly in multi-threaded environments.
Tim reported that BUG_ON() issues warnings on his distro, as the libc marks
some syscalls with __attribute__((warn_unused_result)). Let's pass the
write() result through DISGUISE() to hide it.
This does exactly the same as ALREADY_CHECKED() but does it inline,
returning an identical copy of the scalar variable without letting
the compiler know how it might have been transformed. This can
forcefully disable certain null-pointer checks or result checks when
known undesirable. Typically forcing a crash with *(DISGUISE(NULL))=0
will not cause a null-deref warning.
These are mostly comments in the code. A few error messages were fixed
and are of low enough importance not to deserve a backport. Some regtests
were also fixed.
Implement mt_list_to_list() and list_to_mt_list(), to be able to convert
from a struct list to a struct mt_list, and vice versa.
This is normally of no use, except for struct connection's list field, that
can go in either a struct list or a struct mt_list.
Ryan O'Hara reported that haproxy breaks on fedora-32 using gcc-10
(pre-release). It turns out that constructs such as:
while (item != head) {
item = LIST_ELEM(item.n);
}
loop forever, never matching <item> to <head> despite a printf there
showing them equal. In practice the problem is that the LIST_ELEM()
macro is wrong, it assigns the subtract of two pointers (an integer)
to another pointer through a cast to its pointer type. And GCC 10 now
considers that this cannot match a pointer and silently optimizes the
comparison away. A tested workaround for this is to build with
-fno-tree-pta. Note that older gcc versions even with -ftree-pta do
not exhibit this rather surprizing behavior.
This patch changes the test to instead cast the null-based address to
an int to get the offset and subtract it from the pointer, and this
time it works. There were just a few places to adjust. Ideally
offsetof() should be used but the LIST_ELEM() API doesn't make this
trivial as it's commonly called with a typeof(ptr) and not typeof(ptr*)
thus it would require to completely change the whole API, which is not
something workable in the short term, especially for a backport.
With this change, the emitted code is subtly different even on older
versions. A code size reduction of ~600 bytes and a total executable
size reduction of ~1kB are expected to be observed and should not be
taken as an anomaly. Typically this loop in dequeue_proxy_listeners() :
while ((listener = MT_LIST_POP(...)))
used to produce this code where the comparison is performed on RAX
while the new offset is assigned to RDI even though both are always
identical:
53ded8: 48 8d 78 c0 lea -0x40(%rax),%rdi
53dedc: 48 83 f8 40 cmp $0x40,%rax
53dee0: 74 39 je 53df1b <dequeue_proxy_listeners+0xab>
and now produces this one which is slightly more efficient as the
same register is used for both purposes:
53dd08: 48 83 ef 40 sub $0x40,%rdi
53dd0c: 74 2d je 53dd3b <dequeue_proxy_listeners+0x9b>
Similarly, retrieving the channel from a stream_interface using si_ic()
and si_oc() used to cause this (stream-int in rdi):
1cb7: c7 47 1c 00 02 00 00 movl $0x200,0x1c(%rdi)
1cbe: f6 47 04 10 testb $0x10,0x4(%rdi)
1cc2: 74 1c je 1ce0 <si_report_error+0x30>
1cc4: 48 81 ef 00 03 00 00 sub $0x300,%rdi
1ccb: 81 4f 10 00 08 00 00 orl $0x800,0x10(%rdi)
and now causes this:
1cb7: c7 47 1c 00 02 00 00 movl $0x200,0x1c(%rdi)
1cbe: f6 47 04 10 testb $0x10,0x4(%rdi)
1cc2: 74 1c je 1ce0 <si_report_error+0x30>
1cc4: 81 8f 10 fd ff ff 00 orl $0x800,-0x2f0(%rdi)
There is extremely little chance that this fix wakes up a dormant bug as
the emitted code effectively does what the source code intends.
This must be backported to all supported branches (dropping MT_LIST_ELEM
and the spoa_example parts as needed), since the bug is subtle and may
not always be visible even when compiling with gcc-10.
In MT_LIST_DEL_SAFE(), when the code was changed to use a temporary variable
instead of using the provided pointer directly, we shouldn't have changed
the code that set the pointer to NULL, as we really want the pointer
provided to be nullified, otherwise other parts of the code won't know
we just deleted an element, and bad things will happen.
This should be backported to 2.1.
The splice() syscall has been supported in glibc since version 2.5 issued
in 2006 and is present on supported systems so there's no need for having
our own arch-specific syscall definitions anymore.
This was made to support epoll on patched 2.4 kernels, and on early 2.6
using alternative libcs thanks to the arch-specific syscall definitions.
All the features we support have been around since 2.6.2 and present in
glibc since 2.3.2, neither of which are found in field anymore. Let's
simply drop this and use epoll normally.
The accept4() syscall has been present for a while now, there is no more
reason for maintaining our own arch-specific syscall implementation for
systems lacking it in libc but having it in the kernel.
This was introduced 10 years ago to squeeze a few CPU cycles per syscall
on 32-bit x86 machines and was already quite old by then, requiring to
explicitly enable support for this in the kernel. We don't even know if
it still builds, let alone if it works at all on recent kernels! Let's
completely drop this now.
We currently have two UUID generation functions, one for the sample
fetch and the other one in the SPOE filter. Both were a bit complicated
since they were made to support random() implementations returning an
arbitrary number of bits, and were throwing away 33 bits every 64. Now
we don't need this anymore, so let's have a generic function consuming
64 bits at once and use it as appropriate.
This is the replacement of failed attempt to add thread safety and
per-process sequences of random numbers initally tried with commit
1c306aa84d ("BUG/MEDIUM: random: implement per-thread and per-process
random sequences").
This new version takes a completely different approach and doesn't try
to work around the horrible OS-specific and non-portable random API
anymore. Instead it implements "xoroshiro128**", a reputedly high
quality random number generator, which is one of the many variants of
xorshift, which passes all quality tests and which is described here:
http://prng.di.unimi.it/
While not cryptographically secure, it is fast and features a 2^128-1
period. It supports fast jumps allowing to cut the period into smaller
non-overlapping sequences, which we use here to support up to 2^32
processes each having their own, non-overlapping sequence of 2^96
numbers (~7*10^28). This is enough to provide 1 billion randoms per
second and per process for 2200 billion years.
The implementation was made thread-safe either by using a double 64-bit
CAS on platforms supporting it (x86_64, aarch64) or by using a local
lock for the time needed to perform the shift operations. This ensures
that all threads pick numbers from the same pool so that it is not
needed to assign per-thread ranges. For processes we use the fast jump
method to advance the sequence by 2^96 for each process.
Before this patch, the following config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout format raw daemon
log-format "%[uuid] %pid"
redirect location /
Would produce this output:
a4d0ad64-2645-4b74-b894-48acce0669af 12987
a4d0ad64-2645-4b74-b894-48acce0669af 12992
a4d0ad64-2645-4b74-b894-48acce0669af 12986
a4d0ad64-2645-4b74-b894-48acce0669af 12988
a4d0ad64-2645-4b74-b894-48acce0669af 12991
a4d0ad64-2645-4b74-b894-48acce0669af 12989
a4d0ad64-2645-4b74-b894-48acce0669af 12990
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12987
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12992
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12986
(...)
And now produces:
f94b29b3-da74-4e03-a0c5-a532c635bad9 13011
47470c02-4862-4c33-80e7-a952899570e5 13014
86332123-539a-47bf-853f-8c8ea8b2a2b5 13013
8f9efa99-3143-47b2-83cf-d618c8dea711 13012
3cc0f5c7-d790-496b-8d39-bec77647af5b 13015
3ec64915-8f95-4374-9e66-e777dc8791e0 13009
0f9bf894-dcde-408c-b094-6e0bb3255452 13011
49c7bfde-3ffb-40e9-9a8d-8084d650ed8f 13014
e23f6f2e-35c5-4433-a294-b790ab902653 13012
There are multiple benefits to using this method. First, it doesn't
depend anymore on a non-portable API. Second it's thread safe. Third it
is fast and more proven than any hack we could attempt to try to work
around the deficiencies of the various implementations around.
This commit depends on previous patches "MINOR: tools: add 64-bit rotate
operators" and "BUG/MEDIUM: random: initialize the random pool a bit
better", all of which will need to be backported at least as far as
version 2.0. It doesn't require to backport the build fixes for circular
include files dependecy anymore.
This reverts commit 1c306aa84d.
It breaks the build on all non-glibc platforms. I got confused by the
man page (which possibly is the most confusing man page I've ever read
about a standard libc function) and mistakenly understood that random_r
was portable, especially since it appears in latest freebsd source as
well but not in released versions, and with a slightly different API :-/
We need to find a different solution with a fallback. Among the
possibilities, we may reintroduce this one with a fallback relying on
locking around the standard functions, keeping fingers crossed for no
other library function to call them in parallel, or we may also provide
our own PRNG, which is not necessarily more difficult than working
around the totally broken up design of the portable API.
As mentioned in previous patch, the random number generator was never
made thread-safe, which used not to be a problem for health checks
spreading, until the uuid sample fetch function appeared. Currently
it is possible for two threads or processes to produce exactly the
same UUID. In fact it's extremely likely that this will happen for
processes, as can be seen with this config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout daemon format raw
log-format "%[uuid] %pid"
redirect location /
It typically produces this log:
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30645
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30641
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30644
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30646
07764439-c24d-4e6f-a5a6-0138be59e7a8 30645
07764439-c24d-4e6f-a5a6-0138be59e7a8 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30643
07764439-c24d-4e6f-a5a6-0138be59e7a8 30646
b6773fdd-678f-4d04-96f2-4fb11ad15d6b 30646
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30642
07764439-c24d-4e6f-a5a6-0138be59e7a8 30642
What this patch does is to use a distinct per-thread and per-process
seed to make sure the same sequences will not appear, and will then
extend these seeds by "burning" a number of randoms that depends on
the global random seed, the thread ID and the process ID. This adds
roughly 20 extra bits of randomness, resulting in 52 bits total per
thread and per process.
It only takes a few milliseconds to burn these randoms and given
that threads start with a different seed, we know they will not
catch each other. So these random extra bits are essentially added
to ensure randomness between boots and cluster instances.
This replaces all uses of random() with ha_random() which uses the
thread-local state.
This must be backported as far as 2.0 or any version having the
UUID sample-fetch function since it's the main victim here.
It's important to note that this patch, in addition to depending on
the previous one "BUG/MEDIUM: init: initialize the random pool a bit
better", also depends on the preceeding build fixes to address a
circular dependency issue in the include files that prevented it
from building. Part or all of these patches may need to be backported
or adapted as well.
The htx_find_offset() function may be used to look for a block at a specific
offset in an HTX message, starting from the message head. A compound result is
returned, an htx_ret structure, with the found block and the position of the
offset in the block. If the offset is ouside of the HTX message, the returned
block is NULL.
The b_insert_blk() function may now be used to insert a string, given a pointer
and the string length, at an absolute offset in a buffer, moving data between
this offset and the buffer's tail just after the end of the inserted string. The
buffer's length is automatically updated. This function supports wrapping. All
the string is copied or nothing. So it returns 0 if there are not enough space
to perform the copy. Otherwise, the number of bytes copied is returned.
`istalloc` allocates memory and returns an `ist` with the size `0` that points
to this allocation.
`istfree` frees the pointed memory and clears the pointer.