Some charsets can look like valid UTF-8, but aren't UTF-8. One example
is ISO-2022-JP. While ENCA apparently likes to get misdetect real UTF-8,
this is not the case with uchardet. uchardet can detect ISO-2022-JP
correctly, but didn't even get to try, because our own UTF-8 check
succeeded. So run the UTF-8 check when using ENCA only.
Fixes#2195.
For now, it needs to be explicitly selected. ENCA is still the default.
This assumes uchardet returns iconv names. This doesn't seem to be
always the case, and the result are lots of iconv errors. So
explicitly check for this situation, and print a warning if it
occurs. It's entirely possible that uchardet support is actually
useless, because names are not necessarily iconv-compatible (but
uchardet doesn't seem to document whether it attempts to return
iconv-compatible names if possible).
Fixes#908.
uchardet is written in C++, and thus doesn't appreciate the value of
using static strings, and internally stores the guessed charset as
allocated std::string. Add a minimal hack to deal with this. (I don't
appreciate that the code is potentially harder to understand by
returning either a static or allocated string, but I do appreciate for
not having to litter the existing code with strdups.)
There is not much of a reason to have these wrappers around. Use POSIX
standard functions directly, and use a separate utility function to take
care of the timespec calculations. (Course POSIX for using this weird
format for time values.)
We escape only characters below 32, plus " and \. UTF-8 should be apssed
through verbatim. Since char can be signed (and usually is), the check
broke and happened to escape UTF-8 encoded bytes too. This broke UTF-8
completely.
Note that we don't check for broken or invalid UTF-8, such as described
both in the client API and IPC docs.
Fixes#1874.
BSTR_P() passes the string length and start pointer to printf-like
functions. If the lenfth is 0, the pointer can be NULL, but we're
actually still not allowed to pass a NULL pointer in any case.
This is mostly a technically, because nobody in their right mind would
attempt to specifically break such cases. But it's still undefined
behavior, and some libcs might be strict about this.
It assumed that any >\"< sequence was an escape for >"<, but that is not
the case with JSON such as >{"ducks":"\\"}<. In this case, the second
>\< is obviously not starting an escape.
The JSON parser was introduced for the IPC protocol, but I guess it's
useful here too.
The motivation for this commit is the same as with 8e4fa5fc (again).
bstr.start can be NULL when bstr.len is 0, so don't call memcmp or
strncasecmp if that's the case. Passing NULL to string functions is
invalid C, even when the length is 0, and it causes Windows to raise an
invalid parameter error.
Should fix#1155
bstr.c doesn't really deserve its own directory, and compat had just
a few files, most of which may as well be in osdep. There isn't really
any justification for these extra directories, so get rid of them.
The compat/libav.h was empty - just delete it. We changed our approach
to API compatibility, and will likely not need it anymore.
Plan 9 has a very interesting synchronization mechanism, the
rendezvous() call. A good property of this is that you don't need to
explicitly initialize and destroy a barrier object, unlike as with e.g.
POSIX barriers (which are mandatory to begin with). Upon "meeting", they
can exchange a value.
This mechanism will be nice to synchronize certain stages of
initialization between threads in the following commit.
Unlike Plan 9 rendezvous(), this is not implemented with a hashtable,
because that would require additional effort (especially if you want to
make it actually scele). Unlike the Plan 9 variant, we use intptr_t
instead of void* as type for the value, because I expect that we will be
mostly passing a status code as value and not a pointer. Converting an
integer to void* requires two cast (because the integer needs to be
intptr_t), the other way around it's only one cast.
We don't particularly care about performance in this case either. It's
simply not important for our use-case. So a simple linked list is used
for waiters, and on wakeup, all waiters are temporarily woken up.
Useful for Windows stuff. Actually, ENCA support should catch this, but,
well, whatever, everyone seems to hate ENCA.
Detection with BOM is trivial, although it needs some hackery to
integrate it with the existing autodetection support. For one, change
the default value of --sub-codepage to make this easier.
Probably fixes issue #937 (the second part).
Something like "char *s = ...; isdigit(s[0]);" triggers undefined
behavior, because char can be signed, and thus s[0] can be a negative
value. The is*() functions require unsigned char _or_ EOF. EOF is a
special value outside of unsigned char range, thus the argument to the
is*() functions can't be a char.
This undefined behavior can actually trigger crashes if the
implementation of these functions e.g. uses lookup tables, which are
then indexed with out-of-range values.
Replace all <ctype.h> uses with our own custom mp_is*() functions added
with misc/ctype.h. As a bonus, these functions are locale-independent.
(Although currently, we _require_ C locale for other reasons.)
uint_least32_t could be larger than uint32_t, so the return values of
mp_ring_get_wpos/rpos must be adjusted. Actually just use unsigned long
as type instead, because that is less awkward than uint_least32_t.
In my opinion, we shouldn't use atomics at all, but ok.
This switches the mpv code to use C11 stdatomic.h, and for compilers
that don't support stdatomic.h yet, we emulate the subset used by mpv
using the builtins commonly provided by gcc and clang.
This supersedes an earlier similar attempt by Kovensky. That attempt
unfortunately relied on a big copypasted freebsd header (which also
depended on much more highly compiler-specific functionality, defined
reserved symbols, etc.), so it had to be NIH'ed.
Some issues:
- C11 says default initialization of atomics "produces a valid state",
but it's not sure whether the stored value is really 0. But we rely on
this.
- I'm pretty sure our use of the __atomic... builtins is/was incorrect.
We don't use atomic load/store intrinsics, and access stuff directly.
- Our wrapper actually does stricter typechecking than the stdatomic.h
implementation by gcc 4.9. We make the atomic types incompatible with
normal types by wrapping them into structs. (The FreeBSD wrapper does
the same.)
- I couldn't test on MinGW.
Use the time as returned by mp_time_us() for mpthread_cond_timedwait(),
instead of calculating the struct timespec value based on a timeout.
This (probably) makes it easier to wait for a specific deadline.
The here documented guarantee might simplify code using this mechanism
a lot, because it becomes unnecessary to invent a separate mechanism to
make the mp_dispatch_queue_process loop exit after processing a dispatch
callback. (Instead, the dispatch callback can set a flag, and the caller
of mp_dispatch_queue_process can check it.)
The wakeup is needed to make mp_dispatch_queue_process() return if
suspension is not needed anymore - which is only the case when the
request count reaches 0.
The assertion added with this commit always has/had to be true.
Note that this mechanism is similarly "unreliable" as for example
pthread_cond_timedwait(). Trying to target the exact wait time will just
make it more complex.
The main use case for this is for threads which either use the dispatch
centrally and want mp_dispatch_queue_process to do a blocking wait for
new work, or threads which have to implement timers. For the former,
anything is fine, as long as they don't have to do active waiting for
new works. For the former, callers are better off recalculating their
deadline after every call.
This is much simpler, leaves fairness isues etc. to the operating
system, and will work better with threading-related debugging tools.
The "trick" to this is that the lock can be acquired and held only while
the queue is in suspend mode. This way we don't need to make sure the
lock is held outside of mp_dispatch_queue_process, which would be quite
messy to get right, because it would have to be in locked state by
default.
mp_dispatch_queue_process() releases the queue->lock mutex while
processing a dispatch callback. But this allowed mp_dispatch_lock() to
grab the "logical" lock represented by queue->locked. Grabbing the
logical lock is not a problem in itself, but it can't be allowed to
happen while the callback is still running.
Fix this by claiming the logical lock while the dispatch callback is
processed. Also make sure that the thread calling mp_dispatch_lock() is
woken up properly.
Fortunately, this didn't matter, because the locking function is unused.
This was part of osdep/threads.c out of laziness. But it doesn't contain
anything OS dependent. Note that the rest of threads.c actually isn't
all that OS dependent either (just some minor ifdeffery to work around
the lack of clock_gettime() on OSX).