1
0
mirror of https://github.com/mpv-player/mpv synced 2025-02-15 19:47:32 +00:00
Commit Graph

13 Commits

Author SHA1 Message Date
wm4
fc7212b214 charset_conv: check for UTF-8 if uchardet returns unknown
When libuchardet returns an empty string, it can be either ASCII, UTF-8,
or an unknown encoding. Try to distinguish it from the unknown case by
checking for UTF-8. This avoids an annoying message, and avoids
unnecessary processing (we convert invalid UTF-8 sequences to latin1 to
workaround libavcodec's braindead UTF-8 check).
2015-12-20 20:55:24 +01:00
wm4
74c11f0c84 sub: detect charset in demuxer
Slightly simpler, and removes the need to pre-read all subtitle packets.

This still does the subtitle charset conversion on the packet level
(instead converting when parsing the file), so in theory this still
could provide a way to change the charset at runtime. But maybe even
this should be removed, as FFmpeg is somewhat likely to get its own
charset detection and conversion mechanism in the future. (Would have
to keep the subtitle file in memory to allow changing the charset on
the fly, I guess.)
2015-12-17 01:17:23 +01:00
wm4
384b13c5fd demux_libass: remove this demuxer
This loaded external .ass files via libass. libavformat's .ass reader is
now good enough, so use that instead.

Apparently libavformat still doesn't support fonts embedded into text
.ass files, but support for this has been accidentally broken in mpv for
a while anyway. (And only 1 person complained.)
2015-11-11 21:28:20 +01:00
wm4
d60270ed3d sub: fix --sub-codepage UTF-8 with fallback
Fixes e.g --sub-codepage=utf8:gb18030 if the subtitle us UTF-8.

This was broken in commit e5d31808.

Also log the detected charset in verbose mode.
2015-09-01 13:37:45 +02:00
wm4
e5d3180889 charset_conv: use our own UTF-8 check with ENCA only
Some charsets can look like valid UTF-8, but aren't UTF-8. One example
is ISO-2022-JP. While ENCA apparently likes to get misdetect real UTF-8,
this is not the case with uchardet. uchardet can detect ISO-2022-JP
correctly, but didn't even get to try, because our own UTF-8 check
succeeded. So run the UTF-8 check when using ENCA only.

Fixes #2195.
2015-08-04 18:58:58 +02:00
Jehan
e7897dfb9b charset_conv: "auto" encoding detection now uses uchardet.
If mpv is not built with uchardet, "enca" is still the fallback default
encoding detection.
2015-08-04 17:51:00 +02:00
wm4
bbbb2d9723 charset_conv: fix switched parameters
Fixes #2186.
2015-08-02 18:21:22 +02:00
wm4
a74914a057 charset_conv: add uchardet support
For now, it needs to be explicitly selected. ENCA is still the default.

This assumes uchardet returns iconv names. This doesn't seem to be
always the case, and the result are lots of iconv errors. So
explicitly check for this situation, and print a warning if it
occurs. It's entirely possible that uchardet support is actually
useless, because names are not necessarily iconv-compatible (but
uchardet doesn't seem to document whether it attempts to return
iconv-compatible names if possible).

Fixes #908.
2015-08-02 00:03:29 +02:00
wm4
11f2be2bcc charset_conv: make it possible to return an allocated string as guess
uchardet is written in C++, and thus doesn't appreciate the value of
using static strings, and internally stores the guessed charset as
allocated std::string. Add a minimal hack to deal with this. (I don't
appreciate that the code is potentially harder to understand by
returning either a static or allocated string, but I do appreciate for
not having to litter the existing code with strdups.)
2015-08-01 23:49:37 +02:00
wm4
aa1a383342 sub: add detection via BOM
Useful for Windows stuff. Actually, ENCA support should catch this, but,
well, whatever, everyone seems to hate ENCA.

Detection with BOM is trivial, although it needs some hackery to
integrate it with the existing autodetection support. For one, change
the default value of --sub-codepage to make this easier.

Probably fixes issue #937 (the second part).
2014-07-22 23:40:48 +02:00
wm4
f8c2dd1b78 build: include <strings.h> for strcasecmp()
It happens to work without strings.h on glibc or with _GNU_SOURCE, but
the POSIX standard requires including <strings.h>.

Hopefully fixes OSX build.
2014-07-10 08:29:32 +02:00
wm4
33c8fd789d charset_conv: mp_msg conversions 2013-12-21 21:43:16 +01:00
wm4
0112143fda Split mpvcore/ into common/, misc/, bstr/ 2013-12-17 02:39:45 +01:00