sub: add subtitle charset conversion
This code was once part of subreader.c, then traveled to libass, and now
made its way back to the fork of the fork of the original code, MPlayer.
It works pretty much the same as subreader.c, except that we have to
concatenate some packets to do auto-detection. This is rather annoying,
but for all we know the actual source file could be a binary format.
Unlike subreader.c, the iconv context is reopened on each packet. This
is simpler, and with respect to multibyte encodings, more robust.
Reopening is probably not a very fast, but I suspect subtitle charset
conversion is not an operation that happens often or has to be fast.
Also, this auto-detection is disabled for microdvd - this is the only
format we know that has binary data in its packets, but is actually
decoded to text. FFmpeg doesn't really allow us to solve this properly,
because a) the input packets can be binary, and b) the output will be
checked whether it's UTF-8, and if it's not, the output is thrown away
and an error message is printed. We could just recode the decoded
subtitles before sd_ass if it weren't for that.
2013-06-23 20:15:04 +00:00
|
|
|
#ifndef MP_CHARSET_CONV_H
|
|
|
|
#define MP_CHARSET_CONV_H
|
|
|
|
|
|
|
|
#include <stdbool.h>
|
2013-12-17 01:39:45 +00:00
|
|
|
#include "bstr/bstr.h"
|
sub: add subtitle charset conversion
This code was once part of subreader.c, then traveled to libass, and now
made its way back to the fork of the fork of the original code, MPlayer.
It works pretty much the same as subreader.c, except that we have to
concatenate some packets to do auto-detection. This is rather annoying,
but for all we know the actual source file could be a binary format.
Unlike subreader.c, the iconv context is reopened on each packet. This
is simpler, and with respect to multibyte encodings, more robust.
Reopening is probably not a very fast, but I suspect subtitle charset
conversion is not an operation that happens often or has to be fast.
Also, this auto-detection is disabled for microdvd - this is the only
format we know that has binary data in its packets, but is actually
decoded to text. FFmpeg doesn't really allow us to solve this properly,
because a) the input packets can be binary, and b) the output will be
checked whether it's UTF-8, and if it's not, the output is thrown away
and an error message is printed. We could just recode the decoded
subtitles before sd_ass if it weren't for that.
2013-06-23 20:15:04 +00:00
|
|
|
|
|
|
|
enum {
|
|
|
|
MP_ICONV_VERBOSE = 1, // print errors instead of failing silently
|
|
|
|
MP_ICONV_ALLOW_CUTOFF = 2, // allow partial input data
|
2013-08-15 17:29:42 +00:00
|
|
|
MP_STRICT_UTF8 = 4, // don't fall back to UTF-8-BROKEN when guessing
|
sub: add subtitle charset conversion
This code was once part of subreader.c, then traveled to libass, and now
made its way back to the fork of the fork of the original code, MPlayer.
It works pretty much the same as subreader.c, except that we have to
concatenate some packets to do auto-detection. This is rather annoying,
but for all we know the actual source file could be a binary format.
Unlike subreader.c, the iconv context is reopened on each packet. This
is simpler, and with respect to multibyte encodings, more robust.
Reopening is probably not a very fast, but I suspect subtitle charset
conversion is not an operation that happens often or has to be fast.
Also, this auto-detection is disabled for microdvd - this is the only
format we know that has binary data in its packets, but is actually
decoded to text. FFmpeg doesn't really allow us to solve this properly,
because a) the input packets can be binary, and b) the output will be
checked whether it's UTF-8, and if it's not, the output is thrown away
and an error message is printed. We could just recode the decoded
subtitles before sd_ass if it weren't for that.
2013-06-23 20:15:04 +00:00
|
|
|
};
|
|
|
|
|
2013-08-15 21:13:10 +00:00
|
|
|
bool mp_charset_is_utf8(const char *user_cp);
|
sub: add subtitle charset conversion
This code was once part of subreader.c, then traveled to libass, and now
made its way back to the fork of the fork of the original code, MPlayer.
It works pretty much the same as subreader.c, except that we have to
concatenate some packets to do auto-detection. This is rather annoying,
but for all we know the actual source file could be a binary format.
Unlike subreader.c, the iconv context is reopened on each packet. This
is simpler, and with respect to multibyte encodings, more robust.
Reopening is probably not a very fast, but I suspect subtitle charset
conversion is not an operation that happens often or has to be fast.
Also, this auto-detection is disabled for microdvd - this is the only
format we know that has binary data in its packets, but is actually
decoded to text. FFmpeg doesn't really allow us to solve this properly,
because a) the input packets can be binary, and b) the output will be
checked whether it's UTF-8, and if it's not, the output is thrown away
and an error message is printed. We could just recode the decoded
subtitles before sd_ass if it weren't for that.
2013-06-23 20:15:04 +00:00
|
|
|
bool mp_charset_requires_guess(const char *user_cp);
|
2013-08-15 17:29:42 +00:00
|
|
|
const char *mp_charset_guess(bstr buf, const char *user_cp, int flags);
|
sub: add subtitle charset conversion
This code was once part of subreader.c, then traveled to libass, and now
made its way back to the fork of the fork of the original code, MPlayer.
It works pretty much the same as subreader.c, except that we have to
concatenate some packets to do auto-detection. This is rather annoying,
but for all we know the actual source file could be a binary format.
Unlike subreader.c, the iconv context is reopened on each packet. This
is simpler, and with respect to multibyte encodings, more robust.
Reopening is probably not a very fast, but I suspect subtitle charset
conversion is not an operation that happens often or has to be fast.
Also, this auto-detection is disabled for microdvd - this is the only
format we know that has binary data in its packets, but is actually
decoded to text. FFmpeg doesn't really allow us to solve this properly,
because a) the input packets can be binary, and b) the output will be
checked whether it's UTF-8, and if it's not, the output is thrown away
and an error message is printed. We could just recode the decoded
subtitles before sd_ass if it weren't for that.
2013-06-23 20:15:04 +00:00
|
|
|
bstr mp_charset_guess_and_conv_to_utf8(bstr buf, const char *user_cp, int flags);
|
|
|
|
bstr mp_iconv_to_utf8(bstr buf, const char *cp, int flags);
|
|
|
|
|
|
|
|
#endif
|