charset_conv: Use CP949 instead of EUC-KR

iconv distinguishes between euc-kr and cp949, while libguess
and libuchardet doesn't (only returns euc-kr). EILSEQ occurs
when the input encoding of iconv is set to euc-kr and if the subs
contain letters not included in euc-kr. Since cp949 is a extension
of euc-kr, choose cp949 instead.

Signed-off-by: wm4 <wm4@nowhere>
This commit is contained in:
Jeong Woon Choi 2016-09-02 18:32:14 +09:00 committed by wm4
parent c72df80460
commit 875aeb0f5c
1 changed files with 5 additions and 0 deletions

View File

@ -291,6 +291,11 @@ bstr mp_iconv_to_utf8(struct mp_log *log, bstr buf, const char *cp, int flags)
if (strcasecmp(cp, "UTF-8-BROKEN") == 0)
return bstr_sanitize_utf8_latin1(NULL, buf);
// Force CP949 over EUC-KR since iconv distinguishes them and
// EUC-KR causes error on CP949 encoded data
if (strcasecmp(cp, "EUC-KR") == 0)
cp = "CP949";
iconv_t icdsc;
if ((icdsc = iconv_open("UTF-8", cp)) == (iconv_t) (-1)) {
if (flags & MP_ICONV_VERBOSE)