Normally, --subcp always forces conversion. This really always forces
conversion, even if the UTF-8 check on the input succeeds.
Extend the --subcp to allow codepages as fallback if UTF-8 doesn't
work. So, for example --subcp=utf8:cp1250 will use UTF-8 if the input
looks like UTF-8, and will fall back to use cp1250 if the UTF-8 check
fails.
I think this should actually be the default, but on the other hand,
this changes the semantics of the option, and a user would actually
expect --subcp to force conversion, rather than silently using UTF-8
if that happens to work.
Broken UTF-8 in this context means we treat it as UTF-8, but we also
interpret broken UTF-8 sequences as Latin1.
Also, run our own UTF-8 check function before the charset detectors.
This prevents from ENCA's UTF-8 check possibly messing up (like
detecting 7-bit clean UTF-8 as ASCII, or other things). It also takes
care of UTF-8 detection if no charset detector (ENCA, libguess) is
compiled in, and it lets us deal better with cut-off UTF-8 sequences.
core is used in many unix systems for core dumps. For that reason some tools
work under the assumption that the file is indeed a core dump (for example
autoconf does this).
This commit just renames the files. The following one will change all the
includes to fix compilation. This is done this way because git has a easier
time tracing file changes if there is a pure rename commit.