Commit Graph

42 Commits

Author SHA1 Message Date
wm4 13624b5c7a stream_libarchive: disable tar support
Unfortunately, libarchive detects a stream of 0s as tar, as demonstrated
by "mpv /dev/zero". This is inconvenient in some cases.

One example is the .cue demuxer trying to open a raw audio .bin file,
which it allows only if probing fails (as .bin is raw and normally will
not look like any real file format). Although this use-case is
worthless.
2020-02-02 17:35:57 +01:00
wm4 65f3c7453d stream_libarchive: more broken garbage
Why the fuck am I even bothering with this crap?
2020-01-20 19:58:51 +01:00
wm4 3ef0754102 Revert "stream_libarchive: remove "old" rar volume pattern"
This reverts commit 1b0129c414.

It turns out most of the files affected by the idiotic use-case actually
use this old naming pattern, which I hoped was unused.

This means for now we'll always assume .rar files are multi-part (until
proven otherwise), but the following commit tries to fix this.
2020-01-20 19:58:51 +01:00
wm4 90c11fa729 stream_libarchive: do not require leading / in archive URLs
The / was added some time ago, because it simplifies some other things.
But there is actually no reason to reject old URLs.
2020-01-19 19:26:51 +01:00
wm4 33e999de82 stream_libarchive: fix unnecessarily opening all volumes on opening
Seems like I'm still not done with rar playback stuff...

It turns out the reason for archive_read_open1() opening all volumes had
nothing to do with libarchive's rar code, but was a consequence of how
multi volume support is implemented in libarchive, and due to the fact
that we enabled archive_read_support_format_zip_seekable() (through
archive_read_support_format_zip()).

The seekable zip format will seek to the end of the file and search for
a zip "header" there. It could possibly be considered a libarchive bug
that it does that even if it's fairly sure that it's a RAR file.

We already do probing on a small buffer read from the start of the file
(i.e. not giving libarchive a way to seek the stream before we think
it's an archive), but that does not help, since libarchive needs to
probe _again_. libarchive does not seem to provide a function to query
the format (no archive_read_get_format()). Which seems quite strange,
but at least I didn't find one.

This commit works this around by doing some manual rar/zip probing. We
could have gone only with rar probing. But detecting zip separately
allows us to avoid that stream_libarchive seeks to the end during early
probing. This is an additional bonus on top of "fixing" multi volume
rar.

The zip probing is from archive_read_format_zip_streamable_bid(). The
rar signature is the common prefix of the rar and rar5 formats in
libarchive (presumably the RAR fixed header parts without version).

If the demuxer seeks to the end of the rar entry, this will still open
all volumes; I'm not sure whether the old/removed rar code in mpv could
handle this better.

See: #7182
2020-01-09 02:25:13 +01:00
wm4 28650e116a stream_libarchive: enable anger management
Well that was too much misery when trying to deal with an idiotic
feature, so it had to be compensated for.
Replace insults with proper explanation, libarchive sort of isn't guilty
in the first place, and their format support is pretty good all things
considered.
2020-01-07 15:32:27 +01:00
wm4 1b0129c414 stream_libarchive: remove "old" rar volume pattern
This turned every "normal" .rar filename into a multi-volume one
accidentally. Since the detection is purely by filename (due to lack of
support by libarchive I guess), it causes the nasty message added in the
previous commit to appear for every .rar file. Just drop it.
2020-01-04 20:49:00 +01:00
wm4 119bad4daa stream_libarchive: add annoying message regarding multi-volume archives
I'm done.
2020-01-04 20:00:43 +01:00
wm4 1b283f6b60 libarchive: some shitty hack to make opening slightly faster
See manpage additions. The libarchive behavior mentioned in the last
paragraph there is technically unrelated, but makes this new option
mostly pointless.

See: #7182
2020-01-04 19:56:09 +01:00
wm4 4231ce6f5f stream_libarchive: log each opened volume
To annoy the user.
2020-01-04 19:48:55 +01:00
wm4 4419d29bb0 stream_libarchive: remove unnecessary string list of volumes
Just add the entries as volumes directly.
2020-01-04 19:31:08 +01:00
wm4 04bde06095 stream_libarchive: some more hacks to improve multi-volume archives
Instead of opening every volume on start just to see if it's there, all
all volumes that could possibly exist, and "handle" it on opening. This
requires working around some of libarchive's amazing stupidity and using
some empirically determined behavior. Will possibly break if libarchive
changes some of this behavior.

See: #7182
2020-01-04 18:59:23 +01:00
wm4 657ce1b15c stream_libarchive: enable rar5 support
We whitelist formats (and not all of them). RAR v5 is a separated format
entry for inexplicable reasons. (It's a separate implementation, but who
really wants to support only either of the rar formats?)

I'm not sure if it was libarchive 3.3.3. Their git history is absolutely
chaotic. These people do not know how to use git. I couldn't be bothered
to dig deeper.
2020-01-04 17:15:09 +01:00
wm4 1cb9e7efb8 stream, demux: redo origin policy thing
mpv has a very weak and very annoying policy that determines whether a
playlist should be used or not. For example, if you play a remote
playlist, you usually don't want it to be able to read local filesystem
entries. (Although for a media player the impact is small I guess.)

It's weak and annoying as in that it does not prevent certain cases
which could be interpreted as bad in some cases, such as allowing
playlists on the local filesystem to reference remote URLs. It probably
barely makes sense, but we just want to exclude some other "definitely
not a good idea" things, all while playlists generally just work, so
whatever.

The policy is:
- from the command line anything is played
- local playlists can reference anything except "unsafe" streams
  ("unsafe" means special stream inputs like libavfilter graphs)
- remote playlists can reference only remote URLs
- things like "memory://" and archives are "transparent" to this

This commit does... something. It replaces the weird stream flags with a
slightly clearer "origin" value, which is now consequently passed down
and used everywhere. It fixes some deviations from the described policy.

I wanted to force archives to reference only content within them, but
this would probably have been more complicated (or required different
abstractions), and I'm too lazy to figure it out, so archives are now
"transparent" (playlists within archives behave the same outside).

There may be a lot of bugs in this.

This is unfortunately a very noisy commit because:
- every stream open call now needs to pass the origin
- so does every demuxer open call (=> params param. gets mandatory)
- most stream were changed to provide the "origin" value
- the origin value needed to be passed along in a lot of places
- I was too lazy to split the commit

Fixes: #7274
2019-12-20 13:00:39 +01:00
wm4 572c32abbe libarchive: prefix entry names in archive URLs with '/'
This has the advantage that playlists within the archive will work as
expected, because demux_playlist will correctly join the archive base
URL and entry name. Before this change, it could skip before the "|",
resulting in a broken URL.
2019-12-20 08:35:08 +01:00
wm4 ac7f67b3f2 demux_mkv, stream: attempt to improve behavior in unseekable streams
stream_skip() semantics were kind of bad, especially after the recent
change to the stream code. Forward stream_skip() calls could still
trigger a seek and fail, even if it was supposed to actually skip data.
(Maybe the idea that stream_skip() should try to seek is worthless in
the first place.)

Rename it to stream_seek_skip() (takes absolute position now because I
think that's better), and make it always skip if the stream is marked as
forward.

While we're at it, make EOF detection more robust. I guess s->eof
shouldn't exist at all, since it's valid only "sometimes". It should be
removed... but not today. A 1-byte stream_read_peek() call is good to
get the s->eof flag set to a correct value.
2019-11-14 12:59:14 +01:00
wm4 e5a9b792ec stream: replace STREAM_CTRL_GET_SIZE with a proper entrypoint
This is overlay convoluted as a stream control, and important enough to
warrant "first class" functionality.
2019-11-07 22:53:13 +01:00
wm4 d3479018db stream: change buffer argument types from char* to void*
This is slightly better, although not much, and ultimately doesn't
matter.

The public API in stream_cb.h also uses char*, but can't change that.
2019-11-07 22:53:13 +01:00
James Hilliard abfc58cad4 stream_libarchive: Always use LC_CTYPE_MASK for libarchive
Using LC_ALL_MASK is unnecessary and unreliable on some systems.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
2019-09-21 12:53:47 +02:00
wm4 f77515ebaf stream_libarchive: remove base filename stuff
Apparently this was so that when playing a video file from a .rar file,
it would load external subtitles with the same name (instead of looking
for mpv's rar:// mangled URL). This was requested on github almost 5
years ago. Seems like a weird feature, and I don't care. Drop it,
because it complicates some in progress change.
2019-09-19 20:37:04 +02:00
wm4 8ca4386366 stream_libarchive: fix another crash with broken rar files
libarchive (sometimes affectionately called libcve) has this annoying
behavior that if after a "fatal" error, you do any operation on the
archive context other than querying the error and closing the context,
you get a free CVE. So we close the archive context in these situations.
This can set p->mpa to NULL, so code accessing this field needs to be
careful.

This was not considered in a certain code path, and a simple truncated
.rar file made it crash. Part of the problem was that the file inside
the rar was a mkv file, which triggered seeking when the demux_mkv
resync code encountered bogus data.

This is probably a regression from a relatively recent change to this
code (in any case mpv 0.29.1 doesn't crash).

Fix this by adding the check.

There's also a mechanism to reopen an archive context used to emulate
seeking, since most libarchive format handlers don't support this
natively. Add a reopen call to the codepath, because obviously it should
always be possible to seek back into a "working" area of the file.

There is a second bug with this: if reopening fails, we don't adjust the
current position back to 0, which in some cases means we accidentally
return bogus data to the reader when we shouldn't. Fix this by always
resetting the position on reopening.
2019-09-19 20:37:03 +02:00
dudemanguy 037cbacb8c libarchive: add fallback for systems without C.UTF-8 2019-05-04 14:17:40 +02:00
Anton Kindestam 8b83c89966 Merge commit '559a400ac36e75a8d73ba263fd7fa6736df1c2da' into wm4-commits--merge-edition
This bumps libmpv version to 1.103
2018-12-05 19:19:24 +01:00
wm4 7e85dc2167 stream_libarchive: fix hangs when demuxer does out of bound seeks
This happened with a .flac file inside an archive. It tried to seek
beyond the end of the archive entry in a format where seeking isn't
supported. stream_libarchive handles these situations by skipping data.
But when the end of the archive is reached, archive_read_data() returns
0. While libarchive didn't bother to fucking document this, they do say
it's supposed to work like read(), so I guess a return value of 0 really
means EOF. So change the "< 0" to "<= 0". Also add some error logging.

The same file actually worked without out of bounds reads when
extracted, so there still might be something very wrong.
2018-10-01 10:41:01 +02:00
wm4 31b78ad7fa misc: move mp_cancel from stream.c to thread_tools.c
It seems a bit inappropriate to have dumped this into stream.c, even if
it's roughly speaking its main user. At least it made its way somewhat
unfortunately to other components not related to the stream or demuxer
layer at all.

I'm too greedy to give this weird helper its own file, so dump it into
thread_tools.c.

Probably a somewhat pointless change.
2018-05-24 19:56:35 +02:00
wm4 987eecdb5a stream_libarchive: mark as needing cache
Seeking back can be excessively slow with most formats, so it'll benefit
from this.
2018-04-15 21:07:13 +03:00
wm4 bf111f9c3c stream_libarchive: fix seeking fallback
In commit 1199c1e3, we added checks to every libarchive API call to make
sure the archive was closed on ARCHIVE_FATAL - otherwise, libarchive
could endow us with free CVEs (such as it apparently happens when you
continue reading a rar archive that uses features not yet supported by
libarchive).

This broke the fallback for seeking in unseekable archive formats. Of
course libarchive won't tell us directly whether a format implementation
has seek support or not - and OF COURSE it returns ARCHIVE_FATAL if it
has no seek support. (The error string, which you can retrieve via API,
is actually more detailed, and also claims it's an "internal error". I
don't think so, libarchive.) Returning ARCHIVE_FATAL means we have to
assume free CVEs are ahead, and we have to close the archive. Which
breaks the fallback in a dumb way (we have no way of telling which of
those cases happened anyway).

Fix this by assuming that all seek errors are potentially due to lack of
seek support. If the seek call fails, reopen the archive, and set a flag
so the seek API is never tried again. (This means we can still skip
ahead for forward seeks, which is more efficient than skipping from the
start of the archive entry.)

Also fix an old typo in an error message.
2017-12-24 21:33:16 +01:00
wm4 1e70e82baa stream_libarchive: workaround various types of locale braindeath
Fix that libarchive fails to return filenames for UTF-8/UTF-16 entries.
The reason is that it uses locales and all that garbage, and mpv does
not set a locale.

Both C locales and wchar_t are shitfucked retarded legacy braindeath. If
the C/POSIX standard committee had actually competent members, these
would have been deprecated or removed long ago. (I mean, they managed to
remove gets().) To justify this emotional outbreak potentially insulting
to unknown persons, I will write a lot of text. Those not comfortable
with toxic language should pretend this is a religious text.

C locales are supposed to be a way to support certain languages and
cultures easier. One example are character codepages. Back when UTF-8
was not invented yet, there were only 255 possible characters, which is
not enough for anything but English and some european languages. So they
decided to make the meaning of a character dependent on the current
codepage. The locale (LC_CTYPE specifically) determines what character
encoding is currently used.

Of course nowadays, this is legacy nonsense. Everything uses UTF-8 for
"char", and what doesn't is broken and terrible anyway. But the old ways
stayed with us, and the stupidity of it as well.

C locales were utterly moronic even when they were invented. The locale
(via setlocale()) is global state, and global state is not a reasonable
way to do anything. It will break libraries, or well modularized code.
(The latter would be forced to strictly guard all entrypoints set
set/restore locales, assuming a single threaded world.)

On top of that, setting a locale randomly changes the semantics of a
bunch of standard functions. If a function respects locale, you suddenly
can't rely on it to behave the same on all systems. Some behavior can
come as a surprise, and of course it will be dependent on the region of
the user (it doesn't help that most software is US-centric, and the US
locale is almost like the C locale, i.e. almost what you expect).

Idiotically, locales were not just used to define the current character
encoding, but the concept was used for a whole lot of things, like e. g.
whether numbers should use "," or "." as decimal separaror. The latter
issue is actually much worse, because it breaks basic string conversion
or parsing of numbers for the purpose of interacting with file formats
and such.

Much can be said about how retarded locales are, even beyond what I just
wrote, or will wrote below. They are so hilariously misdesigned and
insufficient, I can't even fathom how this shit was _standardized_. (In
any case, that meant everyone was forced to implement it.) Many C
functions can't even do it correctly. For example, the character set
encoding can be a multibyte encoding (not just UTF-8, but awful garbage
like Shift JIS (sometimes called SHIT JIZZ), yet functions like
toupper() can return only 1 byte. Or just take the fact that the locale
API tries to define standard paper sizes (LC_PAPER) or telephone number
formatting (LC_TELEPHONE). Who the fuck uses this, or would ever use
this?

But the badness doesn't stop here. At some point, they invented threads.
And they put absolutely no thought into how threads should interact with
locales. So they kept locales as global state. Because obviously, you
want to be able to change the semantics of basic string processing
functions _while_ they're running, right? (Any thread can call
setlocale() at any time, and it's supposed to change the locale of all
other threads.)

At this point, how the fuck are you supposed to do anything correctly?
You can't even temporarily switch the locale with setlocale(), because
it would asynchronously fuckup the other threads. All you can do is to
enforce a convention not to set anything but the C local (this is what
mpv does), or to duplicate standard functions using code that doesn't
query locale (this is what e.g. libass does, a close dependency of mpv).

Imagine they had done this for certain other things. Like errno, with
all the brokenness of the locale API. This simply wouldn't have worked,
shit would just have been too broken. So they didn't. But locales give a
delicious sweet spot of brokenness, where things are broken enough to
cause neverending pain, but not broken enough that enough effort would
have spent to fix it completely.

On that note, standard C11 actually can't stringify an error value. It
does define strerror(), but it's not thread safe, even though C11
supports threads. The idiots could just have defined it to be thread
safe. Even if your libc is horrible enough that it can't return string
literals, it could just just some thread local buffer. Because C11 does
define thread local variables. But hey, why care about details, if you
can just create a shitty standard?

(POSIX defines strerror_r(), which "solves" this problem, while still
not making strerror() thread safe.)

Anyway, back to threads. The interaction of locales and threads makes no
sense. Why would you make locales process global? Who even wanted it to
work this way? Who decided that it should keep working this way, despite
being so broken (and certainly causing implementation difficulties in
libc)? Was it just a fucked up psychopath?

Several decades later, the moronic standard committees noticed that this
was (still is) kind of a bad situation. Instead of fixing the situation,
they added more garbage on top of it. (Probably for the sake of
"compatibility"). Now there is a set of new functions, which allow you
to override the locale for the current thread. This means you can
temporarily override and restore the local on all entrypoints of your
code (like you could with setlocale(), before threads were invented).

And of course not all operating systems or libcs implement this. For
example, I'm pretty sure Microsoft doesn't. (Microsoft got to fuck it up
as usual, and only provides _configthreadlocale(). This is shitfucked on
its own, because it's GLOBAL STATE to configure that GLOBAL STATE should
not be GLOBAL STATE, i.e. completely broken garbage, because it requires
agreement over all modules/libraries what behavior should be used. I
mean, sure, makign setlocale() affect only the current thread would have
been the reasonable behavior. Making this behavior configurable isn't,
because you can't rely on what behavior is active.)

POSIX showed some minor decency by at least introducing some variations
of standard functions, which have a locale argument (e.g. toupper_l()).
You just pass the locale which you want to be used, and don't have to do
the set locale/call function/restore locale nonense. But OF COURSE they
fucked this up too. In no less than 2 ways:

- There is no statically available handle for the C locale, so you have
  to initialize and store it somewhere, which makes it harder to make
  utility functions safe, that call locale-affected standard functions
  and expect C semantics. The easy solution, using pthread_once() and a
  global variable with the created locale, will not be easily accepted
  by pedantic assholes, because they'll worry about allocation failure,
  or leaking the locale when using this in library code (and then
  unloading the library). Or you could have complicated library
  init/uninit functions, which bring a big load of their own mess.
  Same for automagic DLL constructors/destructors.
- Not all functions have a variant that takes a locale argument, and
  they missed even some important ones, like snprintf() or strtod() WHAT
  THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT
  THE FUCK WHAT THE FUCK WHAT THE FUCK WHAT THE FUCK

I would like to know why it took so long to standardize a half-assed
solution, that, apart from being conceptually half-assed, is even
incomplete and insufficient. The obvious way to fix this would have
been:

- deprecate the entire locale API and their use, and make it a NOP
- make UTF-8 the standard character type
- make the C locale behavior the default
- add new APIs that explicitly take locale objects
- provide an emulation layer, that can be used to transparently build
  legacy code without breaking them

But this wouldn't have been "compatible", and the apparently incompetent
standard committees would have never accepted this. As if anyone
actually used this legacy garbage, except other legacy garbage. Oh yeah,
and let's care a lot about legacy compatibility, and let's not care  at
all about modern code that either has to suffer from this, or subtly
breaks when the wrong locales are active.

Last but not least, the UTF-8 locale name is apparently not even
standardized. At the moment I'm trying to use "C.UTF-8", which is
apparently glibc _and_ Debian specific. Got to use every opportunity to
make correct usage of UTF-8 harder. What luck that this commit is only
for some optional relatively obscure mpv feature.

Why is the C locale not UTF-8? Why did POSIX not standardize an UTF-8
locale? Well, according to something I heard a few years ago, they're
considering disallowing UTF-8 as locale, because UTF-8 would violate
certain ivnariants expected by C or POSIX. (But I'm not sure if I
remember this correctly - probably better not to rage about it.)

Now, on to libarchive.

libarchive intentionally uses the locale API and all the broken crap
around it to "convert" UTF-8 or UTF-16 (as contained in reasonably sane
archive formats) to "char*". This is a good start!

Since glibc does not think that the C locale uses UTF-8, this fails for
mpv. So trying to use archive_entry_pathname() to get the archive entry
name fails if the name contains non-ASCII characters.

Maybe use archive_entry_pathname_utf8()? Surely that should return
UTF-8, since its name seems to indicate that it returns UTF-8. But of
fucking course it doesn't! libarchive's horribly convoluted code (that
is full of locale API usage and other legacy shit, as well as ifdefs and
OS specific code, including Windows and fucking Cygwin) somehow fucks up
and fails if the locale is not set to UTF-8. I made a PR fixing this in
libarchive almost 2 years ago, but it was ignored.

So, would archive_entry_pathname_w() as fallback work? No, why would it?
Of course this _also_ involves shitfucked code that calls shitfucked
standard functions (or OS specific ifdeffed shitfuck). The truth is that
at least glibc changes the meaning of wchar_t depending on the locale.
Unlike most people think, wchar_t is not standardized to be an UTF
variant (or even unicode) - it's an encoding that uses basic units that
can be larger than 8 bit. It's an implementation defined thing. Windows
defines it to 2 bytes and UTF-16, and glibc defines it to 4 bytes and
UTF-32, but only if an UTF-8 locale is set (apparently).

Yes. Every libarchive function dealing with strings has 3 variants:
plain, _utf8, and _w. And none of these work if the locale is not set.
I cannot fathom why they even have a wchar_t variant, because it's
redundant and fucking useless for any modern code.

Writing a UTF-16 to UTF-8 conversion routine is maybe 3 pages of code,
or a few lines if you use iconv. But libarchive uses all this glorious
bullshit, and ends up with 3 not working API functions, and with over
4000 lines of its own string abstraction code with gratuitous amounts of
ifdefs and OS dependent code that breaks in a fairly common use case.

So what we do is:

- Use the idiotic POSIX 2008 API (uselocale() etc.) (Too bad for users
  who try to build this on a system that doesn't have these - hopefully
  none are left in 2017. But if there are, torturing them with obscure
  build errors is probably justified. Might be bad for Windows though,
  which is a very popular platform except on phones.)
- Use the "C.UTF-8" locale, which is probably not 100% standards
  compliant, but works on my system, so it's fine.
- Guard every libarchive call with uselocale() + restoring the locale.
- Be lazy and skip some libarchive calls. Look forward to the unlikely
  and astonishingly stupid bugs this could produce.

We could also just set a C UTF-8 local in main (since that would have no
known negative effects on the rest of the code), but this won't work for
libmpv.

We assume that uselocale() never fails. In an unexplainable stroke of
luck, POSIX made the semantics of uselocale() nice enough that user code
can fail failures without introducing crash or security bugs, even if
there should be an implementation fucked up enough where it's actually
possible that uselocale() fails even with valid input.

With all this shitty ugliness added, it finally works, without fucking
up other parts of the player. This is still less bad than that time when
libquivi fucked up OpenGL rendering, because calling a libquvi function
would load some proxy abstraction library, which in turn loaded a KDE
plugin (even if KDE was not used), which in turn called setlocale()
because Qt does this, and consequently made the mpv GLSL shader
generation code emit "," instead of "." for numbers, and of course only
for users who had that KDE plugin installed, and lived in a part of the
world where "." is not used as decimal separator.

All in all, I believe this proves that software developers as a whole
and as a culture produce worse results than drug addicted butt fucked
monkeys randomly hacking on typewriters while inhaling the fumes of a
radioactive dumpster fire fueled by chinese platsic toys for children
and Elton John/Justin Bieber crossover CDs for all eternity.
2017-11-12 13:36:56 +01:00
wm4 1199c1e38a stream_libarchive: stop reading on ARCHIVE_FATAL
According to

https://github.com/libarchive/libarchive/pull/773#issuecomment-334892291

we're not allowed to "continue reading" (post above) or performing "more
operations" (comments in archive.h header), whatever that means. Assume
closing and freeing the archive is still ok.

Since the codec already includes logic for closing and reopening the
archive for seeking in unseekable archives, this probably isn't too bad.

Untested due to lack of crashing sample (I lost my original test case,
and as recently user-provided one didn't crash).
2017-11-02 18:47:05 +01:00
wm4 e3a57272a7 stream_libarchive: add some more points at which reading can be stopped 2016-10-01 18:19:57 +02:00
wm4 510c6a1be4 libarchive: sanitize non-UTF8 archive entries
Some client API users simply don't like such filenames. For their sake,
don't return them, but return a dummy filename instead. (Returning a
latin1-ized version would work too, but is slightly more work.)

Also remove the "\n" from the replacement dummy filename. This was
accidental.
2016-07-18 12:52:59 +02:00
wm4 fb8deb69a6 libarchive: unify entry iteration between stream/demux layers
No really good reason to duplicate this.
2016-07-18 12:44:56 +02:00
Kevin Mitchell dd0c85679b stream_libarchive: make libarchive seek callback lazy
This fixes problems seeking http streams to their end.
2015-11-09 22:41:19 -08:00
Kevin Mitchell 4efadb2808 stream_libarchive: add multivolume support
This commit introduces logic to read other volumes from the same source
as the primary archive. Both .rar formats as well as 7z are supported for now.

It also changes the libarchive callback structure to be per-volume
consistent with the libarchive intenal client data array constructed
with archive_read_append_callback_data.

Added open, close and switch callbacks. Only the latter is strictly
required to make sure that the streams always start at position 0, but
leaving all volumes open can eat a lot of memory for archives with many
parts.
2015-11-09 22:41:19 -08:00
Kevin Mitchell cf5b117553 libarchive: remove redundant log prefix
"libarchive:" is already added by the logging system
2015-11-09 22:41:19 -08:00
wm4 5c3196d20b stream_libarchive: read tar only in "unsafe" mode
As expected, probing with libarchive is a disaster. Both libavformat and
libarchive are too eager to misdetect file formats just because files
"might" be of a specific type. In this case, it's mp3 vs. tar. To be
fair, neither file format has an actual header. I'm not sure why we'd
need tar support, but since libarchive provides it, and idiots on the
internet apparently pack media files in tar sometimes (really, idiots),
keep it for now, and probe tar last.
2015-08-22 22:13:20 +02:00
wm4 addbf8faae stream_libarchive: disable raw filter
Too many false positives (it accepts things like unspecific text files),
and also relatively useless.
2015-08-20 21:56:44 +02:00
wm4 1b93a7a895 stream_libarchive: fix libarchive callback signature
libarchive uses a quite confusing ifdeffery mess for some of the types
used in callbacks. Currently, archive_read_set_seek_callback() causes a
warning at least on Windows due to mismatching return type. The header
file uses __LA_INT64_T as return type, so I think the user is intended
to use int64_t.

(The ssize_t return type for the read_cb seems correct, on the other
hand.)
2015-08-20 11:08:22 +02:00
wm4 4427fa9900 stream_libarchive: restrict number of allowed formats
Most of what is not in this list is extremely obscure, or increases the
file format misdetection rate.
2015-08-18 23:26:40 +02:00
wm4 cf2fa9d3e5 stream: provide a stream_get_size() convenience function
And use it everywhere, instead of retrieving the size manually. Slight
simplification.
2015-08-18 00:10:54 +02:00
wm4 bf5eac8dd3 demux_libarchive: open flat compressed files
Things like .gz etc., which have no real file header. A mixed bag,
because it e.g. tends to misdetect mp3 files as compressed files or
something (of course it has no mp3 support - I don't know as what it
detects them). But requested by someone (or maybe not, I'm not sure
how to interpret that).
2015-08-17 23:59:55 +02:00
wm4 2b280f4522 stream: libarchive wrapper for reading compressed archives
This works similar to the existing .rar support, but uses libarchive.
libarchive supports a number of formats, including zip and (most of)
rar.

Unfortunately, seeking does not work too well. Most libarchive readers
do not support seeking, so it's emulated by skipping data until the
target position. On backwards seek, the file is reopened. This works
fine on a local machine (and if the file is not too large), but will
perform not so well over network connection.

This is disabled by default for now. One reason is that we try
libarchive on every file we open, before trying libavformat, and I'm not
sure if I trust libarchive that much yet. Another reason is that this
breaks multivolume rar support. While libarchive supports seeking in
rar, and (probably) supports multivolume archive, our support of
libarchive (probably) does not. I don't care about multivolume rar, but
vocal users do.
2015-08-17 00:55:26 +02:00