The old code ignored many corner cases, and even wrote "blacker than
black" to YUV images. Use the new pixel format metadata and other
recently added gimmicky crap, which should make this more correct. Even
the almighty fuckup of a format AV_PIX_FMT_UYYVYY411 should work,
although that couldn't be tested for obvious reasons.
This doesn't work for "monow", but this is so extremely fringe while at
the same time painful that I just won't care. In theory, it could be
modeled as some sort of inverted gray colorspace or something.
Remove the vaguely defined plane_bits and component_bits fields from
struct mp_imgfmt_desc. Add weird replacements for existing uses. Remove
the bytes[] field, replace uses with bpp[].
Fix some potential alignment issues in existing code. As a compromise,
split mp_image_pixel_ptr() into 2 functions, because I think it's a bad
idea to implicitly round, but for some callers being slightly less
strict is convenient.
This shouldn't really change anything. In fact, it's a 100% useless
change. I'm just cleaning up what I started almost 8 years ago (see
commit 00653a3eb0). With this I've decided to keep mp_imgfmt_desc,
just removing the weird parts, and keeping the saner parts.
Adding all these so I can use them for obscure processing purposes (see
later draw_bmp commit).
There isn't really a reason why they should exist. On the other hand,
they're just labels for formats that can be handled in a generic way,
and this commit adds support for them in the zimg wrapper and vo_gpu
just by making the formats exist. (Well, vo_gpu had to be fixed in the
previous commit.)
This is really basic for planar image data access; not sure why there
weren't such helpers before.
They also handle trickier formats that use bit-packing, or they would be
mich simpler. (This affects only BGR4/BGR4/MONOW/MONOH, I hope whoever
invented them is proud of triggering so many special cases for so little
gain.)
This is mostly for testing. It adds passing through the metadata through
the video chain. The metadata can be manipulated with vf_format. Support
for zimg alpha conversion (if built with zimg after it gained alpha
support) is implemented. Support premultiplied input in vo_gpu.
Some things still seem to be buggy.
Libav seems rather dead: no release for 2 years, no new git commits in
master for almost a year (with one exception ~6 months ago). From what I
can tell, some developers resigned themselves to the horrifying idea to
post patches to ffmpeg-devel instead, while the rest of the developers
went on to greener pastures.
Libav was a better project than FFmpeg. Unfortunately, FFmpeg won,
because it managed to keep the name and website. Libav was pushed more
and more into obscurity: while there was initially a big push for Libav,
FFmpeg just remained "in place" and visible for most people. FFmpeg was
slowly draining all manpower and energy from Libav. A big part of this
was that FFmpeg stole code from Libav (regular merges of the entire
Libav git tree), making it some sort of Frankenstein mirror of Libav,
think decaying zombie with additional legs ("features") nailed to it.
"Stealing" surely is the wrong word; I'm just aping the language that
some of the FFmpeg members used to use. All that is in the past now, I'm
probably the only person left who is annoyed by this, and with this
commit I'm putting this decade long problem finally to an end. I just
thought I'd express my annoyance about this fucking shitshow one last
time.
The most intrusive change in this commit is the resample filter, which
originally used libavresample. Since the FFmpeg developer refused to
enable libavresample by default for drama reasons, and the API was
slightly different, so the filter used some big preprocessor mess to
make it compatible to libswresample. All that falls away now. The
simplification to the build system is also significant.
With hwdec copy modes, mp_image_copy_attributes() is used to transfer
metadata other than the image data when copying the image from the
hardware surface. It didn't copy the closed caption data.
Fix this. This makes closed captions in copy mode work.
Fixes: #6376
And update the comment both explaining why this defaulting matters and
why we use BT.2020 instead.
tl;dr BT.709 clips even the one test file we *do* have, so if we don't
handle XYZ "natively" in vo_gpu we might as well at least handle it in a
way that runs less risk of clipping
This used to be needed for the "GPU memcpy" (shitty Intel methods to
deal with certain uncached memory types). This is now done in FFmpeg,
and the code in mp_image.c was just unnecessarily convoluted.
This was speculatively added 2 years ago in preparation for something
that apparently never happened. The D3D code was added as an "example",
but this too was never used/finished.
No reason to keep this.
Generally, using x86 SIMD efficiently (or crash-free) requires aligning
all data on boundaries of 16, 32, or 64 (depending on instruction set
used). 64 bytes is needed or AVX-512, 32 for old AVX, 16 for SSE. Both
FFmpeg and zimg usually require aligned data for this reason.
FFmpeg is very unclear about alignment. Yes, it requires you to align
data pointers and strides. No, it doesn't tell you how much, except
sometimes (libavcodec has a legacy-looking avcodec_align_dimensions2()
API function, that requires a heavy-weight AVCodecContext as argument).
Sometimes, FFmpeg will take a shit on YOUR and ITS OWN alignment. For
example, vf_crop will randomly reduce alignment of data pointers,
depending on the crop parameters. On the other hand, some libavfilter
filters or libavcodec encoders may randomly crash if they get the wrong
alignment. I have no idea how this thing works at all.
FFmpeg usually doesn't seem to signal alignment internal anywhere, and
usually leaves it to av_malloc() etc. to allocate with proper alignment.
libavutil/mem.c currently has a ALIGN define, which is set to 64 if
FFmpeg is built with AVX-512 support, or as low as 16 if built without
any AVX support. The really funny thing is that a normal FFmpeg build
will e.g. align tiny string allocations to 64 bytes, even if the machine
does not support AVX at all.
For zimg use (in a later commit), we also want guaranteed alignment.
Modern x86 should actually not be much slower at unaligned accesses, but
that doesn't help. zimg's dumb intrinsic code apparently randomly
chooses between aligned or unaligned accesses (depending on compiler, I
guess), and on some CPUs these can even cause crashes. So just treat the
requirement to align as a fact of life.
All this means that we should probably make sure our own allocations are
64 bit aligned. This still doesn't guarantee alignment in all cases, but
it's slightly better than before.
This also makes me wonder whether we should always override libavcodec's
buffer pool, just so we have a guaranteed alignment. Currently, we only
do that if --vd-lavc-dr is used (and if that actually works). On the
other hand, it always uses DR on my machine, so who cares.
See manpage additions. This is a huge hack. You can bet there are shit
tons of bugs. It's literally forcing square pegs into round holes.
Hopefully, the manpage wall of text makes it clear enough that the whole
shit can easily crash and burn. (Although it shouldn't literally crash.
That would be a bug. It possibly _could_ start a fire by entering some
sort of endless loop, not a literal one, just something where it tries
to do work without making progress.)
(Some obvious bugs I simply ignored for this initial version, but
there's a number of potential bugs I can't even imagine. Normal playback
should remain completely unaffected, though.)
How this works is also described in the manpage. Basically, we demux in
reverse, then we decode in reverse, then we render in reverse.
The decoding part is the simplest: just reorder the decoder output. This
weirdly integrates with the timeline/ordered chapter code, which also
has special requirements on feeding the packets to the decoder in a
non-straightforward way (it doesn't conflict, although a bugmessmass
breaks correct slicing of segments, so EDL/ordered chapter playback is
broken in backward direction).
Backward demuxing is pretty involved. In theory, it could be much
easier: simply iterating the usual demuxer output backward. But this
just doesn't fit into our code, so there's a cthulhu nightmare of shit.
To be specific, each stream (audio, video) is reversed separately. At
least this means we can do backward playback within cached content (for
example, you could play backwards in a live stream; on that note, it
disables prefetching, which would lead to losing new live video, but
this could be avoided).
The fuckmess also meant that I didn't bother trying to support
subtitles. Subtitles are a problem because they're "sparse" streams.
They need to be "passively" demuxed: you don't try to read a subtitle
packet, you demux audio and video, and then look whether there was a
subtitle packet. This means to get subtitles for a time range, you need
to know that you demuxed video and audio over this range, which becomes
pretty messy when you demux audio and video backwards separately.
Backward display is the most weird (and potentially buggy) part. To
avoid that we need to touch a LOT of timing code, we negate all
timestamps. The basic idea is that due to the navigation, all
comparisons and subtractions of timestamps keep working, and you don't
need to touch every single of them to "reverse" them.
E.g.:
bool before = pts_a < pts_b;
would need to be:
bool before = forward
? pts_a < pts_b
: pts_a > pts_b;
or:
bool before = pts_a * dir < pts_b * dir;
or if you, as it's implemented now, just do this after decoding:
pts_a *= dir;
pts_b *= dir;
and then in the normal timing/renderer code:
bool before = pts_a < pts_b;
Consequently, we don't need many changes in the latter code. But some
assumptions inhererently true for forward playback may have been broken
anyway. What is mainly needed is fixing places where values are passed
between positive and negative "domains". For example, seeking and
timestamp user display always uses positive timestamps. The main mess is
that it's not obvious which domain a given variable should or does use.
Well, in my tests with a single file, it suddenly started to work when I
did this. I'm honestly surprised that it did, and that I didn't have to
change a single line in the timing code past decoder (just something
minor to make external/cached text subtitles display). I committed it
immediately while avoiding thinking about it. But there really likely
are subtle problems of all sorts.
As far as I'm aware, gstreamer also supports backward playback. When I
looked at this years ago, I couldn't find a way to actually try this,
and I didn't revisit it now. Back then I also read talk slides from the
person who implemented it, and I'm not sure if and which ideas I might
have taken from it. It's possible that the timestamp reversal is
inspired by it, but I didn't check. (I think it claimed that it could
avoid large changes by changing a sign?)
VapourSynth has some sort of reverse function, which provides a backward
view on a video. The function itself is trivial to implement, as
VapourSynth aims to provide random access to video by frame numbers (so
you just request decreasing frame numbers). From what I remember, it
wasn't exactly fluid, but it worked. It's implemented by creating an
index, and seeking to the target on demand, and a bunch of caching. mpv
could use it, but it would either require using VapourSynth as demuxer
and decoder for everything, or replacing the current file every time
something is supposed to be played backwards.
FFmpeg's libavfilter has reversal filters for audio and video. These
require buffering the entire media data of the file, and don't really
fit into mpv's architecture. It could be used by playing a libavfilter
graph that also demuxes, but that's like VapourSynth but worse.
This helps with compatibility and/or performance in particular for
oddly-sized formats like rgb24. We use a loop to avoid having to
calculate the lcm (or waste bytes in the extremely common case of the
byte size and the stride align having shared factors).
Also rename stereo3d to stereo_in. The only real change is that the
vo_gpu OSD code now uses the actual stereo 3D mode, instead of the
--video-steroe-mode value. (Why does this vo_gpu code even exist?)
This means vf_vapoursynth doesn't need a hack to work around the filter
code, and libavfilter filters now actually get the frame_rate field on
input pads set.
The libavfilter doxygen says the frame_rate field is only to be set if
the frame rate is known to be constant, and uses the word "must" (which
probably means they really mean it?) - but ffmpeg.c sets the field to
mere guesses anyway, and it looks like this normally won't lead to
problems.
vf_vdpaupp crashed on certain files (with --hwdec=vdpau --deinterlace).
This happened for example with mpeg2 files, which for some reason
typically contain some AVFrame side data. It turns out the last change
in 55c88fdb8f was not quite clean, and forgot the special cases in
mp_image_new_dummy_ref(). This function is supposed to copy all metadata
from the argument passed, except buffer refs. But there were new buffer
refs, that were not cleared properly. Also, the ff_side_data pointer
must be cleared, or the new mp_image would try to free it on
destruction.
The bottom line is that mp_image_new_dummy_ref() is a pretty bad idea,
and I suppose all callers with non-NULL arguments should be changed to
create a blank mp_image, and copy frame properties as needed (this
includes callers of mp_image_new_custom_ref()).
Fixes#5630.
Useful for libavfilter. Somewhat risky, because we can't ensure the
consistency of the unknown side data (but this is a general problem with
side data, and libavfilter filters will usually get it wrong too _if_
there are conflict cases).
Fixes#5569.
We must not create new references herem because mp_image_new_ref() is
called later, and actually creates new references (including doing
actual error checking). Blame C, not me.
This is preparation for a change in vd_lavc.c: it should not have to
access the demuxer (to pass along closed captions), so the idea is to
make them part of mp_image, and to let the layer above vd_lavc propagate
the buffer.
Don't bother with preserving them for mp_image->AVFrame, because we
don't need this.
Reduce the trivial but still annoying code duplication in
mp_image_new_ref(), which has to create new buffer references and deal
with possible failure of creating them. The tricky part is that if
creating a reference fails, we must set the target to NULL, so that
unreferencing the failed new mp_image reference does not release the
buffer references of the original mp_image. For the same reason, the
code can't jump to error handling when it can't create a new reference,
and has to set a flag instead.
This fixes that AVFrames passing through libavfilter (such as with
--lavfi-complex) implicitly stripped some fields. I'm not actually sure
what to do with the mp_image_params.color.light field here (what happens
if the colorspace changed?) - there is no equivalent in AVFrame or
FFmpeg at all.
It did not affect the old --vf code, because it doesn't allow
libavfilter to change the metadata.
Also log the .light field in verbose mode.
This is where it should be. It only wasn't because of an old libavcodec
bug, that returned the side data only on every IDR. This required some
sort of caching, which is now dropped. (mp_image wouldn't have been able
to do this kind of caching, because this code is stateless.) We don't
support these old libavcodec versions anymore, which is why this is not
needed anymore.
Also move initialization of rotation/stereo stuff to dec_video.c.
The mechanism introduced in b135af6842 assumed AVHWFramesContext would
be enough. Apparently it's not - the intended use with Rockchip (not
Rokchip btw.) requires accessing actual frame data in order to access
the AVDRMFrameDescriptor struct.
Just pass the entire mp_image to the new function. This is more
flexible, although it slightly worries me that it will be less reusable
for things which require setting up mp_image_params before any real
frames are processed (such as filters).
The same should happen with any other side data that matters to mpv,
otherwise filters will drop it.
(No, don't try to argue that mpv should use AVFrame. That won't work.)
ffmpeg_garbage() is copy&paste from frame_new_side_data() in FFmpeg
(roughly feed201849b8f91), because it's not public API. The name
reflects my opinion about FFmpeg's API.
In mp_image_to_av_frame(), change the too-fragile
*new_ref = (struct mp_image){0};
into explicitly zeroing out the fields that are "transferred" to the
created AVFrame.
Merge mp_image_copy_fields_to_av_frame() into mp_image_from_av_frame(),
same for the other direction.
There isn't any good reason to keep them separate, and the refcounting
handling makes it only more awkward.
It seems this will be useful for Rokchip DRM hwcontext integration.
DRM hwcontexts have additional internal structure which can be different
depending on the decoder, and which is not part of the generic hwcontext
API. Rockchip has 1 layer, which EGL interop happens to translate to a
RGB texture, while VAAPI (mapped as DRM hwcontext) will use multiple
layers. Both will use sw_format=nv12, and thus are indistinguishable on
the mp_image_params level. But this is needed to initialize the EGL
mapping and the vo_gpu video renderer correctly.
We hope that the layer count is enough to tell whether EGL will
translate the data to a RGB texture (vs. 2 texture resembling raw nv12
data). For that we introduce MP_IMAGE_HW_FLAG_OPAQUE.
This commit adds the flag, infrastructure to set it, and an "example"
for D3D11.
The D3D11 addition is quite useless at this point. But later we want to
get rid of d3d11_update_image_attribs() anyway, while we still need a
way to force d3d11vpp filter insertion, so maybe it has some
justification (who knows). In any case it makes testing this easier.
Obviously it also adds some basic support for triggering the opaque
format for decoding, which will use a driver-specific format, but which
is not supported in shaders. The opaque flag is not used to determine
whether d3d11vpp needs to be inserted, though.
If the chroma location is missing, vo_gpu will use centered chroma.
Select a better chroma location by default: normally, it will always be
MPEG video chroma location. If full levels are used, use JPEG chroma
location, because that sort of sounds like it could make sense as it
might coincide with JPEG being decoded.
See e.g. #4804.
Now you need FFmpeg git, or something.
This also gets rid of the last real use of gpu_memcpy(). libavutil does
that itself. (vaapi.c still used it, but it was essentially unused,
because the code path isn't really in use anymore. It wasn't even
included due to the d3d-hwaccel dependency in wscript.)
See "Copyright" file for caveats.
This changes the remaining "almost LGPL" files to LGPL, because we think
that the conditions the author set for these was finally fulfilled.
This is "wrong", because you might want mp_image_copy_attributes() to
preserve the information that the colorspace parameters are unknown.
This is important for hwdec -copy modes, which call this function before
fix_image_params() and mp_colorspace_merge() are called.
Instead, just wipe the colorspace attributes if the pixel format changes
in an apparently incompatible way. Use mp_image_params_guess_csp() logic
for this and factor that into its own function.
mp_image_set_attributes() attempts to do something similar, so change
that in the same way. Also, mp_image_params_guess_csp() just returned if
the imgfmt was invalid or unset - just remove that part, because it
annoyingly doesn't fit into the new code, and had little reason to exist
to begin with. (Probably.)
I see no reason not to do this. I think the check comes from the time
when mp_image stored the image aspect ratio, instead of the pixel aspect
ratio, where the logic might have made more sense.
It was noticed that -copy hwdec modes typically dropped the
chroma_location field. This happened because the attributes on hw
download are copied with mp_image_copy_attributes(), which tries to copy
these parameters only if src and dst were both YUV (in an attempt to
copy parameters only if it makes sense).
But hardware formats did not have the YUV flag set (anymore?), and code
shouldn't attempt to check the flag in this way anyway. Drop the check,
and always copy the whole color metadata struct. There is a call to
mp_image_params_guess_csp() below, which tries to unset nonsense
metadata if it was copied from a YUV format to RGB. This function would
also do the right thing for hw formats (although for the cited bug only
the software case matters).
Fixes#4804.
This is needed for HAVE_SSE4_INTRINSICS. config.h used to be included as
a transitive dependency of vf.h, but the include statement was removed
from vf.h in 8f2ccba71b.
Also silence an unused variable warning that was introduced in the same
commit.
This adds handling of spherical video metadata: retrieving it from
demux_lavf and demux_mkv, passing it through filters, and adjusting it
with vf_format. This does not include support for rendering this type of
video.
We don't expect we need/want to support the other projection types like
cube maps, so we don't include that for now. They can be added later as
needed.
Also raise the maximum sizes of stringified image params, since they
can get really long.
Since these need to be refcounted, we throw them directly into struct
mp_image instead of being part of mp_colorspace. Even though they would
semantically make more sense in mp_colorspace, having them there is
really awkward because mp_colorspace is passed around and stored a lot,
and this way their lifetime is exactly tied to the lifetime of the
mp_image associated with it.
Refactor the image allocation code, and expose part of it as helper
code. This aims towards allowing callers to easily allocate mp_image
references from custom-allocated linear buffers. This is exposing only
as much as what should be actually required.
Slightly cleaner, possibly slightly more correct. (The last case should
be dead code now. In general, we can't know the implied colorspace from
a AV_PIX_FMT, at least not if FFmpeg adds a new one.)
Another legacy annoyance. The only place where packed YUV is still
important is slightly older Apple hardware or drivers, which require
it for efficient hardware decoding.