This commit adds the d3d11va-copy hwdec mode using the ffmpeg d3d11va
api. Functions in common with dxva2 are handled in a separate decode/d3d.c
file. A future commit will rewrite decode/dxva2.c to share this code.
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.
In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)
This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.
It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).
Some other notable changes (and potential improvements):
- texture expansion is now *always* handled in pass_read_video, and the
colormatrix never does this anymore. (Which means the code could
probably be removed from the colormatrix generation logic, modulo some
other VOs)
- struct fbo_tex now stores both its "physical" and "logical"
(configured) size, which cuts down on the amount of width/height
baggage on some function calls
- vo_opengl can now technically support textures with different bit
depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
inside img_format.c doesn't export this (nor does ffmpeg support it,
really) so the status quo of using the same tex_mul for all planes is
kept.
- dumb_mode is now only needed because of the indirect_fbo being in the
main rendering pipeline. If we reintroduce p->use_indirect and thread
a transform through the entire program this could be skipped where
unnecessary, allowing for the removal of dumb_mode. But I'm not sure
how to do this in a clean way. (Which is part of why it got introduced
to begin with)
- It would be trivial to resurrect source-shader now (it would just be
one extra 'if' inside pass_read_video).
Some VOs had support for these - remove them.
Typically, these formats will have only some use in cases where using
RGB software conversion with libswscale is faster than letting the
VO/GPU do it (i.e. almost never). For the sake of testing this case,
keep IMGFMT_RGB565. This is the least messy format, because it has no
padding/alpha bits with unknown semantics.
Note that decoding to these formats still works. We'll let libswscale
repack the data to whatever the VO in use can take.
This removes the need to define IMGFMT_GBRAP, which fixes compilation
with the current Libav release.
This also makes it automatically pick up a GBRP format with the same bit
width. (Unfortunately, it seems libswscale does not support conversion
to AV_PIX_FMT_GBRAP16, so our code falls back to 8 bit, removing
precision for video covered by subtitles in cases this code is used.)
Also, when the source video is e.g. 10 bit YUV, upsample to 16 bit.
Whether this is good or bad, it fixes behavior with alpha. Although I'm
not sure if the alpha range is really correct ([0,2^16-1] vs.
[0,255*256]). Keep in mind that libswscale doesn't even agree with the
way we do it.
This actually treats destination alpha correctly, and gives much better
results than before. I don't know if this is perfectly correct yet,
though. Slight difference with vo_opengl behavior suggests it might not
be.
Note that this does not affect VOs with true alpha support. vo_opengl
does not use this code at all, and does the alpha calculations in OpenGL
instead.
The computation of the tex_mul variable was broken in multiple ways.
This variable is used e.g. by debanding for moving expansion of 10 bit
fixed-point input to normalized range to another stage of processing.
One obvious bug was that the rgb555 pixel format was broken. This format
has component_bits=5, but obviously it's already sampled in normalized
range, and does not need expansion. The tex_mul-free code path avoids
this by not using the colormatrix. (The code was originally designed to
work around dealing with the generally complicated pixel formats by only
using the colormatrix in the YUV case.)
Another possible bug was with 10 bit input. It expanded the input by
bringing the [0,2^10) range to [0,1], and then treating the expanded
input as 16 bit input. I didn't bother to check what this actually
computed, but it's somewhat likely it was wrong anyway. Now it uses
mp_get_csp_mul(), and disables expansion when computing the YUV matrix.
Adds support for AV_PIX_FMT_GBRP9, AV_PIX_FMT_GBRP10, AV_PIX_FMT_GBRP12,
AV_PIX_FMT_GBRP14, AV_PIX_FMT_GBRP16, AV_PIX_FMT_GBRAP, and
AV_PIX_FMT_GBRAP16.
(Not that it matters, because nobody uses these anyway.)
VideoToolbox is preferred. Now that FFmpeg released 2.8, there's no
reason to support VDA anymore. In fact, we had a bug that made VDA not
useable with older FFmpeg versions in some newer mpv releases.
VideoToolbox is supported even on slightly older OSX versions, and if
not, you still can run mpv without hw decoding.
This affects vo_opengl_cb in particular: it'll most likely auto-load
VDA, and then the VideoToolbox decoder won't work. And everything fails.
This is mainly caused by FFmpeg using separate pixfmts for the _same_
thing (CVPixelBuffers), simply because libavcodec's architecture demands
that hwaccel backends are selected by pixfmts. (Which makes no sense,
but now we have the mess.)
So instead of duplicating FFmpeg's misdesign, just change the format to
our own canonical one on the image output by the decoder. Now the GL
interop code is exactly the same for VDA and VT, and we use the VT name
only.
VDA is being deprecated in OS X 10.11 so this is needed to keep hwdec working.
The code needs libavcodec support which was added recently (to FFmpeg git,
libav doesn't support it).
Signed-off-by: Stefano Pigozzi <stefano.pigozzi@gmail.com>
This requires FFmpeg git master for accelerated hardware decoding.
Keep in mind that FFmpeg must be compiled with --enable-mmal. Libav
will also work.
Most things work. Screenshots don't work with accelerated/opaque
decoding (except using full window screenshot mode). Subtitles are
very slow - even simple but huge overlays can cause frame drops.
This always uses fullscreen mode. It uses dispmanx and mmal directly,
and there are no window managers or anything on this level.
vo_opengl also kind of works, but is pretty useless and slow. It can't
use opaque hardware decoding (copy back can be used by forcing the
option --vd=lavc:h264_mmal). Keep in mind that the dispmanx backend
is preferred over the X11 ones in case you're trying on X11; but X11
is even more useless on RPI.
This doesn't correctly reject extended h264 profiles and thus doesn't
fallback to software decoding. The hw supports only up to the high
profile, and will e.g. return garbage for Hi10P video.
This sets a precedent of enabling hw decoding by default, but only
if RPI support is compiled (which most hopefully it will be disabled
on desktop Linux platforms). While it's more or less required to use
hw decoding on the weak RPI, it causes more problems than it solves
on real platforms (Linux has the Intel GPU problem, OSX still has
some cases with broken decoding.) So I can live with this compromise
of having different defaults depending on the platform.
Raspberry Pi 2 is required. This wasn't tested on the original RPI,
though at least decoding itself seems to work (but full playback was
not tested).
Simply clamp off the U/V components in the colormatrix, instead of doing
something special in the shader.
Also, since YA8/YA16 gave a plane_bits value of 16/32, and a colormatrix
calculation overflowed with 32, add a component_bits field to the image
format descriptor, which for YA8/YA16 returns 8/16 (the wrong value had
no bad consequences otherwise).
If video output and VO don't support the same format, a conversion
filter needs to be insert. Since a VO can support multiple formats, and
the filter chain also can deal with multiple formats, you basically have
to pick from a huge matrix of possible conversions.
The old MPlayer code had a quite naive algorithm: it first checked
whether any conversion from the list of preferred conversions matched,
and if not, it was falling back on checking a hardcoded list of output
formats (more or less sorted by quality). This had some unintended side-
effects, like not using obvious "replacement" formats, selecting the
wrong colorspace, selecting a bit depth that is too high or too low, and
more.
Use avcodec_find_best_pix_fmt_of_list() provided by FFmpeg instead. This
function was made for this purpose, and should select the "best" format.
Libav provides a similar function, but with a different name - there is
a function with the same name in FFmpeg, but it has different semantics
(I'm not sure if Libav or FFmpeg fucked up here).
This also removes handling of VFCAP_CSP_SUPPORTED vs.
VFCAP_CSP_SUPPORTED_BY_HW, which has no meaning anymore, except possibly
for filter chains with multiple scale filters.
Fixes#1494.
These formats are still supported; you just can't reference them via a
defined constants directly. They are now handled via the generic
passthrough.
(If you want to use such a format, you either have to add the entry
back, or use AV_PIX_FMT_* directly.)
This is a rather radical change: instead of maintaining a whitelist of
FFmpeg formats we support, we automatically support all formats.
In general, a format which doesn't have an explicit IMGFMT_* name will
be converted to a known format through libswscale, or will be handled
by code which can treat pixel formats in a generic way using the pixel
format description, like vo_opengl.
AV_PIX_FMT_UYYVYY411 is a special-case. It's packed YUV with chroma
subsampling by 4 in both directions. Its component order is documented
as "Cb Y0 Y1 Cr Y2 Y3", meaning there's one UV sample for 4 Y samples.
This means each pixel uses 1.5 bytes (4 pixels have 1 UV sample, so
4 bytes + 2 bytes). FFmpeg can actually handle this format with its
generic mechanism in an extremely awkward way, but it doesn't work for
us. Blacklist it, and hope no similar formats will be added in the
future.
Currently, the AV_PIX_FMT_*s allowed are limited to a numeric value of
500. More is not allowed, and there are some fixed size arrays that need
to contain any possible format (look for IMGFMT_END dependencies).
We could have this simpler by replacing IMGFMT_* with AV_PIX_FMT_*
through the whole codebase. But for now, this is better, because we
can compensate for formats missing in Libav or older FFmpeg versions,
like AV_PIX_FMT_RGB0 and others.
bstr.c doesn't really deserve its own directory, and compat had just
a few files, most of which may as well be in osdep. There isn't really
any justification for these extra directories, so get rid of them.
The compat/libav.h was empty - just delete it. We changed our approach
to API compatibility, and will likely not need it anymore.
There is no standard mechanism for detecting endianess. Doing it at
compile time in a portable way is probably hard. Doing it properly
with a configure check is probably hard too. Using the endian
definitions in <sys/types.h> (usually includes <endian.h>, which is
not available everywhere) works under circumstances, but the previous
commit broke it on OSX.
Ideally all code should be endian dependent, but that is not possible
due to the dependencies (such as FFmpeg, some video output APIs, some
audio output APIs).
Create a header osdep/endian.h, which contains various fallbacks.
Note that the last fallback uses libavutil; however, it's not clear
whether AV_HAVE_BIGENDIAN is a public symbol, or whether including
<libavutil/bswap.h> really makes it visible. And in fact we don't want
to pollute the namespace with libavutil definitions either. Thus it's
only the last fallback.
This affects packed RGB formats up to 16 bits per pixel. The old mplayer
names used LSB-to-MSB order, while FFmpeg (and some other libraries) use
MSB-to-LSB.
Nothing should change with this commit, i.e. no bit order or endian bugs
should be added or fixed. In some cases, the name stays the same, even
though the byte order changes, e.g. RGB8->BGR8 and BGR8->RGB8, and this
affects the user-visible names too; this might cause confusion.
These suffixes are annoying when they're redundant, so strip them
automatically. On little endian machines, always strip the "le" suffix,
and on big endian machines vice versa (although I don't think anyone
ever tried to run mpv on a big endian machine).
Since pixel format strings are returned by a certain function and we
can't just change static strings, use a trick to pass a stack buffer
transparently. But this also means the string can't be permanently
stored by the caller, so vf_dlopen.c has to be updated. There seems
to be no other case where this is done, though.
Integrate it with the existing surface allocator in vdpau.c. The changes
are a bit violent, because the vdpau API is so non-orthogonal: compared
to video surfaces, output surfaces use a different ID type, different
format types, and different API functions.
Also, introduce IMGFMT_VDPAU_OUTPUT for VdpOutputSurfaces wrapped in
mp_image, rather than hacking it. This is a bit cleaner.
The most user visible change is that "420p" is now displayed as
"yuv420p". This is what FFmpeg uses (almost), and is also less confusing
since "420p" is often confused with "420 pixels vertical resolution".
In general, we return the FFmpeg pixel format name. We still use our own
old mechanism to keep a list of exceptions to provide compatibility for
a while.
Also, never return NULL for image format names. If the format is unset
(0/IMGFMT_NONE), return "none". If the format has no name (probably
never happens, FFmpeg seems to guarantee that a name is set), return
"unknown".
They were used by ancient libavcodec versions. This also removes the
need to distinguish vdpau image formats at all (since there is only
one), and some code can be simplified.
Image formats used to be FourCCs, so unsigned int was better. But now
it's annoying and the only difference is that unsigned int is more to
type than int.
We got a crash in libavutil when encoding with Y8 (GRAY8). The reason
was that libavutil was copying an Y8 image allocated by us, and expected
a palette. This is because GRAY8 is a PSEUDOPAL format. It's not clear
what PSEUDOPAL means, and it makes literally no sense at all. However,
it does expect a palette allocated for some formats that are not
paletted, and libavutil crashed when trying to access the non-existent
palette.
PIX_FMT_VDA_VLD and PIX_FMT_VAAPI_VLD were never used anywhere. I'm not
sure why they were even added, and they sound like they are just for
compatibility with XvMC-style decoding, which sucks anyway.
Now that there's only a single vaapi format, remove the
IMGFMT_IS_VAAPI() macro. Also get rid of IMGFMT_IS_VDA(), which was
unused.
These formats are helpful for distinguishing surfaces with and without
alpha. Unfortunately, Libav and older version of FFmpeg don't support
them, so code will break. Fix this by treating these formats specially
on the mpv side, mapping them to RGBA on Libav, and unseting the alpha
bit in the mp_imgfmt_desc struct.
Decoding H264 using Video Decode Acceleration used the custom 'vda_h264_dec'
decoder in FFmpeg.
The Good: This new implementation has some advantages over the previous one:
- It works with Libav: vda_h264_dec never got into Libav since they prefer
client applications to use the hwaccel API.
- It is way more efficient: in my tests this implementation yields a
reduction of CPU usage of roughly ~50% compared to using `vda_h264_dec` and
~65-75% compared to h264 software decoding. This is mainly because
`vo_corevideo` was adapted to perform direct rendering of the
`CVPixelBufferRefs` created by the Video Decode Acceleration API Framework.
The Bad:
- `vo_corevideo` is required to use VDA decoding acceleration.
- only works with versions of ffmpeg/libav new enough (needs reference
refcounting). That is FFmpeg 2.0+ and Libav's git master currently.
The Ugly: VDA was hardcoded to use UYVY (2vuy) for the uploaded video texture.
One one end this makes the code simple since Apple's OpenGL implementation
actually supports this out of the box. It would be nice to support other
output image formats and choose the best format depending on the input, or at
least making it configurable. My tests indicate that CPU usage actually
increases with a 420p IMGFMT output which is not what I would have expected.
NOTE: There is a small memory leak with old versions of FFmpeg and with Libav
since the CVPixelBufferRef is not automatically released when the AVFrame is
deallocated. This can cause leaks inside libavcodec for decoded frames that
are discarded before mpv wraps them inside a refcounted mp_image (this only
happens on seeks).
For frames that enter mpv's refcounting facilities, this is not a problem
since we rewrap the CVPixelBufferRef in our mp_image that properly forwards
CVPixelBufferRetain/CvPixelBufferRelease calls to the underying
CVPixelBufferRef.
So, for FFmpeg use something more recent than `b3d63995` for Libav the patch
was posted to the dev ML in July and in review since, apparently, the proposed
fix is rather hacky.
This is based on the MPlayer VA API patches. To be exact it's based on
a very stripped down version of commit f1ad459a263f8537f6c from
git://gitorious.org/vaapi/mplayer.git.
This doesn't contain useless things like benchmarking hacks and the
demo code for GLX interop. Also, unlike in the original patch, decoding
and video output are split into separate source files (the separation
between decoding and display also makes pixel format hacks unnecessary).
On the other hand, some features not present in the original patch were
added, like screenshot support.
VA API is rather bad for actual video output. Dealing with older libva
versions or the completely broken vdpau backend doesn't help. OSD is
low quality and should be rather slow. In some cases, only either OSD
or subtitles can be shown at the same time (because OSD is drawn first,
OSD is prefered).
Also, libva can't decide whether it accepts straight or premultiplied
alpha for OSD sub-pictures: the vdpau backend seems to assume
premultiplied, while a native vaapi driver uses straight. So I picked
straight alpha. It doesn't matter much, because the blending code for
straight alpha I added to img_convert.c is probably buggy, and ASS
subtitles might be blended incorrectly.
Really good video output with VA API would probably use OpenGL and the
GL interop features, but at this point you might just use vo_opengl.
(Patches for making HW decoding with vo_opengl have a chance of being
accepted.)
Despite these issues, decoding seems to work ok. I still got tearing
on the Intel system I tested (Intel(R) Core(TM) i3-2350M). It was also
tested with the vdpau vaapi wrapper on a nvidia system; however this
was rather broken. (Fortunately, there is no reason to use mpv's VAAPI
support over native VDPAU.)
Move the decoder parts from vo_vdpau.c to a new file vdpau_old.c. This
file is named so because because it's written against the "old"
libavcodec vdpau pseudo-decoder (e.g. "h264_vdpau").
Add support for the "new" libavcodec vdpau support. This was recently
added and replaces the "old" vdpau parts. (In fact, Libav is about to
deprecate and remove the "old" API without deprecation grace period,
so we have to support it now. Moreover, there will probably be no Libav
release which supports both, so the transition is even less smooth than
we could hope, and we have to support both the old and new API.)
Whether the old or new API is used is checked by a configure test: if
the new API is found, it is used, otherwise the old API is assumed.
Some details might be handled differently. Especially display preemption
is a bit problematic with the "new" libavcodec vdpau support: it wants
to keep a pointer to a specific vdpau API function (which can be driver
specific, because preemption might switch drivers). Also, surface IDs
are now directly stored in AVFrames (and mp_images), so they can't be
forced to VDP_INVALID_HANDLE on preemption. (This changes even with
older libavcodec versions, because mp_image always uses the newer
representation to make vo_vdpau.c simpler.)
Decoder initialization in the new code tries to deal with codec
profiles, while the old code always uses the highest profile per codec.
Surface allocation changes. Since the decoder won't call config() in
vo_vdpau.c on video size change anymore, we allow allocating surfaces
of arbitrary size instead of locking it to what the VO was configured.
The non-hwdec code also has slightly different allocation behavior now.
Enabling the old vdpau special decoders via e.g. --vd=lavc:h264_vdpau
doesn't work anymore (a warning suggesting the --hwdec option is
printed instead).
And support the PIX_FMT_MONOWHITE pixel format. (This is really weird:
unlike PIX_FMT_MONOBLACK, it uses white pixels. I have no idea why
libavcodec doesn't just convert the pixel format on the fly, instead of
bothering everyone with really special pixel formats.)
This field contained the "average" bit depth per pixel. It serves no
purpose anymore. Remove it.
Only vo_opengl_old still uses this in order to allocate a buffer that is
shared between all planes.
This used to mean that there is more than one plane. This is not very
useful: IMGFMT_Y8 was not considered planar, but it's just a Y plane,
and could be treated the same as actual planar formats. On the other
hand, IMGFMT_NV12 is partially packed, and usually requires special
handling, but was considered planar.
Change its meaning. Now the flag is set if the format has a separate
plane for each component. IMGFMT_Y8 is now planar, IMGFMT_NV12 is not.
As an odd special case, IMGFMT_MONO (1 bit per pixel) is like a planar
RGB format with a single plane.
Remove the strange things the old mp_image_setfmt() code did to the
image format parameters. This includes setting chroma shift to 31 for
gray (Y8) formats and more.
Y8 + vo_opengl_old didn't actually work for unknown reasons (regression
in this branch). Fix this. The difference is that Y8 is now interpreted
as gray RGB (LUMINANCE texture) instead of involving YUV (and levels)
conversion.
Get rid of distinguishing RGB and BGR. There wasn't really any good
reason for this.
Remove mp_get_chroma_shift() and IMGFMT_IS_YUVP16*(). mp_imgfmt_desc
gives the same information and more.
Even though #ifdef ACCURATE is removed, the result should be about the
same. The fallback is only used by packed YUV formats (YUYV, NV12), and
doing 16 bit for them instead of 8 bit is not useful.
A side effect is that Y8 (gray) is not converted drawing subs, and for
alpha formats, the alpha plane is not removed. This means the number of
planes after upsampling can be 1-4 (1: gray, 2: gray+alpha, 3: planar,
4: planar+alpha). The code has to be adjusted accordingly to work on the
color planes only. Also remove the workaround for the chroma shift 31
hack.
mplayer's video chain traditionally used FourCCs for pixel formats. For
example, it used IMGFMT_YV12 for 4:2:0 YUV, which was defined to the
string 'YV12' interpreted as unsigned int. Additionally, it used to
encode information into the numeric values of some formats. The RGB
formats had their bit depth and endian encoded into the least
significant byte. Extended planar formats (420P10 etc.) had chroma
shift, endian, and component bit depth encoded. (This has been removed
in recent commits.)
Replace the FourCC mess with a simple enum. Remove all the redundant
formats like YV12/I420/IYUV. Replace some image format names by
something more intuitive, most importantly IMGFMT_YV12 -> IMGFMT_420P.
Add img_fourcc.h, which contains the old IDs for code that actually uses
FourCCs. Change the way demuxers, that output raw video, identify the
video format: they set either MP_FOURCC_RAWVIDEO or MP_FOURCC_IMGFMT to
request the rawvideo decoder, and sh_video->imgfmt specifies the pixel
format. Like the previous hack, this is supposed to avoid the need for
a complete codecs.cfg entry per format, or other lookup tables. (Note
that the RGB raw video FourCCs mostly rely on ffmpeg's mappings for NUT
raw video, but this is still considered better than adding a raw video
decoder - even if trivial, it would be full of annoying lookup tables.)
The TV code has not been tested.
Some corrective changes regarding endian and other image format flags
creep in.
According to DOCS/OUTDATED-tech/colorspaces.txt, the following formats
are supposed to be palettized:
IMGFMT_BGR8
IMGFMT_RGB8,
IMGFMT_BGR4_CHAR
IMGFMT_RGB4_CHAR
IMGFMT_BGR4
IMGFMT_RGB4
Of these, only BGR8 and RGB8 are actually treated as palettized in some
way. ffmpeg has only one palettized format (AV_PIX_FMT_PAL8), and
IMGFMT_BGR8 was inconsistently mapped to packed non-palettized RGB
formats too (AV_PIX_FMT_BGR8). Moreover, vf_scale.c contained messy
hacks to generate a palette when AV_PIX_FMT_BGR8 is output. (libswscale
does not support AV_PIX_FMT_PAL8 output in the first place.)
Get rid of all of this, and introduce IMGFMT_PAL8, which directly maps
to AV_PIX_FMT_PAL8. Remove the palette creation code from vf_scale.c.
IMGFMT_BGR8 maps to AV_PIX_FMT_RGB8 (don't ask me why it's swapped),
without any palette use. Enabling it in vo_x11 or using it as vf_scale
input seems to give correct results.