This commit replaces all uses of sig_peak and maps all HDR metadata.
Form notable changes mixed usage of maxCLL and max_luma is resolved and
not always max_luma is used which makes vo_gpu and vo_gpu_next behave
the same way.
We make the assumption that there is more buffer available
than indicated, this needs to be considered in this specific
location too as mp_image_crop understandably checks whether
we're about to do something unsafe.
minimal reproducer: mpv av://lavfi:testsrc -vf crop=320:239,format=yuv420p -o test.png
fixes#10469
In sum there were three issues:
1. nominal_peak_luminance is an output parameter, not input
2. mpv internally uses MP_REF_WHITE which we need to scale by
3. the value should be left at its default for SDR
Together with the last change this means mpv can sucessfully
take HDR screenshots with --screenshot-sw without resulting
in a brightly colored mess. Though note that zimg does not
perform proper tonemapping.
c784820454 introduced a bool option type
as a replacement for the flag type, but didn't actually transition and
remove the flag type because it would have been too much mundane work.
This is commonly used by UHD/HDR sources, and mpv hilariously ignores it
up until now, just blindly mapping it to MP_CHROMA_AUTO without even so
much as a warning message.
It would be justified to add all the other chroma locations as well, but
I'm lazy and just wanted to quickly fix this bug.
This probably makes it much faster (I wouldn't know, I didn't run any
benchmarks ). Seems to work as well (although I'm not sure, it's not
like I'd perform rigorous tests).
The scale_zimg test seems to mysteriously treat color in fully
transparent alpha differently, which makes no sense, and isn't visible
(but makes the test fail). I can't be bothered with investigating this
more. What do you do with failing tests? Correct, you disable them. Or
rather, you disable whatever appears to cause them to fail, which is the
threading in this case.
This change follows mostly the tile_example.cpp. The slice size uses a
minimum of 64, which was suggested by the zimg author. Some of this
commit is a bit inelegant and weird, such as recomputing the scale
factor for every slice, or the way slice_h is managed. Too lazy to make
this more elegant.
zimg git had a regressio around active_region (which is needed by the
slicing), which was fixed in commit 83071706b2e6bc634. Apparently, the
bug was never released, so just add a warning to the manpage.
The intention is to add slice-threading to the wrapper. For that
purpose, move all zimg related state to a new struct mp_zimg_state.
There is now an array of instances of this struct. The intention is to
have one instance for each thread. As of this commit, this is hardcoded
to 1 thread; the following commit extends this.
For whatever purpose. If anything, this makes the zimg wrapper cleaner.
The added tests are not particular exhaustive, but nice to have. This
also makes the scale_zimg.c test pretty useless, because it only tests
repacking (going through the zimg wrapper). In theory, the repack_tests
things could also be used on scalers, but I guess it doesn't matter.
Some things are added over the previous zimg wrapper code. For example,
some fringe formats can now be expanded to 8 bit per component for
convenience.
Only _writes_ are aligned, so the assumption doesn't work for reads. But
it's easy to fix by rounding down x0 to the next byte boundary. Writing
pixels outside of the read area is allowed, and we don't go out of
buffer bounds.
Patch by anon32, permission to do anything with it.
This is mostly for testing. It adds passing through the metadata through
the video chain. The metadata can be manipulated with vf_format. Support
for zimg alpha conversion (if built with zimg after it gained alpha
support) is implemented. Support premultiplied input in vo_gpu.
Some things still seem to be buggy.
Move lookup GBRP or planar gray/alpha formats to separate functions in
some cases.
Make setup_regular_rgb_packer() not use a 4:4:4 YUV format to "pass"
RGB. This was used as a "trick" to avoid the stupid GBRP plane
permutation, but it confused severely, so get rid of it. Just do the
reordering, even if the zimg wrapper itself will reorder it back (which
is so stupid that I used the other approach at first). The comment
saying IMGFMT_420P was bogus of course; typically it was IMGFMT_444P.
When I added mp_regular_imgfmt, I made the chroma subsampling use the
actual chroma division factor, instead of a shift (log2 of the actual
value). I had some ideas about how this was (probably?) more intuitive
and general. But nothing ever uses non-power of 2 subsampling (except
jpeg in rare cases apparently, because the world is a bad place).
Change the fields back to use shifts and rename them to avoid mistakes.
I must have messed this up when I actually added the Y210 format
(because that one is correct). So my comment in the commit adding this
about the FFmpeg pixfmt doxygen being wrong was wrong.
I'd like to use this opportunity to complain once more about the
existence of these terrible pixel formats.
Again worthless, slow, and only for libswscale parity.
With this, we support all formats libswscale supports, except bayer
input, and rgb4/bgr4 output. We even support some formats libswscale
doesn't.
It's possible that the zimg wrapper isn't always as fast as libswscale.
But there is optimization potential: the inner repack loops are
self-contained enough that they could be reasonably be implemented in
assembler (probably), and doing everything slice-wise should reduce the
overhead of the separate pack/unpack stages.
Just lazily tested.
The comment on AV_PIX_FMT_Y210LE seems to be wrong. It claims it's "like
YUYV422", bit it seems more like YVYU422, at last the way libswscale
input treats it. Maybe Intel pays its developers too much?
The repacker inner lop is probably rather inefficient. In theory we
could optimize it by reading the packed pixels as words, doing the
component reshuffling using compile time values etc., but I'd rather
keep the code size small. It's already bad enough that we have to
support 16 bit per component variants, just because this one Intel guy
couldn't keep it in his pants. In general, I can't be bothered to spend
time on optimizing it; I'm only doing this for fun (i.e. masochistic
obligation).
This covers 8 and 16 bit packed RGB formats. It doesn't really help with
any actual use-cases, other than giving the finger to libswscale.
One problem is with different color depths. For example, rgb565 provides
1 bit more resolution to the green channel. zimg can only dither to a
uniform depth. I tried dithering to the highest depth and shifting away
1 bit for the lower channels, but that looked ugly (or I messed up
somewhere), so instead it dithers to the lowest depth, and adjusts the
value range if needed. Testing with bgr4_byte (extreme case with 1/2/1
depths), it looks more "grainy" (ordered dithering artifacts) than
libswscale, but it also looks cleaner and smoother. It doesn't have
libswscale's weird red-shift. So I call it a success.
Big endian formats need to be handled explicitly; the generic big endian
swapper code assumes byte-aligned components.
Unpacking is done with shifts and 3 LUTs. This is symmetric to the
packer. Using a generated palette might be better, but I preferred to
keep the symmetry, and not having to mess with a generated palette and
the pal8 code.
This uses FFmepg pixfmts constants directly. I would have preferred
keeping zimg completely separate. But neither do I want to add an IMGFMT
alias for every of these formats, nor do I want to extend our imgfmt
code such that it can provide a complete description of each packed RGB
format (similar to FFmpeg pixdesc).
It also appears that FFmpeg pixdesc as well as the FFmpeg pixfmt doxygen
have an error regarding RGB8: the R/B bit depths are swapped. libswscale
appears to be handling them differently. Not completely sure, as this is
the only packed format case with R/B havuing different depths (instead
of G, the middle component, where things are symmetric).
One of the extremely annoying dumb things in ffmpeg is that most pixel
formats are available as little endian and big endian variants. (The
sane way would be having native endian formats only.) Usually, most of
the real codecs use native formats only, while non-native formats are
used by fringe raw codecs only. But the PNG encoders and decoders
unfortunately use big endian formats, and since PNG it such a popular
format, this causes problems for us. In particular, the current zimg
wrapper will refuse to work (and mpv will fall back to sws) when writing
non-8 bit PNGs.
So add non-native endian support to zimg. This is done in a fairly
"generic" way (which means lots of potential for bugs). If input is a
"regular" format (and just byte-swapped), the rest happens
automatically, which happens to cover all interesting formats.
Some things could be more efficient; for example, unpacking is done on
the data before it's passed to the unpacker. You could make endian
swapping part of the actual unpacking process, which might be slightly
faster. You could avoid copying twice in some cases (such as when
there's no actual repacker, or if alignment needs to be corrected). But
I don't really care. It's reasonably fast for the normal case.
Not entirely sure whether this is correct. Some (but not many) formats
are covered by the tests, some I tested manually. Some I can't even
test, because libswscale doesn't support them (like nv20*).
Change all OPT_* macros such that they don't define the entire m_option
initializer, and instead expand only to a part of it, which sets certain
fields. This requires changing almost every option declaration, because
they all use these macros. A declaration now always starts with
{"name", ...
followed by designated initializers only (possibly wrapped in macros).
The OPT_* macros now initialize the .offset and .type fields only,
sometimes also .priv and others.
I think this change makes the option macros less tricky. The old code
had to stuff everything into macro arguments (and attempted to allow
setting arbitrary fields by letting the user pass designated
initializers in the vararg parts). Some of this was made messy due to
C99 and C11 not allowing 0-sized varargs with ',' removal. It's also
possible that this change is pointless, other than cosmetic preferences.
Not too happy about some things. For example, the OPT_CHOICE()
indentation I applied looks a bit ugly.
Much of this change was done with regex search&replace, but some places
required manual editing. In particular, code in "obscure" areas (which I
didn't include in compilation) might be broken now.
In wayland_common.c the author of some option declarations confused the
flags parameter with the default value (though the default value was
also properly set below). I fixed this with this change.
Obviously, we don't want to lose fractions, and the zimg active_region
fields in fact have the type double. The integer division was wrong.
Also, always set active_region.width/height. It appears zimg behavior
does not change if they're set to the normal integer values, so the
extra check to not set them in this case was worthless.
As suggested by the zimg author: active_region is not supported on
outputs (and the API returns an error), so instead scale to the "full"
surface, but adjust the source rectangle such that the cropped output
image happens to cover the correct region.
Does this even work? Since Balmer Peak doesn't work, I can't really say,
but it seems to look correct.
This was a confusing name, because 1. there's also a z_planes[] field,
and 2. it was not specific to zimg indexes.
Possibly there used to be an idea involved about supporting alpha to
non-alpha formats by discarding the alpha plane, but zimg does this now
(and zimg will correctly blend the alpha component too).
The special thing about this format is
1. mpv assigns the component ID 4 to alpha, and component IDs 2 and 3
are not present, which causes some messy details.
2. zimg always wants the alpha plane as plane 3, and plane 1 and 2 are
not present, while FFmpeg/mpv put the alpha plane as plane 1.
In theory, 2. could be avoided, since FFmpeg actually doesn't have a any
2 plane formats (alpha is either packed, or plane 3). But having to skip
"empty" planes would break expectations.
zplanes is not equivalent to the mpv plane count (actually it was always
used this way), while zimg does not really have a plane count, but does,
in this case, only use plane 0 and 3, while 2 and 3 are unused and
unset. z_planes[] (not zplanes) is now always valid for all 4 array
entries (because it uses zimg indexes), but a -1 entry means it's an
unused plane.
I wonder if these conventions taken by mpv/zimg are not just causing
extra work. Maybe component IDs should just be indexes by the "natural"
order (e.g. R-G-B-A, Y-U-V-A, Y-A), and alpha should be represented as a
field that specifies the component ID for it, or just strictly assume
that 2/4 component formats always use the last component for alpha.
We reorder the planes between mpv and zimg conventions. It turns out the
code still confused when which convention was used.
So the way it actually works is that the _only_ place where zimg order
is used is the zimg_image_buffer.plane[] array. plane_aligned[] and
zmask[] were accessed incorrectly, although I guess it rarely had a
reason to fail (plane reordering is mostly for RGB, which has planes of
all the same size).
Adjust some comments accordingly too.
libzimg recently added direct alpha support and new API for it. (The API
change is rather minimal, and it turns out we can easily support old and
new zimg versions.)
This does not support _all_ alpha formats. For example, gray + alpha is
not supported yet, because my stupid design in the zimg wrapper would
require a planar gray + alpha format, while ffmpeg provides only a
packed one.
Instead, use a YUV planar format. It doesn't matter, since we use the
format only internally and for "management" purposes. We're only
interested in the physical layout, not what colorspace FFmpeg "forcibly"
associates with it.
Also get rid of using the old and slightly sketchy mp_imgfmt_find()
function. Yep, the IMGFMT_RGB30 now "constructs" the planar format,
instead of using a pixfmt constant. Slightly inconvenient, tricky, and
fragile, but I like it, so bugger off.
This whole thing gets rid of some of the strange plane permutations that
were needed earlier.
According to the definition of the GL format, and the definition in
img_format.h, and the actual output by vo_gpu, the order of components
was probably wrong. It's exceedingly likely that the vo_drm format (for
which this was originally written) has the same layout, so this was
probably a bug from when the zimg wrapper code was refactored.
This is mostly just because of the odd RGB default gamma issue, which
shouldn't have any real impact. This also sets allow_approximate_gamma,
which I hope is fine for normal use cases.
Normally, the Y plane can just be passed directly to zimg, and only the
chroma plane needs to be (de)interleaved. It still needs a copy if the Y
pointer is not aligned, though. (Whether this is actually a problem
depends on the CPU and probably zimg's compiler.)
This requires deciding per plane whether the plane should go through the
repack buffer or not. This logic is active in non-nv12 cases, because
not doing so would require extra code (maybe 2 lines or so).
repack_align is now always called, even if it's planar->planar with all
input aligned, but it won't actually do anything in that case. The
assumption is that zimg won't change behavior if you pass a callback
that does nothing versus passing NULL as callback.
This is for formats like nv12 (including p010, nv24, etc.). Might be
important for hardware decoding. Previously, this would have forced a
libswscale fallback.
The genericism makes this only slightly more complicated. The main
complication is due to the fact that mixing planar and packed stuff is
insane (thanks, Nvidia).
P010 output will actually happily set any of the 6 bit "padding" LSB,
that are normally supposed to be 0 (for unpadded data there is P016).
Scaling happens with 16 bit precision. Not going to bother adding an
extra packer which zeros them out, or with shifting them in
packing/unpacking. Lets just hope nobody notices.
We've set all planes to the same zmask. But for subsampled chroma, the
zmask obviously needs to be smaller. This could lead to out of bounds
memory read and write accesses.
Move the align repacker to a single function, since this is now more
convenient.
This probably covers all packed formats which have byte-aligned
component, no alpha, and no subsampling. Everything else needs more
imgfmt metadata, or something even more complicated. Alpha is primarily
not supported, because zimg requires a second scaler instance for it,
and handling packing/unpacking with it is an unacceptable mess.
Raise swscale and zimg default parameters. This restores screenshot
quality settings (maybe) unset in the commit before. Also expose some
more libswscale and zimg options.
Since these options are also used for VOs like x11 and drm, this will
make x11/drm/etc. much slower. For compensation, provide a profile that
sets the old option values: sw-fast. I'm also enabling zimg here, just
as an experiment.
The core problem is that we have a single set of command line options
which control the settings used for most swscale/zimg uses. This was
done in the previous commit. It cannot differentiate between the VOs,
which need to be realtime and may accept/require lower quality options,
and things like screenshots or vo_image, which can be slower, but should
not sacrifice quality by default.
Should this have two sets of options or something similar to do the
right thing depending on the code which calls libswscale? Maybe. Or
should I just ignore the problem, make it someone else's problem (users
who want to use software conversion VOs), provide a sub-optimal
solution, and call it a day? Definitely, sounds good, pushing to master,
goodbye.
Purpose uncertain. I guess it's slightly better, maybe.
The move of the sws/zimg options from VO opts (vo_opt_list) to the
top-level option list is tricky. VO opts have some helper code in vo.c,
that sends VOCTRL_SET_PANSCAN to the VO on every VO opts change. That's
because updating certain VO options used to be this way (and not just
the panscan option). This isn't needed anymore for sws/zimg options, so
explicitly move them away.