Commit Graph

2505 Commits

Author SHA1 Message Date
Rémi Denis-Courmont b3825bbe45 riscv: test for assembler support
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
2023-12-08 17:21:09 +02:00
Alfred Wingate e5ce473040 swscale/x86/rgb_2_rgb: Add opaque pointer to missed definitions of ff_nv12ToUV
Opaque parameters were previously added to the original definition of
ff_nv12ToUV, leading to gcc noticing a type mismatch with -Wlto-type-mismatch.

f2de911818
https://bugs.gentoo.org/907484

Signed-off-by: Alfred Wingate <parona@protonmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2023-12-02 11:22:46 +01:00
xufuji456 cc86343b96 lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d
Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b"

Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-11-28 15:54:49 +02:00
Rémi Denis-Courmont 6d60cc7baf sws/rgb2rgb: fix unaligned accesses in R-V V YUYV to I422p
In my personal opinion, we should not need to support unaligned YUY2
pixel maps. They should always be aligned to at least 32 bits, and the
current code assumes just 16 bits. However checkasm does test for
unaligned input bitmaps. QEMU accepts it, but real hardware dose not.

In this particular case, we can at the same time improve performance and
handle unaligned inputs, so do just that.

uyvytoyuv422_c:      104379.0
uyvytoyuv422_c:      104060.0
uyvytoyuv422_rvv_i32: 25284.0 (before)
uyvytoyuv422_rvv_i32: 19303.2 (after)
2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont 5b8b5ec9c5 sws/rgb2rgb: rework R-V V YUY2 to 4:2:2 planar
This saves three scratch registers and three instructions per line. The
performance gains are mostly negligible. The main point is to free up
registers for further rework.
2023-11-13 18:34:29 +02:00
Niklas Haas 736284e7b9 swscale/yuv2rgb: fix sws_getCoefficients for colorspace=0
The documentation states that invalid entries default to SWS_CS_DEFAULT.
A value of 0 is not a valid SWS_CS_*, yet the code incorrectly
hard-codes it to BT.709 coefficients instead of SWS_CS_DEFAULT.
2023-11-09 12:53:35 +01:00
Niklas Haas d043e5c54c swscale: don't omit ff_sws_init_range_convert for high-bit
This was a complete hack seemingly designed to work around a different
bug, which was fixed in the previous commit. As such, there is no more
reason not to do this, as it simply breaks changing color range in
sws_setColorspaceDetails for no reason.
2023-11-09 12:53:35 +01:00
Niklas Haas cedf589c09 swscale: fix sws_setColorspaceDetails after sws_init_context
More commonly, this fixes the case of sws_setColorspaceDetails after
sws_getContext, since the latter implies sws_init_context.

The problem here is that sws_init_context sets up the range conversion
and fast path tables based on the values of srcRange/dstRange at init
time. This may result in locking in a "wrong" path (either using
unscaled fast path when range conversion later required, or using
scaled slow path when range conversion becomes no longer required).

There are two way outs:

1. Always initialize range conversion and unscaled converters, even if
   they will be unused, and extend the runtime check.
2. Re-do initialization if the values change after
   sws_setColorspaceDetails.

I opted for approach 1 because it was simpler and easier to reason
about.

Reword the av_log message to make it clear that this special converter
is not necessarily used, depending on whether or not there is range
conversion or YUV matrix conversion going on.
2023-11-09 12:53:35 +01:00
Michael Niedermayer 47e784f881
Bump versions after 6.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-29 16:19:14 +01:00
Michael Niedermayer 9d3a7d30c4
Bump versions prior to 6.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-29 15:34:05 +01:00
Martin Storsjö a76b409dd0 aarch64: Reindent all assembly to 8/24 column indentation
libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally
uses a layered indentation style to visually show how different
unrolled/interleaved phases fit together.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:54 +03:00
Martin Storsjö 93cda5a9c2 aarch64: Lowercase UXTW/SXTW and similar flags
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:23 +03:00
Martin Storsjö 184103b310 aarch64: Consistently use lowercase for vector element specifiers
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:18 +03:00
Rémi Denis-Courmont 19baf4e009 swscale/rgb2rgb: R-V V deinterleaveBytes 2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont ede3215115 swscale/rgb2rgb: fix extra iteration in R-V V interleave
There was an additional iteration doing nothing for each line,
due to checking the selected vector length instead of the available
vector length.
2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont d14130aea3 swscale/rgb2rgb: unroll R-V V interleave_bytes 2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont 6269c4a440 swscale/rgb2rgb: unroll RISC-V V uyvytoyuv422 2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont e50f8e861b swscale/rgb2rgb: avoid S-regs in RISC-V V uyvytoyuv422
We can make do with callee-clobbered registers only now.
As an added bonus, this makes the code XLEN-independent.
2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont be37a2e364 swscale/rgb2rgb: rework RISC-V V uyvytoyuv422
This avoids using relatively slow register strides.
2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont 1a4bd76ea5 swscale/rgb2rgb: remove R-V V shuffle_bytes_3012
This is slower than the Zbb version on real hardware due to register
strides. Proper support for vector byte-swap requires the Zvbb
extension, but it's much too early for me to worry about it.
2023-10-02 22:28:38 +03:00
Rémi Denis-Courmont c4a144c29d swscale/rgb2rgb: add R-V Zbb shuffle_bytes_3210 2023-10-02 22:28:25 +03:00
Paul B Mahol 29b673bdcf swscale: add GBRAP14 format support 2023-09-28 19:37:58 +02:00
Andreas Rheinhardt f8503b4c33 avutil/internal: Don't auto-include emms.h
Instead include emms.h wherever it is needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-04 11:04:45 +02:00
L. E. Segovia ddc1cd5cdd configure: Set WIN32_LEAN_AND_MEAN at configure time
Including winsock2.h or windows.h without WIN32_LEAN_AND_MEAN cause
bzlib.h to parse as nonsense, due to an instance of #define char small
in rpcndr.h.

See:

https://stackoverflow.com/a/27794577

Signed-off-by: L. E. Segovia <amy@amyspark.me>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-08-14 22:57:28 +03:00
Rémi Denis-Courmont c2b38619c0 swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{1230,3012}
This avoids strided loads.

Before:
shuffle_bytes_1230_rvv_i32: 308.7
shuffle_bytes_3012_rvv_i32: 308.7

After:
shuffle_bytes_1230_rvv_i32: 46.7
shuffle_bytes_3012_rvv_i32: 46.7
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont 15982554e6 swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{0321,2103}
This avoids strided loads.

Before:
shuffle_bytes_0321_rvv_i32: 307.7
shuffle_bytes_2103_rvv_i32: 308.7

After:
shuffle_bytes_0321_rvv_i32: 59.7
shuffle_bytes_2103_rvv_i32: 61.5
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont d3948e4db5 swscale: inline ff_shuffle_bytes_3210_rvv
No functional changes.
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont b6585eb04c lavu: add/use flag for RISC-V Zba extension
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
2023-07-19 19:29:35 +03:00
Khem Raj a7b3c0203f libswscale/riscv: fix syntax of vsetvli
Add missing operand which clang complains about but GCC assumes it to be
'm1' if not specified.

Works around build failure with Clang:
| src/libswscale/riscv/rgb2rgb_rvv.S:88:25: error: operand must be e[8|16|32|64|128|256|512|1024],m[1|2|4|8|f2|f4|f8],[ta|tu],[ma|mu]
|         vsetvli t4, t3, e8, ta, ma
|                         ^

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-07-13 22:01:24 +03:00
Lynne b3fb73af6b
swscale: bump minor for implementing support for the new pixfmts 2023-05-29 00:42:02 +02:00
Lynne 934525eae0
lsws: add in/out support for the new 12-bit 2-plane 422 and 444 pixfmts 2023-05-29 00:41:35 +02:00
Jin Bo cb4ae8baee
swscale/la: Add following builtin optimized functions
yuv420_rgb24_lsx
yuv420_bgr24_lsx
yuv420_rgba32_lsx
yuv420_argb32_lsx
yuv420_bgra32_lsx
yuv420_abgr32_lsx
./configure --disable-lasx
ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo
-pix_fmt rgb24 -y /dev/null -an
before: 184fps
after:  207fps

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-05-25 21:05:15 +02:00
Lu Wang 4501b1dfd7
swscale/la: Optimize the functions of the swscale series with lsx.
./configure --disable-lasx
ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480
-pix_fmt bgra -y /dev/null -an
before: 91fps
after:  160fps

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-05-25 21:05:08 +02:00
Lynne a62a3930c2
swscale/ppc: remove hScale8To19_vsx
Fails checkasm on a Power9 system.
2023-05-20 20:07:18 +02:00
Michael Niedermayer 47ac3e6065
version.h: Bump minor post 6.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-02-19 18:37:36 +01:00
Michael Niedermayer 62efa096af
version.h: Bump minor for 6.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-02-19 18:32:07 +01:00
James Almer 5bad485603 Bump major versions of all libraries
Signed-off-by: James Almer <jamrial@gmail.com>
2023-02-09 15:35:14 +01:00
Tomas Härdin a678b0c252 sws/utils.c: Do not uselessly call initFilter() when unscaling 2023-02-08 15:53:55 +01:00
Lynne bbe95f7353
x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
Andreas Rheinhardt 1ff9c07fa6 swscale/utils: Fix indentation
Forgotten after c1eb3e7fec.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 21:02:57 +01:00
Andreas Rheinhardt b2d1a25816 swscale/utils: Derive range from YUVJ-pix-fmt only once
Currently, it is done once per slice-thread, leading to
one warning per slice-thread in case a YUVJ pixel format
has been originally used.

This also fixes the anomaly that said parameter are only
updated for the user-facing context (whose values are retrievable
via av_opt_get()) if slice-threading is not in use.

Fixes ticket #9860.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:59:03 +01:00
Andreas Rheinhardt ff39dcb129 swscale/utils: Move functions to avoid forward declarations
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:58:21 +01:00
Andreas Rheinhardt baccc1c541 swscale/utils: Avoid calling ff_thread_once() unnecessarily
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:58:21 +01:00
Andreas Rheinhardt 8ee0711228 swscale/utils: Don't allocate AVFrames for slice contexts
Only the parent context's AVFrames are ever used.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:58:21 +01:00
Andreas Rheinhardt 64ed1d40df swscale/utils: Factor initializing single slice context out
Initializing slice threads currently uses the function
(sws_init_context()) that is also used for initializing
user-facing contexts with the only difference being that
nb_threads is set to one before initializing the slice contexts.

Yet sws_init_context() also initializes lots of stuff
that is not slice-dependent, i.e. (src|dst)Range. This
currently only works because the code sets these fields
to the same values for all slice contexts. This is not
nice; even worse, it entails that log messages are printed
once per slice context (and therefore fill the screen).

This commit lays the groundwork to fix this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:58:21 +01:00
Michael Niedermayer ba209e3d51
swscale/input: Use more unsigned intermediates
Same principle as previous commit, with sufficiently huge rgb2yuv table
values this produces wrong results and undefined behavior.
The unsigned produces the same incorrect results. That is probably
ok as these cases with huge values seem not to occur in any real
use case.

Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-20 21:55:06 +01:00
Jeremy Dorfman ce566281f9
swscale/input: Use unsigned intermediates in rgb64ToUV_c_template
Large rgb2yuv tables and high pixel values cause the intermediate
int32_t of ru*r + gu*g + bu*b to exceed INT_MAX, which is undefined
behavior. This causes libswscale built with LLVM -fsanitize=undefined to
assert. Using unsigned integers instead has defined behavior and
produces identical results, and makes rgb64ToUV_c_template match
rgb64ToY_c_template.

Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-20 21:23:57 +01:00
Andreas Rheinhardt b616b04704 swscale/utils: Remove obsolete 3DNow reference
swscale does not use 3DNow any more since commit
608319a311.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-09 17:39:00 +01:00
Michael Niedermayer b74f89caae
swscale/output: Bias 16bps output calculations to improve non overflowing range for GBRP16/GBRPF32
Fixes: integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-04 22:44:16 +01:00
Michael Niedermayer 0f0afc7fb5
swscale/output: Bias 16bps output calculations to improve non overflowing range
Fixes: integer overflow
Fixes: ./ffmpeg   -f rawvideo -video_size 66x64 -pixel_format yuva420p10le   -i ~/videos/overflow_input_w66h64.yuva420p10le   -filter_complex "scale=flags=bicubic+full_chroma_int+full_chroma_inp+bitexact+accurate_rnd:in_color_matrix=bt2020:out_color_matrix=bt2020:in_range=full:out_range=full,format=rgba64[out]"   -pixel_format rgba64 -map '[out]'   -y overflow_w66h64.png

Found-by: Drew Dunne <asdunne@google.com>
Tested-by: Drew Dunne <asdunne@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-04 22:44:16 +01:00