Commit Graph

2283 Commits

Author SHA1 Message Date
Michael Niedermayer 9ee16a0ba2
swscale/input: Use more unsigned intermediates
Same principle as previous commit, with sufficiently huge rgb2yuv table
values this produces wrong results and undefined behavior.
The unsigned produces the same incorrect results. That is probably
ok as these cases with huge values seem not to occur in any real
use case.

Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit ba209e3d51)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-04-15 22:37:59 +02:00
Michael Niedermayer d1c90886cc
swscale/output: Bias 16bps output calculations to improve non overflowing range
Fixes: integer overflow
Fixes: ./ffmpeg   -f rawvideo -video_size 66x64 -pixel_format yuva420p10le   -i ~/videos/overflow_input_w66h64.yuva420p10le   -filter_complex "scale=flags=bicubic+full_chroma_int+full_chroma_inp+bitexact+accurate_rnd:in_color_matrix=bt2020:out_color_matrix=bt2020:in_range=full:out_range=full,format=rgba64[out]"   -pixel_format rgba64 -map '[out]'   -y overflow_w66h64.png

Found-by: Drew Dunne <asdunne@google.com>
Tested-by: Drew Dunne <asdunne@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 0f0afc7fb5)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-04-15 22:37:58 +02:00
Martin Storsjö 3993a90732 swscale: aarch64: Fix yuv2rgb with negative strides
Treat the 32 bit stride registers as signed.

Alternatively, we could make the stride arguments ptrdiff_t instead
of int, and changing all of the assembly to operate on these
registers with their full 64 bit width, but that's a much larger
and more intrusive change (and risks missing some operation, which
would clamp the intermediates to 32 bit still).

Fixes: https://trac.ffmpeg.org/ticket/9985

Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit cb803a0072)
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-11-04 14:32:19 +02:00
Michael Niedermayer 676dad0aeb swscale/alphablend: Fix slice handling
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 06d6726588)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-06 14:41:42 +02:00
Michael Niedermayer 149992e127 swscale/slice: Fix wrong return on error
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 7874d40f10)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-06 14:41:41 +02:00
Michael Niedermayer 4866d2a9ee swscale/slice: Check slice for allocation failure
Fixes: null pointer dereference
Fixes: alloc_slice.mp4

Found-by: Rafael Dutra <rafael.dutra@cispa.de>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 997f9cfc12)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-06 14:41:41 +02:00
Marton Balint c19641b2e2 swscale/x86/yuv2rgb: fix crashes when loading alpha from unaligned buffers
Regression since fc6a5883d6 on SSSE3 enabled
CPUs.

Fixes ticket #8955.

Signed-off-by: Marton Balint <cus@passwd.hu>
(cherry picked from commit 993429cfb4)
2020-11-02 00:51:05 +01:00
James Almer 799fc4d732 x86/yuv2rgb: fix crashes when storing data on unaligned buffers
Regression since fc6a5883d6 on SSSE3 enabled
CPUs.

Fixes ticket #8747

Signed-off-by: James Almer <jamrial@gmail.com>
(cherry picked from commit ba3e771a42)
2020-07-17 11:53:47 -03:00
Michael Niedermayer 0a8a96c251 Bump minor versions to separate 4.3 from master
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-06-08 22:49:04 +02:00
Martin Storsjö e0604d508e swscale: aarch64: Add a NEON implementation of interleaveBytes
This allows speeding up format conversions from yuv420 to nv12.

                             Cortex A53      A72      A73
interleave_bytes_c:             86077.5  51433.0  66972.0
interleave_bytes_neon:          19701.7  23019.2  15859.2
interleave_bytes_aligned_c:     86603.0  52017.2  67484.2
interleave_bytes_aligned_neon:   9061.0   7623.0   6309.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:38:17 +03:00
Josh de Kock 70b14cc8d6 swscale: arm: fix NEON hscale init
The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used. At the moment filterAlign is
set to 8 but in the future when extra NEON assembly for specific sizes is
added they will need to have checks here too.

The immediate usecase for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).

This applies the same fix from 718c8f9aa5
on the 32 bit arm version of the function, fixing fate-checkasm-sw_scale
there.

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:33:46 +03:00
Josh de Kock 718c8f9aa5 swscale: fix NEON hscale init
The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used. At the moment filterAlign is
set to 8 but in the future when extra NEON assembly for specific sizes is
added they will need to have checks here too.

The immediate usecase for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).

Signed-off-by: Josh de Kock <josh@itanimul.li>
2020-05-15 10:29:30 +01:00
Mark Reid fabeef22d9 libswscale: fix for floating point formats, require full chroma
upon more floating point testing, looks like I missed adding this bit.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-12 01:00:28 +02:00
Mark Reid b4967fc71c libswscale: add output support for AV_PIX_FMT_GBRAPF32
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-05 20:06:58 +02:00
Mark Reid ba5d0515a6 libswscale: add input support AV_PIX_FMT_GBRAPF32
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-05 20:06:58 +02:00
Andreas Rheinhardt 2fae000994 swscale/vscale: Increase type strictness
libswscale/vscale.c makes extensive use of function pointers and in
doing so it converts these function pointers to and from a pointer to
void. Yet this is actually against the C standard:
C90 only guarantees that one can convert a pointer to any incomplete
type or object type to void* and back with the result comparing equal
to the original which makes pointers to void generic pointers to
incomplete or object type. Yet C90 lacks a generic function pointer
type.
C99 additionally guarantees that a pointer to a function of one type may
be converted to a pointer to a function of another type with the result
and the original comparing equal when converting back.
This makes any function pointer type a generic function pointer type.
Yet even this does not make pointers to void generic function pointers.

Both GCC and Clang emit warnings for this when in pedantic mode.

This commit fixes this by using a union that can hold one member of any
of the required function pointer types to store the function pointer.
This works even for C90.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2020-04-27 23:34:31 +02:00
Martin Storsjö 9025d5c5ce swscale: aarch64: Don't clobber callee-saved registers v8-v15
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-04-21 23:41:13 +03:00
Martin Storsjö 872790b1f9 swscale: aarch64: Avoid using the x18 register
The x18 is a reserved platform register on Darwin and Windows.

x8/w8 seems to be unused in this function though (and same about
x10 and x14), so there's really no reason to use x18 here - just change
the uses of x18/w18 into x8/w8 instead without any further rewrites.

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-04-20 00:09:34 +03:00
Michael Niedermayer be3c29e379 swscale/yuv2rgb: Fix vertical dither offset with slices
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-12 16:36:47 +02:00
Michael Niedermayer e057e83a4f swscale/output: Fix integer overflow in yuv2rgb_write_full() with out of range input
Fixes: signed integer overflow: 1169365504 + 981452800 cannot be represented in type 'int'
Fixes: ticket8293

Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-04 22:09:46 +02:00
Michael Niedermayer 49ba1879ad swscale/output: Fix integer overflow in alpha computation in yuv2gbrp16_full_X_c()
Fixes: signed integer overflow: 524280 * 4432 cannot be represented in type 'int'
Fixes: ticket8322

Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-04 22:09:46 +02:00
Ruiling Song 4700f7d6fc swscale/swscale: remove useless code
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-03 00:58:07 +02:00
Carl Eugen Hoyos 5f8c383452 lsws/input: Do not change transparency range.
Fixes ticket #8509.
2020-03-11 22:55:49 +01:00
Ting Fu 828f7db5d9 libswscale/x86/yuv2rgb: Fix Segmentation Fault when load unaligned data
Fixes ticket #8532

Signed-off-by: Ting Fu <ting.fu@intel.com>
2020-02-26 11:10:46 +01:00
Linjie Fu d2aa1fbfd4 swscale: Add swscale input support for Y210LE
Add swscale input support for Y210LE, output support and fate
test could be added later if there is requirement for software
CSC to this packed format.

Signed-off-by: Linjie Fu <linjie.fu@intel.com>
2020-02-24 00:09:51 +00:00
Ting Fu fc6a5883d6 libswscale/x86/yuv2rgb: add ssse3 version
Tested using this command:
/ffmpeg -pix_fmt yuv420p -s 1920*1080 -i ArashRawYuv420.yuv \
-vcodec rawvideo -s 1920*1080 -pix_fmt rgb24 -f null /dev/null

The fps increase from 389 to 640 on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz

Signed-off-by: Ting Fu <ting.fu@intel.com>
2020-02-10 15:08:33 +01:00
Gautam Ramakrishnan da399e2135 libswscale/utils.c: Fix bug #8255
Bug #8255 points out a double free error in libwscale/utils.c file.
The double free is because the pointer to cascaded_context of an
sw_context is not set to NULL after freeing it. When the sw_context
is later freed, sws_freeContext is called on the cascaded_context,
causing a double free.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-02-09 23:33:18 +01:00
Ting Fu e934194b6a libswscale/x86/yuv2rgb: Change inline assembly into nasm code
The original inline assembly and nasm code have the same fps when called by command.
NASM code almost has no impact on the perfromance.

Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-02-05 17:41:59 +01:00
Michael Niedermayer d48e510124 swscale/input: Fix several invalid shifts related to rgb2yuv constants
Fixes: Invalid shifts
Fixes: #8140
Fixes: #8146

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-22 21:50:49 +01:00
Michael Niedermayer 7b7f97532b swscale/output: Fix several invalid shifts in yuv2rgb_full_1_c_template()
Fixes: Invalid shifts
Fixes: #8320

Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-22 18:41:46 +01:00
Michael Niedermayer a6ca22c118 swscale/swscale: Fix several invalid shifts related to vChrDrop
Fixes: Invalid shifts
Fixes: #8166
Fixes: filter-crop_scale_vflip FATE-test

Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-22 18:41:46 +01:00
Carl Eugen Hoyos 96fab29e96 Silence "string-plus-int" warning shown by clang.
libswscale/utils.c:89:42: warning: adding 'unsigned long' to a string does not append to the string [-Wstring-plus-int]
2020-01-06 22:38:56 +01:00
Sebastian Pop c3a17ffff6 swscale/aarch64: use multiply accumulate and shift-right narrow
This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and
horizontal adds by using fused multiply adds. The patch also uses ld1r to load
one element and replicate it across all lanes of the vector. The patch also
improves the clipping code by removing the shift right instructions and
performing the shift with the shift-right narrow instructions.

I see 8% difference on an m6g instance with neoverse-n1 CPUs:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
after:  t:0.012985 avg:0.013013 max:0.013996 min:0.012818

Tested with `make check` on aarch64-linux.

Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-04 20:59:31 +01:00
Zhao Zhili 1e3e547a5b swscale/utils: remove access of AV_PIX_FMT_NB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-31 12:37:47 +01:00
Sebastian Pop bd83191271 swscale/aarch64: use multiply accumulate and increase vector factor to 4
This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4.
The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus:

$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.032168 avg:0.032215 max:0.033081 min:0.032146

The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181
after:  t:0.014015 avg:0.014096 max:0.015018 min:0.013971

Tested with `make check` on aarch64-linux.

Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-17 23:41:47 +01:00
Limin Wang 8558c231fb swscale/swscale_unscaled: add AV_PIX_FMT_GBRAP10 for LE and BE conversion wrapper
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-10 16:09:14 +01:00
Ting Fu 039a0ebe6f libswscale/swscale_unscaled.c: remove redundant code
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-06 11:25:29 +01:00
Limin Wang a5e24be52a swscale/swscale_unscaled: fix gbrap10be md5 different on big endian system
You can reproduce it by below command:
./ffmpeg -f lavfi -i "testsrc=duration=1:rate=30" -vf format=gbrap10 -vcodec rawvideo \
    -pix_fmt gbrap10le -flags +bitexact -sws_flags +accurate_rnd+bitexact -fflags +bitexact  \
    -frames:v 1 -f nut md5:

little-endian:
f91e2edd8098276579c1929e5e160416
big-endian:
ba4d011dbbdc78ccbf6cc7d698630929

Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-11-01 14:43:16 +01:00
Michael Niedermayer d260621089 swscale/output: Avoid 64bit in Alpha in yuv2ya16_X_c_template()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-16 19:17:57 +02:00
Michael Niedermayer 3e6682931b swscale/output: Correct Alpha in yuv2ya16_X_c_template()
Untested, no testcase

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-16 19:17:57 +02:00
Michael Niedermayer 4f4ca675e5 swscale/output: Implement Luma computation from yuv2ya16_X_c_template() without 64bit
This also reverts 21838cad2f
The revert is in this commit to avoid 2 fate updates

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-16 19:17:57 +02:00
Daniel Kolesa e6625ca41f swscale: Fix AltiVec/VSX build with recent GCC
The argument to vec_splat_u16 must be a literal. By making the
function always inline and marking the arguments const, gcc can
turn those into literals, and avoid build errors like:

swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal

Fixes #7861.

Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
2019-10-04 08:58:17 +03:00
Daniel Kolesa 1bdb47b734 swscale: Replace illegal vector keyword usage in altivec code
While this technically compiles in current ffmpeg, this is only
because ffmpeg is compiled in strict ISO C mode, which disables
the builtin 'vector' keyword for AltiVec/VSX. Instead this gets
replaced with a macro inside altivec.h, which defines vector to
be actually __vector, which accepts random types.

Normally, the vector keyword should be used only with plain
scalar non-typedef types, such as unsigned int. But we have the
vec_(s|u)(8|16|32) macros, which can be used in a portable manner,
in util_altivec.h in libavutil.

This is also consistent with other AltiVec/VSX code elsewhere in
the tree.

Fixes #7861.

Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
2019-10-04 08:58:17 +03:00
Andreas Rheinhardt e2646e23be swscale/utils: Fix invalid left shifts of negative numbers
Affected the FATE-tests vsynth_lena-dv-411, vsynth1-dv-411,
vsynth2-dv-411 and hevc-paramchange-yuv420p.yuv420p10.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-28 17:24:32 +02:00
Andreas Rheinhardt 736c7c20e7 swscale/x86/swscale: Fix undefined left shifts of negative numbers
This affected many FATE-tests: The number of failing tests went down
from 663 to 344. (Both numbers exclude tests that failed because of
unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-28 17:24:32 +02:00
Limin Wang cde1d70a39 swscale/swscale: cosmetics
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-27 10:58:30 +02:00
Paul B Mahol 21838cad2f swscale/output: fix signed integer overflow for ya16
Fixes #7666.
2019-09-26 15:56:47 +02:00
Limin Wang 29bde4b3b6 swscale/swscale: delete unwanted assignments
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-09 18:16:06 +02:00
Linjie Fu ef1342650f swscale/output: fix some code indentations
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-06 22:06:12 +02:00
Chip Kerchner 3a557c5d88 lsws/ppc/yuv2rgb_altivec: Replace vec_lvsl/vec_perm with vec_xl
gcc 6.x and 7.x generate wrong code for little endian machines
for the vec_lvsl/vec_perm instruction combos in some cases.
The bug was fixed in version 8.x
If these instructions are replaced with vec_xl, the problem goes
away for all versions of the compilers.

Fixes ticket #7124.
2019-08-13 02:21:24 +02:00