Fixes: signed integer overflow: 2147483424 * 2 cannot be represented in type 'int'
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WSVQA_fuzzer-4576211411795968
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 3.82046e+18 is outside the range of representable values of type 'unsigned int'
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WEBM_DASH_MANIFEST_fuzzer-6381436594421760
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 5053074104798691550 + 5053074104259715104 cannot be represented in type 'long'
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WAV_fuzzer-6515315309936640
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_W64_fuzzer-4704044498944000
Fixes: signed integer overflow: 520464 * 8224 cannot be represented in type 'int'
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 9223372036854775807 - -8000000 cannot be represented in type 'long'
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_SBG_fuzzer-5133181743136768
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_RPL_fuzzer-4677434693517312
Fixes: signed integer overflow: 5555555555555555556 * 8 cannot be represented in type 'long long'
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4802790784303104
Fixes: signed integer overflow: 1768972133 + 968491058 cannot be represented in type 'int'
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 1768972133 + 968491058 cannot be represented in type 'int'
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4802790784303104
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_JACOSUB_fuzzer-5401294942371840
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_CONCAT_fuzzer-6434245599690752
Fixes: signed integer overflow: 9223372026773000000 + 22337000000 cannot be represented in type 'long'
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-659847401740697
Fixes: signed integer overflow: 65312 * 34078 cannot be represented in type 'int'
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -2147483648 + -25122315 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-6199806972198912
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
mb_change_bits is given space based on height >> 2, while more data is read
Fixes: out of array access
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION1_fuzzer-5201925062590464.fuzz
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 2147483647 + 4 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RTV1_fuzzer-6324303861514240
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -1575944192 + -602931200 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_QOA_fuzzer-6470469339185152
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488
Fixes: out of array write
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488
Fixes: out of array write
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 67070/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-5685384082161664
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Since log statements printing int64 were made portable in
4464b7eeb1, let us include
inttypes.h explicitly (as it is unclear where PRId64 and
such are coming from now).
Reported-by: Leo Izen <leo.izen@gmail.com>
Signed-off-by: Marth64 <marth64@proxyid.net>
The VP8 compressed header may not be byte-aligned due to boolean
coding. Round up byte count for accurate data positioning.
Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This commit adds value range checks to cbs_vp8_read_unsigned_le,
migrates fixed() to use it, and enforces little-endian consistency for
all read methods.
Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.
By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.
AWS Graviton 3:
put_hevc_qpel_bi_hv4_8_c: 385.7
put_hevc_qpel_bi_hv4_8_neon: 131.0
put_hevc_qpel_bi_hv4_8_i8mm: 92.2
put_hevc_qpel_bi_hv6_8_c: 701.0
put_hevc_qpel_bi_hv6_8_neon: 239.5
put_hevc_qpel_bi_hv6_8_i8mm: 191.0
put_hevc_qpel_bi_hv8_8_c: 1162.0
put_hevc_qpel_bi_hv8_8_neon: 228.0
put_hevc_qpel_bi_hv8_8_i8mm: 225.2
put_hevc_qpel_bi_hv12_8_c: 2305.0
put_hevc_qpel_bi_hv12_8_neon: 558.0
put_hevc_qpel_bi_hv12_8_i8mm: 483.2
put_hevc_qpel_bi_hv16_8_c: 3965.2
put_hevc_qpel_bi_hv16_8_neon: 732.7
put_hevc_qpel_bi_hv16_8_i8mm: 656.5
put_hevc_qpel_bi_hv24_8_c: 8709.7
put_hevc_qpel_bi_hv24_8_neon: 1555.2
put_hevc_qpel_bi_hv24_8_i8mm: 1448.7
put_hevc_qpel_bi_hv32_8_c: 14818.0
put_hevc_qpel_bi_hv32_8_neon: 2763.7
put_hevc_qpel_bi_hv32_8_i8mm: 2468.0
put_hevc_qpel_bi_hv48_8_c: 32855.5
put_hevc_qpel_bi_hv48_8_neon: 6107.2
put_hevc_qpel_bi_hv48_8_i8mm: 5452.7
put_hevc_qpel_bi_hv64_8_c: 57591.5
put_hevc_qpel_bi_hv64_8_neon: 10660.2
put_hevc_qpel_bi_hv64_8_i8mm: 9580.0
Signed-off-by: Martin Storsjö <martin@martin.st>
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.
AWS Graviton 3:
put_hevc_qpel_uni_w_hv4_8_c: 422.2
put_hevc_qpel_uni_w_hv4_8_neon: 140.7
put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7
put_hevc_qpel_uni_w_hv8_8_c: 1208.0
put_hevc_qpel_uni_w_hv8_8_neon: 268.2
put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5
put_hevc_qpel_uni_w_hv16_8_c: 4297.2
put_hevc_qpel_uni_w_hv16_8_neon: 802.2
put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2
put_hevc_qpel_uni_w_hv32_8_c: 15518.5
put_hevc_qpel_uni_w_hv32_8_neon: 3085.2
put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2
put_hevc_qpel_uni_w_hv64_8_c: 57254.5
put_hevc_qpel_uni_w_hv64_8_neon: 11787.5
put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0
Signed-off-by: Martin Storsjö <martin@martin.st>
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.
By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.
AWS Graviton 3:
put_hevc_qpel_uni_hv4_8_c: 384.2
put_hevc_qpel_uni_hv4_8_neon: 127.5
put_hevc_qpel_uni_hv4_8_i8mm: 85.5
put_hevc_qpel_uni_hv6_8_c: 705.5
put_hevc_qpel_uni_hv6_8_neon: 224.5
put_hevc_qpel_uni_hv6_8_i8mm: 176.2
put_hevc_qpel_uni_hv8_8_c: 1136.5
put_hevc_qpel_uni_hv8_8_neon: 216.5
put_hevc_qpel_uni_hv8_8_i8mm: 214.0
put_hevc_qpel_uni_hv12_8_c: 2259.5
put_hevc_qpel_uni_hv12_8_neon: 498.5
put_hevc_qpel_uni_hv12_8_i8mm: 410.7
put_hevc_qpel_uni_hv16_8_c: 3824.7
put_hevc_qpel_uni_hv16_8_neon: 670.0
put_hevc_qpel_uni_hv16_8_i8mm: 603.7
put_hevc_qpel_uni_hv24_8_c: 8113.5
put_hevc_qpel_uni_hv24_8_neon: 1474.7
put_hevc_qpel_uni_hv24_8_i8mm: 1351.5
put_hevc_qpel_uni_hv32_8_c: 14744.5
put_hevc_qpel_uni_hv32_8_neon: 2599.7
put_hevc_qpel_uni_hv32_8_i8mm: 2266.0
put_hevc_qpel_uni_hv48_8_c: 32800.0
put_hevc_qpel_uni_hv48_8_neon: 5650.0
put_hevc_qpel_uni_hv48_8_i8mm: 5011.7
put_hevc_qpel_uni_hv64_8_c: 57856.2
put_hevc_qpel_uni_hv64_8_neon: 9863.5
put_hevc_qpel_uni_hv64_8_i8mm: 8767.7
Signed-off-by: Martin Storsjö <martin@martin.st>
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.
By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which we
store on the stack.
AWS Graviton 3:
put_hevc_qpel_hv4_8_c: 386.0
put_hevc_qpel_hv4_8_neon: 125.7
put_hevc_qpel_hv4_8_i8mm: 83.2
put_hevc_qpel_hv6_8_c: 749.0
put_hevc_qpel_hv6_8_neon: 207.0
put_hevc_qpel_hv6_8_i8mm: 166.0
put_hevc_qpel_hv8_8_c: 1305.2
put_hevc_qpel_hv8_8_neon: 216.5
put_hevc_qpel_hv8_8_i8mm: 213.0
put_hevc_qpel_hv12_8_c: 2570.5
put_hevc_qpel_hv12_8_neon: 480.0
put_hevc_qpel_hv12_8_i8mm: 398.2
put_hevc_qpel_hv16_8_c: 4158.7
put_hevc_qpel_hv16_8_neon: 659.7
put_hevc_qpel_hv16_8_i8mm: 593.5
put_hevc_qpel_hv24_8_c: 8626.7
put_hevc_qpel_hv24_8_neon: 1653.5
put_hevc_qpel_hv24_8_i8mm: 1398.7
put_hevc_qpel_hv32_8_c: 14646.0
put_hevc_qpel_hv32_8_neon: 2566.2
put_hevc_qpel_hv32_8_i8mm: 2287.5
put_hevc_qpel_hv48_8_c: 31072.5
put_hevc_qpel_hv48_8_neon: 6228.5
put_hevc_qpel_hv48_8_i8mm: 5291.0
put_hevc_qpel_hv64_8_c: 53847.2
put_hevc_qpel_hv64_8_neon: 9856.7
put_hevc_qpel_hv64_8_i8mm: 8831.0
Signed-off-by: Martin Storsjö <martin@martin.st>
The hv32 and hv64 functions were identical - both loop and
process 16 pixels at a time.
The hv16 function was near identical, except for the outer loop
(and using sp instead of a separate register).
Given the size of these functions, the extra cost of the outer
loop is negligible, so use the same function for hv16 as well.
This removes over 200 lines of duplicated assembly, and over 4 KB
of binary size.
Signed-off-by: Martin Storsjö <martin@martin.st>
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.
Signed-off-by: Martin Storsjö <martin@martin.st>
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.
Signed-off-by: Martin Storsjö <martin@martin.st>
For widths of 32 pixels and more, loop first horizontally,
then vertically.
Previously, this function would process a 16 pixel wide slice
of the block, looping vertically. After processing the whole
height, it would backtrack and process the next 16 pixel wide
slice.
When doing 8tap filtering horizontally, the function must load
7 more pixels (in practice, 8) following the actual inputs, and
this was done for each slice.
By iterating first horizontally throughout each line, then
vertically, we access data in a more cache friendly order, and
we don't need to reload data unnecessarily.
Keep the original order in put_hevc_\type\()_h12_8_neon; the
only suboptimal case there is for width=24. But specializing
an optimal variant for that would require more code, which
might not be worth it.
For the h16 case, this implementation would give a slowdown,
as it now loads the first 8 pixels separately from the rest, but
for larger widths, it is a gain. Therefore, keep the h16 case
as it was (but remove the outer loop), and create a new specialized
version for horizontal looping with 16 pixels at a time.
Before: Cortex A53 A72 A73 Graviton 3
put_hevc_qpel_h16_8_neon: 710.5 667.7 692.5 211.0
put_hevc_qpel_h32_8_neon: 2791.5 2643.5 2732.0 883.5
put_hevc_qpel_h64_8_neon: 10954.0 10657.0 10874.2 3241.5
After:
put_hevc_qpel_h16_8_neon: 697.5 663.5 705.7 212.5
put_hevc_qpel_h32_8_neon: 2767.2 2684.5 2791.2 920.5
put_hevc_qpel_h64_8_neon: 10559.2 10471.5 10932.2 3051.7
Signed-off-by: Martin Storsjö <martin@martin.st>
This gets rid of a couple instructions, but the actual performance
is almost identical on Cortex A72/A73. On Cortex A53, it is a
handful of cycles faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon
store temporary buffers on the stack. When consuming it,
many of these functions use the stack pointer as incremental pointer
for reading the data (instead of storing it in another register),
which is rather unusual.
Technically, this is fine as long as the pointer remains properly
aligned.
However in the case of ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm,
after incrementing sp when reading data (within each 16 pixel
wide stripe) it would then reset the stack pointer back to a lower
value, for reading the next 16 pixel wide stripe, expecting the
data to remain untouched.
This can't be assumed; data on the stack below the stack pointer
can be clobbered (e.g. by a signal handler). Some OS ABIs
allow for a little margin that won't be touched, aka a red zone,
but not all do. The ones that do, guarantee 16 or 128 bytes, not
9 KB.
Convert this function to use a separate pointer register to
iterate through the data, retaining the stack pointer to point
at the bottom of the data we require to remain untouched.
Signed-off-by: Martin Storsjö <martin@martin.st>