ffmpeg

Commit Graph

Author	SHA1	Message	Date
Michael Niedermayer	6b213175c9	Bump after 7.0 branch point Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:54 +01:00
Michael Niedermayer	e7d938073e	doc/APIchanges: Add 7.0 cut point Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:54 +01:00
Michael Niedermayer	8f6bdfd4ec	Changelog: Add 7.0 point Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:54 +01:00
Michael Niedermayer	872980ace6	Bump prior release/7.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:53 +01:00
Matthieu Bouron	87ace5c4da	Changelog: add Android content URIs protocol entry Signed-off-by: Matthieu Bouron <matthieu.bouron@gmail.com>	2024-03-27 00:08:11 +01:00
Michael Niedermayer	4126a99d2b	doc/APIchange: Fill in some missing thingss Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:42:19 +01:00
Michael Niedermayer	86f73277bf	avformat/westwood_vqa: Fix 2g packets Fixes: signed integer overflow: 2147483424 * 2 cannot be represented in type 'int' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WSVQA_fuzzer-4576211411795968 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:43 +01:00
Michael Niedermayer	e849eb2343	avformat/matroskadec: Check timescale Fixes: 3.82046e+18 is outside the range of representable values of type 'unsigned int' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WEBM_DASH_MANIFEST_fuzzer-6381436594421760 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:42 +01:00
Michael Niedermayer	61dca9e150	avformat/wavdec: satuarte next_tag_ofs, data_end Fixes: signed integer overflow: 5053074104798691550 + 5053074104259715104 cannot be represented in type 'long' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WAV_fuzzer-6515315309936640 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:41 +01:00
Michael Niedermayer	75317ec442	avformat/wavdec: sanity check channels and bps before using them for block_align Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_W64_fuzzer-4704044498944000 Fixes: signed integer overflow: 520464 * 8224 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:40 +01:00
Michael Niedermayer	0bed22d597	avformat/sbgdec: Check for negative duration Fixes: signed integer overflow: 9223372036854775807 - -8000000 cannot be represented in type 'long' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_SBG_fuzzer-5133181743136768 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:40 +01:00
Michael Niedermayer	878625812f	avformat/rpl: Use 64bit for total_audio_size and check it Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_RPL_fuzzer-4677434693517312 Fixes: signed integer overflow: 5555555555555555556 * 8 cannot be represented in type 'long long' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:39 +01:00
Michael Niedermayer	3d8d778a68	avformat/timecode: use 64bit for intermediate for rounding in fps_from_frame_rate() Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4802790784303104 Fixes: signed integer overflow: 1768972133 + 968491058 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:38 +01:00
Michael Niedermayer	f01a89c5a3	avformat/mov: use 64bit for intermediate for rounding Fixes: signed integer overflow: 1768972133 + 968491058 cannot be represented in type 'int' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4802790784303104 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:37 +01:00
Michael Niedermayer	746203af31	avformat/jacosubdec: Use 64bit for abs Fixes: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_JACOSUB_fuzzer-5401294942371840 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:36 +01:00
Michael Niedermayer	007486058c	avformat/concatdec: Check user_duration sum Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_CONCAT_fuzzer-6434245599690752 Fixes: signed integer overflow: 9223372026773000000 + 22337000000 cannot be represented in type 'long' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:35 +01:00
Michael Niedermayer	1eb8cbd09c	avcodec/wavarc: avoid signed integer overflow in AC code Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-659847401740697 Fixes: signed integer overflow: 65312 * 34078 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	6009dd07bd	avcodec/wavarc: Avoid signed integer overflow in sample Fixes: signed integer overflow: -2147483648 + -25122315 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-6199806972198912 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	ebdcf98499	avcodec/truemotion1: Height not being a multiple of 4 is unsupported mb_change_bits is given space based on height >> 2, while more data is read Fixes: out of array access Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION1_fuzzer-5201925062590464.fuzz Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	d188a86730	avcodec/rtv1: fix undefined FFALIGN Fixes: signed integer overflow: 2147483647 + 4 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RTV1_fuzzer-6324303861514240 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	7eabe56436	avcodec/qoadec: Fix undefined overflow in lms_predict Fixes: signed integer overflow: -1575944192 + -602931200 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_QOA_fuzzer-6470469339185152 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	48eeb198a5	avcodec/hcadec: do not allow code to continue after failed init Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488 Fixes: out of array write Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	addb85ea39	avcodec/hcadec: do not set hfr_group_count to invalid values Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488 Fixes: out of array write Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	0a114d7318	avformat/mov: Do not deallocate heif_item in a input dependant way Fixes: out of array access Fixes: 67070/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-5685384082161664 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Marth64	9df1182065	avformat/dvdvideodec: add explicit inttypes.h include Since log statements printing int64 were made portable in `4464b7eeb1`, let us include inttypes.h explicitly (as it is unclear where PRId64 and such are coming from now). Reported-by: Leo Izen <leo.izen@gmail.com> Signed-off-by: Marth64 <marth64@proxyid.net>	2024-03-26 11:40:12 -04:00
James Almer	1e7ba76562	avformat/mov: free HEIFItem.name when cleaning items in mov_read_trak Fixes memleaks. Signed-off-by: James Almer <jamrial@gmail.com>	2024-03-26 10:43:45 -03:00
Dai, Jianhui J	61afe4d98c	avcodec/cbs_vp8: Improve the bitstream position check The VP8 compressed header may not be byte-aligned due to boolean coding. Round up byte count for accurate data positioning. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2024-03-26 09:05:04 -04:00
Dai, Jianhui J	63dea3c1e1	avcodec/cbs_vp8: Use little endian in fixed() This commit adds value range checks to cbs_vp8_read_unsigned_le, migrates fixed() to use it, and enforces little-endian consistency for all read methods. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2024-03-26 09:04:44 -04:00
Wenbin Chen	ea2e0e92ed	doc: Add libtoch backend option to dnn_processing Signed-off-by: Wenbin Chen <wenbin.chen@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>	2024-03-26 19:17:51 +08:00
Martin Storsjö	f872b19714	aarch64: hevc: Produce plain neon versions of qpel_bi_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_bi_hv4_8_c: 385.7 put_hevc_qpel_bi_hv4_8_neon: 131.0 put_hevc_qpel_bi_hv4_8_i8mm: 92.2 put_hevc_qpel_bi_hv6_8_c: 701.0 put_hevc_qpel_bi_hv6_8_neon: 239.5 put_hevc_qpel_bi_hv6_8_i8mm: 191.0 put_hevc_qpel_bi_hv8_8_c: 1162.0 put_hevc_qpel_bi_hv8_8_neon: 228.0 put_hevc_qpel_bi_hv8_8_i8mm: 225.2 put_hevc_qpel_bi_hv12_8_c: 2305.0 put_hevc_qpel_bi_hv12_8_neon: 558.0 put_hevc_qpel_bi_hv12_8_i8mm: 483.2 put_hevc_qpel_bi_hv16_8_c: 3965.2 put_hevc_qpel_bi_hv16_8_neon: 732.7 put_hevc_qpel_bi_hv16_8_i8mm: 656.5 put_hevc_qpel_bi_hv24_8_c: 8709.7 put_hevc_qpel_bi_hv24_8_neon: 1555.2 put_hevc_qpel_bi_hv24_8_i8mm: 1448.7 put_hevc_qpel_bi_hv32_8_c: 14818.0 put_hevc_qpel_bi_hv32_8_neon: 2763.7 put_hevc_qpel_bi_hv32_8_i8mm: 2468.0 put_hevc_qpel_bi_hv48_8_c: 32855.5 put_hevc_qpel_bi_hv48_8_neon: 6107.2 put_hevc_qpel_bi_hv48_8_i8mm: 5452.7 put_hevc_qpel_bi_hv64_8_c: 57591.5 put_hevc_qpel_bi_hv64_8_neon: 10660.2 put_hevc_qpel_bi_hv64_8_i8mm: 9580.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	d21b9a0411	aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. AWS Graviton 3: put_hevc_qpel_uni_w_hv4_8_c: 422.2 put_hevc_qpel_uni_w_hv4_8_neon: 140.7 put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7 put_hevc_qpel_uni_w_hv8_8_c: 1208.0 put_hevc_qpel_uni_w_hv8_8_neon: 268.2 put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5 put_hevc_qpel_uni_w_hv16_8_c: 4297.2 put_hevc_qpel_uni_w_hv16_8_neon: 802.2 put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2 put_hevc_qpel_uni_w_hv32_8_c: 15518.5 put_hevc_qpel_uni_w_hv32_8_neon: 3085.2 put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2 put_hevc_qpel_uni_w_hv64_8_c: 57254.5 put_hevc_qpel_uni_w_hv64_8_neon: 11787.5 put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	5ab138673b	aarch64: hevc: Produce plain neon versions of qpel_uni_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_uni_hv4_8_c: 384.2 put_hevc_qpel_uni_hv4_8_neon: 127.5 put_hevc_qpel_uni_hv4_8_i8mm: 85.5 put_hevc_qpel_uni_hv6_8_c: 705.5 put_hevc_qpel_uni_hv6_8_neon: 224.5 put_hevc_qpel_uni_hv6_8_i8mm: 176.2 put_hevc_qpel_uni_hv8_8_c: 1136.5 put_hevc_qpel_uni_hv8_8_neon: 216.5 put_hevc_qpel_uni_hv8_8_i8mm: 214.0 put_hevc_qpel_uni_hv12_8_c: 2259.5 put_hevc_qpel_uni_hv12_8_neon: 498.5 put_hevc_qpel_uni_hv12_8_i8mm: 410.7 put_hevc_qpel_uni_hv16_8_c: 3824.7 put_hevc_qpel_uni_hv16_8_neon: 670.0 put_hevc_qpel_uni_hv16_8_i8mm: 603.7 put_hevc_qpel_uni_hv24_8_c: 8113.5 put_hevc_qpel_uni_hv24_8_neon: 1474.7 put_hevc_qpel_uni_hv24_8_i8mm: 1351.5 put_hevc_qpel_uni_hv32_8_c: 14744.5 put_hevc_qpel_uni_hv32_8_neon: 2599.7 put_hevc_qpel_uni_hv32_8_i8mm: 2266.0 put_hevc_qpel_uni_hv48_8_c: 32800.0 put_hevc_qpel_uni_hv48_8_neon: 5650.0 put_hevc_qpel_uni_hv48_8_i8mm: 5011.7 put_hevc_qpel_uni_hv64_8_c: 57856.2 put_hevc_qpel_uni_hv64_8_neon: 9863.5 put_hevc_qpel_uni_hv64_8_i8mm: 8767.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	5cbeefc79e	aarch64: hevc: Produce plain neon versions of qpel_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_hv4_8_c: 386.0 put_hevc_qpel_hv4_8_neon: 125.7 put_hevc_qpel_hv4_8_i8mm: 83.2 put_hevc_qpel_hv6_8_c: 749.0 put_hevc_qpel_hv6_8_neon: 207.0 put_hevc_qpel_hv6_8_i8mm: 166.0 put_hevc_qpel_hv8_8_c: 1305.2 put_hevc_qpel_hv8_8_neon: 216.5 put_hevc_qpel_hv8_8_i8mm: 213.0 put_hevc_qpel_hv12_8_c: 2570.5 put_hevc_qpel_hv12_8_neon: 480.0 put_hevc_qpel_hv12_8_i8mm: 398.2 put_hevc_qpel_hv16_8_c: 4158.7 put_hevc_qpel_hv16_8_neon: 659.7 put_hevc_qpel_hv16_8_i8mm: 593.5 put_hevc_qpel_hv24_8_c: 8626.7 put_hevc_qpel_hv24_8_neon: 1653.5 put_hevc_qpel_hv24_8_i8mm: 1398.7 put_hevc_qpel_hv32_8_c: 14646.0 put_hevc_qpel_hv32_8_neon: 2566.2 put_hevc_qpel_hv32_8_i8mm: 2287.5 put_hevc_qpel_hv48_8_c: 31072.5 put_hevc_qpel_hv48_8_neon: 6228.5 put_hevc_qpel_hv48_8_i8mm: 5291.0 put_hevc_qpel_hv64_8_c: 53847.2 put_hevc_qpel_hv64_8_neon: 9856.7 put_hevc_qpel_hv64_8_i8mm: 8831.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	20c38f4b8d	aarch64: hevc: Reorder qpel_hv functions to prepare for templating This is a pure reordering of code without changing anything in the individual functions. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:50 +02:00
Martin Storsjö	4f71e4ebf2	aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions The hv32 and hv64 functions were identical - both loop and process 16 pixels at a time. The hv16 function was near identical, except for the outer loop (and using sp instead of a separate register). Given the size of these functions, the extra cost of the outer loop is negligible, so use the same function for hv16 as well. This removes over 200 lines of duplicated assembly, and over 4 KB of binary size. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:40 +02:00
Martin Storsjö	4063e50eec	aarch64: hevc: Split the qpel_*_hv functions into two parts The first horizontal filter can use either i8mm or plain neon versions, while the second part is a pure neon implementation. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:29 +02:00
Martin Storsjö	ad01d06f91	aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 AWS Graviton 3: put_hevc_qpel_uni_w_h4_8_c: 159.0 put_hevc_qpel_uni_w_h4_8_neon: 64.2 put_hevc_qpel_uni_w_h4_8_i8mm: 40.0 put_hevc_qpel_uni_w_h6_8_c: 344.7 put_hevc_qpel_uni_w_h6_8_neon: 114.5 put_hevc_qpel_uni_w_h6_8_i8mm: 82.0 put_hevc_qpel_uni_w_h8_8_c: 596.2 put_hevc_qpel_uni_w_h8_8_neon: 132.2 put_hevc_qpel_uni_w_h8_8_i8mm: 106.0 put_hevc_qpel_uni_w_h12_8_c: 1325.0 put_hevc_qpel_uni_w_h12_8_neon: 299.0 put_hevc_qpel_uni_w_h12_8_i8mm: 211.5 put_hevc_qpel_uni_w_h16_8_c: 2300.0 put_hevc_qpel_uni_w_h16_8_neon: 422.0 put_hevc_qpel_uni_w_h16_8_i8mm: 286.2 put_hevc_qpel_uni_w_h24_8_c: 5059.0 put_hevc_qpel_uni_w_h24_8_neon: 912.2 put_hevc_qpel_uni_w_h24_8_i8mm: 664.2 put_hevc_qpel_uni_w_h32_8_c: 9198.2 put_hevc_qpel_uni_w_h32_8_neon: 1638.2 put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7 put_hevc_qpel_uni_w_h48_8_c: 20754.7 put_hevc_qpel_uni_w_h48_8_neon: 3633.7 put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7 put_hevc_qpel_uni_w_h64_8_c: 36854.7 put_hevc_qpel_uni_w_h64_8_neon: 6435.7 put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:03:18 +02:00
Martin Storsjö	de23b384fd	aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm In addition to just templating, this contains one change to ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register which ff_hevc_put_hevc_epel_h32_8_neon requires. AWS Graviton 3: put_hevc_epel_bi_hv4_8_c: 176.5 put_hevc_epel_bi_hv4_8_neon: 62.0 put_hevc_epel_bi_hv4_8_i8mm: 58.0 put_hevc_epel_bi_hv6_8_c: 343.7 put_hevc_epel_bi_hv6_8_neon: 109.7 put_hevc_epel_bi_hv6_8_i8mm: 105.7 put_hevc_epel_bi_hv8_8_c: 536.0 put_hevc_epel_bi_hv8_8_neon: 112.7 put_hevc_epel_bi_hv8_8_i8mm: 111.7 put_hevc_epel_bi_hv12_8_c: 1107.7 put_hevc_epel_bi_hv12_8_neon: 254.7 put_hevc_epel_bi_hv12_8_i8mm: 239.0 put_hevc_epel_bi_hv16_8_c: 1927.7 put_hevc_epel_bi_hv16_8_neon: 356.2 put_hevc_epel_bi_hv16_8_i8mm: 334.2 put_hevc_epel_bi_hv24_8_c: 4195.2 put_hevc_epel_bi_hv24_8_neon: 736.7 put_hevc_epel_bi_hv24_8_i8mm: 715.5 put_hevc_epel_bi_hv32_8_c: 7280.5 put_hevc_epel_bi_hv32_8_neon: 1287.7 put_hevc_epel_bi_hv32_8_i8mm: 1162.2 put_hevc_epel_bi_hv48_8_c: 16857.7 put_hevc_epel_bi_hv48_8_neon: 2836.2 put_hevc_epel_bi_hv48_8_i8mm: 2908.5 put_hevc_epel_bi_hv64_8_c: 29248.2 put_hevc_epel_bi_hv64_8_neon: 5051.7 put_hevc_epel_bi_hv64_8_i8mm: 4491.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:03:16 +02:00
Martin Storsjö	96e5adda9f	aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm AWS Graviton 3: put_hevc_epel_uni_w_hv4_8_c: 191.2 put_hevc_epel_uni_w_hv4_8_neon: 87.7 put_hevc_epel_uni_w_hv4_8_i8mm: 83.2 put_hevc_epel_uni_w_hv6_8_c: 349.5 put_hevc_epel_uni_w_hv6_8_neon: 153.0 put_hevc_epel_uni_w_hv6_8_i8mm: 148.5 put_hevc_epel_uni_w_hv8_8_c: 581.2 put_hevc_epel_uni_w_hv8_8_neon: 166.7 put_hevc_epel_uni_w_hv8_8_i8mm: 163.5 put_hevc_epel_uni_w_hv12_8_c: 1230.0 put_hevc_epel_uni_w_hv12_8_neon: 387.7 put_hevc_epel_uni_w_hv12_8_i8mm: 370.2 put_hevc_epel_uni_w_hv16_8_c: 2003.2 put_hevc_epel_uni_w_hv16_8_neon: 501.5 put_hevc_epel_uni_w_hv16_8_i8mm: 490.2 put_hevc_epel_uni_w_hv24_8_c: 4448.7 put_hevc_epel_uni_w_hv24_8_neon: 1092.2 put_hevc_epel_uni_w_hv24_8_i8mm: 1069.7 put_hevc_epel_uni_w_hv32_8_c: 7817.2 put_hevc_epel_uni_w_hv32_8_neon: 1916.2 put_hevc_epel_uni_w_hv32_8_i8mm: 1829.5 put_hevc_epel_uni_w_hv48_8_c: 16728.2 put_hevc_epel_uni_w_hv48_8_neon: 4263.7 put_hevc_epel_uni_w_hv48_8_i8mm: 4342.7 put_hevc_epel_uni_w_hv64_8_c: 29563.2 put_hevc_epel_uni_w_hv64_8_neon: 7474.2 put_hevc_epel_uni_w_hv64_8_i8mm: 7128.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:58 +02:00
Martin Storsjö	d7294199ab	aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm AWS Graviton 3: put_hevc_epel_uni_hv4_8_c: 163.5 put_hevc_epel_uni_hv4_8_neon: 59.7 put_hevc_epel_uni_hv4_8_i8mm: 57.5 put_hevc_epel_uni_hv6_8_c: 344.7 put_hevc_epel_uni_hv6_8_neon: 105.0 put_hevc_epel_uni_hv6_8_i8mm: 102.7 put_hevc_epel_uni_hv8_8_c: 552.2 put_hevc_epel_uni_hv8_8_neon: 111.2 put_hevc_epel_uni_hv8_8_i8mm: 104.0 put_hevc_epel_uni_hv12_8_c: 1195.0 put_hevc_epel_uni_hv12_8_neon: 248.7 put_hevc_epel_uni_hv12_8_i8mm: 229.5 put_hevc_epel_uni_hv16_8_c: 1910.2 put_hevc_epel_uni_hv16_8_neon: 339.5 put_hevc_epel_uni_hv16_8_i8mm: 323.2 put_hevc_epel_uni_hv24_8_c: 4048.2 put_hevc_epel_uni_hv24_8_neon: 737.7 put_hevc_epel_uni_hv24_8_i8mm: 713.7 put_hevc_epel_uni_hv32_8_c: 6865.7 put_hevc_epel_uni_hv32_8_neon: 1285.0 put_hevc_epel_uni_hv32_8_i8mm: 1206.0 put_hevc_epel_uni_hv48_8_c: 15830.5 put_hevc_epel_uni_hv48_8_neon: 2844.7 put_hevc_epel_uni_hv48_8_i8mm: 2914.0 put_hevc_epel_uni_hv64_8_c: 27912.7 put_hevc_epel_uni_hv64_8_neon: 4970.5 put_hevc_epel_uni_hv64_8_i8mm: 4653.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:28 +02:00
Martin Storsjö	7bf3d14769	aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm AWS Graviton 3: put_hevc_epel_hv4_8_c: 163.7 put_hevc_epel_hv4_8_neon: 52.5 put_hevc_epel_hv4_8_i8mm: 49.5 put_hevc_epel_hv6_8_c: 292.2 put_hevc_epel_hv6_8_neon: 97.7 put_hevc_epel_hv6_8_i8mm: 101.2 put_hevc_epel_hv8_8_c: 471.0 put_hevc_epel_hv8_8_neon: 106.7 put_hevc_epel_hv8_8_i8mm: 102.5 put_hevc_epel_hv12_8_c: 1030.2 put_hevc_epel_hv12_8_neon: 240.5 put_hevc_epel_hv12_8_i8mm: 215.0 put_hevc_epel_hv16_8_c: 1711.5 put_hevc_epel_hv16_8_neon: 340.2 put_hevc_epel_hv16_8_i8mm: 319.2 put_hevc_epel_hv24_8_c: 3670.0 put_hevc_epel_hv24_8_neon: 702.0 put_hevc_epel_hv24_8_i8mm: 666.5 put_hevc_epel_hv32_8_c: 6785.5 put_hevc_epel_hv32_8_neon: 1247.0 put_hevc_epel_hv32_8_i8mm: 1169.0 put_hevc_epel_hv48_8_c: 14689.7 put_hevc_epel_hv48_8_neon: 2665.2 put_hevc_epel_hv48_8_i8mm: 2740.0 put_hevc_epel_hv64_8_c: 25899.2 put_hevc_epel_hv64_8_neon: 4801.2 put_hevc_epel_hv64_8_i8mm: 4487.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:19 +02:00
Martin Storsjö	5b5666e5ab	aarch64: hevc: Reorder epel_hv functions to prepare for templating This is a pure reordering of code without changing anything in the individual functions. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:07 +02:00
Martin Storsjö	e6d4c0e117	aarch64: hevc: Split the epel_*_hv functions into two parts The first horizontal filter can use either i8mm or plain neon versions, while the second part is a pure neon implementation. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:00 +02:00
Martin Storsjö	54af555bfa	aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8 AWS Graviton 3: put_hevc_epel_uni_w_h4_8_c: 97.2 put_hevc_epel_uni_w_h4_8_neon: 41.2 put_hevc_epel_uni_w_h4_8_i8mm: 35.2 put_hevc_epel_uni_w_h6_8_c: 203.7 put_hevc_epel_uni_w_h6_8_neon: 84.7 put_hevc_epel_uni_w_h6_8_i8mm: 74.7 put_hevc_epel_uni_w_h8_8_c: 345.7 put_hevc_epel_uni_w_h8_8_neon: 94.0 put_hevc_epel_uni_w_h8_8_i8mm: 80.7 put_hevc_epel_uni_w_h12_8_c: 768.7 put_hevc_epel_uni_w_h12_8_neon: 196.7 put_hevc_epel_uni_w_h12_8_i8mm: 169.7 put_hevc_epel_uni_w_h16_8_c: 1313.0 put_hevc_epel_uni_w_h16_8_neon: 290.7 put_hevc_epel_uni_w_h16_8_i8mm: 238.0 put_hevc_epel_uni_w_h24_8_c: 2877.5 put_hevc_epel_uni_w_h24_8_neon: 650.0 put_hevc_epel_uni_w_h24_8_i8mm: 512.0 put_hevc_epel_uni_w_h32_8_c: 5113.5 put_hevc_epel_uni_w_h32_8_neon: 1129.5 put_hevc_epel_uni_w_h32_8_i8mm: 739.2 put_hevc_epel_uni_w_h48_8_c: 11757.0 put_hevc_epel_uni_w_h48_8_neon: 2518.7 put_hevc_epel_uni_w_h48_8_i8mm: 1688.5 put_hevc_epel_uni_w_h64_8_c: 20478.0 put_hevc_epel_uni_w_h64_8_neon: 4411.7 put_hevc_epel_uni_w_h64_8_i8mm: 2884.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:47 +02:00
Martin Storsjö	6d384298ec	aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8 AWS Graviton 3: put_hevc_epel_h4_8_c: 64.7 put_hevc_epel_h4_8_neon: 25.0 put_hevc_epel_h4_8_i8mm: 21.2 put_hevc_epel_h6_8_c: 130.0 put_hevc_epel_h6_8_neon: 40.7 put_hevc_epel_h6_8_i8mm: 36.5 put_hevc_epel_h8_8_c: 209.0 put_hevc_epel_h8_8_neon: 45.2 put_hevc_epel_h8_8_i8mm: 41.2 put_hevc_epel_h12_8_c: 465.5 put_hevc_epel_h12_8_neon: 104.5 put_hevc_epel_h12_8_i8mm: 86.5 put_hevc_epel_h16_8_c: 830.7 put_hevc_epel_h16_8_neon: 134.2 put_hevc_epel_h16_8_i8mm: 114.0 put_hevc_epel_h24_8_c: 1844.7 put_hevc_epel_h24_8_neon: 282.2 put_hevc_epel_h24_8_i8mm: 277.2 put_hevc_epel_h32_8_c: 3227.5 put_hevc_epel_h32_8_neon: 501.5 put_hevc_epel_h32_8_i8mm: 396.0 put_hevc_epel_h48_8_c: 7229.2 put_hevc_epel_h48_8_neon: 1120.2 put_hevc_epel_h48_8_i8mm: 901.2 put_hevc_epel_h64_8_c: 12869.0 put_hevc_epel_h64_8_neon: 1999.2 put_hevc_epel_h64_8_i8mm: 1610.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:29 +02:00
Martin Storsjö	8f03c30a17	aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:20 +02:00
Martin Storsjö	717cc82d28	aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping For widths of 32 pixels and more, loop first horizontally, then vertically. Previously, this function would process a 16 pixel wide slice of the block, looping vertically. After processing the whole height, it would backtrack and process the next 16 pixel wide slice. When doing 8tap filtering horizontally, the function must load 7 more pixels (in practice, 8) following the actual inputs, and this was done for each slice. By iterating first horizontally throughout each line, then vertically, we access data in a more cache friendly order, and we don't need to reload data unnecessarily. Keep the original order in put_hevc_\type\()_h12_8_neon; the only suboptimal case there is for width=24. But specializing an optimal variant for that would require more code, which might not be worth it. For the h16 case, this implementation would give a slowdown, as it now loads the first 8 pixels separately from the rest, but for larger widths, it is a gain. Therefore, keep the h16 case as it was (but remove the outer loop), and create a new specialized version for horizontal looping with 16 pixels at a time. Before: Cortex A53 A72 A73 Graviton 3 put_hevc_qpel_h16_8_neon: 710.5 667.7 692.5 211.0 put_hevc_qpel_h32_8_neon: 2791.5 2643.5 2732.0 883.5 put_hevc_qpel_h64_8_neon: 10954.0 10657.0 10874.2 3241.5 After: put_hevc_qpel_h16_8_neon: 697.5 663.5 705.7 212.5 put_hevc_qpel_h32_8_neon: 2767.2 2684.5 2791.2 920.5 put_hevc_qpel_h64_8_neon: 10559.2 10471.5 10932.2 3051.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:11 +02:00
Martin Storsjö	e3a54cabde	aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon This gets rid of a couple instructions, but the actual performance is almost identical on Cortex A72/A73. On Cortex A53, it is a handful of cycles faster. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:58:01 +02:00
Martin Storsjö	78db8405c0	aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon store temporary buffers on the stack. When consuming it, many of these functions use the stack pointer as incremental pointer for reading the data (instead of storing it in another register), which is rather unusual. Technically, this is fine as long as the pointer remains properly aligned. However in the case of ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, after incrementing sp when reading data (within each 16 pixel wide stripe) it would then reset the stack pointer back to a lower value, for reading the next 16 pixel wide stripe, expecting the data to remain untouched. This can't be assumed; data on the stack below the stack pointer can be clobbered (e.g. by a signal handler). Some OS ABIs allow for a little margin that won't be touched, aka a red zone, but not all do. The ones that do, guarantee 16 or 128 bytes, not 9 KB. Convert this function to use a separate pointer register to iterate through the data, retaining the stack pointer to point at the bottom of the data we require to remain untouched. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:57:55 +02:00
Martin Storsjö	e66858fbab	aarch64: hevc: Reorder a misplaced function init line Group the epel and qpel functions together. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:57:50 +02:00

1 2 3 4 5 ...

114467 Commits All Branches Search

114467 Commits

All Branches