ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-12-19 05:55:07 +00:00

Author	SHA1	Message	Date
Andreas Rheinhardt	432e287e27	fftools/ffmpeg_sched: Explicitly return 0 on sch_enc_send() success Do not return the return value of the last enc_send_to_dst() call, as this would treat the last call differently from the earlier calls; furthermore, sch_enc_send() explicitly documents to always return 0 on success. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-28 03:06:13 +01:00
Niklas Haas	b89ee26539	avfilter: properly reduce YUV colorspace format lists Doing this with REDUCE_FORMATS() instead of swap_color_() is not only shorter, but more importantly comes with the benefit of being done inside a loop, allowing us to correctly propagate complex graphs involving multiple conversion filters (e.g. -vf scale,zscale). The latter family of swapping functions is only used to settle the best remaining* entry if no exact match was found, and as such was never the correct solution to YUV colorspaces, which only care about exact matches.	2024-03-27 19:11:27 +01:00
James Almer	189c32f536	avformat/mov: don't abort on duplicate Mastering Display Metadata boxes The VP9 spec defines a SmDm box for this information, and the ISOBMFF spec defines a mdvc one. If both are present, just ignore one of them. This is in line with clli and CoLL boxes. Fixes ticket #10711. Signed-off-by: James Almer <jamrial@gmail.com>	2024-03-27 13:51:28 -03:00
Andreas Rheinhardt	8ca57fcf9e	avutil/fifo, file: Remove unused headers Forgotten in `4105899245`, `4c92fc02f8`. Reviewed-by: Stefano Sabatini <stefasab@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-27 17:07:22 +01:00
Andreas Rheinhardt	9223c92c88	doc/examples: Always use <> includes Reviewed-by: Stefano Sabatini <stefasab@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-27 17:07:06 +01:00
Zhao Zhili	89e9486bc3	avcodec/h264_mp4toannexb: Fix heap buffer overflow Fixes: out of array write Fixes: 64407/clusterfuzz-testcase-minimized-ffmpeg_BSF_H264_MP4TOANNEXB_fuzzer-4966763443650560 mp4toannexb_filter counts the number of bytes needed in the first pass and allocate the memory, then do memcpy in the second pass. Update sps/pps size in the loop makes the count invalid in the case of SPS/PPS occur after IDR slice. This patch process in-band SPS/PPS before the two pass loops. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-03-27 20:04:40 +08:00
Zhao Zhili	edb1f1bc09	tests: Remove fate-libx265-hdr10 The test depends on the compile option of x265. It failed when HIGH_BIT_DEPTH isn't enabled. It also failed when asan is enabled because of memory issue inside of x265, which I don't think can be fixed within FFmpeg. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-03-27 20:04:40 +08:00
Anton Khirnov	8fc1e1358b	fftools/ffmpeg_enc: simplify error handling for decoded_side_data setup There is no need to free the already-added items, they will be freed alongside the codec context. There is also little point in an error message, as the only reason this can fail is malloc failure.	2024-03-27 11:36:21 +01:00
Anton Khirnov	6f2cb0923c	fftools/ffmpeg_enc: move decoded_side_data setup out of video-only block Nothing about this code is video-specific.	2024-03-27 11:36:20 +01:00
Anton Khirnov	fabf148578	fftools/ffmpeg_enc: only promote first frame side data to global when meaningful Skip those side data types that do not make sense as global side data.	2024-03-27 11:35:27 +01:00
Anton Khirnov	2621be3539	lavu/frame: add side data descriptors They allow exporting extended information about side data types.	2024-03-27 11:33:45 +01:00
Michael Niedermayer	6b213175c9	Bump after 7.0 branch point Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:54 +01:00
Michael Niedermayer	e7d938073e	doc/APIchanges: Add 7.0 cut point Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:54 +01:00
Michael Niedermayer	8f6bdfd4ec	Changelog: Add 7.0 point Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:54 +01:00
Michael Niedermayer	872980ace6	Bump prior release/7.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:53 +01:00
Matthieu Bouron	87ace5c4da	Changelog: add Android content URIs protocol entry Signed-off-by: Matthieu Bouron <matthieu.bouron@gmail.com>	2024-03-27 00:08:11 +01:00
Michael Niedermayer	4126a99d2b	doc/APIchange: Fill in some missing thingss Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:42:19 +01:00
Michael Niedermayer	86f73277bf	avformat/westwood_vqa: Fix 2g packets Fixes: signed integer overflow: 2147483424 * 2 cannot be represented in type 'int' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WSVQA_fuzzer-4576211411795968 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:43 +01:00
Michael Niedermayer	e849eb2343	avformat/matroskadec: Check timescale Fixes: 3.82046e+18 is outside the range of representable values of type 'unsigned int' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WEBM_DASH_MANIFEST_fuzzer-6381436594421760 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:42 +01:00
Michael Niedermayer	61dca9e150	avformat/wavdec: satuarte next_tag_ofs, data_end Fixes: signed integer overflow: 5053074104798691550 + 5053074104259715104 cannot be represented in type 'long' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_WAV_fuzzer-6515315309936640 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:41 +01:00
Michael Niedermayer	75317ec442	avformat/wavdec: sanity check channels and bps before using them for block_align Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_W64_fuzzer-4704044498944000 Fixes: signed integer overflow: 520464 * 8224 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:40 +01:00
Michael Niedermayer	0bed22d597	avformat/sbgdec: Check for negative duration Fixes: signed integer overflow: 9223372036854775807 - -8000000 cannot be represented in type 'long' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_SBG_fuzzer-5133181743136768 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:40 +01:00
Michael Niedermayer	878625812f	avformat/rpl: Use 64bit for total_audio_size and check it Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_RPL_fuzzer-4677434693517312 Fixes: signed integer overflow: 5555555555555555556 * 8 cannot be represented in type 'long long' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:39 +01:00
Michael Niedermayer	3d8d778a68	avformat/timecode: use 64bit for intermediate for rounding in fps_from_frame_rate() Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4802790784303104 Fixes: signed integer overflow: 1768972133 + 968491058 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:38 +01:00
Michael Niedermayer	f01a89c5a3	avformat/mov: use 64bit for intermediate for rounding Fixes: signed integer overflow: 1768972133 + 968491058 cannot be represented in type 'int' Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4802790784303104 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:37 +01:00
Michael Niedermayer	746203af31	avformat/jacosubdec: Use 64bit for abs Fixes: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_JACOSUB_fuzzer-5401294942371840 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:36 +01:00
Michael Niedermayer	007486058c	avformat/concatdec: Check user_duration sum Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_CONCAT_fuzzer-6434245599690752 Fixes: signed integer overflow: 9223372026773000000 + 22337000000 cannot be represented in type 'long' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:25:35 +01:00
Michael Niedermayer	1eb8cbd09c	avcodec/wavarc: avoid signed integer overflow in AC code Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-659847401740697 Fixes: signed integer overflow: 65312 * 34078 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	6009dd07bd	avcodec/wavarc: Avoid signed integer overflow in sample Fixes: signed integer overflow: -2147483648 + -25122315 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVARC_fuzzer-6199806972198912 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	ebdcf98499	avcodec/truemotion1: Height not being a multiple of 4 is unsupported mb_change_bits is given space based on height >> 2, while more data is read Fixes: out of array access Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION1_fuzzer-5201925062590464.fuzz Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	d188a86730	avcodec/rtv1: fix undefined FFALIGN Fixes: signed integer overflow: 2147483647 + 4 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RTV1_fuzzer-6324303861514240 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	7eabe56436	avcodec/qoadec: Fix undefined overflow in lms_predict Fixes: signed integer overflow: -1575944192 + -602931200 cannot be represented in type 'int' Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_QOA_fuzzer-6470469339185152 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	48eeb198a5	avcodec/hcadec: do not allow code to continue after failed init Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488 Fixes: out of array write Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	addb85ea39	avcodec/hcadec: do not set hfr_group_count to invalid values Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HCA_fuzzer-6247136417087488 Fixes: out of array write Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Michael Niedermayer	0a114d7318	avformat/mov: Do not deallocate heif_item in a input dependant way Fixes: out of array access Fixes: 67070/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-5685384082161664 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-26 23:19:49 +01:00
Marth64	9df1182065	avformat/dvdvideodec: add explicit inttypes.h include Since log statements printing int64 were made portable in `4464b7eeb1`, let us include inttypes.h explicitly (as it is unclear where PRId64 and such are coming from now). Reported-by: Leo Izen <leo.izen@gmail.com> Signed-off-by: Marth64 <marth64@proxyid.net>	2024-03-26 11:40:12 -04:00
James Almer	1e7ba76562	avformat/mov: free HEIFItem.name when cleaning items in mov_read_trak Fixes memleaks. Signed-off-by: James Almer <jamrial@gmail.com>	2024-03-26 10:43:45 -03:00
Dai, Jianhui J	61afe4d98c	avcodec/cbs_vp8: Improve the bitstream position check The VP8 compressed header may not be byte-aligned due to boolean coding. Round up byte count for accurate data positioning. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2024-03-26 09:05:04 -04:00
Dai, Jianhui J	63dea3c1e1	avcodec/cbs_vp8: Use little endian in fixed() This commit adds value range checks to cbs_vp8_read_unsigned_le, migrates fixed() to use it, and enforces little-endian consistency for all read methods. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2024-03-26 09:04:44 -04:00
Wenbin Chen	ea2e0e92ed	doc: Add libtoch backend option to dnn_processing Signed-off-by: Wenbin Chen <wenbin.chen@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>	2024-03-26 19:17:51 +08:00
Martin Storsjö	f872b19714	aarch64: hevc: Produce plain neon versions of qpel_bi_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_bi_hv4_8_c: 385.7 put_hevc_qpel_bi_hv4_8_neon: 131.0 put_hevc_qpel_bi_hv4_8_i8mm: 92.2 put_hevc_qpel_bi_hv6_8_c: 701.0 put_hevc_qpel_bi_hv6_8_neon: 239.5 put_hevc_qpel_bi_hv6_8_i8mm: 191.0 put_hevc_qpel_bi_hv8_8_c: 1162.0 put_hevc_qpel_bi_hv8_8_neon: 228.0 put_hevc_qpel_bi_hv8_8_i8mm: 225.2 put_hevc_qpel_bi_hv12_8_c: 2305.0 put_hevc_qpel_bi_hv12_8_neon: 558.0 put_hevc_qpel_bi_hv12_8_i8mm: 483.2 put_hevc_qpel_bi_hv16_8_c: 3965.2 put_hevc_qpel_bi_hv16_8_neon: 732.7 put_hevc_qpel_bi_hv16_8_i8mm: 656.5 put_hevc_qpel_bi_hv24_8_c: 8709.7 put_hevc_qpel_bi_hv24_8_neon: 1555.2 put_hevc_qpel_bi_hv24_8_i8mm: 1448.7 put_hevc_qpel_bi_hv32_8_c: 14818.0 put_hevc_qpel_bi_hv32_8_neon: 2763.7 put_hevc_qpel_bi_hv32_8_i8mm: 2468.0 put_hevc_qpel_bi_hv48_8_c: 32855.5 put_hevc_qpel_bi_hv48_8_neon: 6107.2 put_hevc_qpel_bi_hv48_8_i8mm: 5452.7 put_hevc_qpel_bi_hv64_8_c: 57591.5 put_hevc_qpel_bi_hv64_8_neon: 10660.2 put_hevc_qpel_bi_hv64_8_i8mm: 9580.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	d21b9a0411	aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. AWS Graviton 3: put_hevc_qpel_uni_w_hv4_8_c: 422.2 put_hevc_qpel_uni_w_hv4_8_neon: 140.7 put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7 put_hevc_qpel_uni_w_hv8_8_c: 1208.0 put_hevc_qpel_uni_w_hv8_8_neon: 268.2 put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5 put_hevc_qpel_uni_w_hv16_8_c: 4297.2 put_hevc_qpel_uni_w_hv16_8_neon: 802.2 put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2 put_hevc_qpel_uni_w_hv32_8_c: 15518.5 put_hevc_qpel_uni_w_hv32_8_neon: 3085.2 put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2 put_hevc_qpel_uni_w_hv64_8_c: 57254.5 put_hevc_qpel_uni_w_hv64_8_neon: 11787.5 put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	5ab138673b	aarch64: hevc: Produce plain neon versions of qpel_uni_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_uni_hv4_8_c: 384.2 put_hevc_qpel_uni_hv4_8_neon: 127.5 put_hevc_qpel_uni_hv4_8_i8mm: 85.5 put_hevc_qpel_uni_hv6_8_c: 705.5 put_hevc_qpel_uni_hv6_8_neon: 224.5 put_hevc_qpel_uni_hv6_8_i8mm: 176.2 put_hevc_qpel_uni_hv8_8_c: 1136.5 put_hevc_qpel_uni_hv8_8_neon: 216.5 put_hevc_qpel_uni_hv8_8_i8mm: 214.0 put_hevc_qpel_uni_hv12_8_c: 2259.5 put_hevc_qpel_uni_hv12_8_neon: 498.5 put_hevc_qpel_uni_hv12_8_i8mm: 410.7 put_hevc_qpel_uni_hv16_8_c: 3824.7 put_hevc_qpel_uni_hv16_8_neon: 670.0 put_hevc_qpel_uni_hv16_8_i8mm: 603.7 put_hevc_qpel_uni_hv24_8_c: 8113.5 put_hevc_qpel_uni_hv24_8_neon: 1474.7 put_hevc_qpel_uni_hv24_8_i8mm: 1351.5 put_hevc_qpel_uni_hv32_8_c: 14744.5 put_hevc_qpel_uni_hv32_8_neon: 2599.7 put_hevc_qpel_uni_hv32_8_i8mm: 2266.0 put_hevc_qpel_uni_hv48_8_c: 32800.0 put_hevc_qpel_uni_hv48_8_neon: 5650.0 put_hevc_qpel_uni_hv48_8_i8mm: 5011.7 put_hevc_qpel_uni_hv64_8_c: 57856.2 put_hevc_qpel_uni_hv64_8_neon: 9863.5 put_hevc_qpel_uni_hv64_8_i8mm: 8767.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	5cbeefc79e	aarch64: hevc: Produce plain neon versions of qpel_hv As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_hv4_8_c: 386.0 put_hevc_qpel_hv4_8_neon: 125.7 put_hevc_qpel_hv4_8_i8mm: 83.2 put_hevc_qpel_hv6_8_c: 749.0 put_hevc_qpel_hv6_8_neon: 207.0 put_hevc_qpel_hv6_8_i8mm: 166.0 put_hevc_qpel_hv8_8_c: 1305.2 put_hevc_qpel_hv8_8_neon: 216.5 put_hevc_qpel_hv8_8_i8mm: 213.0 put_hevc_qpel_hv12_8_c: 2570.5 put_hevc_qpel_hv12_8_neon: 480.0 put_hevc_qpel_hv12_8_i8mm: 398.2 put_hevc_qpel_hv16_8_c: 4158.7 put_hevc_qpel_hv16_8_neon: 659.7 put_hevc_qpel_hv16_8_i8mm: 593.5 put_hevc_qpel_hv24_8_c: 8626.7 put_hevc_qpel_hv24_8_neon: 1653.5 put_hevc_qpel_hv24_8_i8mm: 1398.7 put_hevc_qpel_hv32_8_c: 14646.0 put_hevc_qpel_hv32_8_neon: 2566.2 put_hevc_qpel_hv32_8_i8mm: 2287.5 put_hevc_qpel_hv48_8_c: 31072.5 put_hevc_qpel_hv48_8_neon: 6228.5 put_hevc_qpel_hv48_8_i8mm: 5291.0 put_hevc_qpel_hv64_8_c: 53847.2 put_hevc_qpel_hv64_8_neon: 9856.7 put_hevc_qpel_hv64_8_i8mm: 8831.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:55 +02:00
Martin Storsjö	20c38f4b8d	aarch64: hevc: Reorder qpel_hv functions to prepare for templating This is a pure reordering of code without changing anything in the individual functions. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:50 +02:00
Martin Storsjö	4f71e4ebf2	aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions The hv32 and hv64 functions were identical - both loop and process 16 pixels at a time. The hv16 function was near identical, except for the outer loop (and using sp instead of a separate register). Given the size of these functions, the extra cost of the outer loop is negligible, so use the same function for hv16 as well. This removes over 200 lines of duplicated assembly, and over 4 KB of binary size. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:40 +02:00
Martin Storsjö	4063e50eec	aarch64: hevc: Split the qpel_*_hv functions into two parts The first horizontal filter can use either i8mm or plain neon versions, while the second part is a pure neon implementation. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:05:29 +02:00
Martin Storsjö	ad01d06f91	aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 AWS Graviton 3: put_hevc_qpel_uni_w_h4_8_c: 159.0 put_hevc_qpel_uni_w_h4_8_neon: 64.2 put_hevc_qpel_uni_w_h4_8_i8mm: 40.0 put_hevc_qpel_uni_w_h6_8_c: 344.7 put_hevc_qpel_uni_w_h6_8_neon: 114.5 put_hevc_qpel_uni_w_h6_8_i8mm: 82.0 put_hevc_qpel_uni_w_h8_8_c: 596.2 put_hevc_qpel_uni_w_h8_8_neon: 132.2 put_hevc_qpel_uni_w_h8_8_i8mm: 106.0 put_hevc_qpel_uni_w_h12_8_c: 1325.0 put_hevc_qpel_uni_w_h12_8_neon: 299.0 put_hevc_qpel_uni_w_h12_8_i8mm: 211.5 put_hevc_qpel_uni_w_h16_8_c: 2300.0 put_hevc_qpel_uni_w_h16_8_neon: 422.0 put_hevc_qpel_uni_w_h16_8_i8mm: 286.2 put_hevc_qpel_uni_w_h24_8_c: 5059.0 put_hevc_qpel_uni_w_h24_8_neon: 912.2 put_hevc_qpel_uni_w_h24_8_i8mm: 664.2 put_hevc_qpel_uni_w_h32_8_c: 9198.2 put_hevc_qpel_uni_w_h32_8_neon: 1638.2 put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7 put_hevc_qpel_uni_w_h48_8_c: 20754.7 put_hevc_qpel_uni_w_h48_8_neon: 3633.7 put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7 put_hevc_qpel_uni_w_h64_8_c: 36854.7 put_hevc_qpel_uni_w_h64_8_neon: 6435.7 put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:03:18 +02:00
Martin Storsjö	de23b384fd	aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm In addition to just templating, this contains one change to ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register which ff_hevc_put_hevc_epel_h32_8_neon requires. AWS Graviton 3: put_hevc_epel_bi_hv4_8_c: 176.5 put_hevc_epel_bi_hv4_8_neon: 62.0 put_hevc_epel_bi_hv4_8_i8mm: 58.0 put_hevc_epel_bi_hv6_8_c: 343.7 put_hevc_epel_bi_hv6_8_neon: 109.7 put_hevc_epel_bi_hv6_8_i8mm: 105.7 put_hevc_epel_bi_hv8_8_c: 536.0 put_hevc_epel_bi_hv8_8_neon: 112.7 put_hevc_epel_bi_hv8_8_i8mm: 111.7 put_hevc_epel_bi_hv12_8_c: 1107.7 put_hevc_epel_bi_hv12_8_neon: 254.7 put_hevc_epel_bi_hv12_8_i8mm: 239.0 put_hevc_epel_bi_hv16_8_c: 1927.7 put_hevc_epel_bi_hv16_8_neon: 356.2 put_hevc_epel_bi_hv16_8_i8mm: 334.2 put_hevc_epel_bi_hv24_8_c: 4195.2 put_hevc_epel_bi_hv24_8_neon: 736.7 put_hevc_epel_bi_hv24_8_i8mm: 715.5 put_hevc_epel_bi_hv32_8_c: 7280.5 put_hevc_epel_bi_hv32_8_neon: 1287.7 put_hevc_epel_bi_hv32_8_i8mm: 1162.2 put_hevc_epel_bi_hv48_8_c: 16857.7 put_hevc_epel_bi_hv48_8_neon: 2836.2 put_hevc_epel_bi_hv48_8_i8mm: 2908.5 put_hevc_epel_bi_hv64_8_c: 29248.2 put_hevc_epel_bi_hv64_8_neon: 5051.7 put_hevc_epel_bi_hv64_8_i8mm: 4491.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 09:03:16 +02:00
Martin Storsjö	96e5adda9f	aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm AWS Graviton 3: put_hevc_epel_uni_w_hv4_8_c: 191.2 put_hevc_epel_uni_w_hv4_8_neon: 87.7 put_hevc_epel_uni_w_hv4_8_i8mm: 83.2 put_hevc_epel_uni_w_hv6_8_c: 349.5 put_hevc_epel_uni_w_hv6_8_neon: 153.0 put_hevc_epel_uni_w_hv6_8_i8mm: 148.5 put_hevc_epel_uni_w_hv8_8_c: 581.2 put_hevc_epel_uni_w_hv8_8_neon: 166.7 put_hevc_epel_uni_w_hv8_8_i8mm: 163.5 put_hevc_epel_uni_w_hv12_8_c: 1230.0 put_hevc_epel_uni_w_hv12_8_neon: 387.7 put_hevc_epel_uni_w_hv12_8_i8mm: 370.2 put_hevc_epel_uni_w_hv16_8_c: 2003.2 put_hevc_epel_uni_w_hv16_8_neon: 501.5 put_hevc_epel_uni_w_hv16_8_i8mm: 490.2 put_hevc_epel_uni_w_hv24_8_c: 4448.7 put_hevc_epel_uni_w_hv24_8_neon: 1092.2 put_hevc_epel_uni_w_hv24_8_i8mm: 1069.7 put_hevc_epel_uni_w_hv32_8_c: 7817.2 put_hevc_epel_uni_w_hv32_8_neon: 1916.2 put_hevc_epel_uni_w_hv32_8_i8mm: 1829.5 put_hevc_epel_uni_w_hv48_8_c: 16728.2 put_hevc_epel_uni_w_hv48_8_neon: 4263.7 put_hevc_epel_uni_w_hv48_8_i8mm: 4342.7 put_hevc_epel_uni_w_hv64_8_c: 29563.2 put_hevc_epel_uni_w_hv64_8_neon: 7474.2 put_hevc_epel_uni_w_hv64_8_i8mm: 7128.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2024-03-26 08:59:58 +02:00

1 2 3 4 5 ...

114478 Commits