ffmpeg

Commit Graph

Author	SHA1	Message	Date
Andreas Rheinhardt	fa06f48371	avfilter/bwdifdsp: Constify Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-09-28 00:17:47 +02:00
Andreas Rheinhardt	80afcc8539	avfilter/bwdif: Add proper BWDIFDSPContext This already avoids unnecessary indirectly included headers in the arch-specific vf_bwdif_init.c files; it is also in preparation for splitting the actual functions out of vf_bwdif.c. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-09-28 00:17:47 +02:00
Paul B Mahol	c5effe7d3d	avfilter/x86/af_afir: add FMA3 SIMD	2023-09-17 11:11:24 +02:00
Evgeny Pavlov	cb1479faca	avfilter/vf_ssim: Fix x86 assembly code for SSIM calculation This commit fixes bug #10495 The code had several bugs related to post-loop compensation code: - test assembly instruction performs bitwise AND operation and generate flags used by jz branch instruction. Wrong test condition leads to incorrect branching - Incorrect compensation code for some branches Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>	2023-08-21 17:04:51 +02:00
James Almer	aca8ceb870	x86/vf_bwdif_init: limit AVX2 functions using 256bit vectors to cpus known to be fast with it Signed-off-by: James Almer <jamrial@gmail.com>	2023-03-25 13:27:20 -03:00
James Darnley	073ec3b9da	avfilter/bwdif: add avx2 filter_line function 8-bit: 2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3 10-bit: 2.00x faster (1703±1.7 vs. 853±2.0 decicycles) compared with ssse3	2023-03-25 02:38:17 +01:00
James Darnley	b503b5a0cf	avfilter/bwdif: move filter_line init to a dedicated function	2023-03-25 02:38:17 +01:00
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2023-02-01 04:23:55 +01:00
Wang, Bin	459527108a	libavfilter/x86/vf_convolution: fix sobel swap issue on WIN64 Reviewed by: James Almer <jamrial@gmail.com> Signed-off-by: Wang, Bin <bin.wang@intel.com>	2022-11-21 12:28:25 +08:00
bwang30	3ab11dc5bb	libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter sobel_c: 4537 sobel_avx512icl 2136 Signed-off-by: bwang30 <bin.wang@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-11-14 10:04:16 +08:00
Paul B Mahol	00b03331a0	avfilter/vf_threshold: fix handling of zero threshold	2022-10-27 10:23:24 +02:00
Andreas Rheinhardt	ed42a51930	avfilter/x86/vf_bwdif: Remove obsolete MMXEXT functions The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:38:14 +02:00
Andreas Rheinhardt	7c3c1d938f	avfilter/x86/vf_idet: Remove obsolete MMX(EXT) functions The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:38:01 +02:00
Andreas Rheinhardt	4d7128be9a	avfilter/x86/vf_yadif: Remove obsolete MMXEXT functions The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:37:48 +02:00
Andreas Rheinhardt	77b2a422a0	avfilter/x86/vf_eq_init: Remove obsolete MMXEXT function x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from process_mmxext are truely ancient 32bit x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:36:31 +02:00
Andreas Rheinhardt	c5dd2fdc09	avfilter/x86/vf_noise: Remove obsolete MMX function x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from line_noise_mmx are truely ancient 32bit x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:32:08 +02:00
Andreas Rheinhardt	0df18f29ae	avfilter/af_afir: Only keep DSP stuff in header Only the AudioFIRDSPContext and the functions for its initialization are needed outside of lavfi/af_afir.c. Also rename the header to af_afirdsp.h to reflect the change. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:49 +02:00
Paul B Mahol	28d011516b	avfilter/x86/vf_limiter: use movu, dst may not be always aligned Happens with pad filter after limiter.	2022-03-24 09:44:09 +01:00
Marton Balint	5b3732227e	avfilter/x86/vf_blend: use unaligned movs for output Fixes crashes with: ffmpeg -f lavfi -i allyuv=d=1 -vf tblend=difference128,pad=5000:ih:1 -f null x Signed-off-by: Marton Balint <cus@passwd.hu>	2022-03-21 00:50:44 +01:00
Paul B Mahol	dae95b3ffd	avfilter/vf_maskedmerge: fix rounding when masking	2022-03-03 09:57:53 +01:00
Paul B Mahol	047c362d3c	avfilter/vf_nlmeans: add x86 SIMD	2021-11-11 21:54:46 +01:00
James Almer	39f3c98bb1	x86/vf_lut3d: use three operand form for some instructions Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2021-10-14 18:09:38 -03:00
Mark Reid	3ee7250116	avfilter/vf_lut3d: fix building with --disable-optimizations	2021-10-13 18:01:21 +02:00
Mark Reid	716b396740	avfilter/vf_lut3d: add x86-optimized tetrahedral interpolation I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines. I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp, 1 runs, 0 skips 1434035335 UNITS in lut3d->interp, 2 runs, 0 skips 1423615347 UNITS in lut3d->interp, 4 runs, 0 skips 1426268863 UNITS in lut3d->interp, 8 runs, 0 skips sse2 905484420 UNITS in lut3d->interp, 1 runs, 0 skips 905659010 UNITS in lut3d->interp, 2 runs, 0 skips 915167140 UNITS in lut3d->interp, 4 runs, 0 skips 915834222 UNITS in lut3d->interp, 8 runs, 0 skips avx 574794860 UNITS in lut3d->interp, 1 runs, 0 skips 581035090 UNITS in lut3d->interp, 2 runs, 0 skips 584116720 UNITS in lut3d->interp, 4 runs, 0 skips 581460290 UNITS in lut3d->interp, 8 runs, 0 skips avx2 301698880 UNITS in lut3d->interp, 1 runs, 0 skips 301982880 UNITS in lut3d->interp, 2 runs, 0 skips 306962430 UNITS in lut3d->interp, 4 runs, 0 skips 305472025 UNITS in lut3d->interp, 8 runs, 0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp, 1 runs, 0 skips 1502922990 UNITS in lut3d->interp, 2 runs, 0 skips 1496114307 UNITS in lut3d->interp, 4 runs, 0 skips 1492554551 UNITS in lut3d->interp, 8 runs, 0 skips sse2 980777180 UNITS in lut3d->interp, 1 runs, 0 skips 986121520 UNITS in lut3d->interp, 2 runs, 0 skips 986489840 UNITS in lut3d->interp, 4 runs, 0 skips 998832248 UNITS in lut3d->interp, 8 runs, 0 skips avx 622212360 UNITS in lut3d->interp, 1 runs, 0 skips 622981160 UNITS in lut3d->interp, 2 runs, 0 skips 645396315 UNITS in lut3d->interp, 4 runs, 0 skips 641057075 UNITS in lut3d->interp, 8 runs, 0 skips avx2 321336400 UNITS in lut3d->interp, 1 runs, 0 skips 321268920 UNITS in lut3d->interp, 2 runs, 0 skips 323459895 UNITS in lut3d->interp, 4 runs, 0 skips 324949967 UNITS in lut3d->interp, 8 runs, 0 skips	2021-10-10 22:23:48 +02:00
Wu Jianhua	e26c4d252f	avfilter/x86/vf_blend: unify indentation format Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-10-03 09:15:55 +02:00
Wu Jianhua	7bbad32d5a	libavfilter/x86/vf_gblur: correct the order of loop step The problem was caused by if the width of the processed block minus 1 is a multiple of the aligned number the instruction jle .bscale_scalar would skip the Optimized Loop Step, which will lead to an incorrect sampling when specifying steps more than 1. Move the Optimized Loop Step after .bscale_scalar to ensure the loop step is enabled. Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-09-18 12:38:01 +02:00
Wu Jianhua	fcf10c925d	libavfilter/x86/vf_gblur: fixed the fate-test failed on MacOS Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-09-18 12:37:56 +02:00
Wu Jianhua	4041c1029b	libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512() We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data: ff_horiz_slice_avx2(old): 109.89 ff_horiz_slice_avx2(new): 666.67 ff_horiz_slice_avx512: 1000 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Wu Jianhua	68a2722aee	libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512() The new vertical slice with AVX2/512 acceleration can significantly improve the performance of Gaussian Filter 2D. Performance data: ff_verti_slice_c: 32.57 ff_verti_slice_avx2: 476.19 ff_verti_slice_avx512: 833.33 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Wu Jianhua	4a5e24721c	libavfilter/x86/vf_gblur: add ff_postscale_slice_avx512() Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Paul B Mahol	0068b3d0f0	avfilter/avf_showcqt: switch to TX FFT from avutil	2021-07-27 21:16:28 +02:00
Andreas Rheinhardt	4608f7cc6a	Remove unnecessary mem.h inclusions Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-07-22 14:47:57 +02:00
James Almer	1628409b18	x86/vf_gblur: fix reg name in UNIX64 prologue Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 15:51:28 -03:00
James Almer	2b4da1cb8c	x86/vf_gblur: fix postscale_slice prologue x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64 ABI uses only the first four regs for this purpose. Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 13:33:20 -03:00
Paul B Mahol	44cf3a2b16	avfilter/x86/vf_gblur: add postscale SIMD	2021-02-16 21:12:11 +01:00
Paul B Mahol	c6ce18be08	avfilter/vf_convolution: add 16-column operation for filter_column() Based on patch by Xu Jun <xujunzz@sjtu.edu.cn>	2021-02-13 14:45:48 +01:00
Paul B Mahol	95183d25e8	avfilter/vf_atadenoise: add sigma options	2021-01-22 16:21:22 +01:00
Paul B Mahol	eaba6cecfb	avfilter/vf_v360: add mitchell interpolation	2020-10-04 19:23:52 +02:00
Paul B Mahol	fda5363c80	avfilter/x86/vf_convolution_init: there is asm only for 8bit depth	2020-09-15 08:13:04 +02:00
Limin Wang	71ec3e4583	Revert "avfilter/yadif: simplify the code for better readability" This reverts commit `2a9b934675`.	2020-08-27 07:30:30 +08:00
Limin Wang	2a9b934675	avfilter/yadif: simplify the code for better readability Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2020-08-26 14:21:11 +08:00
James Almer	320694ff84	x86/vf_blend: fix warnings about trailing empty parameters Finishes fixing ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com>	2020-07-12 11:30:23 -03:00
Paul B Mahol	8e1354c95d	avfilter/x86/vf_v360_init: add missing cases	2020-04-02 12:25:37 +02:00
Paul B Mahol	e4809e12ea	avfilter/vf_v360: add SIMD for lagrange9 interpolation	2020-04-02 12:25:37 +02:00
Martin Storsjö	0815a22dcc	vf_ssim: Fix loading doubles to float registers on i386 This fixes the tests filter-refcmp-ssim-yuv and filter-refcmp-ssim-rgb on i386 after breaking in `fcc0424c93`. Signed-off-by: Martin Storsjö <martin@martin.st>	2020-02-05 14:38:26 +02:00
Paul B Mahol	fcc0424c93	avfilter/vf_ssim: improve precision Use doubles for accumulating floats.	2020-02-04 18:28:04 +01:00
Paul B Mahol	3bf28d40e5	avfilter/vf_v360: change remaps to int16_t type	2020-01-19 19:54:29 +01:00
Marton Balint	1f8e43938b	avfilter/x86/vf_interlace: always use unaligned movs Fixes crashes in command lines such as: ffmpeg -f lavfi -i testsrc2=704x576:r=50,interlace,pad=720:576:8 -f null none Related to ticket #6491. Signed-off-by: Marton Balint <cus@passwd.hu>	2019-12-15 00:23:03 +01:00
Paul B Mahol	ac0f5f4c17	avfilter/vf_maskedclamp: add x86 SIMD	2019-10-23 16:20:21 +02:00
James Almer	738bc3e742	x86/vf_transpose: make ff_transpose_8x8_16_sse2 work on x86_32 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2019-10-22 13:51:13 -03:00

1 2 3 4 5 ...

330 Commits