ffmpeg

Commit Graph

Author	SHA1	Message	Date
Paul B Mahol	047c362d3c	avfilter/vf_nlmeans: add x86 SIMD	2021-11-11 21:54:46 +01:00
James Almer	39f3c98bb1	x86/vf_lut3d: use three operand form for some instructions Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2021-10-14 18:09:38 -03:00
Mark Reid	3ee7250116	avfilter/vf_lut3d: fix building with --disable-optimizations	2021-10-13 18:01:21 +02:00
Mark Reid	716b396740	avfilter/vf_lut3d: add x86-optimized tetrahedral interpolation I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines. I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp, 1 runs, 0 skips 1434035335 UNITS in lut3d->interp, 2 runs, 0 skips 1423615347 UNITS in lut3d->interp, 4 runs, 0 skips 1426268863 UNITS in lut3d->interp, 8 runs, 0 skips sse2 905484420 UNITS in lut3d->interp, 1 runs, 0 skips 905659010 UNITS in lut3d->interp, 2 runs, 0 skips 915167140 UNITS in lut3d->interp, 4 runs, 0 skips 915834222 UNITS in lut3d->interp, 8 runs, 0 skips avx 574794860 UNITS in lut3d->interp, 1 runs, 0 skips 581035090 UNITS in lut3d->interp, 2 runs, 0 skips 584116720 UNITS in lut3d->interp, 4 runs, 0 skips 581460290 UNITS in lut3d->interp, 8 runs, 0 skips avx2 301698880 UNITS in lut3d->interp, 1 runs, 0 skips 301982880 UNITS in lut3d->interp, 2 runs, 0 skips 306962430 UNITS in lut3d->interp, 4 runs, 0 skips 305472025 UNITS in lut3d->interp, 8 runs, 0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp, 1 runs, 0 skips 1502922990 UNITS in lut3d->interp, 2 runs, 0 skips 1496114307 UNITS in lut3d->interp, 4 runs, 0 skips 1492554551 UNITS in lut3d->interp, 8 runs, 0 skips sse2 980777180 UNITS in lut3d->interp, 1 runs, 0 skips 986121520 UNITS in lut3d->interp, 2 runs, 0 skips 986489840 UNITS in lut3d->interp, 4 runs, 0 skips 998832248 UNITS in lut3d->interp, 8 runs, 0 skips avx 622212360 UNITS in lut3d->interp, 1 runs, 0 skips 622981160 UNITS in lut3d->interp, 2 runs, 0 skips 645396315 UNITS in lut3d->interp, 4 runs, 0 skips 641057075 UNITS in lut3d->interp, 8 runs, 0 skips avx2 321336400 UNITS in lut3d->interp, 1 runs, 0 skips 321268920 UNITS in lut3d->interp, 2 runs, 0 skips 323459895 UNITS in lut3d->interp, 4 runs, 0 skips 324949967 UNITS in lut3d->interp, 8 runs, 0 skips	2021-10-10 22:23:48 +02:00
Wu Jianhua	e26c4d252f	avfilter/x86/vf_blend: unify indentation format Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-10-03 09:15:55 +02:00
Wu Jianhua	7bbad32d5a	libavfilter/x86/vf_gblur: correct the order of loop step The problem was caused by if the width of the processed block minus 1 is a multiple of the aligned number the instruction jle .bscale_scalar would skip the Optimized Loop Step, which will lead to an incorrect sampling when specifying steps more than 1. Move the Optimized Loop Step after .bscale_scalar to ensure the loop step is enabled. Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-09-18 12:38:01 +02:00
Wu Jianhua	fcf10c925d	libavfilter/x86/vf_gblur: fixed the fate-test failed on MacOS Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-09-18 12:37:56 +02:00
Wu Jianhua	4041c1029b	libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512() We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data: ff_horiz_slice_avx2(old): 109.89 ff_horiz_slice_avx2(new): 666.67 ff_horiz_slice_avx512: 1000 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Wu Jianhua	68a2722aee	libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512() The new vertical slice with AVX2/512 acceleration can significantly improve the performance of Gaussian Filter 2D. Performance data: ff_verti_slice_c: 32.57 ff_verti_slice_avx2: 476.19 ff_verti_slice_avx512: 833.33 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Wu Jianhua	4a5e24721c	libavfilter/x86/vf_gblur: add ff_postscale_slice_avx512() Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Paul B Mahol	0068b3d0f0	avfilter/avf_showcqt: switch to TX FFT from avutil	2021-07-27 21:16:28 +02:00
Andreas Rheinhardt	4608f7cc6a	Remove unnecessary mem.h inclusions Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-07-22 14:47:57 +02:00
James Almer	1628409b18	x86/vf_gblur: fix reg name in UNIX64 prologue Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 15:51:28 -03:00
James Almer	2b4da1cb8c	x86/vf_gblur: fix postscale_slice prologue x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64 ABI uses only the first four regs for this purpose. Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 13:33:20 -03:00
Paul B Mahol	44cf3a2b16	avfilter/x86/vf_gblur: add postscale SIMD	2021-02-16 21:12:11 +01:00
Paul B Mahol	c6ce18be08	avfilter/vf_convolution: add 16-column operation for filter_column() Based on patch by Xu Jun <xujunzz@sjtu.edu.cn>	2021-02-13 14:45:48 +01:00
Paul B Mahol	95183d25e8	avfilter/vf_atadenoise: add sigma options	2021-01-22 16:21:22 +01:00
Paul B Mahol	eaba6cecfb	avfilter/vf_v360: add mitchell interpolation	2020-10-04 19:23:52 +02:00
Paul B Mahol	fda5363c80	avfilter/x86/vf_convolution_init: there is asm only for 8bit depth	2020-09-15 08:13:04 +02:00
Limin Wang	71ec3e4583	Revert "avfilter/yadif: simplify the code for better readability" This reverts commit `2a9b934675`.	2020-08-27 07:30:30 +08:00
Limin Wang	2a9b934675	avfilter/yadif: simplify the code for better readability Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2020-08-26 14:21:11 +08:00
James Almer	320694ff84	x86/vf_blend: fix warnings about trailing empty parameters Finishes fixing ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com>	2020-07-12 11:30:23 -03:00
Paul B Mahol	8e1354c95d	avfilter/x86/vf_v360_init: add missing cases	2020-04-02 12:25:37 +02:00
Paul B Mahol	e4809e12ea	avfilter/vf_v360: add SIMD for lagrange9 interpolation	2020-04-02 12:25:37 +02:00
Martin Storsjö	0815a22dcc	vf_ssim: Fix loading doubles to float registers on i386 This fixes the tests filter-refcmp-ssim-yuv and filter-refcmp-ssim-rgb on i386 after breaking in `fcc0424c93`. Signed-off-by: Martin Storsjö <martin@martin.st>	2020-02-05 14:38:26 +02:00
Paul B Mahol	fcc0424c93	avfilter/vf_ssim: improve precision Use doubles for accumulating floats.	2020-02-04 18:28:04 +01:00
Paul B Mahol	3bf28d40e5	avfilter/vf_v360: change remaps to int16_t type	2020-01-19 19:54:29 +01:00
Marton Balint	1f8e43938b	avfilter/x86/vf_interlace: always use unaligned movs Fixes crashes in command lines such as: ffmpeg -f lavfi -i testsrc2=704x576:r=50,interlace,pad=720:576:8 -f null none Related to ticket #6491. Signed-off-by: Marton Balint <cus@passwd.hu>	2019-12-15 00:23:03 +01:00
Paul B Mahol	ac0f5f4c17	avfilter/vf_maskedclamp: add x86 SIMD	2019-10-23 16:20:21 +02:00
James Almer	738bc3e742	x86/vf_transpose: make ff_transpose_8x8_16_sse2 work on x86_32 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2019-10-22 13:51:13 -03:00
James Almer	27bae5aaca	x86/vf_transpose: fix cpuflags check Signed-off-by: James Almer <jamrial@gmail.com>	2019-10-21 17:01:39 -03:00
Paul B Mahol	ccd9bca15a	avfilter/vf_transpose: add x86 SIMD	2019-10-21 20:37:51 +02:00
Paul B Mahol	f7f4691f9f	avfilter/x86/vf_atadenoise: fix comment	2019-10-21 17:56:45 +02:00
Paul B Mahol	0ae6fb276b	avfilter/x86/vf_atadenoise: add SIMD for serial too	2019-10-17 21:05:50 +02:00
Paul B Mahol	71e33c6e01	avfilter/vf_atadenoise: add option to use additional algorithm	2019-10-17 20:28:31 +02:00
Paul B Mahol	295d99b439	avfilter/vf_adadenoise: add x86 SIMD	2019-10-17 19:44:11 +02:00
Paul B Mahol	64a805883d	avfilter/vf_gblur: fix heap-buffer overflow Fixes #8282	2019-10-16 12:13:04 +02:00
Andreas Rheinhardt	361fb42e1e	avcodec/filter: Remove extra '; ' outside of functions They are not allowed outside of functions. Fixes the warning "ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]" when compiling with GCC and -pedantic. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2019-10-07 21:15:55 +02:00
James Almer	1dbd3c6116	avfilter/vf_eq: fix compilation with x86 asm disabled Signed-off-by: James Almer <jamrial@gmail.com>	2019-09-26 12:19:43 -03:00
Ting Fu	4f589d668e	avfilter/x86/vf_eq: add SSE2 version Signed-off-by: Ting Fu <ting.fu@intel.com>	2019-09-26 08:12:36 +08:00
Ting Fu	6aff2042d6	avfilter/x86/vf_eq: Change inline assembly into nasm code Signed-off-by: Ting Fu <ting.fu@intel.com>	2019-09-26 08:11:13 +08:00
Paul B Mahol	921eb21b1d	avfilter/x86/vf_360: add most of >8 depth asm	2019-09-16 10:21:16 +02:00
James Almer	4857688732	x86/vf_v360: use a faster horizontal add in remap4_8bit_line_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2019-09-06 12:11:46 -03:00
James Almer	2200cf1aca	x86/vf_v360: make remap{1,2}_8bit_line_avx2 work on x86_32 Signed-off-by: James Almer <jamrial@gmail.com>	2019-09-06 11:11:45 -03:00
Paul B Mahol	058bbf48c6	avfilter/vf_v360: x86 SIMD for interpolations	2019-09-06 14:10:37 +02:00
Ruiling Song	98e419cbf5	avfilter/vf_convolution: add x86 SIMD for filter_3x3() Tested using a simple command (apply edge enhance): ./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \ -vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \ -an -vframes 1000 -f null /dev/null The fps increase from 151 to 270 on my local machine. Signed-off-by: Ruiling Song <ruiling.song@intel.com>	2019-08-07 14:31:28 +08:00
James Almer	b8f1542dcb	avfilter/vf_gblur: add missing preprocessor check Fixes compilation on x86_32 Signed-off-by: James Almer <jamrial@gmail.com>	2019-06-12 10:54:59 -03:00
Ruiling Song	83f9da7768	avfilter/vf_gblur: add x86 SIMD optimizations The horizontal pass get ~2x performance with the patch under single thread. Tested overall performance using the command(avx2 enabled): ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null For single thread, the fps improves from 43 to 60, about 40%. For multi-thread, the fps improves from 110 to 130, about 20%. Signed-off-by: Ruiling Song <ruiling.song@intel.com>	2019-06-12 08:53:11 +08:00
Paul B Mahol	dcae5ba322	avfilter: add anlmdn filter x86 SIMD optimizations	2019-01-10 21:49:47 +01:00
James Almer	ef67af31ff	x86/af_afir: use three operand form forat some instructions Fixes compilation with old yasm versions. Signed-off-by: James Almer <jamrial@gmail.com>	2019-01-03 23:36:19 -03:00

1 2 3 4 5 ...

310 Commits