Commit Graph

20 Commits

Author SHA1 Message Date
Martin Vignali f3df42e81d avfilter/x86/vf_blend : add SIMD for 16 bit version of
grainextract
grainmerge
average
extremity
negation
2018-04-05 21:46:16 +02:00
Martin Vignali 8eb0bb1108 avfilter/x86/vf_blend : reorganize DIFFERENCE macro to reduce line duplication between 8bit and 16 bit version 2018-04-05 21:46:11 +02:00
Martin Vignali 53a03b5c8c avfilter/x86/vf_blend : add 16 bit version for BLEND_SIMPLE, phoenix, difference for SSE and AVX2 (x86_64) 2018-02-24 21:44:19 +01:00
Martin Vignali 3a230ce5fa avfilter/x86/vf_blend : avfilter/x86/vf_blend : add AVX2 version for each func except divide
and optimize average, grainextract, multiply, screen, grain merge
2018-01-28 20:21:32 +01:00
Paul B Mahol f8d0689d3f avfilter/vf_blend: rename addition128 and difference128 to grainmerge and grainextract 2017-08-24 14:45:52 +02:00
James Almer d2ef9e6e7f x86/vf_blend: use ABS2 macro 2017-06-27 20:45:55 -03:00
James Almer 0daa1cf073 x86/vf_blend: optimize difference and negation functions
Process more pixels per loop.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 13:17:23 -03:00
James Almer fa50d9360b x86/vf_blend: add sse and ssse3 extremity functions
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 13:17:23 -03:00
Timothy Gu 222e6da605 x86/vf_blend: Add SSE2 optimization for divide
4.5x faster than C float version with autovectorization
10  x faster than C int version
25  x faster than C float version without autovectorization
2016-02-28 08:19:09 -08:00
Timothy Gu 4574323973 vf_blend: Reduce number of arguments for kernel function 2016-02-14 08:58:41 -08:00
Timothy Gu 74f8d9aaef x86/vf_blend: Add SSE2 optimization for screen
10x faster than C.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
2016-02-10 11:26:04 -08:00
Timothy Gu c8b1612af0 x86/vf_blend: Move multiplying to a macro
Reviewed-by: Paul B Mahol <onemda@gmail.com>
2016-02-10 11:25:11 -08:00
Timothy Gu 253209ac44 vf_blend: Add SSE2 optimization for multiply
5 times faster than C, 3 times overall.
2016-02-08 13:35:24 -08:00
James Almer 8dba3fb8fd x86/vf_blend: add sse2 versions of blend_difference and blend_negation
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-24 13:05:27 -03:00
James Almer 02f428051a x86/vf_blend: make all functions work on x86_32
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-24 13:05:24 -03:00
James Almer 0988c68cf9 x86/vf_blend: simplify using macros
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-24 13:05:21 -03:00
Paul B Mahol 624a1a0e69 avfilter/x86/vf_blend.asm: hardmix: do same with two pxor instructions less
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-07 23:12:09 +02:00
Paul B Mahol e999210cec avfilter/x86/vf_blend.asm: 11th register is used, update functions
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-07 22:53:54 +02:00
Paul B Mahol 0948ba3204 avfilter/x86/vf_blend.asm: add hardmix and phoenix sse2 SIMD
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-07 22:50:15 +02:00
Paul B Mahol 9762554dd0 avfilter/vf_blend: add x86 SIMD for some modes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-03 21:26:17 +02:00