Commit Graph

206 Commits

Author SHA1 Message Date
James Almer e3851169ee x86/vf_ssim: add ff_ssim_4x4_line_xop
~20% faster than ssse3. Also enabled for x86_32

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-20 13:18:05 -03:00
James Almer e1778fb657 x86/vf_ssim: fix some instruction comments
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-20 13:17:58 -03:00
Paul B Mahol eea08efc0d avfilter/x86/vf_psnr.asm: split one line of license text into two
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-07-14 23:54:26 +00:00
James Darnley bff7242608 avfilter/vf_removegrain: add x86 and x86_64 SSE2 functions
Speed of all modes increased by a factor between 7.4 and 19.8 largely depending
on whether bytes are unpacked into words.  Modes 2, 3, and 4 have been sped-up
by a factor of 43 (thanks quick sort!)

All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20,
21, and 22 are available on x86 due to the number of SIMD registers used.

With a contribution from James Almer <jamrial@gmail.com>
2015-07-14 23:50:50 +00:00
Ronald S. Bultje ae4c9ddebc vf_psnr: sse2 optimizations for sum-squared-error.
The internal line accumulator for 16bit can overflow, so I changed that
from int to uint64_t in the C code. The matching assembly looks a little
weird but output looks correct.

(avx2 should be trivial to add later.)

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-14 17:57:14 +02:00
Ronald S. Bultje dfc58584b4 vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.
Both are 2-2.5x faster than their C counterpart.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-14 05:07:07 +02:00
James Almer c16e99e3b3 x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-01 00:15:35 +02:00
James Almer d68c05380c x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
Michael Niedermayer 52fc3e372f avfilter/x86/vf_hqdn3d: Fix register types
Fixes Ticket4301

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-27 05:18:55 +02:00
Michael Niedermayer 5bc2c39527 avfilter/x86/vf_fspp: Fix invalid combination of opcode and operands
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-26 01:43:47 +02:00
Michael Niedermayer a6f9a5d0f6 avfilter/x86/vf_fspp: Fix loop condition for column_fidct()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-28 17:23:27 +01:00
Michael Niedermayer f5b3257c50 avfilter/vf_eq: mark src as const
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-27 01:46:08 +01:00
Michael Niedermayer 530bf8ece6 avfilter/vf_eq: Fix clipping code
Found-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-26 23:46:44 +01:00
Arwa Arif 4c38e960d0 avfilter: Port mp=eq/eq2 to lavfi
Code adapted from James Darnley's port
Some fixes from Paul B Mahol <onemda@gmail.com>

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-26 00:14:04 +01:00
James Almer da02ee127a x86/vf_pp7: port dctB_mmx to yasm
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-09 20:02:27 -03:00
Arwa Arif a299cd5ab3 lavfi: port mp=pp7 to libavfilter
The only difference with mp=pp7 is that default mode is "medium", as stated
in the MPlayer docs, rather than "hard".

Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
2015-01-09 17:26:31 +01:00
James Almer a4f876a1a2 x86/vf_fspp: move pxor in store slice functions out of the loop
m7 is not overwritten, so we only need to clear it once.
Found by Christophe Gisquet.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-26 17:15:34 -03:00
James Almer 466e32bf25 x86/vf_fspp: port inline asm to yasm
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-26 15:39:51 -03:00
James Almer b94e85453e avfilter/vf_fspp: add missing inline asm guards 2014-12-24 15:44:06 -03:00
Arwa Arif bdc4db0ee3 lavfi: port mp=fspp to a native libavfilter filter
Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
2014-12-24 16:29:18 +01:00
Michael Niedermayer 6706a2986c avfilter/vf_spp: Fix overflow in 8bit store slice
Fixes regression with
ffplay -f lavfi -i testsrc=640x480  -vf format=gray,boxblur=20:10,geq="'mod(lum(X,Y),16)*15'",boxblur=10,geq="'abs(mod(lum(X,Y),15)-7)*32'",spp=4:40

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-21 01:48:19 +01:00
Michael Niedermayer 838aa08d75 avfilter/vf_spp: support 10bit per sample
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 18:49:35 +01:00
Michael Niedermayer 30d2ac4bf9 avfilter/vf_spp: change temporary to unsigned
More consistent with uspp and allows for future 10bit support

Found-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-12 13:34:18 +01:00
Kieran Kunhya 96fda42a8f vf_interlace: get rid of useless loads
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-11-27 13:57:50 +01:00
Michael Niedermayer ca59b5b6ec avfilter/x86/vf_interlace: remove redundant instructions
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-25 12:37:19 +01:00
Michael Niedermayer 3fe3c8abb1 Merge commit 'ca5c3ff90972a5c97aabda2ace57ba72dcd7d83b'
* commit 'ca5c3ff90972a5c97aabda2ace57ba72dcd7d83b':
  vf_interlace: x86: improve asm performance

Conflicts:
	libavfilter/x86/vf_interlace.asm

See: 05e4b25e9b
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-25 12:31:45 +01:00
Michael Niedermayer ca5c3ff909 vf_interlace: x86: improve asm performance
4775 decicycles -> 3688 decicycles
2014-11-25 02:00:06 +00:00
Michael Niedermayer 05e4b25e9b avfilter/x86/vf_interlace: rewrite asm
4775 decicycles -> 3688 decicycles

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-15 04:09:03 +01:00
Michael Niedermayer fb3eb57369 avfilter/tinterlace: add Support for ff_lowpass_line_avx() & ff_lowpass_line_sse2()
Based-on: 2e1704059a by Kieran Kunhya

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-15 04:02:33 +01:00
Michael Niedermayer 6f373d75e8 Merge commit '2e1704059ae8625beda2ffde847ad22c5ba416dc'
* commit '2e1704059ae8625beda2ffde847ad22c5ba416dc':
  vf_interlace: Add SIMD for lowpass filter

Conflicts:
	libavfilter/vf_interlace.c
	libavfilter/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-15 02:39:49 +01:00
Kieran Kunhya 2e1704059a vf_interlace: Add SIMD for lowpass filter
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2014-11-15 00:35:31 +01:00
James Almer 864f9326fb x86/vf_noise: move asm code to a separate file
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-17 00:44:35 -03:00
Pascal Massimino 649b7a9946 av_filter/x86/idet: use HADDD where appropriate
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-09 19:02:49 -03:00
Pascal Massimino e3fd6a3a4e av_filter/x86/idet: MMX/SSE2 implementation of 16bits filter_line()
tested on http://ps-auxw.de/10bit-h264-sample/10bit-eldorado.mkv
MMX: ~30% faster decoding overall
SSE2:~40% faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-09 16:47:22 +02:00
James Darnley db8970d7b6 vfi/x86/vf_idet: fix incorrect use of paddq
paddq is an SSE2 instruction so it cannot be used for MMX.

This was probably just a typo because the sums are dwords anyway.

Reviewed-by: Pascal Massimino <pascal.massimino@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 12:49:34 +02:00
Pascal Massimino 161fc0f463 avfilter/x86/idet: fix license header (GPL -> LGPL)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 12:22:36 +02:00
skal 406a9ccffe avfilter/vf_idet: MMX/MMXEXT/SSE2 implementation of idet's filter_line()
integration by Neil Birkbeck, with help from Vitor Sessak.
core SSE2 loop by Skal (pascal.massimino@gmail.com)

Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-04 22:19:00 +02:00
Andreas Cadhalpun 39a6e02fd4 fix spelling errors
Reviewed-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-12 22:33:27 +02:00
James Almer ddea3b7106 x86/yadif-10: remove duplicate ABS macro
And use the x86util ones instead, which are optimized for mmxext/sse2.
About ~1% increase in performance on pre SSSE3 processors.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-10 21:06:51 +02:00
Michael Niedermayer a348f4befe avfilter/x86/vf_pullup: fix "invalid combination of opcode and operands" with nasm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-28 16:06:00 +02:00
Michael Niedermayer b8255a4c70 avfilter/x86/vf_pullup: fix old typo
This makes C and MMX match, no change to fate as the differences where
apparently not sufficient to show up in fate

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-25 18:22:48 +02:00
Michael Niedermayer 6dffc8f5aa avfilter/vf_pullup: use ptrdiff_t as stride argument for dsp functions
This should avoid issues on x86_64

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-25 18:22:31 +02:00
Christophe Gisquet 9107612818 x86util: add and use RSHIFT/LSHIFT macros
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 13:19:27 +02:00
Michael Niedermayer ebb21887b8 Merge commit '01c5779f56cf708e6cb88b11cfdc248cae7e2ee8'
* commit '01c5779f56cf708e6cb88b11cfdc248cae7e2ee8':
  x86: Drop some unnecessary YASM ifdefs

Conflicts:
	libavfilter/x86/vf_yadif_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 19:16:39 +02:00
Diego Biurrun 01c5779f56 x86: Drop some unnecessary YASM ifdefs
Dead code elimination is enough to avoid undefined references in these cases.
2014-04-04 19:08:05 +02:00
Robert Krüger 194ef56ba7 Change license of yadif from GPL to LGPL
Signed-off-by: Robert Krüger <krueger@lesspain.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-14 14:19:15 +01:00
Robert Krüger 4a38eeec38 Revert "Revert "vf_yadif: move x86 init code to x86/yadif.c""
This reverts commit 975110a85e.

Signed-off-by: Robert Krüger <krueger@lesspain.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-14 14:19:14 +01:00
Robert Krüger d8e763fda7 vf_yadif: Relicense from GPL to LGPL
All copyright holders have agreed to the relicensing.
2014-01-14 00:04:59 +01:00
Michael Niedermayer 975110a85e Revert "vf_yadif: move x86 init code to x86/yadif.c"
This reverts commit a87b17f328.
This reduces the amount of non LGPL code, making a relicensing to LGPL
easier

Conflicts:

	libavfilter/vf_yadif.c
	libavfilter/x86/yadif.c
	libavfilter/x86/yadif_template.c
	libavfilter/yadif.h

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-12-01 20:26:26 +01:00
Clément Bœsch 969329fe11 Revert "Merge commit 'ed1a11ed52bbd1f15bb9b0416d69b7924bee3191'"
This reverts commit fc5fe4804f, reversing
changes made to ffe3350098.

The factoring is broken; it's not calling the ssse3 code anymore, and
calling the mmx2 code with bad alignment. It also broke some FATE
instances.

Conflicts:
	libavfilter/x86/vf_gradfun_init.c
2013-11-01 14:28:08 +01:00
Michael Niedermayer c6125f5e1c avfilter/x86/vf_gradfun_init: fix some consts & related warnings
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-01 14:20:10 +01:00
Michael Niedermayer fc5fe4804f Merge commit 'ed1a11ed52bbd1f15bb9b0416d69b7924bee3191'
* commit 'ed1a11ed52bbd1f15bb9b0416d69b7924bee3191':
  gradfun: x86: Factor out common code for some gradfun_filter_line() variants

Conflicts:
	libavfilter/x86/vf_gradfun_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-01 10:26:49 +01:00
Michael Niedermayer ffe3350098 Merge commit 'ee80cf741a44115758e62399b7bde08d33161151'
* commit 'ee80cf741a44115758e62399b7bde08d33161151':
  avfilter: x86: K&R formatting cosmetics

Conflicts:
	libavfilter/x86/vf_gradfun_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-01 10:20:20 +01:00
Diego Biurrun ed1a11ed52 gradfun: x86: Factor out common code for some gradfun_filter_line() variants 2013-10-31 16:34:18 +01:00
Diego Biurrun ee80cf741a avfilter: x86: K&R formatting cosmetics 2013-10-31 12:15:54 +01:00
Michael Niedermayer a826efb55a avfilter/x86/vf_gradfun_init: fix const and related warnings
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-24 12:12:59 +02:00
Michael Niedermayer 1ea28ffc4d Merge commit '0e730494160d973400aed8d2addd1f58a0ec883e'
* commit '0e730494160d973400aed8d2addd1f58a0ec883e':
  avfilter: x86: Port gradfun filter optimizations to yasm

Conflicts:
	libavfilter/x86/vf_gradfun_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-24 10:35:39 +02:00
Daniel Kang 0e73049416 avfilter: x86: Port gradfun filter optimizations to yasm
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-10-23 14:50:27 +02:00
Michael Niedermayer f4f8499c19 Merge commit 'f6633c55a3c0e93a5b2bab6aa0692fb608f2a38d'
* commit 'f6633c55a3c0e93a5b2bab6aa0692fb608f2a38d':
  avfilter: Fix typo in Loren's email address

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-23 12:14:49 +02:00
Diego Biurrun f6633c55a3 avfilter: Fix typo in Loren's email address 2013-10-23 10:25:14 +02:00
Paul B Mahol 112017e990 avfilter/x86/vf_pullup: try to fix build on x64
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2013-09-17 17:20:58 +00:00
Paul B Mahol 9c774459a9 avfilter: port pullup filter from libmpcodecs
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2013-09-17 17:03:36 +00:00
Michael Niedermayer 9d01bf7d66 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Consistently use "cpu_flags" as variable/parameter name for CPU flags

Conflicts:
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/h264dsp_init.c
	libavcodec/x86/hpeldsp_init.c
	libavcodec/x86/motion_est.c
	libavcodec/x86/mpegvideo.c
	libavcodec/x86/proresdsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-18 09:53:47 +02:00
Diego Biurrun 3ac7fa81b2 Consistently use "cpu_flags" as variable/parameter name for CPU flags 2013-07-18 00:31:35 +02:00
Clément Bœsch a2c547ffec lavfi: add spp filter. 2013-06-14 01:27:22 +02:00
James Darnley b0ef0ae776 yadif: restore speed of the C filtering code
Always use the special filter for the first and last 3 columns (only).

Changes made in 64ed397 slowed the filter to just under 3/4 of what it
was.  This commit restores the speed while maintaining identical output.

For reference, on my Athlon64:
1733222 decicycles in old
2358563 decicycles in new
1727558 decicycles in this

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-05-14 09:23:55 +02:00
Michael Niedermayer 696f5f98e2 Merge commit '6e9f8d6a7d7392a236df19fef6f4eba41f18167e'
* commit '6e9f8d6a7d7392a236df19fef6f4eba41f18167e':
  x86: vf_yadif: Remove stray dsputil_mmx #include

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-09 11:51:40 +02:00
Diego Biurrun 6e9f8d6a7d x86: vf_yadif: Remove stray dsputil_mmx #include 2013-05-08 18:18:23 +02:00
Michael Niedermayer a8ff830b79 Merge commit '093804a93cc5da3f95f98265a5df116912443cec'
* commit '093804a93cc5da3f95f98265a5df116912443cec':
  avfilter: Add av_cold attributes to init/uninit functions

Conflicts:
	libavfilter/af_ashowinfo.c
	libavfilter/af_volume.c
	libavfilter/src_movie.c
	libavfilter/vf_lut.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-05 11:42:18 +02:00
Diego Biurrun 093804a93c avfilter: Add av_cold attributes to init/uninit functions 2013-05-04 21:10:05 +02:00
Michael Niedermayer 0a73803c86 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: Move some conditional code around to avoid unused variable warnings

Conflicts:
	libavcodec/x86/dsputil_mmx.c
	libavfilter/x86/vf_yadif_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-23 11:01:46 +02:00
Diego Biurrun c1ad70c3cb x86: Move some conditional code around to avoid unused variable warnings 2013-04-22 17:50:02 +02:00
Clément Bœsch 1ae44c87c9 lavfi/gradfun: remove rounding to match C and SSE code.
There is no noticable benefit for such precision.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-03-28 07:59:29 +01:00
Clément Bœsch 38a2f88d39 lavfi/gradfun: fix dithering in MMX code.
Current dithering only uses the first 4 instead of the whole 8 random values.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-03-28 07:59:18 +01:00
Clément Bœsch 2d66fc543b lavfi/gradfun: fix rounding in MMX code.
Current code divides before increasing precision.

Also reduce upper bound for strength from 255 to 64.  This will prevent
an overflow in the SSSE3 and MMX filter_line code: delta is expressed as
an u16 being shifted by 2 to the left. If it overflows, having a
strength not above 64 will make sure that m is set to 0 (making the
m*m*delta >> 14 expression void).

A value above 64 should not make any sense unless gradfun is used as
a blur filter.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-03-28 07:59:04 +01:00
James Darnley c9a51c29fc yadif: remove an 'm' from the LOAD macro definition
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:33:49 +01:00
James Darnley 1d3b14cac2 yadif: remove repeated check on width
The filter already checks that width (and height) are greater than 3.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:33:30 +01:00
James Darnley 7976d92dac yadif: cosmetic indentation from previous commits
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:33:06 +01:00
James Darnley 0a5814c9ba yadif: x86 assembly for 9 to 14-bit samples
These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2).  It also avoids emulating the missing double word
instructions on older instruction sets.

Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.

Athlon64:
1809275 decicycles in C,    32718 runs, 50 skips
 911675 decicycles in mmx,  32727 runs, 41 skips, 2.0x faster
 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster

Core2Quad:
 921363 decicycles in C,     32756 runs, 12 skips
 486537 decicycles in mmx,   32764 runs,  4 skips, 1.9x faster
 293296 decicycles in sse2,  32759 runs,  9 skips, 3.1x faster
 284910 decicycles in ssse3, 32759 runs,  9 skips, 3.2x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:54 +01:00
James Darnley 17e7b49501 yadif: x86 assembly for 16-bit samples
This is a fairly dumb copy of the assembly for 8-bit samples but it
works and produces identical output to the C version.  The options have
been tested on an Athlon64 and a Core2Quad.

Athlon64:
1810385 decicycles in C,    32726 runs, 42 skips
1080744 decicycles in mmx,  32744 runs, 24 skips, 1.7x faster
 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster

Core2Quad:
 924025 decicycles in C,     32750 runs, 18 skips
 623995 decicycles in mmx,   32767 runs,  1 skips, 1.5x faster
 406223 decicycles in sse2,  32764 runs,  4 skips, 2.3x faster
 387842 decicycles in ssse3, 32767 runs,  1 skips, 2.4x faster
 307726 decicycles in sse4,  32763 runs,  5 skips, 3.0x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:34 +01:00
James Darnley 0735b50880 yadif: restore speed of the C filtering code
Always use the special filter for the first and last 3 columns (only).

Changes made in 64ed397 slowed the filter to just under 3/4 of what it
was.  This commit restores the speed while maintaining identical output.

For reference, on my Athlon64:
1733222 decicycles in old
2358563 decicycles in new
1727558 decicycles in this

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-13 22:07:25 +01:00
Loren Merritt 5b3c1aecb2 hqdn3d: Fix out of array read in LOWPASS
CC:libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-03-13 09:14:59 +01:00
Michael Niedermayer 446f7c62a2 Merge commit '64ed397635ef2666b0ca0c8d8c60a8bc44581d82'
* commit '64ed397635ef2666b0ca0c8d8c60a8bc44581d82':
  vf_yadif: fix out-of line reads

Conflicts:
	libavfilter/vf_yadif.c
	tests/ref/fate/filter-yadif-mode0
	tests/ref/fate/filter-yadif-mode1

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-16 09:09:38 +01:00
Anton Khirnov 64ed397635 vf_yadif: fix out-of line reads
Some changes in the border pixels, visually indistinguishable.
2013-02-15 16:08:33 +01:00
Michael Niedermayer 6e9f3f3b65 Merge commit '238614de679a71970c20d7c3fee08a322967ec40'
* commit '238614de679a71970c20d7c3fee08a322967ec40':
  cdgraphics: do not rely on get_buffer() initializing the frame.
  svq1: replace struct svq1_frame_size with an array.
  vf_yadif: silence a warning.

Conflicts:
	libavcodec/svq1dec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-06 14:53:49 +01:00
Anton Khirnov 99162f8d46 vf_yadif: silence a warning.
clang says:
libavfilter/vf_yadif.c:192:28: warning: incompatible pointer types assigning to
'void (*)(uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int)'
from 'void (uint16_t *, uint16_t *, uint16_t *, uint16_t *, int, int, int, int, int)'
2013-02-06 10:21:51 +01:00
Michael Niedermayer 0b6f34cc9f Merge remote-tracking branch 'qatar/master'
* qatar/master:
  avfilter: x86: consistent filenames for filter optimizations

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-05 11:56:48 +01:00
Diego Biurrun e66240f22e avfilter: x86: consistent filenames for filter optimizations 2013-02-04 15:00:47 +01:00
Michael Niedermayer d593f2b241 avfilter/x86/vf_hqdn3d_init: fix author attribution & project name
Reference: 7a1944b907

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-02 13:18:09 +01:00
Michael Niedermayer 0d13a7b786 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  vf_hqdn3d: x86: Add proper arch optimization initialization

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-02 13:18:01 +01:00
Diego Biurrun 76d90125cd vf_hqdn3d: x86: Add proper arch optimization initialization 2013-02-01 13:11:45 +01:00
Michael Niedermayer 329675cfd7 Merge commit 'a1c525f7eb0783d31ba7a653865b6cbd3dc880de'
* commit 'a1c525f7eb0783d31ba7a653865b6cbd3dc880de':
  pcx: return meaningful error codes.
  tmv: return meaningful error codes.
  msrle: return meaningful error codes.
  cscd: return meaningful error codes.
  yadif: x86: fix build for compilers without aligned stack
  lavc: introduce the convenience function init_get_bits8
  lavc: check for overflow in init_get_bits

Conflicts:
	libavcodec/cscd.c
	libavcodec/pcx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-14 14:43:32 +01:00
Daniel Kang 67360ccd51 yadif: x86: fix build for compilers without aligned stack
Manually load registers to avoid using 8 registers on x86_32 with
compilers that do not align the stack (e.g. MSVC).

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-14 09:51:52 +01:00
Michael Niedermayer 65b8527993 Merge commit 'f7bf72a4a1146a7583577c9bdc066767e1ba3c6a'
* commit 'f7bf72a4a1146a7583577c9bdc066767e1ba3c6a':
  idcinvideo: correctly set AVFrame defaults
  yadif: Port inline assembly to yasm
  au: remove unnecessary casts
  au: return AVERROR codes instead of -1

Conflicts:
	libavcodec/idcinvideo.c
	libavfilter/x86/yadif_template.c
	libavformat/au.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-10 12:27:16 +01:00
Daniel Kang 899157b308 yadif: Port inline assembly to yasm
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-09 18:41:02 +01:00
Clément Bœsch 63e1fc2588 lavfi/gradfun: remove rounding to match C and SSE code.
There is no noticable benefit for such precision.
2012-12-19 03:13:25 +01:00
Clément Bœsch 60ba9a9a88 lavfi/gradfun: fix dithering in MMX code.
Current dithering only use the first 4w instead of the whole 8 random values.
2012-12-19 03:13:25 +01:00
Clément Bœsch 49de902a1e lavfi/gradfun: fix rounding in MMX code.
Current code divide before increasing precision.
2012-12-19 03:13:25 +01:00
Carl Eugen Hoyos 24b20087bd Fix compilation with yasm 0.6.2. 2012-12-07 00:26:45 +01:00
Michael Niedermayer 54a71f2e6c Merge commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56'
* commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56':
  pixdesc: fix yuva 10bit bit depth
  avconv: deprecate the -vol option
  x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling
  x86: af_volume: add SSE2-optimized s16 volume scaling

Conflicts:
	ffmpeg.c
	tests/ref/lavfi/pixdesc
	tests/ref/lavfi/pixfmts_copy
	tests/ref/lavfi/pixfmts_null
	tests/ref/lavfi/pixfmts_scale
	tests/ref/lavfi/pixfmts_vflip

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-06 15:55:47 +01:00