From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.
Moreover, some of the removed code was buggy/not bitexact
and lead to failures involving the f32le and f32be versions of
gray, gbrp and gbrap on x86-32 when SSE2 was not disabled.
See e.g.
https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx
Notice that yuv2yuvX_mmx is not removed, because it is used
by SSE3 and AVX2 as fallback in case of unaligned data and
also for tail processing. I don't know why yuv2yuvX_mmxext
isn't being used for this; an earlier version [1] of
554c2bc708 used it, but
the version that was eventually applied does not.
[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2
macro name, drop pointless check for MMX support, we always assume MMX is
available in our SIMD code, fix spelling.
xmm6 was being clobbered in ff_hscale8to{15,19}_8_sse2 on Win64
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
x86: include x86inc.asm in x86util.asm
cng: Reindent some incorrectly indented lines
cngdec: Allow flushing the decoder
cngdec: Make the dbov variable have the right unit
cngdec: Fix the memset size to cover the full array
cngdec: Update the LPC coefficients after averaging the reflection coefficients
configure: fix print_config() with broke awks
Conflicts:
libavcodec/x86/ac3dsp.asm
libavcodec/x86/dct32.asm
libavcodec/x86/deinterlace.asm
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputilenc.asm
libavcodec/x86/fft.asm
libavcodec/x86/fmtconvert.asm
libavcodec/x86/h264_chromamc.asm
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264_idct.asm
libavcodec/x86/h264_idct_10bit.asm
libavcodec/x86/h264_intrapred.asm
libavcodec/x86/h264_intrapred_10bit.asm
libavcodec/x86/h264_weight.asm
libavcodec/x86/vc1dsp.asm
libavcodec/x86/vp3dsp.asm
libavcodec/x86/vp56dsp.asm
libavcodec/x86/vp8dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* qatar/master:
dxa: remove useless code
lavf: don't select an attached picture as default stream for seeking.
avconv: remove pointless checks.
avconv: check for get_filtered_frame() failure.
avconv: remove a pointless check.
swscale: convert hscale() to use named arguments.
x86inc: add *mp named argument support to DEFINE_ARGS.
swscale: convert hscale to cpuflags().
Conflicts:
ffmpeg.c
libswscale/x86/scale.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (46 commits)
mtv: Make sure audio_subsegments is not 0
v4l2: use V4L2_FMT_FLAG_EMULATED only if it is defined
avconv: add symbolic names for -vsync parameters
flvdec: Fix compiler warning for uninitialized variables
rtsp: Fix compiler warning for uninitialized variable
ulti: convert to new bytestream API.
swscale: Use standard multiple inclusion guards in ppc/ header files.
Place some START_TIMER invocations in separate blocks.
v4l2: list available formats
v4l2: set the proper codec_tag
v4l2: refactor device_open
v4l2: simplify away io_method
v4l2: cosmetics
v4l2: uniform and format options
v4l2: do not force interlaced mode
avio: exit early in fill_buffer without read_packet
vc1dec: fix invalid memory access for small video dimensions
rv34: fix invalid memory access for small video dimensions
rv34: joint coefficient decoding and dequantization
avplay: Don't call avio_set_interrupt_cb(NULL)
...
Conflicts:
Changelog
avconv.c
doc/APIchanges
doc/indevs.texi
libavcodec/adxenc.c
libavcodec/dnxhdenc.c
libavcodec/h264.c
libavdevice/v4l2.c
libavformat/flvdec.c
libavformat/mtv.c
libswscale/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
applehttp: Properly clean up if unable to probe a segment
applehttp: Avoid reading uninitialized memory
fate: Replace misleading "aac" in the name of an ADTS test with "adts".
fate: Drop pointless "-an" from pictor test command.
fate: split off image codec FATE tests into their own file
fate: split off WMA codec FATE tests into their own file
fate: split off lossless video and audio FATE tests into their own files
fate: split off qtrle codec FATE tests into their own file
fate: split off Ut Video codec FATE tests into their own file
fate: split off screen codec FATE tests into their own file
fate: split off Real Inc. codec FATE tests into their own file
fate: split off AC-3 codec FATE tests into their own file
mpegvideo: remove abort() in ff_find_unused_picture()
rv40: NEON optimised loop filter strength selection
rv40: rearrange loop filter functions
configure: cosmetics: sort some lists where appropriate
swscale_mmx: drop no longer required parameters from VSCALEX macros
swscale: Mark yuv2planeX_8_mmx as MMX2; it contains MMX2 instructions.
build: conditionally compile x86 H.264 chroma optimizations
v410 encoder and decoder
...
Conflicts:
Changelog
configure
doc/developer.texi
doc/general.texi
libavcodec/arm/asm.S
libavcodec/avcodec.h
libavcodec/v410dec.c
libavcodec/v410enc.c
libavcodec/version.h
libavcodec/x86/Makefile
libavcodec/x86/dsputil_mmx.c
libswscale/x86/swscale_mmx.c
tests/Makefile
tests/fate2.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavf: pass options from AVFormatContext to avio.
avformat: Use avio_open2, pass the AVFormatContext interrupt_callback onwards
avio: add avio_open2, taking an interrupt callback and options
avio: add support for passing options to protocols.
avio: add and use ffurl_protocol_next().
avformat: Pass the interrupt callback on to chained muxers/demuxers
avio: Add an AVIOInterruptCB parameter to ffurl_open/ffurl_alloc
avformat: Use ff_check_interrupt
avio: Add an internal utility function for checking the new interrupt callback
avio: Add AVIOInterruptCB
texi2html: remove stray \n
doc: prettyfy the texi2html documentation
swscale: handle unaligned buffers in yuv2plane1
Conflicts:
libavformat/avformat.h
libavformat/avio.c
libavformat/mov.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
x86inc: use sse versions of common macros instead of sse2 when applicable
doc/APIchanges: add missing dates and hashes
lavf: don't return from void av_update_cur_dts()
Changelog: add more entries.
Changelog: update ffmpeg/avconv incompatibility list.
avconv: remove some redundant temporary variables.
avconv: fix broken indentation
avconv: move copy_initial_nonkeyframes to the options context.
avconv: use file:stream instead of file.stream in log messages.
doc/avconv: elaborate on basic functionality.
doc/avconv: -sample_fmts, not -help sample_fmts prints the sample formats
openssl: Only use CRYPTO_set_id_callback on OpenSSL < 1.0.0
Call avformat_network_init/deinit in the programs
Remove leftover includes of strings.h
avutil: Don't allow using strcasecmp/strncasecmp
Replace all usage of strcasecmp/strncasecmp
avstring: Add locale independent implementations of strcasecmp/strncasecmp
avstring: Add locale independent implementations of toupper/tolower
cosmetics: insert some spaces in explicit enum value assignments
move 8SVX audio codecs to the audio codec list part on the next bump
...
Conflicts:
avprobe.c
doc/APIchanges
ffplay.c
ffserver.c
libavcodec/avcodec.h
libavdevice/bktr.c
libavdevice/v4l.c
libavdevice/v4l2.c
libavformat/matroskaenc.c
libavformat/wtv.c
libavutil/avstring.c
libavutil/avstring.h
libavutil/avutil.h
libswscale/x86/swscale_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Move id3v2 tag writing to a separate file.
swscale: add missing colons to x86 assembly yuv2planeX.
g722: split decoder and encoder into separate files
cosmetics: remove extra spaces before end-of-statement semi-colons
vorbisdec: check output buffer size before writing output
wavpack: calculate bpp using av_get_bytes_per_sample()
ac3enc: Set max value for mode options correctly
lavc: move get_b_cbp() from h263.h to mpeg4videoenc.c
mpeg12: move closed_gop from MpegEncContext to Mpeg1Context
mpeg12: move full_pel from MpegEncContext to Mpeg1Context
mpeg12: move Mpeg1Context from mpeg12.c to mpeg12.h
mpegvideo: remove some unused variables from MpegEncContext.
Conflicts:
libavcodec/mpeg12.c
libavformat/mp3enc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
id3v2: fix doxy comment - 'machine byte order' makes no sense on char arrays
VC1: restore mistakenly removed code
twinvq: check output buffer size before decoding
twinvq: return an error when the packet size is too small
lavf: export some forgotten symbols with non-av prefixes.
swscale: update altivec yuv2planeX asm to new per-plane API.
swscale: make yuv2yuvX_10_sse2/avx 8/9/16-bits aware.
yuv2planeX10 SIMD
swscale: decide whether to use yuv2plane1/X on a per-plane basis.
swscale: reintroduce full precision in 16-bit output.
Split up yuv2yuvX functions
Split out yuv2yuv1 luma and chroma in order to make them generic DSP functions
lavc: replace references to deprecated AVCodecContext.error_recognition to use AVCodecContext.err_recognition
lavc: translate non-flag-based er options into flag-based ef options at codec open
add -err_filter AVOptions to access flag-based error recognition
h264_weight: initialize "height" function argument properly.
presets: spelling error in libvpx 1080p50_60
avplay: fix fullscreen behaviour with SDL 1.2.14 on Mac OS X
Conflicts:
ffplay.c
libavformat/libavformat.v
libswscale/swscale.c
libswscale/x86/swscale_template.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
fix AC3ENC_OPT_MODE_ON/OFF
h264: fix HRD parameters parsing
prores: implement multithreading.
prores: idct sse2/sse4 optimizations.
swscale: use aligned move for storage into temporary buffer.
prores: extract idct into its own dspcontext and merge with put_pixels.
h264: fix invalid shifts in init_cavlc_level_tab()
intfloat_readwrite: fix signed addition overflows
mov: do not misreport empty stts
mov: cosmetics, fix for and if spacing
id3v2: fix NULL pointer dereference
mov: read album_artist atom
mov: fix disc/track numbers and totals
doc: fix references to obsolete presets directories for avconv/ffmpeg
flashsv: return more meaningful error value
flashsv: fix typo in av_log() message
smacker: validate channels and sample format.
smacker: check buffer size before reading output size
smacker: validate number of channels
smacker: Separate audio flags from sample rates in smacker demuxer.
...
Conflicts:
cmdutils.h
doc/ffmpeg.texi
libavcodec/Makefile
libavcodec/motion_est_template.c
libavformat/id3v2.c
libavformat/mov.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Speed: from 3.9x to 9.6x speed improvement over C, and some small
(up to 15%) speed improvements over existing MMX code (particularly
for bigger filters).