x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.
Moreover, some of the removed code was buggy/not bitexact
and lead to failures involving the f32le and f32be versions of
gray, gbrp and gbrap on x86-32 when SSE2 was not disabled.
See e.g.
https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx
Notice that yuv2yuvX_mmx is not removed, because it is used
by SSE3 and AVX2 as fallback in case of unaligned data and
also for tail processing. I don't know why yuv2yuvX_mmxext
isn't being used for this; an earlier version [1] of
554c2bc708 used it, but
the version that was eventually applied does not.
[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The implementation is pretty straight-forward. Most of the existing
NV12 codepaths work regardless of subsampling and are re-used as is.
Where necessary I wrote the slightly different NV24 versions.
Finally, the one thing that confused me for a long time was the
asm specific x86 path that did an explicit exclusion check for NV12.
I replaced that with a semi-planar check and also updated the
equivalent PPC code, which Lauri kindly checked.
* commit 'be923ed659016350592acb9b3346f706f8170ac5':
x86: fmtconvert: port to cpuflags
x86: MMX2 ---> MMXEXT in macro names
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavr: fix handling of custom mix matrices
fate: force pix_fmt in lagarith-rgb32 test
fate: add tests for lagarith lossless video codec.
ARMv6: vp8: fix stack allocation with Apple's assembler
ARM: vp56: allow inline asm to build with clang
fft: 3dnow: fix register name typo in DECL_IMDCT macro
x86: dct32: port to cpuflags
x86: build: replace mmx2 by mmxext
Revert "wmapro: prevent division by zero when sample rate is unspecified"
wmapro: prevent division by zero when sample rate is unspecified
lagarith: fix color plane inversion for YUY2 output.
lagarith: pad RGB buffer by 1 byte.
dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.
Conflicts:
doc/APIchanges
libavcodec/lagarith.c
libavfilter/x86/gradfun.c
libavutil/cpu.h
libavutil/version.h
libswscale/utils.c
libswscale/version.h
libswscale/x86/yuv2rgb.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
* qatar/master: (36 commits)
adpcmenc: Use correct frame_size for Yamaha ADPCM.
avcodec: add ff_samples_to_time_base() convenience function to internal.h
adx parser: set duration
mlp parser: set duration instead of frame_size
gsm parser: set duration
mpegaudio parser: set duration instead of frame_size
(e)ac3 parser: set duration instead of frame_size
flac parser: set duration instead of frame_size
avcodec: add duration field to AVCodecParserContext
avutil: add av_rescale_q_rnd() to allow different rounding
pnmdec: remove useless .pix_fmts
libmp3lame: support float and s32 sample formats
libmp3lame: renaming, rearrangement, alignment, and comments
libmp3lame: use the LAME default bit rate
libmp3lame: use avpriv_mpegaudio_decode_header() for output frame parsing
libmp3lame: cosmetics: remove some pointless comments
libmp3lame: convert some debugging code to av_dlog()
libmp3lame: remove outdated comment.
libmp3lame: do not set coded_frame->key_frame.
libmp3lame: improve error handling in MP3lame_encode_init()
...
Conflicts:
doc/APIchanges
libavcodec/libmp3lame.c
libavcodec/pcxenc.c
libavcodec/pnmdec.c
libavcodec/pnmenc.c
libavcodec/sgienc.c
libavcodec/utils.c
libavformat/hls.c
libavutil/avutil.h
libswscale/x86/swscale_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Revert "swscale: update context offsets after removal of AlpMmxFilter."
(commit a95e3fa90b)
and
Revert "swscale: Remove some write-only variables related to alpha handling."
(commit 9d03cb9fc5).
They broke alpha handling - it's the evil inline asm that still uses that
variable, so it's not truely write-only.
* qatar/master: (22 commits)
als: prevent infinite loop in zero_remaining().
cook: prevent div-by-zero if channels is zero.
pamenc: switch to encode2().
svq1enc: switch to encode2().
dvenc: switch to encode2().
dpxenc: switch to encode2().
pngenc: switch to encode2().
v210enc: switch to encode2().
xwdenc: switch to encode2().
ttadec: use branchless unsigned-to-signed unfolding
avcodec: add a Sun Rasterfile encoder
sunrast: Move common defines to a new header file.
cdxl: fix video decoding for some files
cdxl: fix audio for some samples
apetag: add proper support for binary tags
ttadec: remove dead code
swscale: make access to filter data conditional on filter type.
swscale: update context offsets after removal of AlpMmxFilter.
prores: initialise encoder and decoder parts only when needed
swscale: make monowhite/black RGB-independent.
...
Conflicts:
Changelog
libavcodec/alsdec.c
libavcodec/dpxenc.c
libavcodec/golomb.h
libavcodec/pamenc.c
libavcodec/pngenc.c
libavformat/img2.c
libswscale/output.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
swscale: make yuv2yuv1 use named registers.
h264: mark h264_idct_add8_10 with number of XMM registers.
swscale: fix V plane memory location in bilinear/unscaled RGB/YUYV case.
vp8: always update next_framep[] before returning from decode_frame().
avconv: estimate next_dts from framerate if it is set.
avconv: better next_dts usage.
avconv: rename InputStream.pts to last_dts.
avconv: reduce overloading for InputStream.pts.
avconv: rename InputStream.next_pts to next_dts.
avconv: rework -t handling for encoding.
avconv: set encoder timebase for subtitles.
pva-demux test: add -vn
swscale: K&R formatting cosmetics for SPARC code
apedec: allow the user to set the maximum number of output samples per call
apedec: do not unnecessarily zero output samples for mono frames
apedec: allocate a single flat buffer for decoded samples
apedec: use sizeof(field) instead of sizeof(type)
swscale: split C output functions into separate file.
swscale: Split C input functions into separate file.
bytestream: Add bytestream2 writing API.
The avconv changes are due to massive regressions and bugs not merged yet.
Conflicts:
ffmpeg.c
libavcodec/vp8.c
libswscale/swscale.c
libswscale/x86/swscale_template.c
tests/fate/demux.mak
tests/ref/lavf/asf
tests/ref/lavf/avi
tests/ref/lavf/mkv
tests/ref/lavf/mpg
tests/ref/lavf/nut
tests/ref/lavf/ogg
tests/ref/lavf/rm
tests/ref/lavf/ts
tests/ref/seek/lavf_avi
tests/ref/seek/lavf_mkv
tests/ref/seek/lavf_rm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (22 commits)
rv34: frame-level multi-threading
mpegvideo: claim ownership of referenced pictures
aacsbr: prevent out of bounds memcpy().
ipmovie: fix pts for CODEC_ID_INTERPLAY_DPCM
sierravmd: fix audio pts
bethsoftvideo: Use bytestream2 functions to prevent buffer overreads.
bmpenc: support for PIX_FMT_RGB444
swscale: fix crash in fast_bilinear code when compiled with -mred-zone.
swscale: specify register type.
rv34: use get_bits_left()
avconv: reinitialize the filtergraph on resolution change.
vsrc_buffer: error on changing frame parameters.
avconv: fix -copyinkf.
fate: Update file checksums after the mov muxer change in a78dbada55
movenc: Don't store a nonzero creation time if nothing was set by the caller
bmpdec: support for rgb444 with bitfields compression
rgb2rgb: allow conversion for <15 bpp
doc: fix stray reference to FFmpeg
v4l2: use C99 struct initializer
v4l2: poll the file descriptor
...
Conflicts:
avconv.c
libavcodec/aacsbr.c
libavcodec/bethsoftvideo.c
libavcodec/kmvc.c
libavdevice/v4l2.c
libavfilter/vsrc_buffer.c
libswscale/swscale_unscaled.c
libswscale/x86/input.asm
tests/ref/acodec/alac
tests/ref/acodec/pcm_s16be
tests/ref/acodec/pcm_s24be
tests/ref/acodec/pcm_s32be
tests/ref/acodec/pcm_s8
tests/ref/lavf/mov
tests/ref/vsynth1/dnxhd_1080i
tests/ref/vsynth1/mpeg4
tests/ref/vsynth1/qtrle
tests/ref/vsynth1/svq1
tests/ref/vsynth2/dnxhd_1080i
tests/ref/vsynth2/mpeg4
tests/ref/vsynth2/qtrle
tests/ref/vsynth2/svq1
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
fate: Add tests for more AAC features.
aacps: Add missing newline in error message.
fate: Add tests for vc1/wmapro in ism.
aacdec: Add a fate test for 5.1 channel SBR.
aacdec: Turn off PS for multichannel files that use PCE based configs.
cabac: remove put_cabac_u/ueg from cabac-test.
swscale: RGB4444 and BGR444 input
FATE: add test for xWMA demuxer.
FATE: add test for SMJPEG demuxer and associated IMA ADPCM audio decoder.
mpegaudiodec: optimized iMDCT transform
mpegaudiodec: change imdct window arrangment for better pointer alignment
mpegaudiodec: move imdct and windowing function to mpegaudiodsp
mpegaudiodec: interleave iMDCT buffer to simplify future SIMD implementations
swscale: convert yuy2/uyvy/nv12/nv21ToY/UV from inline asm to yasm.
FATE: test to exercise WTV demuxer.
mjpegdec: K&R formatting cosmetics
swscale: K&R formatting cosmetics for code examples
swscale: K&R reformatting cosmetics for header files
FATE test: cvid-grayscale; ensures that the grayscale Cinepak variant is exercised.
Conflicts:
libavcodec/cabac.c
libavcodec/mjpegdec.c
libavcodec/mpegaudiodec.c
libavcodec/mpegaudiodsp.c
libavcodec/mpegaudiodsp.h
libavcodec/mpegaudiodsp_template.c
libavcodec/x86/Makefile
libavcodec/x86/imdct36_sse.asm
libavcodec/x86/mpegaudiodec_mmx.c
libswscale/swscale-test.c
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/x86/swscale_template.c
tests/fate/demux.mak
tests/fate/microsoft.mak
tests/fate/video.mak
tests/fate/wma.mak
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
x86inc: use sse versions of common macros instead of sse2 when applicable
doc/APIchanges: add missing dates and hashes
lavf: don't return from void av_update_cur_dts()
Changelog: add more entries.
Changelog: update ffmpeg/avconv incompatibility list.
avconv: remove some redundant temporary variables.
avconv: fix broken indentation
avconv: move copy_initial_nonkeyframes to the options context.
avconv: use file:stream instead of file.stream in log messages.
doc/avconv: elaborate on basic functionality.
doc/avconv: -sample_fmts, not -help sample_fmts prints the sample formats
openssl: Only use CRYPTO_set_id_callback on OpenSSL < 1.0.0
Call avformat_network_init/deinit in the programs
Remove leftover includes of strings.h
avutil: Don't allow using strcasecmp/strncasecmp
Replace all usage of strcasecmp/strncasecmp
avstring: Add locale independent implementations of strcasecmp/strncasecmp
avstring: Add locale independent implementations of toupper/tolower
cosmetics: insert some spaces in explicit enum value assignments
move 8SVX audio codecs to the audio codec list part on the next bump
...
Conflicts:
avprobe.c
doc/APIchanges
ffplay.c
ffserver.c
libavcodec/avcodec.h
libavdevice/bktr.c
libavdevice/v4l.c
libavdevice/v4l2.c
libavformat/matroskaenc.c
libavformat/wtv.c
libavutil/avstring.c
libavutil/avstring.h
libavutil/avutil.h
libswscale/x86/swscale_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>