Commit Graph

13 Commits

Author SHA1 Message Date
Mikhail Nitenko 756d2e087a lavc/aarch64: move transpose_4x8H to neon.S
transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is
not unique to vp9 and could be used elsewhere.

Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2021-08-21 00:06:26 +03:00
Derek Buitenhuis 87b8e95008 Merge commit 'cdb1665f70def544ddab3e3ed3763ef99c8b3873'
* commit 'cdb1665f70def544ddab3e3ed3763ef99c8b3873':
  aarch64: Make transpose_4x4H do a regular transpose

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-04-24 12:51:42 +01:00
Martin Storsjö cdb1665f70 aarch64: Make transpose_4x4H do a regular transpose
Previously, ff_h264_idct_add_neon (originally in the arm version) used
a non-regular transpose in order to be able to use more instructions
that deal with registers as 128 bit register pairs. The aarch64
translation doesn't do it to the same extent, but brought along the
same structure since it was a straight translation.

This reshuffles ff_h264_idct_add_neon, bringing it closer to
the C implementation, making the transpose_4x4H macro do a regular
transpose, usable for other algorithms as well.

Previously, the third and fourth output from transpose_4x4H were
swapped, and prior to cc29d96d5a, the same inputs as well. In
addition to just swapping the outputs, also renumber the intermediate
registers for better readability (making the register order match
transpose_4x8B).

This runs with the same number of cycles as before.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-03-26 21:25:56 +02:00
Janne Grunau cc29d96d5a arm64: fix inverted register order in transpose_4x4H
Fix related register order issue in ff_h264_idct_add_neon.

Found-by: zjh8890 <243186085@qq.com>
2015-12-21 13:44:20 +01:00
Janne Grunau 2dba0407fd avcodec/arm64: fix inverted register order in transpose_4x4H
Fix related register order issue in ff_h264_idct_add_neon.

Found-by: zjh8890 <243186085@qq.com>

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-12-19 03:58:46 +01:00
Michael Niedermayer 95b59bfb9d Revert "avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H"
The change was not correct and broke H264

This reverts commit cd83f899c9.
2015-12-17 21:26:37 +01:00
zjh8890 c18176bd55 avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong,
this bug waste me much time while I make aarch64 arm instruction which used the function.
2015-12-12 14:20:01 +01:00
Michael Niedermayer bf0470a5be Merge commit '36e3b1f2fd262028834a9d7b1eb533c1218ee6c2'
* commit '36e3b1f2fd262028834a9d7b1eb533c1218ee6c2':
  aarch64: h264 loop filter NEON optimizations

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-15 15:27:26 +01:00
Michael Niedermayer 19fc3c0122 Merge commit 'd5dd8c7bf0f0d77c581db3236e0d938f06fd5591'
* commit 'd5dd8c7bf0f0d77c581db3236e0d938f06fd5591':
  aarch64: h264 qpel NEON optimizations

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-15 15:13:41 +01:00
Michael Niedermayer fb1c786a9d Merge commit '8438b3f09f6b225d0886cc385117c38eb44ca0c1'
* commit '8438b3f09f6b225d0886cc385117c38eb44ca0c1':
  aarch64: h264 idct NEON assembler optimizations

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-15 15:06:47 +01:00
Janne Grunau 36e3b1f2fd aarch64: h264 loop filter NEON optimizations
Ported from ARMv7 NEON.
2014-01-15 12:31:04 +01:00
Janne Grunau d5dd8c7bf0 aarch64: h264 qpel NEON optimizations
Ported from ARMv7 NEON.
2014-01-15 12:17:49 +01:00
Janne Grunau 8438b3f09f aarch64: h264 idct NEON assembler optimizations
Ported from ARMv7 NEON.
2014-01-15 12:13:41 +01:00