Commit Graph

22 Commits

Author SHA1 Message Date
Grzegorz Bernacki 8f4b000c37 lavc/aarch64: Add neon implementation for vsse_intra8
Provide optimized implementation for vsse_intra8 for arm64.

Performance tests are shown below.
- vsse_5_c: 87.7
- vsse_5_neon: 26.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-04 13:24:20 +03:00
Grzegorz Bernacki bad67cb9fd lavc/aarch64: Provide optimized implementation of vsse8 for arm64.
Provide optimized implementation of vsse8 for arm64.

Performance comparison tests are shown below.
- vsse_1_c: 141.5
- vsse_1_neon: 32.5

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-04 13:24:20 +03:00
Grzegorz Bernacki faea56c9c7 lavc/aarch64: Provide neon implementation of nsse8
Add vectorized implementation of nsse8 function.

Performance comparison tests are shown below.
- nsse_1_c: 256.0
- nsse_1_neon: 82.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-04 13:24:20 +03:00
Grzegorz Bernacki f401a2af21 lavc/aarch64: Add neon implementation for pix_abs8 functions.
Provide optimized implementation of pix_abs8 function for arm64.

Performance comparison tests are shown below:
pix_abs_1_1_c: 162.5
pix_abs_1_1_neon: 27.0
pix_abs_1_2_c: 174.0
pix_abs_1_2_neon: 23.5
pix_abs_1_3_c: 203.2
pix_abs_1_3_neon: 34.7

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-04 13:24:04 +03:00
Hubert Mazur b2732115dd lavc/aarch64: Add neon implementation for pix_median_abs8
Provide optimized implementation for pix_median_abs8 function.

Performance comparison tests are shown below.
- median_sad_1_c: 277.0
- median_sad_1_neon: 82.0

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-21 12:57:56 +03:00
Hubert Mazur e9a6170213 lavc/aarch64: Add neon implementation for vsad8_intra
Provide optimized implementation for vsad8_intra function.

Performance comparison tests are shown below.
- vsad_5_c: 94.7
- vsad_5_neon: 20.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-21 12:57:56 +03:00
Hubert Mazur 0ee535b1db lavc/aarch64: Add neon implementation for pix_median_abs16
Provide optimized implementation for pix_median_abs16 function.

Performance comparison tests are shown below.
 - median_sad_0_c: 720.5
 - median_sad_0_neon: 127.2

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-21 12:57:56 +03:00
Hubert Mazur 06b98e396a lavc/aarch64: Provide neon implementation of nsse16
Add vectorized implementation of nsse16 function.

Performance comparison tests are shown below.
- nsse_0_c: 682.2
- nsse_0_neon: 116.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur 908abe8032 lavc/aarch64: Add neon implementation for vsse_intra16
Provide optimized implementation for vsse_intra16 for arm64.

Performance tests are shown below.
- vsse_4_c: 155.2
- vsse_4_neon: 36.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur ce03ea3e79 lavc/aarch64: Add neon implementation for vsad_intra16
Provide optimized implementation for vsad_intra16 function for arm64.

Performance comparison tests are shown below.
- vsad_4_c: 177.5
- vsad_4_neon: 23.5

Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur c495a4b32d lavc/aarch64: Add neon implementation of vsse16
Provide optimized implementation of vsse16 for arm64.

Performance comparison tests are shown below.
- vsse_0_c: 257.7
- vsse_0_neon: 59.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur 200f5e578f lavc/aarch64: Add neon implementation for vsad16
Provide optimized implementation of vsad16 function for arm64.

Performance comparison tests are shown below.
- vsad_0_c: 285.2
- vsad_0_neon: 39.5

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur 70efa4d011 lavc/aarch64: Add neon implementation for pix_abs8
Provide optimized implementation of pix_abs8 function for arm64.

Performance comparison tests are shown below.
- pix_abs_1_0_c: 101.2
- pix_abs_1_0_neon: 22.5
- sad_1_c: 101.2
- sad_1_neon: 22.5

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur 74312e80d7 lavc/aarch64: Add neon implementation for sse8
Provide optimized implementation of sse8 function for arm64.

Performance comparison tests are shown below.
- sse_1_c: 130.7
- sse_1_neon: 29.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur a2e45ad407 lavc/aarch64: Add neon implementation for pix_abs16_y2
Provide optimized implementation of pix_abs16_y2 function for arm64.

Performance comparison tests are shown below.
pix_abs_0_2_c: 317.2
pix_abs_0_2_neon: 37.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur d7abb7d143 lavc/aarch64: Add neon implementation for sse4
Provide neon implementation for sse4 function.

Performance comparison tests are shown below.
- sse_2_c: 80.7
- sse_2_neon: 31.0

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur ad251fd262 lavc/aarch64: Add neon implementation for sse16
Provide neon implementation for sse16 function.

Performance comparison tests are shown below.
- sse_0_c: 268.2
- sse_0_neon: 43.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Martin Storsjö 60109d5b3d aarch64: me_cmp: Fix the indentation of function declarations
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Andreas Rheinhardt abb85429f3 avcodec/me_cmp: Constify me_cmp_func buffer parameters
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-31 03:31:53 +02:00
Hubert Mazur 01e190dc99 lavc/aarch64: Add pix_abs16_x2 neon implementation
Provide neon implementation for pix_abs16_x2 function.

Performance tests of implementation are below.
 - pix_abs_0_1_c: 283.5
 - pix_abs_0_1_neon: 39.0

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-07-13 23:25:22 +03:00
Hubert Mazur eb7ab3928f lavc/aarch64: Hook up the existing ff_pix_abs16_neon to the sad[0] function pointer
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-07-11 23:58:28 +03:00
Swinney, Jonathan c471cc7474 lavc/aarch64: motion estimation functions in neon
- ff_pix_abs16_neon
 - ff_pix_abs16_xy2_neon

In direct micro benchmarks of these ff functions verses their C implementations,
these functions performed as follows on AWS Graviton 3.

ff_pix_abs16_neon:
pix_abs_0_0_c: 141.1
pix_abs_0_0_neon: 19.6

ff_pix_abs16_xy2_neon:
pix_abs_0_3_c: 269.1
pix_abs_0_3_neon: 39.3

Tested with:
./tests/checkasm/checkasm --test=motion --bench --disable-linux-perf

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-06-28 00:51:39 +03:00