ffmpeg/libswresample
Ganesh Ajjanagadde 9bec6d71a2 swresample/resample: speed up build_filter by 50%
This speeds up build_filter by ~ 50%. This gain should be pretty
consistent across all architectures and platforms.

Essentially, this relies on a observation that the filters have some
even/odd symmetry that may be exploited during the construction of the
polyphase filter bank. In particular, phases (scaled to [0, 1]) in [0.5, 1] are
easily derived from [0, 0.5] and expensive reevaluation of function
points are unnecessary. This requires some rather annoying even/odd
bookkeeping as can be seen from the patch.

I vaguely recall from signal processing theory more general symmetries allowing even greater
optimization of the construction. At a high level, "even functions"
correspond to 2, and one can imagine variations. Nevertheless, for the sake
of some generality and because of existing filters, this is all that is
being exploited.

Currently, this patch relies on phase_count being even or (trivially) 1,
though this is not an inherent limitation to the approach. This
assumption is safe as phase_count is 1 << phase_bits, and is hence a
power of two. There is no way for user API to set it to a nontrivial odd
number. This assumption has been placed as an assert in the code.

To repeat, this assumes even symmetry of the filters, which is the most common
way to get generalized linear phase anyway and is true of all currently
supported filters.

As a side note, accuracy should be identical or perhaps slightly better
due to this "forcing" filter symmetries leading to a better phase
characteristic. As before, I can't test this claim easily, though it may
be of interest.

Patch tested with FATE.

Sample benchmark (x86-64, Haswell, GNU/Linux):

test: swr-resample-dblp-44100-2626

new:
527376779 decicycles in build_filter(loop 1000),     256 runs,      0 skips
524361765 decicycles in build_filter(loop 1000),     512 runs,      0 skips
516552574 decicycles in build_filter(loop 1000),    1024 runs,      0 skips

old:
974178658 decicycles in build_filter(loop 1000),     256 runs,      0 skips
972794408 decicycles in build_filter(loop 1000),     512 runs,      0 skips
954350046 decicycles in build_filter(loop 1000),    1024 runs,      0 skips

Note that lower level optimizations are entirely possible, I focussed on
getting the high level semantics correct. In any case, this should
provide a good foundation.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-04 17:05:57 -05:00
..
aarch64
arm
x86 x86/audio_convert: fix clobbering of xmm registers 2015-10-01 22:40:50 -03:00
audioconvert.c
audioconvert.h
dither_template.c
dither.c
libswresample.v
log2_tab.c
Makefile
noise_shaping_data.c
options.c swr: do not reject channel layouts that use channel 63 2015-10-28 19:25:49 +01:00
rematrix_template.c
rematrix.c swresample: slightly nicer debug output for auto matrix 2015-10-15 20:16:13 +02:00
resample_dsp.c
resample_template.c
resample.c swresample/resample: speed up build_filter by 50% 2015-11-04 17:05:57 -05:00
resample.h
soxr_resample.c
swresample_frame.c
swresample_internal.h swresample/swresample_internal: add av_warn_unused_result 2015-10-15 22:27:23 -04:00
swresample-test.c all: add const-correctness to qsort comparators 2015-10-25 10:07:20 -04:00
swresample.c
swresample.h doc/resampler, swresample/options: use proper capitalization 2015-10-10 20:49:54 +02:00
swresampleres.rc
version.h