ffmpeg

Commit Graph

Author	SHA1	Message	Date
Lynne	a56d7e0ca3	lavu/tx: add DCT-III implementation	2022-11-24 15:58:36 +01:00
Lynne	504b7bec1a	lavu/tx: add DCT-II implementation	2022-11-24 15:58:35 +01:00
Lynne	43d285a40f	lavu/tx: fix last coefficient scaling for R2C transforms This was a typo.	2022-11-24 15:58:35 +01:00
Lynne	8547123f3b	lavu/tx: generalize PFA FFTs This commit permits any stacking of FFTs of any size.	2022-11-24 15:58:34 +01:00
Lynne	87bae6b018	lavu/tx: refactor to explicitly track and convert lookup table order Necessary for generalizing PFAs.	2022-11-24 15:58:34 +01:00
Lynne	6ddd10c3e2	lavu/tx: allow codelets to specify a minimum number of matching factors	2022-11-24 15:58:33 +01:00
Lynne	fbe4fd992f	lavu/tx: support output stride in naive transforms Allows them to be used in general PFAs.	2022-11-24 15:58:31 +01:00
Lynne	68cabf8750	lavu/tx: add fft_inplace_small transforms This is much faster than the loop.	2022-11-24 15:58:30 +01:00
Lynne	fff3e1d848	lavu/tx: support out-of-place transforms in fft_inplace This makes testing easier, as a unified path can be used for in/out of place transforms.	2022-11-24 15:58:30 +01:00
Lynne	d260796f11	lavu/tx: make C ptwo transforms in+out of place We assume that _all_ in-place transforms can operate out of place, which isn't true, because the C ptwo transforms were always in-place (dst).	2022-11-24 15:58:29 +01:00
Lynne	37008dc402	lavu/tx: add naive_small FFT The same as naive but with precomputed tables. Makes it more useful for odd-factors we don't support yet.	2022-11-24 15:58:29 +01:00
Lynne	e8a9b7b298	lavu/tx: list all odd-length FFT factors as regular codelets Allows them to be picked just like any other transform.	2022-11-24 15:58:28 +01:00
Lynne	45bd4bf79f	lavu/tx: generalize single-factor transforms Not that useful, but it gives us fast small odd-length transforms.	2022-11-24 15:58:28 +01:00
Lynne	79f11e2409	lavu/tx: make prime factor transforms truly in-place They all overwrote in[0] and then used it as a DC.	2022-11-24 15:58:28 +01:00
Andreas Rheinhardt	f8efd890bf	avutil/tx_template: Move function pointers to const memory This can be achieved by moving the AVOnce out of the structure containing the function pointers; the latter can then be made const. This also has the advantage of eliminating padding in the structure (sizeof(AVOnce) is four here) and allowing the AVOnces to be put into .bss (dependening upon the implementation). Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-28 09:30:10 +02:00
Andreas Rheinhardt	188216581b	avutil/tx_template: Avoid code duplication Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-28 09:30:10 +02:00
Andreas Rheinhardt	2af5f55b2e	avutil/tx_template: Don't waste space for inexistent factors It is possible to avoid the factors array for the power-of-two tables for which said array is unused by using a different structure for initialization for power-of-two tables than for non-power-of-two-tables. This saves 31516B from .data. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-28 09:29:41 +02:00
Lynne	ace42cf581	x86/tx_float: add 15xN PFA FFT AVX SIMD ~4x faster than the C version. The shuffles in the 15pt dim1 are seriously expensive. Not happy with it, but I'm contempt. Can be easily converted to pure AVX by removing all vpermpd/vpermps instructions.	2022-09-23 12:35:27 +02:00
Lynne	7e7baf8ab8	lavu/tx: do not steal lookup tables of subcontexts in the iMDCT As it happens, some still need their contexts.	2022-09-23 12:33:31 +02:00
Lynne	f1b35fc8f0	lavu/tx: remove av_cold from table definitions How did this get here?	2022-09-11 03:18:40 +02:00
Lynne	c92edd969a	lavu/tx: rotate 3 & 15-point exptabs This just inverts their signs. Simplifies SIMD.	2022-09-10 02:37:17 +02:00
Lynne	51172223fd	lavu/tx: generalize MDCTs The same code can perform any-length MDCTs with minimal changes.	2022-09-10 02:37:16 +02:00
Lynne	645a1f4422	lavu/tx: add the inplace flag to PFA FFTs They support in-place, because they have to use a temporary buffer.	2022-09-10 02:37:14 +02:00
Lynne	ae66a9db7b	lavu/tx: optimize and simplify inverse MDCTs Convert the input from a scatter to a gather instead, which is faster and better for SIMD. Also, add a pre-shuffled exptab version to avoid gathering there at all. This doubles the exptab size, but the speedup makes it worth it. In SIMD, the exptab will likely be purged to a higher cache anyway because of the FFT in the middle, and the amount of loads stays identical. For a 960-point inverse MDCT, the speedup is 10%. This makes it possible to write sane and fast SIMD versions of inverse MDCTs.	2022-08-16 01:22:38 +02:00
Lynne	af94ab7c7c	lavu/tx: add an RDFT implementation RDFTs are full of conventions that vary between implementations. What I've gone for here is what's most common between both fftw, avcodec's rdft and what we use, the equivalent of which is DFT_R2C for forward and IDFT_C2R for inverse. The other 2 conventions (IDFT_R2C and DFT_C2R) were not used at all in our code, and their names are also not appropriate. If there's a use for either, we can easily add a flag which would just flip the sign on one exptab. For some unknown reason, possibly to allow reusing FFT's exp tables, av_rdft's C2R output is 0.5x lower than what it should be to ensure a proper back-and-forth conversion. This code outputs its real samples at the correct level, which matches FFTW's level, and allows the user to change the level and insert arbitrary multiplies for free by setting the scale option.	2022-01-26 04:12:46 +01:00
Lynne	ef4bd81615	lavu/tx: rewrite internal code as a tree-based codelet constructor This commit rewrites the internal transform code into a constructor that stitches transforms (codelets). This allows for transforms to reuse arbitrary parts of other transforms, and allows transforms to be stacked onto one another (such as a full iMDCT using a half-iMDCT which in turn uses an FFT). It also permits for each step to be individually replaced by assembly or a custom implementation (such as an ASIC).	2022-01-26 04:12:44 +01:00
Lynne	1978b143eb	checkasm: add av_tx FFT SIMD testing code This sadly required making changes to the code itself, due to the same context needing to be reused for both versions. The lookup table had to be duplicated for both versions.	2021-04-24 17:19:17 +02:00
Lynne	0072a42388	lavu/tx: add full-sized iMDCT transform flag	2021-04-24 17:17:27 +02:00
Lynne	8c55c82583	lavu/tx: add a 9-point FFT and (i)MDCT	2021-04-24 17:17:25 +02:00
Lynne	bd9ea917a3	lavu/tx: add a 7-point FFT and (i)MDCT	2021-04-24 17:17:23 +02:00
Lynne	89da62f2fc	lavu/tx: refactor power-of-two FFT This commit refactors the power-of-two FFT, making it faster and halving the size of all tables, making the code much smaller on all systems. This removes the big/small pass split, because on modern systems the "big" pass is always faster, and even on older machines there is no measurable speed difference.	2021-04-24 17:17:20 +02:00
Lynne	e20a39a375	lavu/tx: do not invert permutes on MDCTs	2021-02-27 05:01:17 +01:00
Lynne	8e94b7cff0	lavu/tx: invert permutation lookups out[lut[i]] = in[i] lookups were 4.04 times(!) slower than out[i] = in[lut[i]] lookups for an out-of-place FFT of length 4096. The permutes remain unchanged for anything but out-of-place monolithic FFT, as those benefit quite a lot from the current order (it means there's only 1 lookup necessary to add to an offset, rather than a full gather). The code was based around non-power-of-two FFTs, so this wasn't benchmarked early on.	2021-02-27 04:21:05 +01:00
Lynne	10341743d2	lavu/tx: require output argument to match input for inplace transforms This simplifies some assembly code by a lot, by either saving a branch or saving an entire duplicated function.	2021-02-26 05:42:24 +01:00
Lynne	5ca40d6d94	lavu/tx: support in-place FFT transforms This commit adds support for in-place FFT transforms. Since our internal transforms were all in-place anyway, this only changes the permutation on the input. Unfortunately, research papers were of no help here. All focused on dry hardware implementations, where permutes are free, or on software implementations where binary bloat is of no concern so storing dozen times the transforms for each permutation and version is not considered bad practice. Still, for a pure C implementation, it's only around 28% slower than the multi-megabyte FFTW3 in unaligned mode. Unlike a closed permutation like with PFA, split-radix FFT bit-reversals contain multiple NOPs, multiple simple swaps, and a few chained swaps, so regular single-loop single-state permute loops were not possible. Instead, we filter out parts of the input indices which are redundant. This allows for a single branch, and with some clever AVX512 asm, could possibly be SIMD'd without refactoring. The inplace_idx array is guaranteed to never be larger than the revtab array, and in practice only requires around log2(len) entries. The power-of-two MDCTs can be done in-place as well. And it's possible to eliminate a copy in the compound MDCTs too, however it'll be slower than doing them out of place, and we'd need to dirty the input array.	2021-02-21 17:05:16 +01:00
James Almer	f6477ac9f4	avutil/tx: use ENOSYS instead of ENOTSUP It's the standard error code used across the codebase to signal unimplemented or unsupported features. Signed-off-by: James Almer <jamrial@gmail.com>	2021-01-13 23:02:47 -03:00
Lynne	06a8596825	lavu: support arbitrary-point FFTs and all even (i)MDCT transforms This patch adds support for arbitrary-point FFTs and all even MDCT transforms. Odd MDCTs are not supported yet as they're based on the DCT-II and DCT-III and they're very niche. With this we can now write tests.	2021-01-13 17:34:13 +01:00
Lynne	2465fe1302	lavu/tx: add 2-point FFT transform By itself, this allows 6-point, 10-point and 30-point transforms. When the 9-point transform is added it allows for 18-point FFT, and also for a 36-point MDCT (used by MP3).	2020-03-23 21:26:25 +00:00
Lynne	e1c84856bb	lavu/tx: improve 3-point fixed precision There's just no reason not to when its so easy (albeit messy) and its also reducing the precision of all non-power-of-two transforms that use it.	2020-02-14 19:58:14 +00:00
Lynne	223b58c74b	lavu/tx: slightly optimize fft15 Saves 2 additions.	2020-02-13 17:11:30 +00:00
Lynne	a38c6f47c9	lavu/tx: undef the correct macro It was renamed and no warning was given for undeffing a nonexisting one.	2020-02-13 17:11:30 +00:00
Lynne	e8f054b095	lavu/tx: implement 32 bit fixed point FFT and MDCT Required minimal changes to the code so made sense to implement. FFT and MDCT tested, the output of both was properly rounded. Fun fact: the non-power-of-two fixed-point FFT and MDCT are the fastest ever non-power-of-two fixed-point FFT and MDCT written. This can replace the power of two integer MDCTs in aac and ac3 if the MIPS optimizations are ported across. Unfortunately the ac3 encoder uses a 16-bit fixed point forward transform, unlike the encoder which uses a 32bit inverse transform, so some modifications might be required there. The 3-point FFT is somewhat less accurate than it otherwise could be, having minor rounding errors with bigger transforms. However, this could be improved later, and the way its currently written is the way one would write assembly for it. Similar rounding errors can also be found throughout the power of two FFTs as well, though those are more difficult to correct. Despite this, the integer transforms are more than accurate enough.	2020-02-13 17:10:34 +00:00
Lynne	42e2319ba9	lavu/tx: add support for double precision FFT and MDCT Simply moves and templates the actual transforms to support an additional data type. Unlike the float version, which is equal or better than libfftw3f, double precision output is bit identical with libfftw3.	2019-08-02 01:19:52 +01:00

43 Commits