scaletempo2 has this optimization where it first uses a step size of 5
together with a quadratic interpolation to quickly get the approximate
position of the best overlap and then does a more thorough search aroun
that area.
Doing the same thing in scaletempo brought a 4.8x performance
improvement, however in my measurements a step size of 3 more
consistently finds good overlaps and it's still a 2.9x improvement for
this function.
I should note that while a step size of 3 produced better numbers,
I was not actually able to hear any difference in my test.
A step size of 3 was chosen just in case it actually makes an audible
difference in some cases and the cpu usage isn't really a problem
anymore, but that can be revisited in the future.
scaletempo2 is still faster then scaletempo with a step size of 5,
which I suspect is mostly because it uses some vectorized functions and
scaletempo does not.
This might seem counter intuitive at first, but we want to change the
sound in total as little as possible, not only the middle part of the
overlap.
This also removes the loop unrolling from the integer path to keep it as
close to the float path as possible. The difference in performance is
fairly small and if such an optimization is deemed desirable in the
future it should be implemented for both float and integer
Fixes corrupted audio after resize_input_buffer; realloc_2d did not move
data to new location. Rather than reimplementing more allocator logic,
migrate internals to use talloc and grow buffer with realloc.
Playback with many audio channels could be distorted when using
scaletempo2. This was most noticeable when there were a lot of quiet
channels and few louder channels.
Fix this by increasing the weight of louder channels in relation to
quieter channels. Each channel's target block energy is factored into
the usual similarity measure.
This should have little effect on very correlated channels (such as most
stereo media), where the factors are very similar for all channels.
See-Also: #8705
See-Also: #13737
Lots of filters have generic internal function names like "process".
On a stack trace, all of the different filters use this name,
which causes confusion of the actual filter being processed.
This renames these internal function names to carry the filter names.
This matches what had already been done for some filters.
With certain speed settings, the following can happen at the start of
the playback:
- can_perform_wsola returns false, so no frames are written
- mp_scaletempo2_frames_available returns true when
p->input_buffer_final_frames is 0 and target_block_index < 0
This results in infinite loop and completely stalls audio filter
processing and playback. Fix this by only checking this condition
after the final frame is set.
Fixes: 8080d00d7f
Deprecated upstream 1cc24d7495
We need to reallocate the context here because `avcodec_free_context`
also frees the context, and we want to reuse the context with some
reconfig.
Why a bigger search-interval is required:
scaletempo2 doesn't do a good job when the signal contains frequencies
less then 1/search_interval. With a search interval of 30ms that means
anything below 33.333Hz sounds bad.
Depending on the genre it's very for music to contain frequencies down
to 30Hz, and sometimes even a little bit below that. Therefore a higher
default value is needed to handle such cases.
Based on that an argument can be made for a value of 50, as that should
work down to 20Hz, or something even higher because movies sometimes
have some infrasonic content.
However the downside of big search intervals is increased CPU usage and
intelligibility at higher speeds, as it effectively leads to parts of
the audio being skipped.
A value of 40 can handle frequencies down to 25Hz, enough for all music
except very rare edge cases, while still providing decent
intelligibility.
Why a smaller window-size is required:
Large values reduce intelligibility at high speeds and therefore small
values are preferred.
However when values get too small it starts to sound weird
(similar to librubberband).
In my testing a value of 10 already works well, but adding a small
safety margin seems like a good idea, especially since it made no
noticeable difference to intelligibility, which is why 12 was chosen.
Avoid generating too much audio after EOF.
Note: This often has no effect, because less audio is produced than
required.
Usually this comes to effect with the userspeed filter at high speed
(4x) and going back to 1x speed to remove the filter.
After the final input packet, the filter padded with silence to allow
one more iteration. That was not enough to process the final frames.
Continue padding the end of `input_buffer` with silence until the final
frames have been processed.
Implementation: Instead of padding when adding final samples, pad before
running WSOLA iteration. Count number of added silent frames and
remaining input frames for time keeping.
This changes the emitted pts values from the start of the search block
to the center of the search block. Change initial `output_time`
accordingly. Initial `search_block_index` is irrelevant, because it's
overwritten before the first iteration.
Using the `output_time` removes the rounding of `search_block_index`,
which also fixes the <20 microsecond gaps in timestamps between output
packets.
Rationale:
The variance in audio position was in the range `0..search-interval`.
With this change, the range is
(- search-interval / 2)..(search-interval / 2)`
which ensures lower maximum offset.
Target block can be anywhere in the previous search-block, varying by
`search-interval` while the filter is active. This resulted in constant
audio offset when returning to 1x playback speed.
- Move the search block to the target block to sync up exactly.
- Drop old frames to minimize input_buffer usage.
The internal time update function involved multiple problems:
- Time was updated after WSOLA iteration. The means speed was updated
one iteration later than it could be.
- The update functions caused spikes of too many or too few samples
advanced, leading to audio glitches on speed changes.
- The inconsistent updates made it very difficult to produce gapless
audio packets.
- The `output_time` update function involved complicated feedback:
`search_block_index` influenced how many frames from `input_buffer`
are retained, which influenced how much `output_time` is changed,
which influenced `search_block_index`.
With these changes:
- Time is updated before WSOLA iterations. Speed changes are effective
instantly.
- There are no spikes in playback speed during speed changes.
- No significant gaps are introduced in output packets.
- The time update function becomes (function calls omitted for brevity)
output_time += ola_hop_size * playback_rate
Functions received a `playback_rate` parameter to check how many samples
are needed before iteration. Internal state is only updated when the
iteration is actually run, so the speed is allowed to change until
enough data is received.
The first WSOLA iteration overlapped audio with whatever was in the
`wsola_output` buffer. This was either silence (if not run before), or
old frames (if switching to 1x and back to a different speed).
Track the state of the output buffer and memcpy the whole window for the
first iteration instead.
`output_time` is used to set the center of the search block. Init of
both `search_block_index` and `output_time` with 0 caused inconsistent
search block movement for the first iterations.
Initialize with `search_block_center_offset` for consistency with initial
`search_block_index`.
Fixes#12028
There was an additional issue that audio was always delayed by half the
configured search-interval. This was caused by the `out` buffer length
not being included in the delay calculation.
Notes:
- Every WSOLA iteration advances the input buffer by _some amount_, and
produces data in the output buffer always of size `ola_hop_size`.
- `mp_scaletempo2_fill_buffer` is always called with `ola_hop_size`
- Thus, the rendered frames are always cleared immediately after
processing, and `num_complete_frames` is 0 in the delay calculation.
- The factors contributing to delay are:
- the pending samples in the input buffer according to the search
block position, and
- the pending rendered samples in the output buffer (always empty in
practice).
The frame_delay code looked like that of the rubberband filter, which
might not work for scaletempo2. Sometimes a different amount of input
audio was consumed by scaletempo2 than expected. It may have been caused
by speed changes being a more dynamic process in scaletempo2. This can
be seen by where `playback_rate` is used in `run_one_wsola_iteration`:
`playback_rate` is only referenced after the iteration, when updating
the time and removing old data from buffers.
In scaletempo2, the playback speed is applied by changing the amount the
search block is moved. That apparently averages out correctly at
constant playback speed, but when the speed changes, the error in this
assumption probably spikes. This error accumulated across all speed
changes because of the persistent `frame_delay` value.
With the removal of the persistent `frame_delay`, there should be no way
for the audio to drift off. By deriving the delay from filter buffer
positions, and the buffers are filled only as much as needed, the delay
always stays within buffer bounds.
c784820454 introduced a bool option type
as a replacement for the flag type, but didn't actually transition and
remove the flag type because it would have been too much mundane work.
When af_scaletempo2.c:process() detects a format change, it goes back
through mp_scaletempo2_init() to reinitialize everything. However,
mp_scaletempo2.input_buffer is not necessarily reallocated due to a
check in af_scaletempo2_internals.c:resize_input_buffer(). This is a
problem if the number of audio channels increases, since without
reallocating, the buffer for the new channel(s) will at best point to
NULL, and at worst uninitialized memory.
Since resize_input_buffer() is only called from two places, pull size
check out into mp_scaletempo2_fill_input_buffer(). This allows each
caller to decide whether they want to resize or not. We could be
smarter about when to reallocate, but that would add a lot of machinery
for a case I don't expect to be hit often in practice.
Simply returning out of this function leaks avpkt, need to always "goto
done".
Rewrite the logic a bit to make it more clear what's going on (IMO).
Fixes#9593
This brings my scaletempo2 benchmark down from ~22s to ~7s on my machine
(-march=native), and down to ~11s with a generic compile.
Guarded behind an appropriate #ifdef to avoid being ableist against
people who have the clinical need to run obscure platforms.
Closes#8848
scaletempo2 is a new audio filter for playing back
audio at modified speed and is based on chromium
commit 51ed77e3f37a9a9b80d6d0a8259e84a8ca635259.
It sounds subjectively better than the existing
implementions scaletempo and rubberband.
This mode drops or repeats audio data to adapt to video speed, instead
of resampling it or such. It was added to deal with SPDIF. The
implementation was part of fill_audio_out_buffers() - the entire
function is something whose complexity exploded in my face, and which I
want to clean up, and this is hopefully a first step.
Put it in a filter, and mess with the shitty glue code. It's all sort of
roundabout and illogical, but that can be rectified later. The important
part is that it works much like the resample or scaletempo filters.
For PCM audio, this does not work on samples anymore. This makes it much
worse. But for PCM you can use saner mechanisms that sound better. Also,
something about PTS tracking is wrong. But not wasting more time on
this.
Replace use of .min==1 with a proper flag. This is a good idea, because
it has nothing to do with numeric limits (also see commit 9d32d62b61
for how this can go wrong).
With this, m_option.min/max are strictly used for numeric limits.
Change all OPT_* macros such that they don't define the entire m_option
initializer, and instead expand only to a part of it, which sets certain
fields. This requires changing almost every option declaration, because
they all use these macros. A declaration now always starts with
{"name", ...
followed by designated initializers only (possibly wrapped in macros).
The OPT_* macros now initialize the .offset and .type fields only,
sometimes also .priv and others.
I think this change makes the option macros less tricky. The old code
had to stuff everything into macro arguments (and attempted to allow
setting arbitrary fields by letting the user pass designated
initializers in the vararg parts). Some of this was made messy due to
C99 and C11 not allowing 0-sized varargs with ',' removal. It's also
possible that this change is pointless, other than cosmetic preferences.
Not too happy about some things. For example, the OPT_CHOICE()
indentation I applied looks a bit ugly.
Much of this change was done with regex search&replace, but some places
required manual editing. In particular, code in "obscure" areas (which I
didn't include in compilation) might be broken now.
In wayland_common.c the author of some option declarations confused the
flags parameter with the default value (though the default value was
also properly set below). I fixed this with this change.