avfilter: add Dynamic Audio Normalizer filter

This commit is contained in:
LoRd_MuldeR 2015-07-07 16:19:59 +00:00 committed by Paul B Mahol
parent 3b365dda5c
commit 21436b95dc
4 changed files with 894 additions and 0 deletions

View File

@ -1544,6 +1544,164 @@ Optional. It should have a value much less than 1 (e.g. 0.05 or 0.02) and is
used to prevent clipping.
@end table
@section dynaudnorm
Dynamic Audio Normalizer.
This filter applies a certain amount of gain to the input audio in order
to bring its peak magnitude to a target level (e.g. 0 dBFS). However, in
contrast to more "simple" normalization algorithms, the Dynamic Audio
Normalizer *dynamically* re-adjusts the gain factor to the input audio.
This allows for applying extra gain to the "quiet" sections of the audio
while avoiding distortions or clipping the "loud" sections. In other words:
The Dynamic Audio Normalizer will "even out" the volume of quiet and loud
sections, in the sense that the volume of each section is brought to the
same target level. Note, however, that the Dynamic Audio Normalizer achieves
this goal *without* applying "dynamic range compressing". It will retain 100%
of the dynamic range *within* each section of the audio file.
@table @option
@item f
Set the frame length in milliseconds. In range from 10 to 8000 milliseconds.
Default is 500 milliseconds.
The Dynamic Audio Normalizer processes the input audio in small chunks,
referred to as frames. This is required, because a peak magnitude has no
meaning for just a single sample value. Instead, we need to determine the
peak magnitude for a contiguous sequence of sample values. While a "standard"
normalizer would simply use the peak magnitude of the complete file, the
Dynamic Audio Normalizer determines the peak magnitude individually for each
frame. The length of a frame is specified in milliseconds. By default, the
Dynamic Audio Normalizer uses a frame length of 500 milliseconds, which has
been found to give good results with most files.
Note that the exact frame length, in number of samples, will be determined
automatically, based on the sampling rate of the individual input audio file.
@item g
Set the Gaussian filter window size. In range from 3 to 301, must be odd
number. Default is 31.
Probably the most important parameter of the Dynamic Audio Normalizer is the
@code{window size} of the Gaussian smoothing filter. The filter's window size
is specified in frames, centered around the current frame. For the sake of
simplicity, this must be an odd number. Consequently, the default value of 31
takes into account the current frame, as well as the 15 preceding frames and
the 15 subsequent frames. Using a larger window results in a stronger
smoothing effect and thus in less gain variation, i.e. slower gain
adaptation. Conversely, using a smaller window results in a weaker smoothing
effect and thus in more gain variation, i.e. faster gain adaptation.
In other words, the more you increase this value, the more the Dynamic Audio
Normalizer will behave like a "traditional" normalization filter. On the
contrary, the more you decrease this value, the more the Dynamic Audio
Normalizer will behave like a dynamic range compressor.
@item p
Set the target peak value. This specifies the highest permissible magnitude
level for the normalized audio input. This filter will try to approach the
target peak magnitude as closely as possible, but at the same time it also
makes sure that the normalized signal will never exceed the peak magnitude.
A frame's maximum local gain factor is imposed directly by the target peak
magnitude. The default value is 0.95 and thus leaves a headroom of 5%*.
It is not recommended to go above this value.
@item m
Set the maximum gain factor. In range from 1.0 to 100.0. Default is 10.0.
The Dynamic Audio Normalizer determines the maximum possible (local) gain
factor for each input frame, i.e. the maximum gain factor that does not
result in clipping or distortion. The maximum gain factor is determined by
the frame's highest magnitude sample. However, the Dynamic Audio Normalizer
additionally bounds the frame's maximum gain factor by a predetermined
(global) maximum gain factor. This is done in order to avoid excessive gain
factors in "silent" or almost silent frames. By default, the maximum gain
factor is 10.0, For most inputs the default value should be sufficient and
it usually is not recommended to increase this value. Though, for input
with an extremely low overall volume level, it may be necessary to allow even
higher gain factors. Note, however, that the Dynamic Audio Normalizer does
not simply apply a "hard" threshold (i.e. cut off values above the threshold).
Instead, a "sigmoid" threshold function will be applied. This way, the
gain factors will smoothly approach the threshold value, but never exceed that
value.
@item r
Set the target RMS. In range from 0.0 to 1.0. Default is 0.0 - disabled.
By default, the Dynamic Audio Normalizer performs "peak" normalization.
This means that the maximum local gain factor for each frame is defined
(only) by the frame's highest magnitude sample. This way, the samples can
be amplified as much as possible without exceeding the maximum signal
level, i.e. without clipping. Optionally, however, the Dynamic Audio
Normalizer can also take into account the frame's root mean square,
abbreviated RMS. In electrical engineering, the RMS is commonly used to
determine the power of a time-varying signal. It is therefore considered
that the RMS is a better approximation of the "perceived loudness" than
just looking at the signal's peak magnitude. Consequently, by adjusting all
frames to a constant RMS value, a uniform "perceived loudness" can be
established. If a target RMS value has been specified, a frame's local gain
factor is defined as the factor that would result in exactly that RMS value.
Note, however, that the maximum local gain factor is still restricted by the
frame's highest magnitude sample, in order to prevent clipping.
@item n
Enable channels coupling. By default is enabled.
By default, the Dynamic Audio Normalizer will amplify all channels by the same
amount. This means the same gain factor will be applied to all channels, i.e.
the maximum possible gain factor is determined by the "loudest" channel.
However, in some recordings, it may happen that the volume of the different
channels is uneven, e.g. one channel may be "quieter" than the other one(s).
In this case, this option can be used to disable the channel coupling. This way,
the gain factor will be determined independently for each channel, depending
only on the individual channel's highest magnitude sample. This allows for
harmonizing the volume of the different channels.
@item c
Enable DC bias correction. By default is disabled.
An audio signal (in the time domain) is a sequence of sample values.
In the Dynamic Audio Normalizer these sample values are represented in the
-1.0 to 1.0 range, regardless of the original input format. Normally, the
audio signal, or "waveform", should be centered around the zero point.
That means if we calculate the mean value of all samples in a file, or in a
single frame, then the result should be 0.0 or at least very close to that
value. If, however, there is a significant deviation of the mean value from
0.0, in either positive or negative direction, this is referred to as a
DC bias or DC offset. Since a DC bias is clearly undesirable, the Dynamic
Audio Normalizer provides optional DC bias correction.
With DC bias correction enabled, the Dynamic Audio Normalizer will determine
the mean value, or "DC correction" offset, of each input frame and subtract
that value from all of the frame's sample values which ensures those samples
are centered around 0.0 again. Also, in order to avoid "gaps" at the frame
boundaries, the DC correction offset values will be interpolated smoothly
between neighbouring frames.
@item b
Enable alternative boundary mode. By default is disabled.
The Dynamic Audio Normalizer takes into account a certain neighbourhood
around each frame. This includes the preceding frames as well as the
subsequent frames. However, for the "boundary" frames, located at the very
beginning and at the very end of the audio file, not all neighbouring
frames are available. In particular, for the first few frames in the audio
file, the preceding frames are not known. And, similarly, for the last few
frames in the audio file, the subsequent frames are not known. Thus, the
question arises which gain factors should be assumed for the missing frames
in the "boundary" region. The Dynamic Audio Normalizer implements two modes
to deal with this situation. The default boundary mode assumes a gain factor
of exactly 1.0 for the missing frames, resulting in a smooth "fade in" and
"fade out" at the beginning and at the end of the input, respectively.
@item s
Set the compress factor. In range from 0.0 to 30.0. Default is 0.0.
By default, the Dynamic Audio Normalizer does not apply "traditional"
compression. This means that signal peaks will not be pruned and thus the
full dynamic range will be retained within each local neighbourhood. However,
in some cases it may be desirable to combine the Dynamic Audio Normalizer's
normalization algorithm with a more "traditional" compression.
For this purpose, the Dynamic Audio Normalizer provides an optional compression
(thresholding) function. If (and only if) the compression feature is enabled,
all input frames will be processed by a soft knee thresholding function prior
to the actual normalization process. Put simply, the thresholding function is
going to prune all samples whose magnitude exceeds a certain threshold value.
However, the Dynamic Audio Normalizer does not simply apply a fixed threshold
value. Instead, the threshold value will be adjusted for each individual
frame.
In general, smaller parameters result in stronger compression, and vice versa.
Values below 3.0 are not recommended, because audible distortion may appear.
@end table
@section earwax
Make audio easier to listen to on headphones.

View File

@ -67,6 +67,7 @@ OBJS-$(CONFIG_CHANNELSPLIT_FILTER) += af_channelsplit.o
OBJS-$(CONFIG_CHORUS_FILTER) += af_chorus.o generate_wave_table.o
OBJS-$(CONFIG_COMPAND_FILTER) += af_compand.o
OBJS-$(CONFIG_DCSHIFT_FILTER) += af_dcshift.o
OBJS-$(CONFIG_DYNAUDNORM_FILTER) += af_dynaudnorm.o
OBJS-$(CONFIG_EARWAX_FILTER) += af_earwax.o
OBJS-$(CONFIG_EBUR128_FILTER) += f_ebur128.o
OBJS-$(CONFIG_EQUALIZER_FILTER) += af_biquads.o

734
libavfilter/af_dynaudnorm.c Normal file
View File

@ -0,0 +1,734 @@
/*
* Dynamic Audio Normalizer
* Copyright (c) 2015 LoRd_MuldeR <mulder2@gmx.de>. Some rights reserved.
*
* This file is part of FFmpeg.
*
* FFmpeg is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* FFmpeg is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with FFmpeg; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
/**
* @file
* Dynamic Audio Normalizer
*/
#include <float.h>
#include "libavutil/avassert.h"
#include "libavutil/opt.h"
#define FF_BUFQUEUE_SIZE 302
#include "libavfilter/bufferqueue.h"
#include "audio.h"
#include "avfilter.h"
#include "internal.h"
typedef struct cqueue {
double *elements;
int size;
int nb_elements;
int first;
} cqueue;
typedef struct DynamicAudioNormalizerContext {
const AVClass *class;
struct FFBufQueue queue;
int frame_len;
int frame_len_msec;
int filter_size;
int dc_correction;
int channels_coupled;
int alt_boundary_mode;
double peak_value;
double max_amplification;
double target_rms;
double compress_factor;
double *prev_amplification_factor;
double *dc_correction_value;
double *compress_threshold;
double *fade_factors[2];
double *weights;
int channels;
int delay;
cqueue **gain_history_original;
cqueue **gain_history_minimum;
cqueue **gain_history_smoothed;
} DynamicAudioNormalizerContext;
#define OFFSET(x) offsetof(DynamicAudioNormalizerContext, x)
#define FLAGS AV_OPT_FLAG_AUDIO_PARAM|AV_OPT_FLAG_FILTERING_PARAM
static const AVOption dynaudnorm_options[] = {
{ "f", "set the frame length in msec", OFFSET(frame_len_msec), AV_OPT_TYPE_INT, {.i64 = 500}, 10, 8000, FLAGS },
{ "g", "set the filter size", OFFSET(filter_size), AV_OPT_TYPE_INT, {.i64 = 31}, 3, 301, FLAGS },
{ "p", "set the peak value", OFFSET(peak_value), AV_OPT_TYPE_DOUBLE, {.dbl = 0.95}, 0.0, 1.0, FLAGS },
{ "m", "set the max amplification", OFFSET(max_amplification), AV_OPT_TYPE_DOUBLE, {.dbl = 10.0}, 1.0, 100.0, FLAGS },
{ "r", "set the target RMS", OFFSET(target_rms), AV_OPT_TYPE_DOUBLE, {.dbl = 0.0}, 0.0, 1.0, FLAGS },
{ "n", "enable channel coupling", OFFSET(channels_coupled), AV_OPT_TYPE_INT, {.i64 = 1}, 0, 1, FLAGS },
{ "c", "enable DC correction", OFFSET(dc_correction), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 1, FLAGS },
{ "b", "enable alternative boundary mode", OFFSET(alt_boundary_mode), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 1, FLAGS },
{ "s", "set the compress factor", OFFSET(compress_factor), AV_OPT_TYPE_DOUBLE, {.dbl = 0.0}, 0.0, 30.0, FLAGS },
{ NULL }
};
AVFILTER_DEFINE_CLASS(dynaudnorm);
static av_cold int init(AVFilterContext *ctx)
{
DynamicAudioNormalizerContext *s = ctx->priv;
if (!(s->filter_size & 1)) {
av_log(ctx, AV_LOG_ERROR, "filter size %d is invalid. Must be an odd value.\n", s->filter_size);
return AVERROR(EINVAL);
}
return 0;
}
static int query_formats(AVFilterContext *ctx)
{
AVFilterFormats *formats;
AVFilterChannelLayouts *layouts;
static const enum AVSampleFormat sample_fmts[] = {
AV_SAMPLE_FMT_DBLP,
AV_SAMPLE_FMT_NONE
};
int ret;
layouts = ff_all_channel_layouts();
if (!layouts)
return AVERROR(ENOMEM);
ret = ff_set_common_channel_layouts(ctx, layouts);
if (ret < 0)
return ret;
formats = ff_make_format_list(sample_fmts);
if (!formats)
return AVERROR(ENOMEM);
ret = ff_set_common_formats(ctx, formats);
if (ret < 0)
return ret;
formats = ff_all_samplerates();
if (!formats)
return AVERROR(ENOMEM);
return ff_set_common_samplerates(ctx, formats);
}
static inline int frame_size(int sample_rate, int frame_len_msec)
{
const int frame_size = round((double)sample_rate * (frame_len_msec / 1000.0));
return frame_size + (frame_size % 2);
}
static void precalculate_fade_factors(double *fade_factors[2], int frame_len)
{
const double step_size = 1.0 / frame_len;
int pos;
for (pos = 0; pos < frame_len; pos++) {
fade_factors[0][pos] = 1.0 - (step_size * (pos + 1.0));
fade_factors[1][pos] = 1.0 - fade_factors[0][pos];
}
}
static cqueue *cqueue_create(int size)
{
cqueue *q;
q = av_malloc(sizeof(cqueue));
if (!q)
return NULL;
q->size = size;
q->nb_elements = 0;
q->first = 0;
q->elements = av_malloc(sizeof(double) * size);
if (!q->elements) {
av_free(q);
return NULL;
}
return q;
}
static void cqueue_free(cqueue *q)
{
av_free(q->elements);
av_free(q);
}
static int cqueue_size(cqueue *q)
{
return q->nb_elements;
}
static int cqueue_empty(cqueue *q)
{
return !q->nb_elements;
}
static int cqueue_enqueue(cqueue *q, double element)
{
int i;
av_assert2(q->nb_elements |= q->size);
i = (q->first + q->nb_elements) % q->size;
q->elements[i] = element;
q->nb_elements++;
return 0;
}
static double cqueue_peek(cqueue *q, int index)
{
av_assert2(index < q->nb_elements);
return q->elements[(q->first + index) % q->size];
}
static int cqueue_dequeue(cqueue *q, double *element)
{
av_assert2(!cqueue_empty(q));
*element = q->elements[q->first];
q->first = (q->first + 1) % q->size;
q->nb_elements--;
return 0;
}
static int cqueue_pop(cqueue *q)
{
av_assert2(!cqueue_empty(q));
q->first = (q->first + 1) % q->size;
q->nb_elements--;
return 0;
}
static const double s_pi = 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679;
static void init_gaussian_filter(DynamicAudioNormalizerContext *s)
{
double total_weight = 0.0;
const double sigma = (((s->filter_size / 2.0) - 1.0) / 3.0) + (1.0 / 3.0);
double adjust;
int i;
// Pre-compute constants
const int offset = s->filter_size / 2;
const double c1 = 1.0 / (sigma * sqrt(2.0 * s_pi));
const double c2 = 2.0 * pow(sigma, 2.0);
// Compute weights
for (i = 0; i < s->filter_size; i++) {
const int x = i - offset;
s->weights[i] = c1 * exp(-(pow(x, 2.0) / c2));
total_weight += s->weights[i];
}
// Adjust weights
adjust = 1.0 / total_weight;
for (i = 0; i < s->filter_size; i++) {
s->weights[i] *= adjust;
}
}
static int config_input(AVFilterLink *inlink)
{
AVFilterContext *ctx = inlink->dst;
DynamicAudioNormalizerContext *s = ctx->priv;
int c;
s->frame_len =
inlink->min_samples =
inlink->max_samples =
inlink->partial_buf_size = frame_size(inlink->sample_rate, s->frame_len_msec);
av_log(ctx, AV_LOG_DEBUG, "frame len %d\n", s->frame_len);
s->fade_factors[0] = av_malloc(s->frame_len * sizeof(*s->fade_factors[0]));
s->fade_factors[1] = av_malloc(s->frame_len * sizeof(*s->fade_factors[1]));
s->prev_amplification_factor = av_malloc(inlink->channels * sizeof(*s->prev_amplification_factor));
s->dc_correction_value = av_calloc(inlink->channels, sizeof(*s->dc_correction_value));
s->compress_threshold = av_calloc(inlink->channels, sizeof(*s->compress_threshold));
s->gain_history_original = av_calloc(inlink->channels, sizeof(*s->gain_history_original));
s->gain_history_minimum = av_calloc(inlink->channels, sizeof(*s->gain_history_minimum));
s->gain_history_smoothed = av_calloc(inlink->channels, sizeof(*s->gain_history_smoothed));
s->weights = av_malloc(s->filter_size * sizeof(*s->weights));
if (!s->prev_amplification_factor || !s->dc_correction_value ||
!s->compress_threshold || !s->fade_factors[0] || !s->fade_factors[1] ||
!s->gain_history_original || !s->gain_history_minimum ||
!s->gain_history_smoothed || !s->weights)
return AVERROR(ENOMEM);
for (c = 0; c < inlink->channels; c++) {
s->prev_amplification_factor[c] = 1.0;
s->gain_history_original[c] = cqueue_create(s->filter_size);
s->gain_history_minimum[c] = cqueue_create(s->filter_size);
s->gain_history_smoothed[c] = cqueue_create(s->filter_size);
if (!s->gain_history_original[c] || !s->gain_history_minimum[c] ||
!s->gain_history_smoothed[c])
return AVERROR(ENOMEM);
}
precalculate_fade_factors(s->fade_factors, s->frame_len);
init_gaussian_filter(s);
s->channels = inlink->channels;
s->delay = s->filter_size;
return 0;
}
static int config_output(AVFilterLink *outlink)
{
outlink->flags |= FF_LINK_FLAG_REQUEST_LOOP;
return 0;
}
static inline double fade(double prev, double next, int pos,
double *fade_factors[2])
{
return fade_factors[0][pos] * prev + fade_factors[1][pos] * next;
}
static inline double pow2(const double value)
{
return value * value;
}
static inline double bound(const double threshold, const double val)
{
const double CONST = 0.8862269254527580136490837416705725913987747280611935; //sqrt(PI) / 2.0
return erf(CONST * (val / threshold)) * threshold;
}
static double find_peak_magnitude(AVFrame *frame, int channel)
{
double max = DBL_EPSILON;
int c, i;
if (channel == -1) {
for (c = 0; c < frame->channels; c++) {
double *data_ptr = (double *)frame->extended_data[c];
for (i = 0; i < frame->nb_samples; i++)
max = FFMAX(max, fabs(data_ptr[i]));
}
} else {
double *data_ptr = (double *)frame->extended_data[channel];
for (i = 0; i < frame->nb_samples; i++)
max = FFMAX(max, fabs(data_ptr[i]));
}
return max;
}
static double compute_frame_rms(AVFrame *frame, int channel)
{
double rms_value = 0.0;
int c, i;
if (channel == -1) {
for (c = 0; c < frame->channels; c++) {
const double *data_ptr = (double *)frame->extended_data[c];
for (i = 0; i < frame->nb_samples; i++) {
rms_value += pow2(data_ptr[i]);
}
}
rms_value /= frame->nb_samples * frame->channels;
} else {
const double *data_ptr = (double *)frame->extended_data[channel];
for (i = 0; i < frame->nb_samples; i++) {
rms_value += pow2(data_ptr[i]);
}
rms_value /= frame->nb_samples;
}
return FFMAX(sqrt(rms_value), DBL_EPSILON);
}
static double get_max_local_gain(DynamicAudioNormalizerContext *s, AVFrame *frame,
int channel)
{
const double maximum_gain = s->peak_value / find_peak_magnitude(frame, channel);
const double rms_gain = s->target_rms > DBL_EPSILON ? (s->target_rms / compute_frame_rms(frame, channel)) : DBL_MAX;
return bound(s->max_amplification, FFMIN(maximum_gain, rms_gain));
}
static double minimum_filter(cqueue *q)
{
double min = DBL_MAX;
int i;
for (i = 0; i < cqueue_size(q); i++) {
min = FFMIN(min, cqueue_peek(q, i));
}
return min;
}
static double gaussian_filter(DynamicAudioNormalizerContext *s, cqueue *q)
{
double result = 0.0;
int i;
for (i = 0; i < cqueue_size(q); i++) {
result += cqueue_peek(q, i) * s->weights[i];
}
return result;
}
static void update_gain_history(DynamicAudioNormalizerContext *s, int channel,
double current_gain_factor)
{
if (cqueue_empty(s->gain_history_original[channel]) ||
cqueue_empty(s->gain_history_minimum[channel])) {
const int pre_fill_size = s->filter_size / 2;
s->prev_amplification_factor[channel] = s->alt_boundary_mode ? current_gain_factor : 1.0;
while (cqueue_size(s->gain_history_original[channel]) < pre_fill_size) {
cqueue_enqueue(s->gain_history_original[channel], s->alt_boundary_mode ? current_gain_factor : 1.0);
}
while (cqueue_size(s->gain_history_minimum[channel]) < pre_fill_size) {
cqueue_enqueue(s->gain_history_minimum[channel], s->alt_boundary_mode ? current_gain_factor : 1.0);
}
}
cqueue_enqueue(s->gain_history_original[channel], current_gain_factor);
while (cqueue_size(s->gain_history_original[channel]) >= s->filter_size) {
av_assert0(cqueue_size(s->gain_history_original[channel]) == s->filter_size);
const double minimum = minimum_filter(s->gain_history_original[channel]);
cqueue_enqueue(s->gain_history_minimum[channel], minimum);
cqueue_pop(s->gain_history_original[channel]);
}
while (cqueue_size(s->gain_history_minimum[channel]) >= s->filter_size) {
av_assert0(cqueue_size(s->gain_history_minimum[channel]) == s->filter_size);
const double smoothed = gaussian_filter(s, s->gain_history_minimum[channel]);
cqueue_enqueue(s->gain_history_smoothed[channel], smoothed);
cqueue_pop(s->gain_history_minimum[channel]);
}
}
static inline double update_value(double new, double old, double aggressiveness)
{
av_assert0((aggressiveness >= 0.0) && (aggressiveness <= 1.0));
return aggressiveness * new + (1.0 - aggressiveness) * old;
}
static void perform_dc_correction(DynamicAudioNormalizerContext *s, AVFrame *frame)
{
const double diff = 1.0 / frame->nb_samples;
int is_first_frame = cqueue_empty(s->gain_history_original[0]);
int c, i;
for (c = 0; c < s->channels; c++) {
double *dst_ptr = (double *)frame->extended_data[c];
double current_average_value = 0.0;
for (i = 0; i < frame->nb_samples; i++)
current_average_value += dst_ptr[i] * diff;
const double prev_value = is_first_frame ? current_average_value : s->dc_correction_value[c];
s->dc_correction_value[c] = is_first_frame ? current_average_value : update_value(current_average_value, s->dc_correction_value[c], 0.1);
for (i = 0; i < frame->nb_samples; i++) {
dst_ptr[i] -= fade(prev_value, s->dc_correction_value[c], i, s->fade_factors);
}
}
}
static double setup_compress_thresh(double threshold)
{
if ((threshold > DBL_EPSILON) && (threshold < (1.0 - DBL_EPSILON))) {
double current_threshold = threshold;
double step_size = 1.0;
while (step_size > DBL_EPSILON) {
while ((current_threshold + step_size > current_threshold) &&
(bound(current_threshold + step_size, 1.0) <= threshold)) {
current_threshold += step_size;
}
step_size /= 2.0;
}
return current_threshold;
} else {
return threshold;
}
}
static double compute_frame_std_dev(DynamicAudioNormalizerContext *s,
AVFrame *frame, int channel)
{
double variance = 0.0;
int i, c;
if (channel == -1) {
for (c = 0; c < s->channels; c++) {
const double *data_ptr = (double *)frame->extended_data[c];
for (i = 0; i < frame->nb_samples; i++) {
variance += pow2(data_ptr[i]); // Assume that MEAN is *zero*
}
}
variance /= (s->channels * frame->nb_samples) - 1;
} else {
const double *data_ptr = (double *)frame->extended_data[channel];
for (i = 0; i < frame->nb_samples; i++) {
variance += pow2(data_ptr[i]); // Assume that MEAN is *zero*
}
variance /= frame->nb_samples - 1;
}
return FFMAX(sqrt(variance), DBL_EPSILON);
}
static void perform_compression(DynamicAudioNormalizerContext *s, AVFrame *frame)
{
int is_first_frame = cqueue_empty(s->gain_history_original[0]);
int c, i;
if (s->channels_coupled) {
const double standard_deviation = compute_frame_std_dev(s, frame, -1);
const double current_threshold = FFMIN(1.0, s->compress_factor * standard_deviation);
const double prev_value = is_first_frame ? current_threshold : s->compress_threshold[0];
s->compress_threshold[0] = is_first_frame ? current_threshold : update_value(current_threshold, s->compress_threshold[0], (1.0/3.0));
const double prev_actual_thresh = setup_compress_thresh(prev_value);
const double curr_actual_thresh = setup_compress_thresh(s->compress_threshold[0]);
for (c = 0; c < s->channels; c++) {
double *const dst_ptr = (double *)frame->extended_data[c];
for (i = 0; i < frame->nb_samples; i++) {
const double localThresh = fade(prev_actual_thresh, curr_actual_thresh, i, s->fade_factors);
dst_ptr[i] = copysign(bound(localThresh, fabs(dst_ptr[i])), dst_ptr[i]);
}
}
} else {
for (c = 0; c < s->channels; c++) {
const double standard_deviation = compute_frame_std_dev(s, frame, c);
const double current_threshold = setup_compress_thresh(FFMIN(1.0, s->compress_factor * standard_deviation));
const double prev_value = is_first_frame ? current_threshold : s->compress_threshold[c];
s->compress_threshold[c] = is_first_frame ? current_threshold : update_value(current_threshold, s->compress_threshold[c], 1.0/3.0);
const double prev_actual_thresh = setup_compress_thresh(prev_value);
const double curr_actual_thresh = setup_compress_thresh(s->compress_threshold[c]);
double *const dst_ptr = (double *)frame->extended_data[c];
for (i = 0; i < frame->nb_samples; i++) {
const double localThresh = fade(prev_actual_thresh, curr_actual_thresh, i, s->fade_factors);
dst_ptr[i] = copysign(bound(localThresh, fabs(dst_ptr[i])), dst_ptr[i]);
}
}
}
}
static void analyze_frame(DynamicAudioNormalizerContext *s, AVFrame *frame)
{
if (s->dc_correction) {
perform_dc_correction(s, frame);
}
if (s->compress_factor > DBL_EPSILON) {
perform_compression(s, frame);
}
if (s->channels_coupled) {
const double current_gain_factor = get_max_local_gain(s, frame, -1);
int c;
for (c = 0; c < s->channels; c++)
update_gain_history(s, c, current_gain_factor);
} else {
int c;
for (c = 0; c < s->channels; c++)
update_gain_history(s, c, get_max_local_gain(s, frame, c));
}
}
static void amplify_frame(DynamicAudioNormalizerContext *s, AVFrame *frame)
{
int c, i;
for (c = 0; c < s->channels; c++) {
double *dst_ptr = (double *)frame->extended_data[c];
double current_amplification_factor;
cqueue_dequeue(s->gain_history_smoothed[c], &current_amplification_factor);
for (i = 0; i < frame->nb_samples; i++) {
const double amplification_factor = fade(s->prev_amplification_factor[c],
current_amplification_factor, i,
s->fade_factors);
dst_ptr[i] *= amplification_factor;
if (fabs(dst_ptr[i]) > s->peak_value)
dst_ptr[i] = copysign(s->peak_value, dst_ptr[i]);
}
s->prev_amplification_factor[c] = current_amplification_factor;
}
}
static int filter_frame(AVFilterLink *inlink, AVFrame *in)
{
AVFilterContext *ctx = inlink->dst;
DynamicAudioNormalizerContext *s = ctx->priv;
AVFilterLink *outlink = inlink->dst->outputs[0];
int ret = 0;
if (!cqueue_empty(s->gain_history_smoothed[0])) {
AVFrame *out = ff_bufqueue_get(&s->queue);
amplify_frame(s, out);
ret = ff_filter_frame(outlink, out);
}
analyze_frame(s, in);
ff_bufqueue_add(ctx, &s->queue, in);
return ret;
}
static int flush_buffer(DynamicAudioNormalizerContext *s, AVFilterLink *inlink,
AVFilterLink *outlink)
{
AVFrame *out = ff_get_audio_buffer(outlink, s->frame_len);
int c, i;
if (!out)
return AVERROR(ENOMEM);
for (c = 0; c < s->channels; c++) {
double *dst_ptr = (double *)out->extended_data[c];
for (i = 0; i < out->nb_samples; i++) {
dst_ptr[i] = s->alt_boundary_mode ? DBL_EPSILON : ((s->target_rms > DBL_EPSILON) ? FFMIN(s->peak_value, s->target_rms) : s->peak_value);
if (s->dc_correction) {
dst_ptr[i] *= ((i % 2) == 1) ? -1 : 1;
dst_ptr[i] += s->dc_correction_value[c];
}
}
}
s->delay--;
return filter_frame(inlink, out);
}
static int request_frame(AVFilterLink *outlink)
{
AVFilterContext *ctx = outlink->src;
DynamicAudioNormalizerContext *s = ctx->priv;
int ret = 0;
ret = ff_request_frame(ctx->inputs[0]);
if (ret == AVERROR_EOF && !ctx->is_disabled && s->delay)
ret = flush_buffer(s, ctx->inputs[0], outlink);
return ret;
}
static av_cold void uninit(AVFilterContext *ctx)
{
DynamicAudioNormalizerContext *s = ctx->priv;
int c;
av_freep(&s->prev_amplification_factor);
av_freep(&s->dc_correction_value);
av_freep(&s->compress_threshold);
av_freep(&s->fade_factors[0]);
av_freep(&s->fade_factors[1]);
for (c = 0; c < s->channels; c++) {
cqueue_free(s->gain_history_original[c]);
cqueue_free(s->gain_history_minimum[c]);
cqueue_free(s->gain_history_smoothed[c]);
}
av_freep(&s->gain_history_original);
av_freep(&s->gain_history_minimum);
av_freep(&s->gain_history_smoothed);
av_freep(&s->weights);
ff_bufqueue_discard_all(&s->queue);
}
static const AVFilterPad avfilter_af_dynaudnorm_inputs[] = {
{
.name = "default",
.type = AVMEDIA_TYPE_AUDIO,
.filter_frame = filter_frame,
.config_props = config_input,
.needs_writable = 1,
},
{ NULL }
};
static const AVFilterPad avfilter_af_dynaudnorm_outputs[] = {
{
.name = "default",
.type = AVMEDIA_TYPE_AUDIO,
.config_props = config_output,
.request_frame = request_frame,
},
{ NULL }
};
AVFilter ff_af_dynaudnorm = {
.name = "dynaudnorm",
.description = NULL_IF_CONFIG_SMALL("Dynamic Audio Normalizer."),
.query_formats = query_formats,
.priv_size = sizeof(DynamicAudioNormalizerContext),
.init = init,
.uninit = uninit,
.inputs = avfilter_af_dynaudnorm_inputs,
.outputs = avfilter_af_dynaudnorm_outputs,
.priv_class = &dynaudnorm_class,
};

View File

@ -83,6 +83,7 @@ void avfilter_register_all(void)
REGISTER_FILTER(CHORUS, chorus, af);
REGISTER_FILTER(COMPAND, compand, af);
REGISTER_FILTER(DCSHIFT, dcshift, af);
REGISTER_FILTER(DYNAUDNORM, dynaudnorm, af);
REGISTER_FILTER(EARWAX, earwax, af);
REGISTER_FILTER(EBUR128, ebur128, af);
REGISTER_FILTER(EQUALIZER, equalizer, af);