This patch introduces a new frame side data type AVFilmGrainParams for use
with video codecs which support it.
It can save a lot of memory used for duplicate processed reference frames and
reduce copies when applying film grain during presentation.
This is intended to replace the deprecated the AV_FRAME_DATA_QP_TABLE*
API and extend it to a wider range of codecs.
In the future, it may also be extended to support other encoding
parameters such as motion vectors.
Additional changes by Anton Khirnov <anton@khirnov.net> with suggestions
by Lynne <dev@lynne.ee>.
Signed-off-by: Juan De León <juandl@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This matches the inclusion of the other hwcontext_<hwaccel>.h headers.
The skipping of the header depending on build flags is already present.
Signed-off-by: Daniel Playfair Cal: <daniel.playfair.cal@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Required minimal changes to the code so made sense to implement.
FFT and MDCT tested, the output of both was properly rounded.
Fun fact: the non-power-of-two fixed-point FFT and MDCT are the fastest ever
non-power-of-two fixed-point FFT and MDCT written.
This can replace the power of two integer MDCTs in aac and ac3 if the
MIPS optimizations are ported across.
Unfortunately the ac3 encoder uses a 16-bit fixed point forward transform,
unlike the encoder which uses a 32bit inverse transform, so some modifications
might be required there.
The 3-point FFT is somewhat less accurate than it otherwise could be,
having minor rounding errors with bigger transforms. However, this
could be improved later, and the way its currently written is the way one
would write assembly for it.
Similar rounding errors can also be found throughout the power of two FFTs
as well, though those are more difficult to correct.
Despite this, the integer transforms are more than accurate enough.
This commit adds the necessary code to initialize and use a Vulkan device
within the hwcontext libavutil framework.
Currently direct mapping to VAAPI and DRM frames is functional, and
transfers to CUDA and native frames are supported.
Lets hope the future Vulkan video decode extension fits well within this
framework.
Simply moves and templates the actual transforms to support an
additional data type.
Unlike the float version, which is equal or better than libfftw3f,
double precision output is bit identical with libfftw3.
This commit adds a new API to libavutil to allow for arbitrary transformations
on various types of data.
This is a partly new implementation, with the power of two transforms taken
from libavcodec/fft_template, the 5 and 15-point FFT taken from mdct15, while
the 3-point FFT was written from scratch.
The (i)mdct folding code is taken from mdct15 as well, as the mdct_template
code was somewhat old, messy and not easy to separate.
A notable feature of this implementation is that it allows for 3xM and 5xM
based transforms, where M is a power of two, e.g. 384, 640, 768, 1280, etc.
AC-4 uses 3xM transforms while Siren uses 5xM transforms, so the code will
allow for decoding of such streams.
A non-exaustive list of supported sizes:
4, 8, 12, 16, 20, 24, 32, 40, 48, 60, 64, 80, 96, 120, 128, 160, 192, 240,
256, 320, 384, 480, 512, 640, 768, 960, 1024, 1280, 1536, 1920, 2048, 2560...
The API was designed such that it allows for not only 1D transforms but also
2D transforms of certain block sizes. This was partly on accident as the stride
argument is required for Opus MDCTs, but can be used in the context of a 2D
transform as well.
Also, various data types would be implemented eventually as well, such as
"double" and "int32_t".
Some performance comparisons with libfftw3f (SIMD disabled for both):
120:
22353 decicycles in fftwf_execute, 1024 runs, 0 skips
21836 decicycles in compound_fft_15x8, 1024 runs, 0 skips
128:
22003 decicycles in fftwf_execute, 1024 runs, 0 skips
23132 decicycles in monolithic_fft_ptwo, 1024 runs, 0 skips
384:
75939 decicycles in fftwf_execute, 1024 runs, 0 skips
73973 decicycles in compound_fft_3x128, 1024 runs, 0 skips
640:
104354 decicycles in fftwf_execute, 1024 runs, 0 skips
149518 decicycles in compound_fft_5x128, 1024 runs, 0 skips
768:
109323 decicycles in fftwf_execute, 1024 runs, 0 skips
164096 decicycles in compound_fft_3x256, 1024 runs, 0 skips
960:
186210 decicycles in fftwf_execute, 1024 runs, 0 skips
215256 decicycles in compound_fft_15x64, 1024 runs, 0 skips
1024:
163464 decicycles in fftwf_execute, 1024 runs, 0 skips
199686 decicycles in monolithic_fft_ptwo, 1024 runs, 0 skips
With SIMD we should be faster than fftw for 15xM transforms as our fft15 SIMD
is around 2x faster than theirs, even if our ptwo SIMD is slightly slower.
The goal is to remove the libavcodec/mdct15 code and deprecate the
libavcodec/avfft interface once aarch64 and x86 SIMD code has been ported.
New code throughout the project should use this API.
The implementation passes fate when used in Opus, AAC and Vorbis, and the output
is identical with ATRAC9 as well.
The dynamic metadata contains data for color volume transform -
application 4 of SMPTE 2094-40:2016 standard. The data comes from
HEVC in the SEI_TYPE_USER_DATA_REGISTERED_ITU_T_T35.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
We have a pattern of wrapping CUDA calls to print errors and
normalise return values that is used in a couple of places. To
avoid duplication and increase consistency, let's put the wrapper
implementation in a shared place and use it everywhere.
Affects:
* avcodec/cuviddec
* avcodec/nvdec
* avcodec/nvenc
* avfilter/vf_scale_cuda
* avfilter/vf_scale_npp
* avfilter/vf_thumbnail_cuda
* avfilter/vf_transpose_npp
* avfilter/vf_yadif_cuda
This uses any devices it can find on the host system - on a system with no
hardware device support or in builds with no support included it will do
nothing and pass.
This new side-data will contain info on how a packet is encrypted.
This allows the app to handle packet decryption.
Signed-off-by: Jacob Trimble <modmaker@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This was added in early 2013 and abandoned several months later; as far as
I can tell, there are no external users. Future OpenCL use will be via
hwcontext, which requires neither special OpenCL-only API nor global state
in libavutil.
All internal users are also deleted - this is just the unsharp filter
(replaced by unsharp_opencl, which is more flexible) and the deshake filter
(no replacement).
Rework it to improve performance. Now mutex is not shared by workers,
instead each worker has its own mutex and condition variable. This
reduces lock contention between workers. Also use atomic variable for
counter.
The interface also allows execute to run special function on main
thread, requested by Ronald.
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
To be used with the new d3d11 hwaccel decode API.
With the new hwaccel API, we don't want surfaces to depend on the
decoder (other than the required dimension and format). The old D3D11VA
pixfmt uses ID3D11VideoDecoderOutputView pointers, which include the
decoder configuration, and thus is incompatible with the new hwaccel
API. This patch introduces AV_PIX_FMT_D3D11, which uses ID3D11Texture2D
and an index. It's simpler and compatible with the new hwaccel API.
The introduced hwcontext supports only the new pixfmt.
Frame upload code untested.
Significantly based on work by Steve Lhomme <robux4@gmail.com>, but with
heavy changes/rewrites.
Merges Libav commit fff90422d1.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This adds tons of code for no other benefit than making VideoToolbox
support conform with the new hwaccel API (using hw_device_ctx and
hw_frames_ctx).
Since VideoToolbox decoding does not actually require the user to
allocate frames, the new code does mostly nothing.
One benefit is that ffmpeg_videotoolbox.c can be dropped once generic
hwaccel support for ffmpeg.c is merged from Libav.
Does not consider VDA or VideoToolbox encoding.
Fun fact: the frame transfer functions are copied from vaapi, as the
mapping makes copying generic boilerplate. Mapping itself is not
exported by the VT code, because I don't know how to test.
* commit '92db5083077a8b0f8e1050507671b456fd155125':
build: Generate pkg-config files from Make and not from configure
build: Store library version numbers in .version files
Includes cherry-picked commits 8a34f36593 and
ee164727dd to fix issues.
Changes were also made to retain support for raise_major and build_suffix.
Reviewed-by: ubitux
Merged-by: James Almer <jamrial@gmail.com>
* commit '11a9320de54759340531177c9f2b1e31e6112cc2':
build: Move build-system-related helper files to a separate subdirectory
"ffbuild" directory name is used instead of "avbuild".
Merged-by: Clément Bœsch <u@pkh.me>
* commit '2170017a1cd033b6f28e16476921022712a522d8':
avutil: fix data race in av_get_cpu_flags()
This commit is a noop, see fed50c4304
Merged-by: James Almer <jamrial@gmail.com>
This moves work from the configure to the Make stage where it can
be parallelized and ensures that pkgconfig files are updated when
library versions change.
Bug-Id: 449
Make the one-time initialization in av_get_cpu_flags() thread-safe. The
static variables |flags|, |cpuflags_mask|, and |checked| in
libavutil/cpu.c are read and written using normal load and store
operations. These are considered as data races. The fix is to use atomic
load and store operations.
Remove the |checked| variable because the invalid value of -1 for
|flags| can be used to indicate the same condition. Rename |flags| to
|cpu_flags| and move it to file scope.
The fix can be verified by running the libavutil/tests/cpu_init.c test
program under ThreadSanitizer:
./configure --toolchain=clang-tsan
make libavutil/tests/cpu_init
libavutil/tests/cpu_init
There should be no warnings from ThreadSanitizer.
Co-author: Dmitry Vyukov of Google, who suggested the data race fix.
Signed-off-by: Wan-Teh Chang <wtc@google.com>
While no decoder currently exports spherical information, this type
represents a frame property that has to be passed through from container
to frames.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
While no decoder currently exports spherical information, this type
represents a frame property that has to be passed through from container
to frames.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Fixes make checkheaders on systems without the Cuda Toolkit, which
was broken after the dynlink changes.
Signed-off-by: James Almer <jamrial@gmail.com>