configure checks that the assembler supports the B extension (or rather
its constituents) anyway. These macros were dodging sanity checks for
unsupported instructions and nothing else.
The RISC-V B bit manipulation extension was ratified only two months ago.
But it is strictly equivalent to the union of the zba, zbb and zbs
extensions which were defined almost 3 years earlier. Rather than require
new assembler, we can just match the extension name manually and translate
it into its constituent parts.
If __riscv_hwprobe() fails, then the kernel version is presumably too
old. There is not much point falling back to the auxillary vector.
- The Linux kernel requires I, so the flag is always set on Linux, and
run-time detection is unnecessary. Our RISC-V assembler does anyway not
support targets without I.
- Linux can compile with or without F and D, but it cannot perform
run-time detection for them (a kernel with F support will not boot a
processor without F). The run-time detection is thus useless in that
case. Besides F and D extensions are used throughout the C code, so
their run-time detection would not be practical.
- Support for V was added in a later kernel version than riscv_hwprobe(),
so the system call will always be available if the kernel supports V.
The only exception would be vendor kernel forks, but those are known to
haphasardly pretend to support V on systems without actual V support, or
with only pre-ratification binary-incompatible version. Furthermore, a
large chunk of our optimisations require Zba and/or Zbb which cannot be
detected with HWCAP in those kernels.
For what it is worth, OpenJDK already took a similar action. Note that this
keeps AT_HWCAP usage for platforms with neither C run-time <sys/hwprobe.h>
nor kernel <asm/hwprobe.h>, notably kernels other than Linux.
Fixes: CID1604383 Unchecked return value
Fixes: CID1604439 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1604487 Unchecked return value
Fixes: CID1604494 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself
Fixes: 68550/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-6424065930756096
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
I've accidentally used API not available on the checked version.
Additionally check for the SDK to be new enough to even have the
CVImageBufferCreateColorSpaceFromAttachments API to not fail
the build.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
There's nothing stopping users from writing to such buffers.
Its more accurate to say they are singular, i.e. not duplicated
between multiple submissions.
This can be helpful for global statistics, or error propagation
purposes.
Should make strict compilers happy.
Also, make AV_COPY128 use integer operations while at it. Removing the
inclusion of immintrin.h ensures a lot less intrinsic related headers are
included as well, which fixes a clash of defines with some Clang versions.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: James Almer <jamrial@gmail.com>
av_executor_execute run the task directly when thread is disabled.
The task can schedule a new task by call av_executor_execute. This
forms an implicit recursive call. This patch removed the recursive
call.
Removed by accident in the previous commits. This makes the code only run when
compiled with GCC and Clang like before. Support for other compilers like msvc
can be added later.
Signed-off-by: James Almer <jamrial@gmail.com>
This has the benefit of removing any SSE -> AVX penalty that may happen when
the compiler emits VEX encoded instructions.
Signed-off-by: James Almer <jamrial@gmail.com>
When called inside a loop, the inline asm version results in one pxor
unnecessarely emitted per iteration, as the contents of the __asm__() block are
opaque to the compiler's instruction scheduler.
This is not the case with intrinsics, where pxor will be emitted once with any
half decent compiler.
This also has the benefit of removing any SSE -> AVX penalty that may happen
when the compiler emits VEX encoded instructions.
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: CID1591930 Wrong sizeof argument
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Steve Lhomme <robux4@ycbcr.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1591944 Wrong sizeof argument
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Steve Lhomme <robux4@ycbcr.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1598558 Resource leak
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Steve Lhomme <robux4@ycbcr.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: CID1591909 Wrong sizeof argument
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Steve Lhomme <robux4@ycbcr.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In addition to the other properties, try to obtain the right
CGColorSpace and set it as well, else it could lead to a CVBuffer
tagged as BT.2020 but with a CGColorSpace indicating BT.709.
Therefore it is essential for consistency to set a colorspace
according to the other values, or if none can be obtained (for example
because the other values are all unspecified) unset it as well.
Fix#10884
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The documentation was not clear at all what specifically the
function does, so it was left unspecified if it will unset or
not touch attachments it could not map from the AVFrame.
The documentation of the return value was wrong as well.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
When mapping AVFrame properties to the CVBuffer attachments, it is
necessary to properly delete undefined attachments, else we can
leave incorrect values in there guessed from VideoToolbox for
example, leading to inconsistent results where the AVFrame and
CVBuffer differ in metadata.
Ref #10884
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Given that a video stream/frame may have only one view or both views coded with
the packing information being unavailable, this commit adds a new type value
AV_STEREO3D_UNSPEC for this purpose.
The most common case for this is container level signaling of Stereo3D video
where the specifics are defined at the bitstream level.
Signed-off-by: James Almer <jamrial@gmail.com>
Before the patch, disable threads support at configure/build time
was the only method to force zero thread in executor. However,
it's common practice for libavcodec to run on caller's thread when
user specify thread number to one. And for WASM environment, whether
threads are supported needs to be detected at runtime. So executor
should support zero thread at runtime.
A single thread executor can be useful, e.g., to handle network
protocol. So we can't take thread_count one as zero thread, which
disabled a valid usercase.
Other libraries take -threads 0 to mean auto. Executor as a low
level utils doesn't do cpu detect. So take thread_count zero as
zero thread, literally.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This avoids hardcoding any implementation-specific limitiations as
part of the API, and allows for future expandability.
This also allows API users to more conveniently convert the
values into floats without hardcoding specific conversion constants.
The API was committed a few days ago, so changing this field now
is within the realms of acceptable.
On m1, kpc_get_counter_count(KPC_MASK) return 8 in my test. The
exact value doesn't matter in our case, as long as we have a
sufficiently large array
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The default timer register pmccntr_el0 usually requires enabling
access with e.g. a kernel module (while it is accessible by
default on Windows). On Linux, the default for checkasm benchmarks
is to use perf (if suitable headers are available) though.
On macOS, using cntvct_el0 gives measurements with the same
magnitude as mach_absolute_time (which is used currently), but
possibly with a little less overhead/noise.
Signed-off-by: Martin Storsjö <martin@martin.st>
The vendor has long since switched to Arm, with the last product
reaching their official end-of-life over 11 years ago. Linux support for
the ISA was dropped 7 years ago. More importantly, this architecture was
never supported by upstream GCC, and the vendor fork is stuck at version
4.2, which FFmpeg no longer supports (as per C11 requirement).
Presumably, this is still the case given the lack of vendor support.
Indeed all of the code being removed here consisted of inline assembler
scalar optimisations. A sane C compiler should be able to perform those
automatically nowadays (with the sole exception of fast CLZ detection),
but this is moot as this architecture is evidently dead.
C code or compiler built-ins are preferable over inline assembler for
byte-swaps as it allows for better optimisations (e.g. instruction
scheduling) which would otherwise be impossible.
As with f64c2e710f for x86 and Arm,
this removes the inline assembler on GCC (and Clang) since we now
require recent enough compiler versions. This indeed seems to work on
AArch64, SuperH and, if Zbb is enabled, RISC-V. (AVR32 was not tested
since it has no known working compilers at this time.)
Since the C11 support is required, those GCC versions can no longer be
supported anyhow. (Clang pretends to be GCC 4.4, but it looks like the
code was intended for old GCC specifically.)
Since the C11 support is required, those GCC versions can no longer be
supported anyhow. (Clang pretends to be GCC 4.4, but the removed code
does not seem to have been intended for Clang.)
Otherwise nothing is written into the destination when a write mapping
is requested.
For example, a vulkan frame mapped from a drm frame (which is wrapped as
a vaapi frame in the example) is used as the output of scale_vulkan
filter, it always gets a green screen without this patch.
ffmpeg -init_hw_device vaapi=va -init_hw_device vulkan=vulkan@va
-filter_hw_device vulkan -f lavfi -i testsrc=size=352x288,format=nv12
-vf
"hwupload,scale_vulkan,hwmap=derive_device=vaapi:reverse=1,format=vaapi,hwdownload,format=nv12"
-f nut - | ffplay -
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise
swaps. The result is about five times slower than if targetting Zbb
statically, but still a lot faster than the default bespoke C code or a
call to GCC run-time functions.
For 16-bit swap, this is however unsurprisingly a lot worse, and so this
sticks to the baseline. In fact, even using REV8 statically does not
seem to be beneficial in that case.
Zbb static Zbb dynamic I baseline
bswap16: 0.668184765 3.340764069 0.668029012
bswap32: 0.668174014 3.340763319 9.353855435
bswap64: 0.668221765 3.340496313 14.698672283
(seconds for 1 billion iterations on a SiFive-U74 core)
Due to hysterical raisins, most RISC-V Linux distributions target a
RV64GC baseline excluding the Bit-manipulation ISA extensions, most
notably:
- Zba: address generation extension and
- Zbb: basic bit manipulation extension.
Most CPUs that would make sense to run FFmpeg on support Zba and Zbb
(including the current FATE runner), so it makes sense to optimise for
them. In fact a large chunk of existing assembler optimisations relies
on Zba and/or Zbb.
Since we cannot patch shared library code, the next best thing is to
carry a flag initialised at load-time and check it on need basis.
This results in 3 instructions overhead on isolated use, e.g.:
1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported)
LBU rd, %pcrel_lo(1b)(rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
The C compiler will typically load the flag ahead of time to reducing
latency, and can also keep it around if Zbb is used multiple times in a
single optimisation scope. For this to work, the flag symbol must be
hidden; otherwise the optimisation degrades with a GOT look-up to
support interposition:
1: AUIPC rd, GOT_OFFSET_HI
LD rd, GOT_OFFSET_LO(rd)
LBU rd, (rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
This patch adds code to provision the flag in libraries using bit
manipulation functions from libavutil: byte-swap, bit-weight and
counting leading or trailing zeroes.
It will fallback to mach_absolute_time inside libavutil/timer.h
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The function pointer is appended to the structure for backward binary
compatibility. Fortunately, this is allocated by libavutil, not by the
user, so increasing the structure size is safe.
This is test code after all so it should test things
Fixes: CID1518990 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Failure is possible due to strdup()
Fixes: CID1516764 Dereference null return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: The warnings from CID1598553 Uninitialized scalar variable
Passing partly initialized structs is ugly and asking for hard to rieproduce bugs,
The uninitialized fields where not used
Reviewed-by: "Xiang, Haihao" <haihao.xiang-at-intel.com@ffmpeg.org>
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When AVHWFramesContext.initial_pool_size is 0, a dynamic frame pool is
required. We may support this under certain conditions, e.g. oneVPL 2.9+
support dynamic frame allocation, we needn't provide a fixed frame pool
in the mfxFrameAllocator.Alloc callback.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
vtype_vli computes the VTYPE value with the optimal LMUL for a given
element width, tail and mask policies and a run-time vector length.
vtype_ivli does the same, but with the compile-time constant vector
length.
vwtypei and vntypei can be used to widen or narrow a VTYPE value for
use in mixed-width vector-optimised functions.
The B Bit manipulation extension was not defined to this day, and
probably never will. Instead it was broken down into Zba, Zbb, Zbc and
Zbs with no particular blessed set to make up B.
This removes the bogus field test. Linux never set this bit, nor
(AFAICT) did FreeBSD or any other OS. We can always add it back in the
unlikely event that it gets taken into use.
Not all C run-times support this, and even then, it will be a while
before distributions provide recent enough versions thereof.
Since this is a trivial system call wrapper, we might just as well call
the corresponding kernel system call directly where the C run-time lacks
support but the kernel headers are new enough (as is the case on Debian
Unstable at the time of writing). In doing so, we need to add a few more
guards as the first suitable kernel (headers) release did not expose the
V, Zba and Zbb extensions.
This adds the Linux-specific function call to detect CPU features. Unlike
the more portable auxillary vector, this supports extensions other than
single lettered ones. At this point, FFmpeg already needs this to detect
Zba and Zbb at run-time, and probably will need it for Zvbb in the near
future.
Support will be available in glibc 2.40 onward.
Clarify comment regarding type of integers regarding AV_OPT_TYPE_IMAGE_SIZE.
Signed-off-by: Marcus B Spencer <marcus@marcusspencer.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is a common error code from e.g. CDNs or cloud storage, and
it is useful to be able to handle it differently to a generic
4XX code.
Its source is RFC6585.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
av_mastering_display_metadata_alloc() is not useful in scenarios where you need to
know the runtime size of AVMasteringDisplayMetadata.
Signed-off-by: James Almer <jamrial@gmail.com>
* SMPTE ST 2128 IPT-C2 defines the coefficients utilized in DoVi
Profile 5. Profile 5 can thus now be represented in VUI as
{AVCOL_RANGE_JPEG, AVCOL_PRI_BT2020, AVCOL_TRC_SMPTE2084,
AVCOL_SPC_IPT_C2, AVCHROMA_LOC_LEFT} (although other chroma
sample locations are allowed). AVCOL_TRC_SMPTE2084 should in
this case be interpreted as 'PQ with reshaping'.
* YCgCo-Re and YCgCo-Ro define the bitexact YCgCo-R, where the
number of bits added to a source RGB bit depth is 2 (i.e., even)
and 1 (i.e., odd), respectively.
As well as accessors plus a function for allocating this struct with
extension blocks,
Definitions generously taken from quietvoid/dovi_tool, which is
assembled as a collection of various patent fragments, as well as output
by the official Dolby Vision bitstream verifier tool.
The NLQ pivots are not documented but should be present in the header
for profile 7 RPU format. It has been verified using Dolby's
verification toolkit.
Signed-off-by: quietvoid <tcChlisop0@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
struct Foo * declares a new type (namely struct Foo)
if there is no declaration of struct Foo already visible
in the current scope; otherwise it is just a pointer to
an element of the already declared type "struct Foo".
There is a gotcha with the first case:
struct Foo is only declared in its scope; a later declaration
of struct Foo in an enclosing scope declares a different type.
This happens in hwcontext_vulkan.h if it is included before
hwcontext.h, because some declarations of struct AVHWDeviceContext
and struct AVHWFramesContext have function prototype scope.
Compilers warn about this (during checkheaders):
‘struct AVHWDeviceContext’ declared inside parameter list will not
be visible outside of this definition or declaration
Fix this by including hwcontext.h.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also move AV_CHECK_OFFSET to its only user, namely
lavc/arm/mpegvideo_arm.c and rename it to CHECK_OFFSET.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.
Keep it for external users in order to not cause breakages.
Also improve the other headers a bit while just at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also update the checks that guard against inserting
a new enum entry in the middle of a range.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
GCC 9-13 do not emit warnings for this at all optimization
levels even when -Wmaybe-uninitialized is not disabled.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
C11 required to use ATOMIC_VAR_INIT to statically initialize
atomic objects with static storage duration. Yet this macro
was unsuitable for initializing structures [1] and was actually
unneeded for all known implementations (this includes our
compatibility fallback implementations which simply wrap the value
in parentheses: #define ATOMIC_VAR_INIT(value) (value)).
Therefore C17 deprecated the macro and C23 actually removed it [2].
Since commit 5ff0eb34d2 we default
to C17 if the compiler supports it; Clang warns about ATOMIC_VAR_INIT
in this mode. Given that no implementation ever needed this macro,
this commit stops using it to avoid this warning.
[1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2396.htm#dr_485
[2]: https://en.cppreference.com/w/c/atomic/ATOMIC_VAR_INIT
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
A pointer conversion is UB if the resulting pointer is not
correctly aligned for the resultant type, even if no
load/store is ever performed through that pointer (C11 6.3.2.3 (7)).
This may happen in opt_copy_elem(), because the pointers are
converted even when they belong to a type that does not guarantee
sufficient alignment.
Fix this by deferring the cast after having checked the type.
Also make the casts -Wcast-qual safe and avoid an indirection
for src.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is not documented to be safe and in any case it is nonsense:
Currently av_strdup(NULL) returns NULL and in order to distinguish
this from a genuine allocation failure, opt_copy_elem()
checked afterwards whether src was actually NULL. But then one
can simply check in advance whether one should call av_strdup()
at all.
set_string() was even worse and returned ENOMEM in case the value
to be duplicated is NULL; this only worked because
av_opt_set_defaults2() does not check the return value at all
(given that it can't propagate it).
These two places account for 389114 of 390356 av_strdup(NULL)
calls during one FATE run.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: 62276/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4802790784303104
Fixes: signed integer overflow: 1768972133 + 968491058 cannot be represented in type 'int'
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>