Commit Graph

44427 Commits

Author SHA1 Message Date
Luca Barbato 9c2d36fcaf dv: Convert to the new bitstream reader 2017-02-11 20:29:44 +01:00
Luca Barbato ba30b74686 aac: Validate the sbr sample rate before using the value
Avoid a floating point exception.

Bug-Id: 1027
CC: libav-stable@libav.org
2017-02-11 20:23:11 +01:00
Luca Barbato 0ee78020cd configure: Move up the avbuild directory creation
The early check for inconsistent in-source vs out-of-source build
cannot generate a config.log otherwise.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-11 15:47:43 +01:00
wm4 c2f97f0508 hwcontext_dxva2: support D3D9Ex
D3D9Ex uses different driver paths. This helps with "headless"
configurations when no user logs in. Plain D3D9 device creation will
fail if no user is logged in, while it works with D3D9Ex.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-02-11 11:37:45 +01:00
wm4 04f3bd3496 AVFrame: add an opaque_ref field
This is an extended version of the AVFrame.opaque field, which can be
used to attach arbitrary user information to an AVFrame.

The usefulness of the opaque field is rather limited, because it can
store only up to 32 bits of information (or 64 bit on 64 bit systems).
It's not possible to set this field to a memory allocation, because
there is no way to deallocate it correctly.

The opaque_ref field circumvents this by letting the user set an
AVBuffer, which makes the user data refcounted.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-02-11 11:37:45 +01:00
Anton Khirnov 4de220d2e3 frame: allow align=0 (meaning automatic) for av_frame_get_buffer()
This will avoid every caller from hardcoding some specific alignment,
which may break in the future with new instruction sets.
2017-02-11 11:37:45 +01:00
Anton Khirnov f44ec22e09 lavc: use av_cpu_max_align() instead of hardcoding alignment requirements 2017-02-11 11:37:45 +01:00
Anton Khirnov e6bff23f1e cpu: add a function for querying maximum required data alignment 2017-02-11 11:37:45 +01:00
Anton Khirnov 5c8a5765dc scale_npp: explicitly set the output frames context for passthrough mode
This is no longer done automatically for filters marked as
hwframe-aware.
2017-02-11 11:37:45 +01:00
Anton Khirnov 6f554521af Use the new AVIOContext destructor. 2017-02-11 11:37:45 +01:00
Anton Khirnov 99684f3ae7 avio: add a destructor for AVIOContext
Before this commit, AVIOContext is to be freed with a plain av_free(),
which prevents us from adding any deeper structure to it.
2017-02-11 11:37:44 +01:00
Martin Storsjö 435cd7bc99 arm: vp9lpf: Use orrs instead of orr+cmp
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:44:04 +02:00
Martin Storsjö e1f9de86f4 arm/aarch64: vp9lpf: Calculate !hev directly
Previously we first calculated hev, and then negated it.

Since we were able to schedule the negation in the middle
of another calculation, we don't see any gain in all cases.

Before:                     Cortex A7      A8      A9     A53  A53/AArch64
vp9_loop_filter_v_4_8_neon:     147.0   129.0   115.8    89.0         88.7
vp9_loop_filter_v_8_8_neon:     242.0   198.5   174.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    500.0   419.5   382.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   971.2   825.5   731.5   579.0        453.0
After:
vp9_loop_filter_v_4_8_neon:     143.0   127.7   114.8    88.0         87.7
vp9_loop_filter_v_8_8_neon:     241.0   197.2   173.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    497.0   419.5   379.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   965.2   818.7   731.4   579.0        452.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:43:59 +02:00
Martin Storsjö 3fcf788fbb aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling
This work is sponsored by, and copyright, Google.

Before:                           Cortex A53
vp9_inv_dct_dct_16x16_sub1_add_neon:   235.3
vp9_inv_dct_dct_32x32_sub1_add_neon:   555.1
After:
vp9_inv_dct_dct_16x16_sub1_add_neon:   180.2
vp9_inv_dct_dct_32x32_sub1_add_neon:   475.3

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:31:58 +02:00
Martin Storsjö a76bf8cf12 arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling
This work is sponsored by, and copyright, Google.

Before:                            Cortex A7      A8      A9     A53
vp9_inv_dct_dct_16x16_sub1_add_neon:   273.0   189.5   211.7   235.8
vp9_inv_dct_dct_32x32_sub1_add_neon:   752.0   459.2   862.2   553.9
After:
vp9_inv_dct_dct_16x16_sub1_add_neon:   226.5   145.0   225.1   171.8
vp9_inv_dct_dct_32x32_sub1_add_neon:   721.2   415.7   727.6   475.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:31:52 +02:00
Martin Storsjö 388e0d2515 aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter
No measured speedup on a Cortex A53, but other cores might benefit.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:08:50 +02:00
Martin Storsjö fea92a4b57 arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter
Before:                    Cortex A7      A8     A9     A53
vp9_put_8tap_smooth_4h_neon:   378.1   273.2  340.7   229.5
After:
vp9_put_8tap_smooth_4h_neon:   352.1   222.2  290.5   229.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:08:37 +02:00
Martin Storsjö 5e0c2158fb aarch64: vp9mc: Simplify the extmla macro parameters
Fold the field lengths into the macro.

This makes the macro invocations much more readable, when the
lines are shorter.

This also makes it easier to use only half the registers within
the macro.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:08:29 +02:00
Vittorio Giovara 53ea595eec mov: Rework stsc index validation
In order to avoid potential integer overflow change the comparison
and make sure to use the same unsigned type for both elements.
2017-02-10 16:26:16 -05:00
Vittorio Giovara ce6d72d107 imgutils: Document av_image_get_buffer_size() 2017-02-10 16:25:58 -05:00
Luca Barbato b6093e8c72 hlsenc: Correctly write down all 16 bytes in hex
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-10 14:12:16 +01:00
Martin Storsjö bc25897630 utvideodec: Add a missing include
This was missing from 77c23704c7, fixing building.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-10 09:31:49 +02:00
Timo Rothenpieler a52976c0fe nvenc: make gpu indices independent of supported capabilities
Do not allocate a CUDA context for every available gpu.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-09 23:29:32 +01:00
Derek Buitenhuis 77c23704c7 avcodec: Mark some codecs with threadsafe init as such
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-09 23:28:18 +01:00
Martin Storsjö 0c0b87f12d aarch64: vp9itxfm: Fix incorrect vertical alignment
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 23:57:06 +02:00
Martin Storsjö 8476eb0d3a aarch64: vp9itxfm: Update a comment to refer to a register with a different name
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 23:57:02 +02:00
Martin Storsjö 3dd7827258 aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 23:56:59 +02:00
Martin Storsjö ed8d293306 aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible
The ld1r is a leftover from the arm version, where this trick is
beneficial on some cores.

Use a single-lane load where we don't need the semantics of ld1r.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 23:56:54 +02:00
Martin Storsjö 4da4b2b87f aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 23:56:50 +02:00
Martin Storsjö 3933b86bb9 arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 23:56:44 +02:00
Martin Storsjö a63da4511d aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14740 bytes to 24292 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub8_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1387.4
vp9_inv_dct_dct_16x16_sub16_add_neon:   1387.6
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5198.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    5198.6
vp9_inv_dct_dct_32x32_sub8_add_neon:    5196.3
vp9_inv_dct_dct_32x32_sub12_add_neon:   6183.4
vp9_inv_dct_dct_32x32_sub16_add_neon:   6174.3
vp9_inv_dct_dct_32x32_sub20_add_neon:   7151.4
vp9_inv_dct_dct_32x32_sub24_add_neon:   7145.3
vp9_inv_dct_dct_32x32_sub28_add_neon:   8119.3
vp9_inv_dct_dct_32x32_sub32_add_neon:   8118.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:     640.8
vp9_inv_dct_dct_16x16_sub4_add_neon:     639.0
vp9_inv_dct_dct_16x16_sub8_add_neon:     842.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1388.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   1389.3
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    3685.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    3685.1
vp9_inv_dct_dct_32x32_sub8_add_neon:    3684.4
vp9_inv_dct_dct_32x32_sub12_add_neon:   5312.2
vp9_inv_dct_dct_32x32_sub16_add_neon:   5315.4
vp9_inv_dct_dct_32x32_sub20_add_neon:   7154.9
vp9_inv_dct_dct_32x32_sub24_add_neon:   7154.5
vp9_inv_dct_dct_32x32_sub28_add_neon:   8126.6
vp9_inv_dct_dct_32x32_sub32_add_neon:   8127.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:32:03 +02:00
Martin Storsjö 5eb5aec475 arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 12388 bytes to 19784 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    212.0    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2102.1   1521.7   1736.2   1265.8
vp9_inv_dct_dct_16x16_sub4_add_neon:    2104.5   1533.0   1736.6   1265.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2484.8   1828.7   2014.4   1506.5
vp9_inv_dct_dct_16x16_sub12_add_neon:   2851.2   2117.8   2294.8   1753.2
vp9_inv_dct_dct_16x16_sub16_add_neon:   3239.4   2408.3   2543.5   1994.9
vp9_inv_dct_dct_32x32_sub1_add_neon:     758.3    456.7    864.5    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10776.7   7949.8   8567.7   6819.7
vp9_inv_dct_dct_32x32_sub4_add_neon:   10865.6   8131.5   8589.6   6816.3
vp9_inv_dct_dct_32x32_sub8_add_neon:   12053.9   9271.3   9387.7   7564.0
vp9_inv_dct_dct_32x32_sub12_add_neon:  13328.3  10463.2  10217.0   8321.3
vp9_inv_dct_dct_32x32_sub16_add_neon:  14176.4  11509.5  11018.7   9062.3
vp9_inv_dct_dct_32x32_sub20_add_neon:  15301.5  12999.9  11855.1   9828.2
vp9_inv_dct_dct_32x32_sub24_add_neon:  16482.7  14931.5  12650.1  10575.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17589.5  15811.9  13482.8  11333.4
vp9_inv_dct_dct_32x32_sub32_add_neon:  18696.2  17049.2  14355.6  12089.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    1203.5    998.2   1035.3    763.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1203.5    998.1   1035.5    760.8
vp9_inv_dct_dct_16x16_sub8_add_neon:    1926.1   1610.6   1722.1   1271.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2873.2   2129.7   2285.1   1757.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   3221.4   2520.3   2557.6   2002.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     753.0    457.5    866.6    554.6
vp9_inv_dct_dct_32x32_sub2_add_neon:    7554.6   5652.4   6048.4   4920.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    7549.9   5685.0   6046.9   4925.7
vp9_inv_dct_dct_32x32_sub8_add_neon:    8336.9   6704.5   6604.0   5478.0
vp9_inv_dct_dct_32x32_sub12_add_neon:  10914.0   9777.2   9240.4   7416.9
vp9_inv_dct_dct_32x32_sub16_add_neon:  11859.2  11223.3   9966.3   8095.1
vp9_inv_dct_dct_32x32_sub20_add_neon:  15237.1  13029.4  11838.3   9829.4
vp9_inv_dct_dct_32x32_sub24_add_neon:  16293.2  14379.8  12644.9  10572.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17424.3  15734.7  13473.0  11326.9
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.3  17457.0  14298.6  12080.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:32:00 +02:00
Martin Storsjö 79d332ebbd aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function
This allows reusing the macro for a separate implementation of the
pass2 function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:31:56 +02:00
Martin Storsjö 47b3c2c18d arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function
This allows reusing the macro for a separate implementation of the
pass2 function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:31:53 +02:00
Martin Storsjö 115476018d aarch64: vp9itxfm: Make the larger core transforms standalone functions
This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
19496 to 14740 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub32_add_neon:   8095.7

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   1390.1
vp9_inv_dct_dct_32x32_sub4_add_neon:    5199.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8125.8

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:31:45 +02:00
Martin Storsjö 0331c3f5e8 arm: vp9itxfm: Make the larger core transforms standalone functions
This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
15324 to 12388 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub4_add_neon:    2063.4   1516.0   1719.5   1245.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3279.3   2454.5   2525.2   1982.3
vp9_inv_dct_dct_32x32_sub4_add_neon:   10750.0   7955.4   8525.6   6754.2
vp9_inv_dct_dct_32x32_sub32_add_neon:  18574.0  17108.4  14216.7  12010.2

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    2060.8   1608.5   1735.7   1262.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.2   2443.5   2546.1   1999.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10682.0   8043.8   8581.3   6810.1
vp9_inv_dct_dct_32x32_sub32_add_neon:  18522.4  17277.4  14286.7  12087.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:31:40 +02:00
Diego Biurrun c546147db0 configure: Correctly recurse in do_check_deps()
Fixes all sorts of configuration problems introducec by dad7a9c7c0
on non-Linux or non-vanilla configs. Also removes a line made redundant
in that commit.
2017-02-08 21:23:41 +01:00
Martin Storsjö 57ec83e424 omx: Use the EOS flag to handle flushing at the end
This avoids having to count the number of frames sent to the codec
and the number of output packets received; instead just wait until
the encoder returns a buffer with the EOS flag set.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-08 11:50:57 +02:00
Diego Biurrun dad7a9c7c0 configure: Rework dependency handling for conflicting components
This makes the feature more visible and obvious.
2017-02-07 19:06:04 +01:00
Diego Biurrun 9127ac5ebc configure: Add name parameter to require_pkg_config() helper function
This allows distinguishing between the internal variable name for
external libraries and the pkg-config package name. Having both
names available avoids special-casing outside the helper function
when the two identifiers do not match.
2017-02-07 19:06:02 +01:00
Diego Biurrun a25dac976a Use bitstream_init8() where appropriate 2017-02-07 18:27:21 +01:00
Diego Biurrun 71a49fe25f configure: Use cppflags check helper functions where appropriate 2017-02-06 15:43:56 +01:00
Diego Biurrun 0ce3761c78 configure: Add stdlib.h #include to CPPFLAGS check helper functions
This ensures that added CPPFLAGS are validated against libc headers.
2017-02-06 15:43:56 +01:00
Alexandra Hájková f7ec7f546f wma: Convert to the new bitstream reader 2017-02-06 15:13:34 +01:00
Martin Storsjö 58d87e0f49 aarch64: vp9itxfm: Restructure the idct32 store macros
This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.

This is also arguably more readable.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-05 13:05:32 +02:00
Martin Storsjö 3bc5b28d5a arm: vp9itxfm: Avoid .irp when it doesn't save any lines
This makes it more readable.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-05 12:59:19 +02:00
John Stebbins 8e67039c63 asfdec: Use the ASF stream count when iterating
The AVFormat stream count can be larger due external factors, such as
an id3 tag appended.

Avoid an out of bound read.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-04 15:21:12 +01:00
Diego Biurrun 7abdd026df asm: Consistently uppercase SECTION markers 2017-02-03 11:37:53 +01:00
Diego Biurrun 740b0bf03b build: Ignore generated .version files 2017-02-03 11:37:53 +01:00
Martin Storsjö 15a92e0c40 rtmp: Correctly handle the Window Acknowledgement Size packets
This swaps which field is set when the Window Acknowledgement Size
and Set Peer BW packets are received, renames the fields in
order to clarify their role further and adds verbose comments
explaining their respective roles and how well the code currently
does what it is supposed to.

The Set Peer BW packet tells the receiver of the packet (which
can be either client or server) that it should not send more data
if it already has sent more data than the specified number of bytes,
without receiving acknowledgement for them. Actually checking this
limit is currently not implemented.

In order to be able to check that properly, one can send the
Window Acknowledgement Size packet, which tells the receiver of the
packet that it needs to send Acknowledgement packets
(RTMP_PT_BYTES_READ) at least after receiving a given number of bytes
since the last Acknowledgement.

Therefore, when we receive a Window Acknowledgement Size packet,
this sets the maximum number of bytes we can receive without sending
an Acknowledgement; therefore when handling this packet we should set
the receive_report_size field (previously client_report_size).

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-03 09:27:41 +02:00