This has been a long standing annoyance - ffmpeg is removing
sizeof(AVPacket) from the API which means you cannot stack-allocate
AVPacket anymore. However, that is something we take advantage of
because we use short-lived AVPackets to bridge from native mpv packets
in our main decoding paths.
We don't think that switching these to `av_packet_alloc` is desirable,
given the cost of heap allocation, so this change takes a different
approach - allocating a single packet in the relevant context and
reusing it over and over.
That's fairly straight-forward, with the main caveat being that
re-initialising the packet is unintuitive. There is no function that
does exactly what we need (what `av_init_packet` did). The closest is
`av_packet_unref`, which additionally frees buffers and side-data.
However, we don't copy those things - we just assign them in from our
own packet, so we have to explicitly clear the pointers before calling
`av_packet_unref`. But at least we can make a wrapper function for
that.
The weirdest part of the change is the handling of the vtt subtitle
conversion. This requires two packets, so I had to pre-allocate two in
the context struct. That sounds excessive, but if allocating the
primary packet is too expensive, then allocating the secondary one for
vtt subtitles must also be too expensive.
This change is not conditional as heap allocated AVPackets were
available for years and years before the deprecation.
This allows users to control whether full dialogue subtitles are displayed
with an audio track already in their preferred subtitle language.
Additionally, this improves handling for the forced flag, automatically
selecting between forced and unforced subtitle streams based on the user's
settings and the selected audio.
Seems like this is requested all the time.
It seems libass allows out of range values, but does allows the subtitle
to go out of the screen at the bottom (only when moving it to the top
it's "clamped"). Too bad, don't do that then. The bitmap sub rendering
code on the other hand is under our control, and will not move a
subtitle out of the screen.
Fixes: #7986
The OSD is passed to VOs via struct sub_bitmaps, which has a change_id
field. This field is incremented whenever there is a (potential) change
to the other struct contents. If not, the VO can rely on it not having
changed. This must include for example sub_bitmap.x and sub_bitmap.dw.
If these two fields (and y equivalents) change, change_id must change,
even if the subtitle bitmap data might still be the same.
sd_lavc.c stopped respecting this at some unknown point. It could
sometimes cause problems, though usually only with bad and old VOs which
somehow relied on this more than vo_gpu. (I've actually encountered this
before with sd_lavc subtitle scaling, as indicated by a nasty comment,
though probably didn't track this down, since said old VOs can die in a
fire.)
Fix this by maintaining the change_id explicitly. Unfortunately adds
even more code. Instead of comparing the result we could track property
changes, but I think this is better. The number of parts is always very
low with this subtitle decoder, so there's no actual performance issue
to worry about.
This could be triggered by scaling changes (video-zoom etc.), but
probably also changing bitmap subtitle position or scaling.
Making OSD/subtitle bitmaps refcounted was planend a longer time ago,
e.g. the sub_bitmaps.packed field (which refcounts the subtitle bitmap
data) was added in 2016. But nothing benefited much from it, because
struct sub_bitmaps was usually stack allocated, and there was this weird
callback stuff through osd_draw().
Make it possible to get actually refcounted subtitle bitmaps on the OSD
API level. For this, we just copy all subtitle data other than the
bitmaps with sub_bitmaps_copy(). At first, I had planned some fancy
refcount shit, but when that was a big mess and hard to debug and just
boiled to emulating malloc(), I made it a full allocation+copy. This
affects mostly the parts array. With crazy ASS subtitles, this parts
array can get pretty big (thousands of elements or more), in which case
the extra alloc/copy could become performance relevant. But then again
this is just pure bullshit, and I see no need to care. In practice, this
extra work most likely gets drowned out by libass murdering a single
core (while mpv is waiting for it) anyway. So fuck it.
I just wanted this so draw_bmp.c requires only a single call to render
everything. VOs also can benefit from this, because the weird callback
shit isn't necessary anymore (simpler code), but I haven't done anything
about it yet. In general I'd hope this will work towards simplifying the
OSD layer, which is prerequisite for making actual further improvements.
I haven't tested some cases such as the "overlay-add" command. Maybe it
crashes now? Who knows, who cares.
In addition, it might be worthwhile to reduce the code duplication
between all the things that output subtitle bitmaps (with repacking,
image allocation, etc.), but that's orthogonal.
A mkv sample file was provided to me, which contained a moving PGS
subtitle track, with the same track rendered into the video as
reference. The subtitle track appeared to stutter (while the video one
was smooth). It turns out this was a timestamp rounding issue in mpv.
The subtitle timestamps in the file match the video ones exactly.
They're the same within the mpv demuxer too. Unfortunately, the
conversion from and to libavcodec timestamps is lossy, because mpv uses
a non-integer timebase, while libavcodec supports integers only. See
mp_pts_to_av() and mp_pts_from_av(). The recovered timestamp is almost
the same, but is off by a very minor part. As a result, the timestamps
won't compare equal, and if that happens, display of the subtitle frame
is skipped. Subtitle timestamps don't go through this conversion
because... libavcodec is special? The libavcodec subtitle API is
special.
Fix this by giving it a microsecond of slack. This is basically as if we
used an internal microseconds integer timebase, but only for the purpose
of image subtitle display.
The same could happen to sd_ass, except in practice it doesn't. ASS
subtitles (well, .ass files) inherently use a timebase incompatible to
video, so to ensure frame exactness, ASS timestamps are usually set to
slightly before the video frame's.
Discussion of better solutions:
One could rewrite mpv not to use float timestamps. You'd probably pick
some integer timebase instead (like microseconds), which would avoid the
libavcodec interop issue. At the very least this would be a lot of work.
It would be interesting to know whether the rounding in ther mpv<->lavc
timestamp conversion could be fixed to round-trip in this case. The
conversion tries to avoid problems by using the source timebase (e.g.
milliseconds with mkv). But in general some rounding is unavoidable,
because something between decoder and lowest demuxer layer could
transform the timestamps.
One could extend libavcodec to attach arbitrary information to avpacket
and return it in the resulting avframe. To some degree, such a mechanism
already exists (side data). But there are certain problems that make
this unfeasible and broken.
One could pass through exact mpv float timestamps by reinterpret-casting
them to int64_t, the FFmpeg timestamp type. Actually mpv used to do
this. But there were problems, such as FFmpeg (or things used by FFmpeg)
wanting to interpret the timestamps. Awful shit that make mpv change to
the current approach.
There's probably more but I'm getting bored. With some luck I wasted
precious seconds of your life with my nonsense.
Libav seems rather dead: no release for 2 years, no new git commits in
master for almost a year (with one exception ~6 months ago). From what I
can tell, some developers resigned themselves to the horrifying idea to
post patches to ffmpeg-devel instead, while the rest of the developers
went on to greener pastures.
Libav was a better project than FFmpeg. Unfortunately, FFmpeg won,
because it managed to keep the name and website. Libav was pushed more
and more into obscurity: while there was initially a big push for Libav,
FFmpeg just remained "in place" and visible for most people. FFmpeg was
slowly draining all manpower and energy from Libav. A big part of this
was that FFmpeg stole code from Libav (regular merges of the entire
Libav git tree), making it some sort of Frankenstein mirror of Libav,
think decaying zombie with additional legs ("features") nailed to it.
"Stealing" surely is the wrong word; I'm just aping the language that
some of the FFmpeg members used to use. All that is in the past now, I'm
probably the only person left who is annoyed by this, and with this
commit I'm putting this decade long problem finally to an end. I just
thought I'd express my annoyance about this fucking shitshow one last
time.
The most intrusive change in this commit is the resample filter, which
originally used libavresample. Since the FFmpeg developer refused to
enable libavresample by default for drama reasons, and the API was
slightly different, so the filter used some big preprocessor mess to
make it compatible to libswresample. All that falls away now. The
simplification to the build system is also significant.
Do what we do best in multimedia: add conflicting hacks on top of other
hacks, that fix a single sample, and may break other ones.
In this case, it only happens if the file is most likely already broken
(subtitle bounding boxes go outside of the subtitle "canvas"), so it's
OK. The file still looks broken (and, in fact, the file is completely
fucking broken), but you can see the subtitles.
But in summary, this is not actually something I should have bothered
about.
I noticed that MPlayer shows the subtitles "correctly", but this is only
because they have a hack that extends subtitles with small resolution to
a larger hardcoded resolution. This hack was removed from mpv, because
it broke some completely legitimate files. As another really funny fact,
MPlayer's default video output (vdpau) appears to display this file
correctly, but only because it handles narrow aspect ratios (that extend
the height instead of the width) incorrectly. It extends the height, but
leaves the video with 1:1 aspect ratio at the top. It seems to repeat
the last video line. (-vo xv and -vo gl show it correctly, i.e.
stretched like mpv, by the way.) For some reason, the sample file at
hand is extended with black, so the subtitles are rendered into a black
area below the video, which is almost reasonable. So, MPlayer may
display this file "correctly", but in fact it only happens to do so
because of 1 hack that breaks legitimate files, and 1 bug. What the
fuck.
Fixes: #7218 (sort of)
Simple enough to do. May have mixed results. Typically, bitmap subtitles
will have a tight bounding box around the rendered text. But if for
example there is text on the top and bottom, it may be a single big
bitmap with a large transparent area between top and bottom. In
particular, DVD subtitles are really just a single screen-sized
RLE-encoded bitmap, though libavcodec will crop off transparent areas.
Like with sd_ass, you can't move subtitles _down_ if they are already in
their origin position. This could probably be improved, but I don't want
to deal with that right now.
UB-sanitizer complains that we shift bits into the sign (when a is
used). Change it to unsigned, which in theory is more correct and
silences the warning.
Doesn't matter in practice, both the "bug" and the fix have 0 impact.
Remove them from the big MPOpts struct and move them to their sub
structs. In the places where their fields are used, create a private
copy of the structs, instead of accessing the semi-deprecated global
option struct instance (mpv_global.opts) directly.
This actually makes accessing these options finally thread-safe. They
weren't even if they should have for years. (Including some potential
for undefined behavior when e.g. the OSD font was changed at runtime.)
This is mostly transparent. All options get moved around, but most users
of the options just need to access a different struct (changing sd.opts
to a different type changes a lot of uses, for example).
One thing which has to be considered and could cause potential
regressions is that the new option copies must be explicitly updated.
sub_update_opts() takes care of this for example.
Another thing is that writing to the option structs manually won't work,
because the changes won't be propagated to other copies. Apparently the
only affected case is the implementation of the sub-step command, which
tries to change sub_delay. Handle this one explicitly (osd_changed()
doesn't need to be called anymore, because changing the option triggers
UPDATE_OSD, and updates the OSD as a consequence). The way the option
value is propagated is rather hacky, but for now this will do.
It was split at least across osd.c and sd_ass.c/sd_lavc.c. sd_lavc.c
actually ignored most of the more obscure subtitle timing things.
There's no reason for this - just move it all to dec_sub.c (mostly from
sd_ass.c, because it has some of the most complex stuff).
Now timestamps are transformed as they enter or leave dec_sub.c.
There appear to have been some subtle mismatches about how subtitle
timestamps were transformed, e.g. sd_functions.accepts_packet didn't
apply the subtitle speed to the timestamp. This patch should fix them,
although it's not clear if they caused actual misbehavior.
The semantics of SD_CTRL_SUB_STEP are slightly changed, which is the
reason for the changes in command.c and sd_lavc.c.
This API isn't deprecated (yet?), but it's still inferior and harder to
use than avcodec_free_context().
Leave the call only in 1 case in af_lavcac3enc.c, where we apparently
seriously close and reopen the encoder for whatever reason.
All contributors have agreed.
Compared to sd_ass.c, this has a pretty simple history:
av_sub.c -> sub/av_sub.c -> sub/sd_lavc.c
At one point, some code from spudec.c was added to it, but it was
removed again later.
This core of this heuristic was once copied from MPlayer's spudec.c. I
think it was meant for the case when the resolution field was missing or
so.
I couldn't find a file for which this actually does something. On the
other hand, there are samples which actually have a smaller resolution
than 720x576, and which are broken by this old hack.
For subtitles that set no resolution (I'm not sure which codec/container
that would be), there's still the fallback on video resolution.
Just get rid of this hack. Also cleanup a bit. SD_CTRL_GET_RESOLUTION
hasn't been used since DVD menu removal. get_resolution() is left with 1
call site, and would be quite awkward to keep, so un-inline it.
Whitelisting supported codecs is (probably) still better than just
allowing everything, given the weird FFmpeg API. I'm also assuming
Libav doesn't even have the codec ID, but I didn't check.
Also add a --teletext-page option, since otherwise it decodes every
teletext page and shows them in succession.
And yes, we can't use av_opt_set_int() - instead we have to set it as
string. Because FFmpeg's option system is terrible.
Like it's done for audio and video. Just to be uniform.
I'm sorry for deleting the anti-ffmpeg vitriol. It's still all true, but
since we decided to always set the timebase, the crappiness is isolated
to FFmpeg internals.
The accepts_packet packet callback is supposed to deal with subtitle
decoders which have only a small queue of current subtitle events (i.e.
sd_lavc.c), in case feeding it too many packets would discard events
that are still needed.
Normally, the number of subtitles that need to be preserved is estimated
by the rendering pts (get_bitmaps() argument). Rendering lags behind
decoding, so normally the rendering pts is smaller than the next video
frame pts, and we simply discard all subtitle events until the rendering
pts.
This breaks down in some annoying corner cases. One of them is seeking
backwards: the VO will still try to render the old PTS during seeks,
which passes a high PTS to the subtitle renderer, which in turn would
discard more subtitles than it should. There is a similar issue with
forward seeks. Add hacks to deal with those issues.
There should be a better way to deal with the essentially unknown
"rendering position", which is made worse by screenshots or rendering
with vf_sub. At the very least, we could handle seeks better, and e.g.
either force the VO not to re-render subs after seeks (ugly), or
introduce seek sequence numbers to distinguish attempts to render
earlier subtitles when a seek is done.
The intention is to let mp_ass_packer_pack() produce different output
for the RGBA and LIBASS formats. VOs (or whatever generates the OSD)
currently do not signal a preferred format, and this mechanism just
exists to switch between RGBA and LIBASS formats correctly, preferring
LIBASS if the VO supports it.
Since there are not many sub-rectangles, this doesn't cost too much. On
the other hand, it avoids frequent warnings with vo_xv.
Also, the second copy in mp_blur_rgba_sub_bitmap() can be dropped.
The previous few commits changed sd_lavc.c's output to packed RGB sub-
images. In particular, this means all sub-bitmaps are part of a larger,
single bitmap. Change the vo_opengl OSD code such that it can make use
of this, and upload the pre-packed image, instead of packing and copying
them again.
This complicates the upload code a bit (4 code paths due to messy PBO
handling). The plan is to make sub-bitmaps always packed, but some more
work is required to reach this point. The plan is to pack libass images
as well. Since this implies a copy, this will make it easy to refcount
the result.
(This is all targeted towards vo_opengl. Other VOs, vo_xv, vo_x11, and
vo_wayland in particular, will become less efficient. Although at least
vo_vdpau and vo_direct3d could be switched to the new method as well.)
The sub-bitmaps get extended by --sub-gauss, so we have to compute the
bounding box on the original subs. Not sure if this is really
eqauivalent to what the code did before, and I don't have the sample
anymore. (But this approach sure is a _shitty_ hack.)
Implement it directly in sd_lavc.c as well. Blurring requires extending
the size of the sub-images by the blur radius. Since we now want
sub_bitmaps to be packed into a single image, and we don't want to
repack for blurring, we add some extra padding to each sub-bitmap in the
initial packing, and then extend their size later. This relies on the
previous bitmap_packer commit, which always adds the padding in all
cases.
Since blurring is now done on parts of a large bitmap, the data pointers
can become unaligned, depending on their position. To avoid shitty
libswscale printing a dumb warning, allocate an extra image, so that the
blurring pass is done on two newly allocated images. (I don't find this
feature important enough to waste more time on it.)
The previous refactor accidentally broke this feature due to a logic bug
in osd.c. It didn't matter before it happened to break, and doesn't
matter now since the code paths are different.
Until now, subtitle renderers could export SUBBITMAP_INDEXED, which is a
8 bit per pixel with palette format. sd_lavc.c was the only renderer
doing this, and the result was converted to RGBA in every use-case
(except maybe when the subtitles were hidden.)
Change it so that sd_lavc.c converts to RGBA on its own. This simplifies
everything a bit, and the palette handling can be removed from the
common code.
This is also preparation for making subtitle images refcounted. The
"caching" in img_convert.c is a PITA in this respect, and needs to be
redone. So getting rid of some img_convert.c code is a positive side-
effect. Also related to refcounted subtitles is packing them into a
single mp_image. Fewer objects to refcount is easier, and for the libass
format the same will be done. The plan is to remove manual packing from
the VOs which need single images entirely.
Older ffmpeg releases don't have ffmpeg git commit
50401f5fb7d778583b03a13bc4440f71063d319d, which fixes ffmpeg's
pkt_timebase check to reject its default "unset" timebase as invalid.
The consequence was that all non-PGS bitmap subtitle timestamps were
forced to 0.
Of course this hit _only_ shitty distros using outdated/badly maintained
ffmpeg releases, so this is not worth working around. I've already
wasted a lot of time on analyzing this dumb issue, and it could be
useful for bisecting, so don't drop pre-3.0 ffmpeg just yet.
Fixes#3109.
There is an obscure feature which requires essentially reordering PTS
from different packets.
Unfortunately, libavcodec introduced a ridiculously shitty API for
this, which works very much unlike the audio/video API. Instead of
simply passing through the PTS, it wants to fuck with it for no reason,
and even worse, fucks with other fields and changes their semantivcs
(??????). This affects AVSubtitle.end_display_time. This probably will
cause issues for us, and I have no desire to find out whether it will.
Since only PGS requires this, and it happens not to use
end_display_time, do it for PGS only.
Fixes#3016.
Basically, this information is useless, because some muxers (hurr
libavformat) write bogus information anyway. This means if we e.g. see
PGS packets in mkv with duration explicitly set to 0, we must not trust
that value anyway. (The FFmpeg API problem is leaking into files, how
nice.)
This is mainly a refactor. I'm hoping it will make some things easier
in the future due to cleanly separating codec metadata and stream
metadata.
Also, declare that the "codec" field can not be NULL anymore. demux.c
will set it to "" if it's NULL when added. This gets rid of a corner
case everything had to handle, but which rarely happened.
Just simplify by removing parts not needed anymore. This includes
merging dec_sub allocation and initialization (since things making
initialization complicated were removed), or format support queries (it
simply tries to create a decoder, and if that fails, tries the next
one).
MPlayer traditionally always used the display aspect ratio, e.g. 16:9,
while FFmpeg uses the sample (aka pixel) aspect ratio.
Both have a bunch of advantages and disadvantages. Actually, it seems
using sample aspect ratio is generally nicer. The main reason for the
change is making mpv closer to how FFmpeg works in order to make life
easier. It's also nice that everything uses integer fractions instead
of floats now (except --video-aspect option/property).
Note that there is at least 1 user-visible change: vf_dsize now does
not set the display size, only the display aspect ratio. This is
because the image_params d_w/d_h fields did not just set the display
aspect, but also the size (except in encoding mode).
It was stupid. The only thing that still effectively used it was
sd_lavc_conv - all other "filters" were the subtitle decoder/renderers
for text (sd_ass) and bitmap (sd_lavc) subtitles.
While having a subtitle filter chain was interesting (and actually
worked in almost the same way as the audio/video ones), I didn't
manage to use it in a meaningful way, and I couldn't e.g. factor
secondary features like fixing subtitle timing into filters.
Refactor the shit and drop unneeded things as it goes.