Using --sub-filter-regex-plain (default:no)
The ass-to-plaintext functionality already existed at sd_ass.c, but
it's internal and uses a private buffer type, so a trivial utility
wrapper was added with standard char*/bstr interface.
The plaintext can be multi-line, and the multi-line regexp flag is now
always set, but only affects plaintext (the ASS source is one line).
Pretty much identical to filter-regex but with JS expressions and
requires only JS support. Shares the filter-regex-* control options.
The target audience is Windows users - where filter-regex doesn't
work due to missing APIs, but mujs builds cleanly on Windows, and JS
is usually enabled in 3rd party Windows mpv builds.
Lua could have been used with similar effort, however, the JS regex
syntax is more extensive and also much more similar to POSIX.
Add two stand-alone function to help with the text-extraction task
which ass filters need. Makes it easier to add new filters without
cargo-culting this functionality.
Currently, on malformed event (which shouldn't happen), a warning is
printed when a filter tries to extract the text, so if few filters
are enabled, we'll get multiple warnings (like before) - not critical.
The regex filter now uses these utils, the SDH filter not yet.
This allows users to control whether full dialogue subtitles are displayed
with an audio track already in their preferred subtitle language.
Additionally, this improves handling for the forced flag, automatically
selecting between forced and unforced subtitle streams based on the user's
settings and the selected audio.
Options like --sub-ass-force-style and others could not be changed at
runtime (the changes didn't take any effect). Fix this by using the
brutal approach, and completely reinit the subtitle state when this
happens. Maybe a bit clunky, but for now I'd rather not put more effort
into this.
Fixes: #7689
See manpage additions. This was requested, sort of. Although what has
been requested might be something completely different. So this is
speculative.
This also changes sub_get_text() to return an allocated copy, because
the buffer shit was too damn messy.
Making OSD/subtitle bitmaps refcounted was planend a longer time ago,
e.g. the sub_bitmaps.packed field (which refcounts the subtitle bitmap
data) was added in 2016. But nothing benefited much from it, because
struct sub_bitmaps was usually stack allocated, and there was this weird
callback stuff through osd_draw().
Make it possible to get actually refcounted subtitle bitmaps on the OSD
API level. For this, we just copy all subtitle data other than the
bitmaps with sub_bitmaps_copy(). At first, I had planned some fancy
refcount shit, but when that was a big mess and hard to debug and just
boiled to emulating malloc(), I made it a full allocation+copy. This
affects mostly the parts array. With crazy ASS subtitles, this parts
array can get pretty big (thousands of elements or more), in which case
the extra alloc/copy could become performance relevant. But then again
this is just pure bullshit, and I see no need to care. In practice, this
extra work most likely gets drowned out by libass murdering a single
core (while mpv is waiting for it) anyway. So fuck it.
I just wanted this so draw_bmp.c requires only a single call to render
everything. VOs also can benefit from this, because the weird callback
shit isn't necessary anymore (simpler code), but I haven't done anything
about it yet. In general I'd hope this will work towards simplifying the
OSD layer, which is prerequisite for making actual further improvements.
I haven't tested some cases such as the "overlay-add" command. Maybe it
crashes now? Who knows, who cares.
In addition, it might be worthwhile to reduce the code duplication
between all the things that output subtitle bitmaps (with repacking,
image allocation, etc.), but that's orthogonal.
Works as ad-filter. I had some more plans, for example replacing
matching text with different text, but for now it's dropping matches
only. There's a big warning in the manpage that I might change
semantics. For example, I might turn it into a primitive sed.
In a sane world, you'd probably write a simple script that processes
downloaded subtitles before giving them to mpv, and avoid all this
complexity. But we don't live in a sane world, and the sooner you learn
this, the happier you will be. (But I also want to run this on muxed
subtitles.)
This is pretty straightforward. We use POSIX regexes, which are readily
available without additional pain or dependencies. This also means it's
(apparently) not available on win32 (MinGW). The regex list is because I
hate big monolithic regexes, and this makes it slightly better.
Very superficially tested.
Until now, filter_sdh was simply a function that was called by sd_ass
directly (if enabled).
I want to add another filter, so it's time to turn this into a somewhat
more general subtitle filtering infrastructure.
I pondered whether to reuse the audio/video filtering stuff - but better
not. Also, since subtitles are horrible and tend to refuse proper
abstraction, it's still messed into sd_ass, instead of working on the
dec_sub.c level. Actually mpv used to have subtitle "filters" and even
made subtitle converters part of it, but it was fairly horrible, so
don't do that again.
In addition, make runtime changes possible. Since this was supposed to
be a quick hack, I just decided to put all subtitle filter options into
a separate option group (=> simpler change notification), to manually
push the change through the playloop (like it was sort of before for OSD
options), and to recreate the sub filter chain completely in every
change. Should be good enough.
One strangeness is that due to prefetching and such, most subtitle
packets (or those some time ahead) are actually done filtering when we
change, so the user still needs to manually seek to actually refresh
everything. And since subtitle data is usually cached in ASS_Track (for
other terrible but user-friendly reasons), we also must clear the
subtitle data, but of course only on seek, since otherwise all subtitles
would just disappear. What a fucking mess, but such is life. We could
trigger a "refresh seek" to make this more automatic, but I don't feel
like it currently.
This is slightly inefficient (lots of allocations and copying), but I
decided that it doesn't matter. Could matter slightly for crazy ASS
subtitles that render with thousands of events.
Not very well tested. Still seems to work, but I didn't have many test
cases.
This is for easier use with the "delay_open" feature added in the
previous commit. The "null" codec is reported if the codec is unknown
(because the stream was not opened yet at time the tracks were added).
The rest of the timeline mechanism will set the correct codec at
runtime. But this means every time a delay-loaded track is selected, it
wants to initialize a decoder for the "null" codec.
Accept a "null" decoder. But since FFmpeg has no such codec, and out of
my own laziness, just let it fall back to "common" codecs that need no
other initialization data.
The change, in an earlier commit, in format for ass to handle results
in a different number of fields to skip. Correct that so SDH filtering
works.
Should fix issue #7188
It would always autodetect it based on the passed style block,
but as we are defining it - we might as well define it always.
(As far as I can see all decoders in libavcodec utilize 4+ style
blocks)
Remove them from the big MPOpts struct and move them to their sub
structs. In the places where their fields are used, create a private
copy of the structs, instead of accessing the semi-deprecated global
option struct instance (mpv_global.opts) directly.
This actually makes accessing these options finally thread-safe. They
weren't even if they should have for years. (Including some potential
for undefined behavior when e.g. the OSD font was changed at runtime.)
This is mostly transparent. All options get moved around, but most users
of the options just need to access a different struct (changing sd.opts
to a different type changes a lot of uses, for example).
One thing which has to be considered and could cause potential
regressions is that the new option copies must be explicitly updated.
sub_update_opts() takes care of this for example.
Another thing is that writing to the option structs manually won't work,
because the changes won't be propagated to other copies. Apparently the
only affected case is the implementation of the sub-step command, which
tries to change sub_delay. Handle this one explicitly (osd_changed()
doesn't need to be called anymore, because changing the option triggers
UPDATE_OSD, and updates the OSD as a consequence). The way the option
value is propagated is rather hacky, but for now this will do.
It was split at least across osd.c and sd_ass.c/sd_lavc.c. sd_lavc.c
actually ignored most of the more obscure subtitle timing things.
There's no reason for this - just move it all to dec_sub.c (mostly from
sd_ass.c, because it has some of the most complex stuff).
Now timestamps are transformed as they enter or leave dec_sub.c.
There appear to have been some subtle mismatches about how subtitle
timestamps were transformed, e.g. sd_functions.accepts_packet didn't
apply the subtitle speed to the timestamp. This patch should fix them,
although it's not clear if they caused actual misbehavior.
The semantics of SD_CTRL_SUB_STEP are slightly changed, which is the
reason for the changes in command.c and sd_lavc.c.
The OpenType Font File specification recommends that "Collection fonts
that use CFF or CFF2 outlines should have an .OTC extension." mpv
should accept .otc as a fallback extension for font detection should
the mimetype detection fail.
IETF RFC8081 added the "font" top-level media type,
including font/ttf, font/otf, font/sfnt, and also
font/collection. These font formats are all supported
by mpv/libass but they are not accepted as valid
Matroska mime types. mpv can load them via file extension
and they work as expected, so files using the new types
should not trigger a warning from mpv.
List of changes:
1. Rename `signfs` to `scale`, to better match what it actually does
(force --sub-scale to apply to ASS subtitles), and fix the blatantly
wrong documentation (it actually specifically does *not* apply to
signs)
2. Rename `--sub-ass-style-override` to `--sub-ass-override` to help
reduce confusion between it and `--sub-ass-force-style`, as well as
pointing out that it doesn't necessarily actually override styles.
(The new `scale` option, for example, only sets
ASS_OVERRIDE_BIT_FONT_SIZE, but not ASS_OVERRIDE_BIT_STYLE)
3. Mention that `--sub-ass-override` is generally sort of smart about
only overriding dialog, not signs.
All contributors of the code used for sd_ass.c agreed to the LGPL
relicensing. Some code has a very chaotic history, due to MPlayer
subtitle handling being awful, chaotic, and having been refactored a
dozen of times. Most of the subtitle code was actually rewritten from
scratch (a few times), and the initial sd_ass.c was pretty tiny. So we
should be fine, but it's still a good idea to look at this closely.
Potentially problematic cases of old code leaking into sd_ass.c are
mentioned below.
Some code originates from demux_mkv. Most of this was added by eugeni,
and later moved into mplayer.c or mpcommon.c. The old demux_mkv ASS/SSA
subtitle code is somewhat dangerous from a legal perspective, because it
involves 2 patches by a certain Tristan/z80, who disagreed with LGPL,
and who told us to "rewrite" parts we need. These patches were for
converting the ASS packet data to the old MPlayer text subtitle data
structures. None of that survived in the current code base.
Moving the subtitle handling out of demux_mkv happened in the following
commits: bdb6a07d2a, de73d4dd97, 61e4a80191. The code by
z80 was removed in b44202b69f.
At this time, the z80 code was located in mplayer.c and subreader.c.
The code was fully removed, being unnecessary due to the entire old
subtitle rendering code being removed. This adds a ass_to_plaintext(),
function, which replaces the old ASS tag stripping code in
sub_add_text(), which was based on the z80 code. The new function was
intended to strip ASS tags in a correct way, instead of somehow
dealing with other subtitle types (like HTML-style SRT tags), so it
was written from scratch.
Another potential issue is the --sub-fix-timing option is based on
-overlapsub added in d459e64463. But the implementation is new, and
no code from that commit is used in sd_ass.c. The new implementation
started out in 64b1374a44. (The following commit, bd45eb468c
removes the original code that was replaced.) The code was later
moved into sd_ass.c.
The --sub-fps option has a similar history.
Add subtitle filter to remove additions for deaf or hard-of-hearing
(SDH). This is for English, but may in part work for others too.
This is an ASS filter and the intention is that it can always be
enabled as it by default do not remove parts that may be normal text.
Harder filtering can be enabled with an additional option.
Signed-off-by: wm4 <wm4@nowhere>
This means the subtitles will show as "intended".
For some weird reason, --sub-ass-style-override is the option that
controls style override, which implies it's specific to ASS. While that
seems weird and doesn't always reflect reality, I don't care about that
now.
To make it easier for the eyes, multi line subtitles should
be left justified (for most languages).
This adds an option to define how subtitles are to be justified
inpendently of how they are aligned.
Also add option to enable --sub-justify to be applied on ASS subtitles.
Rename the text subtitle options from --sub-text- to --sub-
and --ass- options to --sub-ass-.
The intention is to common sub options to prefixed --sub-
and special ASS option be seen as a special version of sub options.
The OSD options that work like the --sub- options are still named
--osd-.
Man page updated including a short note about renamed --sub-text-*
and --ass-* options to --sub-* and --sub-ass-*.
Secondary subtitle streams (to be shown on the top of the screen along
main subtitle stream) were shown with normal alignment. This is because
we tell libass to override the alignment style (a relatively recent
change, see commit 2f1eb49e). This would behave differently with old
libass versions too.
To escape the mess, just set the alignment explicitly with an override
tag instead of modifying the style.
This is a bug fix, and the text alignment functionality probably got
lost sometime along the way.
For ASS subtitles, this could have unintended consequences, so it's hard
to get right - thus it's not applied to ASS subtitles.
For other text subtitles, this should be fine, though. It still works on
ASS subtitles as promised by the manpage if --no-sub-ass is used.
This has two reasons:
1. I tend to add new fields to this metadata, and every time I've done
so I've consistently forgotten to update all of the dozens of places in
which this colorimetry metadata might end up getting used. While most
usages don't really care about most of the metadata, sometimes the
intend was simply to “copy” the colorimetry metadata from one struct to
another. With this being inside a substruct, those lines of code can now
simply read a.color = b.color without having to care about added or
removed fields.
2. It makes the type definitions nicer for upcoming refactors.
In going through all of the usages, I also expanded a few where I felt
that omitting the “young” fields was a bug.
The intention is to let mp_ass_packer_pack() produce different output
for the RGBA and LIBASS formats. VOs (or whatever generates the OSD)
currently do not signal a preferred format, and this mechanism just
exists to switch between RGBA and LIBASS formats correctly, preferring
LIBASS if the VO supports it.
Change all producer of libass images to packing the bitmaps into a
single larger bitmap directly when they're output. This is supposed to
help working towards refcounted sub bitmaps.
This will reduce performance for VOs like vo_xv, but not for vo_opengl.
vo_opengl simply will pick up the pre-packed sub bitmaps, and skip
packing them again. vo_xv will copy and pack the sub bitmaps
unnecessarily - but if we want sub bitmap refcounting, they'd have to be
copied anyway.
The packing code cannot be removed yet from vo_opengl, because there are
certain corner cases that still produce unpackad other sub bitmaps.
Actual refcounting will also require more work.
libass's ass_process_chunk expects long long int for the timecode and
durations arguments, thus should use llrint instead of lrint.
This does not cause any problems on most platforms, but on cygwin, it
causes strange subtitle behaviour, like subtitles not showing, getting
stuck or old subtitles showing at the same time as new subtitles.
--sub-ass=no / --ass=no still work, but --ass-style-override=strip is
preferred now. With this change, --ass-style-override can control all
the types of style overriding.
Instead of passing an explicit cache to the function, the res parameter
is used. Also, instead of replacing its contents, sub bitmaps are now
appended to it (all assuming the format doesn't actually change).
This is preparation for the following commits.
Subtitles can be preloaded, which means they're fully read and copied
into ASS_Track. This in turn is mainly for the sake of being able to do
subtitle seeking (when it comes down to it, subtitle seeking is the
cause for most trouble here).
Commit a714f8e92 broke preloaded subtitles which have events with
unknown duration, such as some MicroDVD samples. The event list gets
cleared on every seek, so the property of being preloaded obviously gets
lost.
Fix this by moving most of the preloading logic to dec_sub.c. If the
subtitle list gets cleared, they are not considered preloaded anymore,
and the logic for demuxed subtitles is used.
As another minor thing, preloadeding subtitles did neither disable the
demux stream, nor did it discard packets. Thus you could get queue
overflows in theory (harmless, but annoying). Fix this by explicitly
discarding packets in preloaded mode.
In summary, now the only difference between preloaded and normal
demuxing are:
1. a seek is issued, and all packets are read on start
2. during playback, discard the packets instead of feeding them to the
subtitle decoder
This is still petty annoying. It would be nice if maintaining the
subtitle index (and maybe a subtitle packet cache for instant subtitle
presentation when seeking back) could be maintained in the demuxer
instead. Half of all file formats with interleaved subtitles have
this anyway (mp4, mkv muxed with newer mkvmerge).
Although there is logic to prune subtitles as soon as they get too old
in this mode, this is not done for the _currently_ shown subtitles. Thus
explicitly clearing subtitles on seek is required to avoid duplicate
subtitles in certain cases when seeking.
Deals with broken mkv subtitle tracks generated by tvheadend. The subs
are srt, but without packet durations.
We need this logic for CCs anyway. CCs in particular will be unaffected
by this change because they are also marked with unknown duration. It
could be that there are actual demuxers outputting CCs - in this case,
we rely on the fact that they don't set a (meaningless) packet duration
(or we'd have to work that around).
Commit 8d4a179c made subtitle decoders pick up fonts strictly from the
same source file (i.e. the same demuxer).
It breaks some fucked up use-case, and 2 people on this earth complained
about the change because of this. Add it back.
This copies all attached fonts on each subtitle init. I considered
converting attachments to use refcounting, but it'd probably be much
more complex.
Since it's slightly harder to get a list of active demuxers with
duplicate removed, the prev_demuxer variable serves as a hack to achieve
almost the same thing, except in weird corner cases. (In which fonts
could be added twice.)
This is mainly a refactor. I'm hoping it will make some things easier
in the future due to cleanly separating codec metadata and stream
metadata.
Also, declare that the "codec" field can not be NULL anymore. demux.c
will set it to "" if it's NULL when added. This gets rid of a corner
case everything had to handle, but which rarely happened.