Commit Graph

18 Commits

Author SHA1 Message Date
Dudemanguy 9bf4b9d5d4 filter_sdh: optimize get_char_bytes
strlen is only relevant if the length is less than [1, 4], so this can
be replaced with strnlen instead which will only traverse characters
upto the maxlen insted of the entire string length. It also makes MPMIN
unneeded. Also fix a comment.
2024-01-15 16:05:17 +00:00
Dudemanguy 2dd3951a9c filter_sdh: fix incorrect placement of null terminator
The +1 here is not correct. For a 4-byte unicode character, this would
throw a runtime error because the +1 would try to assign the null
terminator past the actual bound of our array. Just remove it since it
should be exactly equal to whatever we have for bytes.
2024-01-12 20:46:00 -06:00
Dudemanguy e15b2b19a3 filter_sdh: sanitize get_char_bytes heuristic to avoid overflow
There's a simple check in filter_sdh that gets the bytes of the first
character in a string in order to do pointer arthimetic to filter the
string. The problem is that it is possible for the amount of bytes to be
greater than the actual length of the string for certain unicode
characters. This can't be worked with so enforce the strlen as the
absolute minimum here to avoid overflow situations.

Fixes #13237.
2024-01-12 20:45:40 -06:00
Dudemanguy 443c2487d7 filter_sdh: add full width parentheses to the enclosures string
Since these are technically parentheses, we'll treat them the same way
as normal parenthesis. Fixes #11155.
2023-12-08 18:14:06 +00:00
Dudemanguy ce958b7742 filter_sdh: add --sub-filter-sdh-enclosures option
This filter is a bit complicated, but one of the essential parts of it
is removing text enclosed by particular set of characters (e.g. text
inbetween []). This was previously hardcoded to only take into account
parenthesis and brackets, but people may want to filter more things so
make this customizable. The option only takes "left hand characters" so
the right pair is mapped internally if applicable. If not, then we just
use the same character. Fixes #8268 since the unicode character in
question can just be passed to this option.
2023-12-08 18:14:06 +00:00
Dudemanguy b7d85f0d4a filter_sdh: combine skip_bracketed and skip_parenthesized
These two functions are almost exactly the same. The parenthesis variant
is essentially just a special case with more conditions to not remove
text. These can easily be combined together into one generic
skip_enclosed function to handle both cases. We also use char * instead
of char for the character comparison here since not everything is
neccesarily 1 byte and can fit into a char. This will be useful for the
following commits where we extend this logic further.
2023-12-08 18:14:06 +00:00
Alexander Seiler bdf7b5c3b8 various: fix various typos in the code base
Signed-off-by: Alexander Seiler <seileralex@gmail.com>
2023-03-28 19:29:44 +00:00
Guido Cella fe9e074752 various: remove trailing whitespace 2022-05-14 14:51:34 +00:00
Cœur bb5b4b1ba6 various: fix typos 2022-04-25 09:07:18 -04:00
Avi Halachmi (:avih) d82a073069 sub: SDH filter: use unified text-extraction utils 2021-08-05 21:32:22 +03:00
Avi Halachmi (:avih) 62704e2315 sub: SDH filter: small refinements
1. On a pathological case where event_format is NULL, previously the
   filter was trying to use it with each new sub - and re-failed. Now
   the filter gets disabled on init (event_format doesn't change).

2. Previously, if the filter didn't modify the text or if the text
   could not be extracted - it still allocated a new packet with same
   content. Now it returns the original, saving a whole lot of no-ops
   (there are still few allocations in this case though).

1 above is preparation for the next commit, and 2 was trivial, but
there's more to do if anyone cares (NIH string functions instead of
bstr, unused arguments, messages could be improved, and more).
2021-08-05 21:32:22 +03:00
wm4 0b35b4c917 sub: make filter_sdh a "proper" filter, allow runtime changes
Until now, filter_sdh was simply a function that was called by sd_ass
directly (if enabled).

I want to add another filter, so it's time to turn this into a somewhat
more general subtitle filtering infrastructure.

I pondered whether to reuse the audio/video filtering stuff - but better
not. Also, since subtitles are horrible and tend to refuse proper
abstraction, it's still messed into sd_ass, instead of working on the
dec_sub.c level. Actually mpv used to have subtitle "filters" and even
made subtitle converters part of it, but it was fairly horrible, so
don't do that again.

In addition, make runtime changes possible. Since this was supposed to
be a quick hack, I just decided to put all subtitle filter options into
a separate option group (=> simpler change notification), to manually
push the change through the playloop (like it was sort of before for OSD
options), and to recreate the sub filter chain completely in every
change. Should be good enough.

One strangeness is that due to prefetching and such, most subtitle
packets (or those some time ahead) are actually done filtering when we
change, so the user still needs to manually seek to actually refresh
everything. And since subtitle data is usually cached in ASS_Track (for
other terrible but user-friendly reasons), we also must clear the
subtitle data, but of course only on seek, since otherwise all subtitles
would just disappear. What a fucking mess, but such is life. We could
trigger a "refresh seek" to make this more automatic, but I don't feel
like it currently.

This is slightly inefficient (lots of allocations and copying), but I
decided that it doesn't matter. Could matter slightly for crazy ASS
subtitles that render with thousands of events.

Not very well tested. Still seems to work, but I didn't have many test
cases.
2020-02-16 02:07:24 +01:00
Dan Oscarsson eb1d50ba20 sub: enhance SDH filtering
It is not uncommon with a speaker label with [xxxx] inside.
They should also be filtered out.
2020-02-09 16:18:34 +01:00
zc62 b9cde8d95c sub: recognize UTF-8 characters in SDH subtitle filter
Only printable ASCII characters were considered to be valid texts. Make
it possible that UTF-8 contents are also considered valid.

This does not make the SDH subtitle filter support non-English
languages. This just prevents the filter from blindly marking lines that
have only UTF-8 characters as empty.

Fixes #6502
2019-03-02 02:05:58 +01:00
wm4 f4cb4fb962 filter_sdh: remove pointless set_pos function
This change was requested during patch review, but apparently it was
overlooked on merge.
2017-04-20 08:22:46 +02:00
wm4 19db7a79d4 filter_sdh: change license to LGPL 2017-04-20 08:19:22 +02:00
Dan Oscarsson 1a0c340a5e sub: minor sdh filter fixes
When doing harder filtering not require a space after : results
in lines with a clock (like 10:05) to be taken as a speaker label.
So require a space after : even when doing harder filtering as
missing space is very uncommon.
Some like to add text in parentheses in the speaker label,
like XXX (loud):  or just (loud):
allow parentheses when doing harder filtering
2017-04-15 16:38:38 +02:00
Dan Oscarsson 5b75142a1d sub: add SDH subtitle filter
Add subtitle filter to remove additions for deaf or hard-of-hearing
(SDH). This is for English, but may in part work for others too.
This is an ASS filter and the intention is that it can always be
enabled as it by default do not remove parts that may be normal text.
Harder filtering can be enabled with an additional option.

Signed-off-by: wm4 <wm4@nowhere>
2017-03-25 15:04:05 +01:00