ytdl_hook: fix url_is_safe to match URL protocols properly

Some youtube_dl extractors retrieve URLs which contain other URLs
inside of them, for example Funimation, like this:
https://example.com/video?parameter=https://example.net/something

The url_is_safe function uses a pattern to match the protocol at the
start of the URL. Before this commit, this pattern was not compliant
with the URL spec (see the definition of "A URL-scheme string"):
https://url.spec.whatwg.org/#url-writing
Therefore it would match any characters, including "://", until the
last occurence of "://" in the string. Thus the above URL would match
https://example.com/video?parameter=https
which is not in safe_protos so the video will not play.

Now the protocol can only start with a letter and only contain
alphanumerics, "." "+" or "-" as the spec says, so it will only match
the first protocol in the URL ("https" in the above example.)
Previously the URL also had to contain "//" after the ":". Data URLs
do not contain "//": https://datatracker.ietf.org/doc/html/rfc2397
so now the pattern does not look for "//", only ":".
This commit is contained in:
George Brooke 2022-03-02 17:03:51 +00:00 committed by avih
parent b1fb4b783b
commit 1a3e85ec33
1 changed files with 1 additions and 1 deletions

View File

@ -153,7 +153,7 @@ local function edl_escape(url)
end end
local function url_is_safe(url) local function url_is_safe(url)
local proto = type(url) == "string" and url:match("^(.+)://") or nil local proto = type(url) == "string" and url:match("^(%a[%w+.-]*):") or nil
local safe = proto and safe_protos[proto] local safe = proto and safe_protos[proto]
if not safe then if not safe then
msg.error(("Ignoring potentially unsafe url: '%s'"):format(url)) msg.error(("Ignoring potentially unsafe url: '%s'"):format(url))