Compare commits

...

36 Commits

Author SHA1 Message Date
pseyfert 5d426c74ce
Merge b3ab97c05e into e0727e4ab6 2024-04-19 16:13:24 +03:00
dirkf e0727e4ab6 [postprocessor/ffmpeg] Fix finding ffprobe (bug in 21792b8)
Fixes 21792b88b7 (commitcomment-140705274), thx: vonProteus
2024-04-07 15:33:30 +01:00
Ori Avtalion 4ea59c6107
[utils] Fix crash in _report_ignoring_subs from c58b655 (#32762)
Align `utils.bug_reports_message()` with yt-dlp https://github.com/yt-dlp/yt-dlp/commit/5873d4ccdd, thanks fstirlitz

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-04-05 15:25:29 +01:00
dirkf 21792b88b7 [external/FFmpeg] Fix and improve --ffmpeg-location handling
* pass YoutubeDL (FileDownloader) to FFmpegPostProcessor constructor
* consolidate path search in FFmpegPostProcessor
* make availability of FFmpegFD depend on existence of FFmpegPostProcessor
* detect ffmpeg executable on instantiation of FFmpegFD
* resolves #32735
2024-03-27 13:11:17 +00:00
dirkf d8f134a664 [downloader/external] Fix "Resource Warning" in downloader test
* add compat_subprocess_Popen context manager
* apply context manager in FFmpegFD._call_downloader()
2024-03-27 13:11:17 +00:00
dirkf 31a15a7c8d [compat] Simplify/fix compat_html_parser_HTMLParseError 2024-03-27 13:11:17 +00:00
dirkf 19dc10b986 [utils] Apply compat_contextlib_suppress 2024-03-27 13:11:17 +00:00
dirkf 182f63e82a [compat] Add compat_contextlib_suppress
with compat_contextlib_suppress(*Exceptions):
    # code that fails silently for any of Exceptions
2024-03-27 13:11:17 +00:00
gy-chen 71211e7db7
[Youtube] Fix unwanted private method __ie_msg in f8b0135850
Fixes `AttributeError no attribute '_YoutubeIE__ie_msg'` if unable to decode n-parameter
2024-03-23 15:30:13 +00:00
Zizheng Guo a96a45b2cd
[Vimeo] Improve `config` extraction (#32742)
* update for more robust json parsing
2024-03-12 11:44:13 +00:00
hatsomatt 820fae3b3a [Videa] Fix extraction
* update API URL
* from https://github.com/yt-dlp/yt-dlp/pull/8003
* thanks to the authors!

Closes yt-dlp/7427
Authored by: hatsomatt, aky-01
2024-03-08 13:14:52 +00:00
dirkf aef24d97e9 [Videa] Align with yt-dlp 2024-03-08 13:14:52 +00:00
dirkf f7b30e3f73 [XFileShare] Update extractor for 2024
* simplify aa_decode()
* review and update supported sites and tests
* in above, include FileMoon.sx, and remove separate module
* incorporate changes from yt-dlp
* allow for decoding multiple scripts (eg, FileMoon)
* use new JWPlayer extraction
2024-03-08 13:03:42 +00:00
dirkf f66372403f [InfoExtractor] Rework and improve JWPlayer extraction
* use traverse_obj() and _search_json()
* support playlist `.load({**video1},{**video2}, ...)`
* support transform_source=... for _extract_jwplayer_data()
2024-03-08 13:03:42 +00:00
dirkf 7216fa2ac4 [InfoExtractor] Add `_search_json()`
* uses the error diagnostic to truncate the JSON string
* may be confused by non-C-Pythons
2024-03-08 13:03:42 +00:00
dirkf acc383b9e3 [utils] Let int_or_none() accept a base, like int() 2024-03-08 13:03:42 +00:00
Hubert Hirtz f0812d7848
[utils] Handle user:pass in URLs (#28801)
* Handle user:pass in URLs

Fixes "nonnumeric port" errors when youtube-dl is given URLs with
usernames and passwords such as:

    http://username:password@example.com/myvideo.mp4

Refs:
- https://en.wikipedia.org/wiki/Basic_access_authentication
- https://tools.ietf.org/html/rfc1738#section-3.1
- https://docs.python.org/3.8/library/urllib.parse.html#urllib.parse.urlsplit

Fixes #18276 (point 4)
Fixes #20258
Fixes #26211 (see comment)

* Align code with yt-dlp

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-03-04 01:27:55 +00:00
Aaron Tan 40bd5c1815
[caffeine.tv] Add new extractor (#32514)
* Add CaffeineTVIE info extractor to support site caffeine.tv

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-02-22 12:54:07 +00:00
dirkf 70f230f9cf
[GBNews]Add new extractor for GB News TV channel (#29432)
* Add extractor for GB News TV channel

* Support more GBNews URL formats
Allow alphanumeric and _ in place of `shows`, which redirect to site's preferred URL

* Update for 2024
2024-02-22 12:44:00 +00:00
dirkf 48ddab1f3a
[downloader/external] Fix WgetFD proxy (rev 2)
From PR (defunct source), closes #29343.
Matches https://github.com/yt-dlp/yt-dlp/pull/3152
Thx former user kikuyan.
2024-02-21 16:29:08 +00:00
dirkf 7687389f08 [Vbox7] Improve extraction, adding features from yt-dlp PR #9100
* changes from https://github.com/yt-dlp/yt-dlp/pull/9100 (thx
seproDev):
  - attempt HLS extraction
  - re-enable XFF
  - test `view_count`, `duration` extraction
* improve commenting, error checks
2024-02-19 00:53:22 +00:00
dirkf 4416f82c80 [Vbox7IE] Sanitise ld+json containing unexpected characters
* based on PR #29680
* added hack to force invoking `transform_source`
* fixes #26218
2024-02-02 12:36:05 +00:00
dirkf bdda6b81df [Vbox7IE] Improve extraction
* DASH extraction no longer fails with new range support
* but always find combined formats if available
* suppress ineffective XFF geo-bypass (causes time-outs)
* adapted from https://github.com/ytdl-org/youtube-dl/pull/29680
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf 1fd8f802b8 [InfoExtractor] Correctly resolve BaseURL in DASH manifest
Specs:
* ISO/IEC 23009-1:2012 section 5.6
* RFC 3986 section 5.
2024-02-02 12:36:05 +00:00
dirkf 4eaeb9b2c6 [InfoExtractor] Support byte range for DASH
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf bec9180e89 [downloader/dash] Support `range` in fragment (format f'{start}-{end}')
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
 * thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf c58b655a9e [InfoExtractor] Support DASH subtitle extraction (yt-dlp back-port) 2024-02-02 12:36:05 +00:00
dirkf dc512e3a8a [YouTube] Fix `like_count` extraction using `likeButtonViewModel`
* also fix various tests
* TODO: check against yt-dlp tests
2024-01-22 11:10:34 +00:00
dirkf f8b0135850 [YouTube] Rework n-sig processing, realigning with yt-dlp
* apply n-sig before chunked fragments, fixes #32692
2024-01-22 11:10:34 +00:00
dirkf 640d39f03a [InfoExtractor] Support some warning and `._downloader` shortcut methods from yt-dlp 2024-01-22 11:10:34 +00:00
dirkf 6651871416 [compat] Rework compat for `method` parameter of `compat_urllib_request.Request` constructor
* fixes #32573
* does not break `utils.HEADrequest` (eg)
2024-01-22 11:10:34 +00:00
mk-pmb be008e657d [core] Fix format string injection for metadata JSON filename message. 2023-12-06 02:45:41 +00:00
Robotix b1bbc1e502
[Epidemic Sound] Add new extractor (#32628)
* Add simple extractor
* Support separate tracks
* Use index as id instead of slug

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-12-06 01:17:57 +00:00
dirkf 55a442adae
[Imgur] Overhaul extractor module (#32612)
Revise extractors for new API and page formats
2023-12-05 20:02:30 +00:00
mimvahedi c62936a5f2
[telewebion] Fix extraction (#32634)
* [telewebion] fix extraction

Resolves https://github.com/ytdl-org/youtube-dl/issues/5135#issuecomment-932952119

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-12-02 15:25:09 +00:00
Paul Seyfert b3ab97c05e rewrite zsh completion generation 2021-03-06 00:06:35 +01:00
29 changed files with 2582 additions and 847 deletions

View File

@ -3,26 +3,8 @@
__youtube_dl() {
local curcontext="$curcontext" fileopts diropts cur prev
typeset -A opt_args
fileopts="{{fileopts}}"
diropts="{{diropts}}"
cur=$words[CURRENT]
case $cur in
:)
_arguments '*: :(::ytfavorites ::ytrecommended ::ytsubscriptions ::ytwatchlater ::ythistory)'
;;
*)
prev=$words[CURRENT-1]
if [[ ${prev} =~ ${fileopts} ]]; then
_path_files
elif [[ ${prev} =~ ${diropts} ]]; then
_path_files -/
elif [[ ${prev} == "--recode-video" ]]; then
_arguments '*: :(mp4 flv ogg webm mkv)'
else
_arguments '*: :({{flags}})'
fi
;;
esac
_arguments {{args}} \
'*: :(::ytfavorites ::ytrecommended ::ytsubscriptions ::ytwatchlater ::ythistory)'
}
__youtube_dl
__youtube_dl

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
import os
from os.path import dirname as dirn
import sys
import optparse
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
import youtube_dl
@ -17,30 +18,44 @@ ZSH_COMPLETION_TEMPLATE = "devscripts/zsh-completion.in"
def build_completion(opt_parser):
opts = [opt for group in opt_parser.option_groups
for opt in group.option_list]
opts_file = [opt for opt in opts if opt.metavar == "FILE"]
opts_dir = [opt for opt in opts if opt.metavar == "DIR"]
fileopts = []
for opt in opts_file:
# escaping is hard:
# - help may contain colons
# - metavar must have colons : escaped
# - single quotes must be removed
# - help must have square brackets [] escaped
def metaparse(opt):
if "--recode-video" == opt.get_opt_string():
return ":{}:(mp4 flv ogg webm mkv)".format(opt.metavar)
if opt.metavar is None:
return ""
if opt.metavar == "FILE":
return ":FILE:_files"
if opt.metavar == "DIR":
return ":DIR:_directories"
else:
return ":{}:".format(opt.metavar.replace(":", "\\:"))
def helpescape(opthelp):
if opthelp == optparse.SUPPRESS_HELP:
return ""
return "[{}]".format(opthelp.replace("'", "\"").replace("]", "\\]").replace("[", "\\["))
def optionexclude(opt):
# When an argument has a long and short version, the arguments entry shall be
# _arguments \
# "(-t --thing)"{-t,--thing}"[do things]:WHAT_THING:"
# i.e. in parentheses with space the explanation of redundancy and in curly braces
# regular shell expansion to create two mostly identical entries.
if opt._short_opts:
fileopts.extend(opt._short_opts)
if opt._long_opts:
fileopts.extend(opt._long_opts)
return "({0} {1})'{{{0},{1}}}'".format(opt._short_opts[0], opt.get_opt_string())
return "{}".format(opt.get_opt_string())
diropts = []
for opt in opts_dir:
if opt._short_opts:
diropts.extend(opt._short_opts)
if opt._long_opts:
diropts.extend(opt._long_opts)
flags = [opt.get_opt_string() for opt in opts]
mytest = ["'{}{}{}'".format(optionexclude(opt), helpescape(opt.help), metaparse(opt)) for opt in opts]
template = read_file(ZSH_COMPLETION_TEMPLATE)
template = template.replace("{{fileopts}}", "|".join(fileopts))
template = template.replace("{{diropts}}", "|".join(diropts))
template = template.replace("{{flags}}", " ".join(flags))
template = template.replace("{{args}}", " \\\n ".join(mytest))
write_file(ZSH_COMPLETION_FILE, template)

View File

@ -993,7 +993,8 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'tbr': 5997.485,
'width': 1920,
'height': 1080,
}]
}],
{},
), (
# https://github.com/ytdl-org/youtube-dl/pull/14844
'urls_only',
@ -1076,7 +1077,8 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'tbr': 4400,
'width': 1920,
'height': 1080,
}]
}],
{},
), (
# https://github.com/ytdl-org/youtube-dl/issues/20346
# Media considered unfragmented even though it contains
@ -1122,18 +1124,185 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'width': 360,
'height': 360,
'fps': 30,
}]
}],
{},
), (
# https://github.com/ytdl-org/youtube-dl/issues/30235
# Bento4 generated test mpd
# mp4dash --mpd-name=manifest.mpd --no-split --use-segment-list mediafiles
'url_and_range',
'http://unknown/manifest.mpd', # mpd_url
'http://unknown/', # mpd_base_url
[{
'manifest_url': 'http://unknown/manifest.mpd',
'fragment_base_url': 'http://unknown/',
'ext': 'm4a',
'format_id': 'audio-und-mp4a.40.2',
'format_note': 'DASH audio',
'container': 'm4a_dash',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'none',
'tbr': 98.808,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'fragment_base_url': 'http://unknown/',
'ext': 'mp4',
'format_id': 'video-avc1',
'format_note': 'DASH video',
'container': 'mp4_dash',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.4D401E',
'tbr': 699.597,
'width': 768,
'height': 432
}],
{},
), (
# https://github.com/ytdl-org/youtube-dl/issues/27575
# GPAC generated test mpd
# MP4Box -dash 10000 -single-file -out manifest.mpd mediafiles
'range_only',
'http://unknown/manifest.mpd', # mpd_url
'http://unknown/', # mpd_base_url
[{
'manifest_url': 'http://unknown/manifest.mpd',
'fragment_base_url': 'http://unknown/audio_dashinit.mp4',
'ext': 'm4a',
'format_id': '2',
'format_note': 'DASH audio',
'container': 'm4a_dash',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'none',
'tbr': 98.096,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'fragment_base_url': 'http://unknown/video_dashinit.mp4',
'ext': 'mp4',
'format_id': '1',
'format_note': 'DASH video',
'container': 'mp4_dash',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.4D401E',
'tbr': 526.987,
'width': 768,
'height': 432
}],
{},
), (
'subtitles',
'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/',
[{
'format_id': 'audio=128001',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'm4a',
'tbr': 128.001,
'asr': 48000,
'format_note': 'DASH audio',
'container': 'm4a_dash',
'vcodec': 'none',
'acodec': 'mp4a.40.2',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=100000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 336,
'height': 144,
'tbr': 100,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=326000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 562,
'height': 240,
'tbr': 326,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=698000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 844,
'height': 360,
'tbr': 698,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=1493000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 1126,
'height': 480,
'tbr': 1493,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=4482000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 1688,
'height': 720,
'tbr': 4482,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}],
{
'en': [
{
'ext': 'mp4',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}
]
},
)
]
for mpd_file, mpd_url, mpd_base_url, expected_formats in _TEST_CASES:
for mpd_file, mpd_url, mpd_base_url, expected_formats, expected_subtitles in _TEST_CASES:
with open('./test/testdata/mpd/%s.mpd' % mpd_file,
mode='r', encoding='utf-8') as f:
formats = self.ie._parse_mpd_formats(
formats, subtitles = self.ie._parse_mpd_formats_and_subtitles(
compat_etree_fromstring(f.read().encode('utf-8')),
mpd_base_url=mpd_base_url, mpd_url=mpd_url)
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
expect_value(self, subtitles, expected_subtitles, None)
def test_parse_f4m_formats(self):
_TEST_CASES = [

View File

@ -23,6 +23,7 @@ from youtube_dl.compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode,
compat_urllib_request,
)
@ -135,6 +136,19 @@ class TestCompat(unittest.TestCase):
self.assertEqual(compat_casefold('\u03a3'), '\u03c3')
self.assertEqual(compat_casefold('A\u0345\u03a3'), 'a\u03b9\u03c3')
def test_compat_urllib_request_Request(self):
self.assertEqual(
compat_urllib_request.Request('http://127.0.0.1', method='PUT').get_method(),
'PUT')
class PUTrequest(compat_urllib_request.Request):
def get_method(self):
return 'PUT'
self.assertEqual(
PUTrequest('http://127.0.0.1').get_method(),
'PUT')
if __name__ == '__main__':
unittest.main()

View File

@ -18,6 +18,7 @@ from test.helper import (
)
from youtube_dl import YoutubeDL
from youtube_dl.compat import (
compat_contextlib_suppress,
compat_http_cookiejar_Cookie,
compat_http_server,
compat_kwargs,
@ -35,6 +36,9 @@ from youtube_dl.downloader.external import (
HttpieFD,
WgetFD,
)
from youtube_dl.postprocessor import (
FFmpegPostProcessor,
)
import threading
TEST_SIZE = 10 * 1024
@ -227,7 +231,17 @@ class TestAria2cFD(unittest.TestCase):
self.assertIn('--load-cookies=%s' % downloader._cookies_tempfile, cmd)
@ifExternalFDAvailable(FFmpegFD)
# Handle delegated availability
def ifFFmpegFDAvailable(externalFD):
# raise SkipTest, or set False!
avail = ifExternalFDAvailable(externalFD) and False
with compat_contextlib_suppress(Exception):
avail = FFmpegPostProcessor(downloader=None).available
return unittest.skipUnless(
avail, externalFD.get_basename() + ' not found')
@ifFFmpegFDAvailable(FFmpegFD)
class TestFFmpegFD(unittest.TestCase):
_args = []

View File

@ -295,6 +295,7 @@ class TestNRKSubtitles(BaseTestSubtitles):
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
self.DL.params['format'] = 'best/bestvideo'
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['nb-ttv']))
self.assertEqual(md5(subtitles['nb-ttv']), '67e06ff02d0deaf975e68f6cb8f6a149')

View File

@ -81,6 +81,7 @@ from youtube_dl.utils import (
sanitize_filename,
sanitize_path,
sanitize_url,
sanitized_Request,
shell_quote,
smuggle_url,
str_or_none,
@ -255,6 +256,18 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_url('https://foo.bar'), 'https://foo.bar')
self.assertEqual(sanitize_url('foo bar'), 'foo bar')
def test_sanitized_Request(self):
self.assertFalse(sanitized_Request('http://foo.bar').has_header('Authorization'))
self.assertFalse(sanitized_Request('http://:foo.bar').has_header('Authorization'))
self.assertEqual(sanitized_Request('http://@foo.bar').get_header('Authorization'),
'Basic Og==')
self.assertEqual(sanitized_Request('http://:pass@foo.bar').get_header('Authorization'),
'Basic OnBhc3M=')
self.assertEqual(sanitized_Request('http://user:@foo.bar').get_header('Authorization'),
'Basic dXNlcjo=')
self.assertEqual(sanitized_Request('http://user:pass@foo.bar').get_header('Authorization'),
'Basic dXNlcjpwYXNz')
def test_expand_path(self):
def env(var):
return '%{0}%'.format(var) if sys.platform == 'win32' else '${0}'.format(var)
@ -512,11 +525,14 @@ class TestUtil(unittest.TestCase):
self.assertEqual(float_or_none(set()), None)
def test_int_or_none(self):
self.assertEqual(int_or_none(42), 42)
self.assertEqual(int_or_none('42'), 42)
self.assertEqual(int_or_none(''), None)
self.assertEqual(int_or_none(None), None)
self.assertEqual(int_or_none([]), None)
self.assertEqual(int_or_none(set()), None)
self.assertEqual(int_or_none('42', base=8), 34)
self.assertRaises(TypeError, int_or_none(42, base=8))
def test_str_to_int(self):
self.assertEqual(str_to_int('123,456'), 123456)

35
test/testdata/mpd/range_only.mpd vendored Normal file
View File

@ -0,0 +1,35 @@
<?xml version="1.0"?>
<!-- MPD file Generated with GPAC version 1.0.1-revrelease at 2021-11-27T20:53:11.690Z -->
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" minBufferTime="PT1.500S" type="static" mediaPresentationDuration="PT0H0M30.196S" maxSegmentDuration="PT0H0M10.027S" profiles="urn:mpeg:dash:profile:full:2011">
<ProgramInformation moreInformationURL="http://gpac.io">
<Title>manifest.mpd generated by GPAC</Title>
</ProgramInformation>
<Period duration="PT0H0M30.196S">
<AdaptationSet segmentAlignment="true" maxWidth="768" maxHeight="432" maxFrameRate="30000/1001" par="16:9" lang="und" startWithSAP="1">
<Representation id="1" mimeType="video/mp4" codecs="avc1.4D401E" width="768" height="432" frameRate="30000/1001" sar="1:1" bandwidth="526987">
<BaseURL>video_dashinit.mp4</BaseURL>
<SegmentList timescale="90000" duration="900000">
<Initialization range="0-881"/>
<SegmentURL mediaRange="882-876094" indexRange="882-925"/>
<SegmentURL mediaRange="876095-1466732" indexRange="876095-876138"/>
<SegmentURL mediaRange="1466733-1953615" indexRange="1466733-1466776"/>
<SegmentURL mediaRange="1953616-1994211" indexRange="1953616-1953659"/>
</SegmentList>
</Representation>
</AdaptationSet>
<AdaptationSet segmentAlignment="true" lang="und" startWithSAP="1">
<Representation id="2" mimeType="audio/mp4" codecs="mp4a.40.2" audioSamplingRate="48000" bandwidth="98096">
<AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/>
<BaseURL>audio_dashinit.mp4</BaseURL>
<SegmentList timescale="48000" duration="480000">
<Initialization range="0-752"/>
<SegmentURL mediaRange="753-124129" indexRange="753-796"/>
<SegmentURL mediaRange="124130-250544" indexRange="124130-124173"/>
<SegmentURL mediaRange="250545-374929" indexRange="250545-250588"/>
</SegmentList>
</Representation>
</AdaptationSet>
</Period>
</MPD>

351
test/testdata/mpd/subtitles.mpd vendored Normal file
View File

@ -0,0 +1,351 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- Created with Unified Streaming Platform (version=1.10.18-20255) -->
<MPD
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:mpeg:dash:schema:mpd:2011"
xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
type="static"
mediaPresentationDuration="PT14M48S"
maxSegmentDuration="PT1M"
minBufferTime="PT10S"
profiles="urn:mpeg:dash:profile:isoff-live:2011">
<Period
id="1"
duration="PT14M48S">
<BaseURL>dash/</BaseURL>
<AdaptationSet
id="1"
group="1"
contentType="audio"
segmentAlignment="true"
audioSamplingRate="48000"
mimeType="audio/mp4"
codecs="mp4a.40.2"
startWithSAP="1">
<AudioChannelConfiguration
schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011"
value="2" />
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="main" />
<SegmentTemplate
timescale="48000"
initialization="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$.dash"
media="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$-$Time$.dash">
<SegmentTimeline>
<S t="0" d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="3584" />
</SegmentTimeline>
</SegmentTemplate>
<Representation
id="audio=128001"
bandwidth="128001">
</Representation>
</AdaptationSet>
<AdaptationSet
id="2"
group="3"
contentType="text"
lang="en"
mimeType="application/mp4"
codecs="stpp"
startWithSAP="1">
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle" />
<SegmentTemplate
timescale="1000"
initialization="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$.dash"
media="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$-$Time$.dash">
<SegmentTimeline>
<S t="0" d="60000" r="9" />
<S d="24000" />
</SegmentTimeline>
</SegmentTemplate>
<Representation
id="textstream_eng=1000"
bandwidth="1000">
</Representation>
</AdaptationSet>
<AdaptationSet
id="3"
group="2"
contentType="video"
par="960:409"
minBandwidth="100000"
maxBandwidth="4482000"
maxWidth="1689"
maxHeight="720"
segmentAlignment="true"
mimeType="video/mp4"
codecs="avc1.4D401F"
startWithSAP="1">
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="main" />
<SegmentTemplate
timescale="12288"
initialization="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$.dash"
media="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$-$Time$.dash">
<SegmentTimeline>
<S t="0" d="24576" r="443" />
</SegmentTimeline>
</SegmentTemplate>
<Representation
id="video=100000"
bandwidth="100000"
width="336"
height="144"
sar="2880:2863"
scanType="progressive">
</Representation>
<Representation
id="video=326000"
bandwidth="326000"
width="562"
height="240"
sar="115200:114929"
scanType="progressive">
</Representation>
<Representation
id="video=698000"
bandwidth="698000"
width="844"
height="360"
sar="86400:86299"
scanType="progressive">
</Representation>
<Representation
id="video=1493000"
bandwidth="1493000"
width="1126"
height="480"
sar="230400:230267"
scanType="progressive">
</Representation>
<Representation
id="video=4482000"
bandwidth="4482000"
width="1688"
height="720"
sar="86400:86299"
scanType="progressive">
</Representation>
</AdaptationSet>
</Period>
</MPD>

32
test/testdata/mpd/url_and_range.mpd vendored Normal file
View File

@ -0,0 +1,32 @@
<?xml version="1.0" ?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" profiles="urn:mpeg:dash:profile:isoff-live:2011" minBufferTime="PT10.01S" mediaPresentationDuration="PT30.097S" type="static">
<!-- Created with Bento4 mp4-dash.py, VERSION=2.0.0-639 -->
<Period>
<!-- Video -->
<AdaptationSet mimeType="video/mp4" segmentAlignment="true" startWithSAP="1" maxWidth="768" maxHeight="432">
<Representation id="video-avc1" codecs="avc1.4D401E" width="768" height="432" scanType="progressive" frameRate="30000/1001" bandwidth="699597">
<SegmentList timescale="1000" duration="10010">
<Initialization sourceURL="video-frag.mp4" range="36-746"/>
<SegmentURL media="video-frag.mp4" mediaRange="747-876117"/>
<SegmentURL media="video-frag.mp4" mediaRange="876118-1466913"/>
<SegmentURL media="video-frag.mp4" mediaRange="1466914-1953954"/>
<SegmentURL media="video-frag.mp4" mediaRange="1953955-1994652"/>
</SegmentList>
</Representation>
</AdaptationSet>
<!-- Audio -->
<AdaptationSet mimeType="audio/mp4" startWithSAP="1" segmentAlignment="true">
<Representation id="audio-und-mp4a.40.2" codecs="mp4a.40.2" bandwidth="98808" audioSamplingRate="48000">
<AudioChannelConfiguration schemeIdUri="urn:mpeg:mpegB:cicp:ChannelConfiguration" value="2"/>
<SegmentList timescale="1000" duration="10010">
<Initialization sourceURL="audio-frag.mp4" range="32-623"/>
<SegmentURL media="audio-frag.mp4" mediaRange="624-124199"/>
<SegmentURL media="audio-frag.mp4" mediaRange="124200-250303"/>
<SegmentURL media="audio-frag.mp4" mediaRange="250304-374365"/>
<SegmentURL media="audio-frag.mp4" mediaRange="374366-374836"/>
</SegmentList>
</Representation>
</AdaptationSet>
</Period>
</MPD>

View File

@ -2635,12 +2635,12 @@ class YoutubeDL(object):
self.to_screen(msg('[info] %s is already present', label.title()))
return 'exists'
else:
self.to_screen(msg('[info] Writing %s as JSON to: ' + infofn, label))
self.to_screen(msg('[info] Writing %s as JSON to: ', label) + infofn)
try:
write_json_file(self.filter_requested_info(info_dict), infofn)
return True
except (OSError, IOError):
self.report_error(msg('Cannot write %s to JSON file ' + infofn, label))
self.report_error(msg('Cannot write %s to JSON file ', label) + infofn)
return
def _write_thumbnails(self, info_dict, filename):

View File

@ -58,19 +58,26 @@ except ImportError: # Python 2
# Also fix up lack of method arg in old Pythons
try:
_req = compat_urllib_request.Request
_req('http://127.0.0.1', method='GET')
type(compat_urllib_request.Request('http://127.0.0.1', method='GET'))
except TypeError:
class _request(object):
def __new__(cls, url, *args, **kwargs):
method = kwargs.pop('method', None)
r = _req(url, *args, **kwargs)
if method:
r.get_method = types.MethodType(lambda _: method, r)
return r
def _add_init_method_arg(cls):
compat_urllib_request.Request = _request
init = cls.__init__
def wrapped_init(self, *args, **kwargs):
method = kwargs.pop('method', 'GET')
init(self, *args, **kwargs)
if any(callable(x.__dict__.get('get_method')) for x in (self.__class__, self) if x != cls):
# allow instance or its subclass to override get_method()
return
if self.has_data() and method == 'GET':
method = 'POST'
self.get_method = types.MethodType(lambda _: method, self)
cls.__init__ = wrapped_init
_add_init_method_arg(compat_urllib_request.Request)
del _add_init_method_arg
try:
import urllib.error as compat_urllib_error
@ -2414,29 +2421,26 @@ except ImportError: # Python 2
compat_urllib_request_urlretrieve = compat_urlretrieve
try:
from HTMLParser import (
HTMLParser as compat_HTMLParser,
HTMLParseError as compat_HTMLParseError)
except ImportError: # Python 3
from html.parser import HTMLParser as compat_HTMLParser
except ImportError: # Python 2
from HTMLParser import HTMLParser as compat_HTMLParser
compat_html_parser_HTMLParser = compat_HTMLParser
try: # Python 2
from HTMLParser import HTMLParseError as compat_HTMLParseError
except ImportError: # Python <3.4
try:
from html.parser import HTMLParseError as compat_HTMLParseError
except ImportError: # Python >3.4
# HTMLParseError has been deprecated in Python 3.3 and removed in
# HTMLParseError was deprecated in Python 3.3 and removed in
# Python 3.5. Introducing dummy exception for Python >3.5 for compatible
# and uniform cross-version exception handling
class compat_HTMLParseError(Exception):
pass
compat_html_parser_HTMLParser = compat_HTMLParser
compat_html_parser_HTMLParseError = compat_HTMLParseError
try:
from subprocess import DEVNULL
compat_subprocess_get_DEVNULL = lambda: DEVNULL
except ImportError:
_DEVNULL = subprocess.DEVNULL
compat_subprocess_get_DEVNULL = lambda: _DEVNULL
except AttributeError:
compat_subprocess_get_DEVNULL = lambda: open(os.path.devnull, 'w')
try:
@ -2936,6 +2940,51 @@ else:
compat_socket_create_connection = socket.create_connection
try:
from contextlib import suppress as compat_contextlib_suppress
except ImportError:
class compat_contextlib_suppress(object):
_exceptions = None
def __init__(self, *exceptions):
super(compat_contextlib_suppress, self).__init__()
# TODO: [Base]ExceptionGroup (3.12+)
self._exceptions = exceptions
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
return exc_val is not None and isinstance(exc_val, self._exceptions or tuple())
# subprocess.Popen context manager
# avoids leaking handles if .communicate() is not called
try:
_Popen = subprocess.Popen
# check for required context manager attributes
_Popen.__enter__ and _Popen.__exit__
compat_subprocess_Popen = _Popen
except AttributeError:
# not a context manager - make one
from contextlib import contextmanager
@contextmanager
def compat_subprocess_Popen(*args, **kwargs):
popen = None
try:
popen = _Popen(*args, **kwargs)
yield popen
finally:
if popen:
for f in (popen.stdin, popen.stdout, popen.stderr):
if f:
# repeated .close() is OK, but just in case
with compat_contextlib_suppress(EnvironmentError):
f.close()
popen.wait()
# Fix https://github.com/ytdl-org/youtube-dl/issues/4223
# See http://bugs.python.org/issue9161 for what is broken
def workaround_optparse_bug9161():
@ -3256,6 +3305,7 @@ __all__ = [
'compat_http_cookiejar_Cookie',
'compat_http_cookies',
'compat_http_cookies_SimpleCookie',
'compat_contextlib_suppress',
'compat_ctypes_WINFUNCTYPE',
'compat_etree_fromstring',
'compat_filter',
@ -3291,6 +3341,7 @@ __all__ = [
'compat_struct_pack',
'compat_struct_unpack',
'compat_subprocess_get_DEVNULL',
'compat_subprocess_Popen',
'compat_tokenize_tokenize',
'compat_urllib_error',
'compat_urllib_parse',

View File

@ -35,6 +35,7 @@ class DashSegmentsFD(FragmentFD):
for frag_index, fragment in enumerate(fragments, 1):
if frag_index <= ctx['fragment_index']:
continue
success = False
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = frag_index == 1 or not skip_unavailable_fragments
@ -42,10 +43,14 @@ class DashSegmentsFD(FragmentFD):
if not fragment_url:
assert fragment_base_url
fragment_url = urljoin(fragment_base_url, fragment['path'])
success = False
headers = info_dict.get('http_headers')
fragment_range = fragment.get('range')
if fragment_range:
headers = headers.copy() if headers else {}
headers['Range'] = 'bytes=%s' % (fragment_range,)
for count in itertools.count():
try:
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict, headers)
if not success:
return False
self._append_fragment(ctx, frag_content)

View File

@ -11,8 +11,14 @@ from .common import FileDownloader
from ..compat import (
compat_setenv,
compat_str,
compat_subprocess_Popen,
)
from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
try:
from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
except ImportError:
FFmpegPostProcessor = None
from ..utils import (
cli_option,
cli_valueless_option,
@ -206,7 +212,10 @@ class WgetFD(ExternalFD):
retry[1] = '0'
cmd += retry
cmd += self._option('--bind-address', 'source_address')
cmd += self._option('--proxy', 'proxy')
proxy = self.params.get('proxy')
if proxy:
for var in ('http_proxy', 'https_proxy'):
cmd += ['--execute', '%s=%s' % (var, proxy)]
cmd += self._valueless_option('--no-check-certificate', 'nocheckcertificate')
cmd += self._configuration_args()
cmd += ['--', info_dict['url']]
@ -358,13 +367,14 @@ class FFmpegFD(ExternalFD):
@classmethod
def available(cls):
return FFmpegPostProcessor().available
# actual availability can only be confirmed for an instance
return bool(FFmpegPostProcessor)
def _call_downloader(self, tmpfilename, info_dict):
url = info_dict['url']
ffpp = FFmpegPostProcessor(downloader=self)
# `downloader` means the parent `YoutubeDL`
ffpp = FFmpegPostProcessor(downloader=self.ydl)
if not ffpp.available:
self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
self.report_error('ffmpeg required for download but no ffmpeg (nor avconv) executable could be found. Please install one.')
return False
ffpp.check_version()
@ -393,6 +403,7 @@ class FFmpegFD(ExternalFD):
# if end_time:
# args += ['-t', compat_str(end_time - start_time)]
url = info_dict['url']
cookies = self.ydl.cookiejar.get_cookies_for_url(url)
if cookies:
args.extend(['-cookies', ''.join(
@ -480,21 +491,25 @@ class FFmpegFD(ExternalFD):
self._debug_cmd(args)
proc = subprocess.Popen(args, stdin=subprocess.PIPE, env=env)
try:
retval = proc.wait()
except BaseException as e:
# subprocess.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if isinstance(e, KeyboardInterrupt) and sys.platform != 'win32':
process_communicate_or_kill(proc, b'q')
else:
proc.kill()
proc.wait()
raise
# From [1], a PIPE opened in Popen() should be closed, unless
# .communicate() is called. Avoid leaking any PIPEs by using Popen
# as a context manager (newer Python 3.x and compat)
# Fixes "Resource Warning" in test/test_downloader_external.py
# [1] https://devpress.csdn.net/python/62fde12d7e66823466192e48.html
with compat_subprocess_Popen(args, stdin=subprocess.PIPE, env=env) as proc:
try:
retval = proc.wait()
except BaseException as e:
# subprocess.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if isinstance(e, KeyboardInterrupt) and sys.platform != 'win32':
process_communicate_or_kill(proc, b'q')
else:
proc.kill()
raise
return retval

View File

@ -0,0 +1,79 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
merge_dicts,
parse_iso8601,
T,
traverse_obj,
txt_or_none,
urljoin,
)
class CaffeineTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?caffeine\.tv/[^/]+/video/(?P<id>[0-9a-f-]+)'
_TESTS = [{
'url': 'https://www.caffeine.tv/TsuSurf/video/cffc0a00-e73f-11ec-8080-80017d29f26e',
'info_dict': {
'id': 'cffc0a00-e73f-11ec-8080-80017d29f26e',
'ext': 'mp4',
'title': 'GOOOOD MORNINNNNN #highlights',
'timestamp': 1654702180,
'upload_date': '20220608',
'uploader': 'TsuSurf',
'duration': 3145,
'age_limit': 17,
},
'params': {
'format': 'bestvideo',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json(
'https://api.caffeine.tv/social/public/activity/' + video_id,
video_id)
broadcast_info = traverse_obj(json_data, ('broadcast_info', T(dict))) or {}
title = broadcast_info['broadcast_title']
video_url = broadcast_info['video_url']
ext = determine_ext(video_url)
if ext == 'm3u8':
formats = self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8',
fatal=False)
else:
formats = [{'url': video_url}]
self._sort_formats(formats)
return merge_dicts({
'id': video_id,
'title': title,
'formats': formats,
}, traverse_obj(json_data, {
'uploader': ((None, 'user'), 'username'),
}, get_all=False), traverse_obj(json_data, {
'like_count': ('like_count', T(int_or_none)),
'view_count': ('view_count', T(int_or_none)),
'comment_count': ('comment_count', T(int_or_none)),
'tags': ('tags', Ellipsis, T(txt_or_none)),
'is_live': 'is_live',
'uploader': ('user', 'name'),
}), traverse_obj(broadcast_info, {
'duration': ('content_duration', T(int_or_none)),
'timestamp': ('broadcast_start_time', T(parse_iso8601)),
'thumbnail': ('preview_image_path', T(lambda u: urljoin(url, u))),
'age_limit': ('content_rating', T(lambda r: r and {
# assume Apple Store ratings [1]
# 1. https://en.wikipedia.org/wiki/Mobile_software_content_rating_system
'FOUR_PLUS': 0,
'NINE_PLUS': 9,
'TWELVE_PLUS': 12,
'SEVENTEEN_PLUS': 17,
}.get(r, 17))),
}))

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals
import base64
import collections
import datetime
import functools
import hashlib
@ -24,6 +25,7 @@ from ..compat import (
compat_getpass,
compat_integer_types,
compat_http_client,
compat_kwargs,
compat_map as map,
compat_open as open,
compat_os_name,
@ -58,6 +60,7 @@ from ..utils import (
GeoRestrictedError,
GeoUtils,
int_or_none,
join_nonempty,
js_to_json,
JSON_LD_RE,
mimetype2ext,
@ -74,6 +77,7 @@ from ..utils import (
str_or_none,
str_to_int,
strip_or_none,
T,
traverse_obj,
try_get,
unescapeHTML,
@ -180,6 +184,8 @@ class InfoExtractor(object):
fragment_base_url
* "duration" (optional, int or float)
* "filesize" (optional, int)
* "range" (optional, str of the form "start-end"
to use in HTTP Range header)
* preference Order number of this format. If this field is
present and not None, the formats get sorted
by this field, regardless of all other values.
@ -596,6 +602,14 @@ class InfoExtractor(object):
"""Sets the downloader for this IE."""
self._downloader = downloader
@property
def cache(self):
return self._downloader.cache
@property
def cookiejar(self):
return self._downloader.cookiejar
def _real_initialize(self):
"""Real initialization process. Redefine in subclasses."""
pass
@ -942,14 +956,47 @@ class InfoExtractor(object):
else:
self.report_warning(errmsg + str(ve))
def report_warning(self, msg, video_id=None):
def __ie_msg(self, *msg):
return '[{0}] {1}'.format(self.IE_NAME, ''.join(msg))
# msg, video_id=None, *args, only_once=False, **kwargs
def report_warning(self, msg, *args, **kwargs):
if len(args) > 0:
video_id = args[0]
args = args[1:]
else:
video_id = kwargs.pop('video_id', None)
idstr = '' if video_id is None else '%s: ' % video_id
self._downloader.report_warning(
'[%s] %s%s' % (self.IE_NAME, idstr, msg))
self.__ie_msg(idstr, msg), *args, **kwargs)
def to_screen(self, msg):
"""Print msg to screen, prefixing it with '[ie_name]'"""
self._downloader.to_screen('[%s] %s' % (self.IE_NAME, msg))
self._downloader.to_screen(self.__ie_msg(msg))
def write_debug(self, msg, only_once=False, _cache=[]):
'''Log debug message or Print message to stderr'''
if not self.get_param('verbose', False):
return
message = '[debug] ' + self.__ie_msg(msg)
logger = self.get_param('logger')
if logger:
logger.debug(message)
else:
if only_once and hash(message) in _cache:
return
self._downloader.to_stderr(message)
_cache.append(hash(message))
# name, default=None, *args, **kwargs
def get_param(self, name, *args, **kwargs):
default, args = (args[0], args[1:]) if len(args) > 0 else (kwargs.pop('default', None), args)
if self._downloader:
return self._downloader.params.get(name, default, *args, **kwargs)
return default
def report_drm(self, video_id):
self.raise_no_formats('This video is DRM protected', expected=True, video_id=video_id)
def report_extraction(self, id_or_name):
"""Report information extraction."""
@ -977,6 +1024,15 @@ class InfoExtractor(object):
def raise_geo_restricted(msg='This video is not available from your location due to geo restriction', countries=None):
raise GeoRestrictedError(msg, countries=countries)
def raise_no_formats(self, msg, expected=False, video_id=None):
if expected and (
self.get_param('ignore_no_formats_error') or self.get_param('wait_for_video')):
self.report_warning(msg, video_id)
elif isinstance(msg, ExtractorError):
raise msg
else:
raise ExtractorError(msg, expected=expected, video_id=video_id)
# Methods for following #608
@staticmethod
def url_result(url, ie=None, video_id=None, video_title=None):
@ -1047,6 +1103,60 @@ class InfoExtractor(object):
self._downloader.report_warning('unable to extract %s' % _name + bug_reports_message())
return None
def _search_json(self, start_pattern, string, name, video_id, **kwargs):
"""Searches string for the JSON object specified by start_pattern"""
# self, start_pattern, string, name, video_id, *, end_pattern='',
# contains_pattern=r'{(?s:.+)}', fatal=True, default=NO_DEFAULT
# NB: end_pattern is only used to reduce the size of the initial match
end_pattern = kwargs.pop('end_pattern', '')
# (?:[\s\S]) simulates (?(s):.) (eg)
contains_pattern = kwargs.pop('contains_pattern', r'{[\s\S]+}')
fatal = kwargs.pop('fatal', True)
default = kwargs.pop('default', NO_DEFAULT)
if default is NO_DEFAULT:
default, has_default = {}, False
else:
fatal, has_default = False, True
json_string = self._search_regex(
r'(?:{0})\s*(?P<json>{1})\s*(?:{2})'.format(
start_pattern, contains_pattern, end_pattern),
string, name, group='json', fatal=fatal, default=None if has_default else NO_DEFAULT)
if not json_string:
return default
# yt-dlp has a special JSON parser that allows trailing text.
# Until that arrives here, the diagnostic from the exception
# raised by json.loads() is used to extract the wanted text.
# Either way, it's a problem if a transform_source() can't
# handle the trailing text.
# force an exception
kwargs['fatal'] = True
# self._downloader._format_err(name, self._downloader.Styles.EMPHASIS)
for _ in range(2):
try:
# return self._parse_json(json_string, video_id, ignore_extra=True, **kwargs)
transform_source = kwargs.pop('transform_source', None)
if transform_source:
json_string = transform_source(json_string)
return self._parse_json(json_string, video_id, **compat_kwargs(kwargs))
except ExtractorError as e:
end = int_or_none(self._search_regex(r'\(char\s+(\d+)', error_to_compat_str(e), 'end', default=None))
if end is not None:
json_string = json_string[:end]
continue
msg = 'Unable to extract {0} - Failed to parse JSON'.format(name)
if fatal:
raise ExtractorError(msg, cause=e.cause, video_id=video_id)
elif not has_default:
self.report_warning(
'{0}: {1}'.format(msg, error_to_compat_str(e)), video_id=video_id)
return default
def _html_search_regex(self, pattern, string, name, default=NO_DEFAULT, fatal=True, flags=0, group=None):
"""
Like _search_regex, but strips HTML tags and unescapes entities.
@ -1701,6 +1811,12 @@ class InfoExtractor(object):
'format_note': 'Quality selection URL',
}
def _report_ignoring_subs(self, name):
self.report_warning(bug_reports_message(
'Ignoring subtitle tracks found in the {0} manifest; '
'if any subtitle tracks are missing,'.format(name)
), only_once=True)
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None,
@ -2141,23 +2257,46 @@ class InfoExtractor(object):
})
return entries
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
def _extract_mpd_formats(self, *args, **kwargs):
fmts, subs = self._extract_mpd_formats_and_subtitles(*args, **kwargs)
if subs:
self._report_ignoring_subs('DASH')
return fmts
def _extract_mpd_formats_and_subtitles(
self, mpd_url, video_id, mpd_id=None, note=None, errnote=None,
fatal=True, data=None, headers=None, query=None):
# TODO: or not? param not yet implemented
if self.get_param('ignore_no_formats_error'):
fatal = False
res = self._download_xml_handle(
mpd_url, video_id,
note=note or 'Downloading MPD manifest',
errnote=errnote or 'Failed to download MPD manifest',
fatal=fatal, data=data, headers=headers, query=query)
note='Downloading MPD manifest' if note is None else note,
errnote='Failed to download MPD manifest' if errnote is None else errnote,
fatal=fatal, data=data, headers=headers or {}, query=query or {})
if res is False:
return []
return [], {}
mpd_doc, urlh = res
if mpd_doc is None:
return []
mpd_base_url = base_url(urlh.geturl())
return [], {}
return self._parse_mpd_formats(
# We could have been redirected to a new url when we retrieved our mpd file.
mpd_url = urlh.geturl()
mpd_base_url = base_url(mpd_url)
return self._parse_mpd_formats_and_subtitles(
mpd_doc, mpd_id, mpd_base_url, mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', mpd_url=None):
def _parse_mpd_formats(self, *args, **kwargs):
fmts, subs = self._parse_mpd_formats_and_subtitles(*args, **kwargs)
if subs:
self._report_ignoring_subs('DASH')
return fmts
def _parse_mpd_formats_and_subtitles(
self, mpd_doc, mpd_id=None, mpd_base_url='', mpd_url=None):
"""
Parse formats from MPD manifest.
References:
@ -2165,8 +2304,10 @@ class InfoExtractor(object):
http://standards.iso.org/ittf/PubliclyAvailableStandards/c065274_ISO_IEC_23009-1_2014.zip
2. https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP
"""
if mpd_doc.get('type') == 'dynamic':
return []
# TODO: param not yet implemented: default like previous yt-dl logic
if not self.get_param('dynamic_mpd', False):
if mpd_doc.get('type') == 'dynamic':
return [], {}
namespace = self._search_regex(r'(?i)^{([^}]+)?}MPD$', mpd_doc.tag, 'namespace', default=None)
@ -2176,8 +2317,24 @@ class InfoExtractor(object):
def is_drm_protected(element):
return element.find(_add_ns('ContentProtection')) is not None
from ..utils import YoutubeDLHandler
fix_path = YoutubeDLHandler._fix_path
def resolve_base_url(element, parent_base_url=None):
# TODO: use native XML traversal when ready
b_url = traverse_obj(element, (
T(lambda e: e.find(_add_ns('BaseURL')).text)))
if parent_base_url and b_url:
if not parent_base_url[-1] in ('/', ':'):
parent_base_url += '/'
b_url = compat_urlparse.urljoin(parent_base_url, b_url)
if b_url:
b_url = fix_path(b_url)
return b_url or parent_base_url
def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy()
base_url = ms_info['base_url'] = resolve_base_url(element, ms_info.get('base_url'))
# As per [1, 5.3.9.2.2] SegmentList and SegmentTemplate share some
# common attributes and elements. We will only extract relevant
@ -2211,15 +2368,27 @@ class InfoExtractor(object):
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
ms_info['initialization_url'] = initialization.get('sourceURL') or base_url
initialization_url_range = initialization.get('range')
if initialization_url_range:
ms_info['initialization_url_range'] = initialization_url_range
segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None:
extract_common(segment_list)
extract_Initialization(segment_list)
segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
segment_urls = traverse_obj(segment_urls_e, (
Ellipsis, T(lambda e: e.attrib), 'media'))
if segment_urls:
ms_info['segment_urls'] = segment_urls
segment_urls_range = traverse_obj(segment_urls_e, (
Ellipsis, T(lambda e: e.attrib), 'mediaRange',
T(lambda r: re.findall(r'^\d+-\d+$', r)), 0))
if segment_urls_range:
ms_info['segment_urls_range'] = segment_urls_range
if not segment_urls:
ms_info['segment_urls'] = [base_url for _ in segment_urls_range]
else:
segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None:
@ -2235,17 +2404,20 @@ class InfoExtractor(object):
return ms_info
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = []
formats, subtitles = [], {}
stream_numbers = collections.defaultdict(int)
mpd_base_url = resolve_base_url(mpd_doc, mpd_base_url or mpd_url)
for period in mpd_doc.findall(_add_ns('Period')):
period_duration = parse_duration(period.get('duration')) or mpd_duration
period_ms_info = extract_multisegment_info(period, {
'start_number': 1,
'timescale': 1,
'base_url': mpd_base_url,
})
for adaptation_set in period.findall(_add_ns('AdaptationSet')):
if is_drm_protected(adaptation_set):
continue
adaption_set_ms_info = extract_multisegment_info(adaptation_set, period_ms_info)
adaptation_set_ms_info = extract_multisegment_info(adaptation_set, period_ms_info)
for representation in adaptation_set.findall(_add_ns('Representation')):
if is_drm_protected(representation):
continue
@ -2253,27 +2425,35 @@ class InfoExtractor(object):
representation_attrib.update(representation.attrib)
# According to [1, 5.3.7.2, Table 9, page 41], @mimeType is mandatory
mime_type = representation_attrib['mimeType']
content_type = mime_type.split('/')[0]
if content_type == 'text':
# TODO implement WebVTT downloading
pass
elif content_type in ('video', 'audio'):
base_url = ''
for element in (representation, adaptation_set, period, mpd_doc):
base_url_e = element.find(_add_ns('BaseURL'))
if base_url_e is not None:
base_url = base_url_e.text + base_url
if re.match(r'^https?://', base_url):
break
if mpd_base_url and not re.match(r'^https?://', base_url):
if not mpd_base_url.endswith('/') and not base_url.startswith('/'):
mpd_base_url += '/'
base_url = mpd_base_url + base_url
representation_id = representation_attrib.get('id')
lang = representation_attrib.get('lang')
url_el = representation.find(_add_ns('BaseURL'))
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
bandwidth = int_or_none(representation_attrib.get('bandwidth'))
content_type = representation_attrib.get('contentType') or mime_type.split('/')[0]
codec_str = representation_attrib.get('codecs', '')
# Some kind of binary subtitle found in some youtube livestreams
if mime_type == 'application/x-rawcc':
codecs = {'scodec': codec_str}
else:
codecs = parse_codecs(codec_str)
if content_type not in ('video', 'audio', 'text'):
if mime_type == 'image/jpeg':
content_type = mime_type
elif codecs.get('vcodec', 'none') != 'none':
content_type = 'video'
elif codecs.get('acodec', 'none') != 'none':
content_type = 'audio'
elif codecs.get('scodec', 'none') != 'none':
content_type = 'text'
elif mimetype2ext(mime_type) in ('tt', 'dfxp', 'ttml', 'xml', 'json'):
content_type = 'text'
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
continue
representation_id = representation_attrib.get('id')
lang = representation_attrib.get('lang')
url_el = representation.find(_add_ns('BaseURL'))
filesize = int_or_none(url_el.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
bandwidth = int_or_none(representation_attrib.get('bandwidth'))
format_id = join_nonempty(representation_id or content_type, mpd_id)
if content_type in ('video', 'audio'):
f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'manifest_url': mpd_url,
@ -2288,104 +2468,130 @@ class InfoExtractor(object):
'filesize': filesize,
'container': mimetype2ext(mime_type) + '_dash',
}
f.update(parse_codecs(representation_attrib.get('codecs')))
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
f.update(codecs)
elif content_type == 'text':
f = {
'ext': mimetype2ext(mime_type),
'manifest_url': mpd_url,
'filesize': filesize,
}
elif content_type == 'image/jpeg':
# See test case in VikiIE
# https://www.viki.com/videos/1175236v-choosing-spouse-by-lottery-episode-1
f = {
'format_id': format_id,
'ext': 'mhtml',
'manifest_url': mpd_url,
'format_note': 'DASH storyboards (jpeg)',
'acodec': 'none',
'vcodec': 'none',
}
if is_drm_protected(adaptation_set) or is_drm_protected(representation):
f['has_drm'] = True
representation_ms_info = extract_multisegment_info(representation, adaptation_set_ms_info)
def prepare_template(template_name, identifiers):
tmpl = representation_ms_info[template_name]
# First of, % characters outside $...$ templates
# must be escaped by doubling for proper processing
# by % operator string formatting used further (see
# https://github.com/ytdl-org/youtube-dl/issues/16867).
t = ''
in_template = False
for c in tmpl:
def prepare_template(template_name, identifiers):
tmpl = representation_ms_info[template_name]
# First of, % characters outside $...$ templates
# must be escaped by doubling for proper processing
# by % operator string formatting used further (see
# https://github.com/ytdl-org/youtube-dl/issues/16867).
t = ''
in_template = False
for c in tmpl:
t += c
if c == '$':
in_template = not in_template
elif c == '%' and not in_template:
t += c
if c == '$':
in_template = not in_template
elif c == '%' and not in_template:
t += c
# Next, $...$ templates are translated to their
# %(...) counterparts to be used with % operator
t = t.replace('$RepresentationID$', representation_id)
t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t)
t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t)
t.replace('$$', '$')
return t
# Next, $...$ templates are translated to their
# %(...) counterparts to be used with % operator
t = t.replace('$RepresentationID$', representation_id)
t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t)
t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t)
t.replace('$$', '$')
return t
# @initialization is a regular template like @media one
# so it should be handled just the same way (see
# https://github.com/ytdl-org/youtube-dl/issues/11605)
if 'initialization' in representation_ms_info:
initialization_template = prepare_template(
'initialization',
# As per [1, 5.3.9.4.2, Table 15, page 54] $Number$ and
# $Time$ shall not be included for @initialization thus
# only $Bandwidth$ remains
('Bandwidth', ))
representation_ms_info['initialization_url'] = initialization_template % {
'Bandwidth': bandwidth,
}
# @initialization is a regular template like @media one
# so it should be handled just the same way (see
# https://github.com/ytdl-org/youtube-dl/issues/11605)
if 'initialization' in representation_ms_info:
initialization_template = prepare_template(
'initialization',
# As per [1, 5.3.9.4.2, Table 15, page 54] $Number$ and
# $Time$ shall not be included for @initialization thus
# only $Bandwidth$ remains
('Bandwidth', ))
representation_ms_info['initialization_url'] = initialization_template % {
'Bandwidth': bandwidth,
}
def location_key(location):
return 'url' if re.match(r'^https?://', location) else 'path'
def location_key(location):
return 'url' if re.match(r'^https?://', location) else 'path'
if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
def calc_segment_duration():
return float_or_none(
representation_ms_info['segment_duration'],
representation_ms_info['timescale']) if 'segment_duration' in representation_ms_info else None
media_template = prepare_template('media', ('Number', 'Bandwidth', 'Time'))
media_location_key = location_key(media_template)
if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
media_location_key: media_template % {
'Number': segment_number,
'Bandwidth': bandwidth,
},
'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
# $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
media_template = prepare_template('media', ('Number', 'Bandwidth', 'Time'))
media_location_key = location_key(media_template)
def add_segment_url():
segment_url = media_template % {
'Time': segment_time,
'Bandwidth': bandwidth,
'Number': segment_number,
}
representation_ms_info['fragments'].append({
media_location_key: segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(
float_or_none(period_duration, segment_duration, default=0)))
representation_ms_info['fragments'] = [{
media_location_key: media_template % {
'Number': segment_number,
'Bandwidth': bandwidth,
},
'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
# $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
segment_d = s['d']
def add_segment_url():
segment_url = media_template % {
'Time': segment_time,
'Bandwidth': bandwidth,
'Number': segment_number,
}
representation_ms_info['fragments'].append({
media_location_key: segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
segment_d = s['d']
add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)):
segment_time += segment_d
add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)):
segment_time += segment_d
add_segment_url()
segment_number += 1
segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
segment_time += segment_d
elif 'segment_urls' in representation_ms_info:
fragments = []
if 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
segment_index = 0
timescale = representation_ms_info['timescale']
for s in representation_ms_info['s']:
@ -2397,48 +2603,78 @@ class InfoExtractor(object):
'duration': duration,
})
segment_index += 1
representation_ms_info['fragments'] = fragments
elif 'segment_urls' in representation_ms_info:
elif 'segment_urls_range' in representation_ms_info:
# Segment URLs with mediaRange
# Example: https://kinescope.io/200615537/master.mpd
# https://github.com/ytdl-org/youtube-dl/issues/30235
# or any mpd generated with Bento4 `mp4dash --no-split --use-segment-list`
segment_duration = calc_segment_duration()
for segment_url, segment_url_range in zip(
representation_ms_info['segment_urls'], representation_ms_info['segment_urls_range']):
fragments.append({
location_key(segment_url): segment_url,
'range': segment_url_range,
'duration': segment_duration,
})
else:
# Segment URLs with no SegmentTimeline
# Example: https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# https://github.com/ytdl-org/youtube-dl/pull/14844
fragments = []
segment_duration = float_or_none(
representation_ms_info['segment_duration'],
representation_ms_info['timescale']) if 'segment_duration' in representation_ms_info else None
segment_duration = calc_segment_duration()
for segment_url in representation_ms_info['segment_urls']:
fragment = {
fragments.append({
location_key(segment_url): segment_url,
}
if segment_duration:
fragment['duration'] = segment_duration
fragments.append(fragment)
representation_ms_info['fragments'] = fragments
# If there is a fragments key available then we correctly recognized fragmented media.
# Otherwise we will assume unfragmented media with direct access. Technically, such
# assumption is not necessarily correct since we may simply have no support for
# some forms of fragmented media renditions yet, but for now we'll use this fallback.
if 'fragments' in representation_ms_info:
f.update({
# NB: mpd_url may be empty when MPD manifest is parsed from a string
'url': mpd_url or base_url,
'fragment_base_url': base_url,
'fragments': [],
'protocol': 'http_dash_segments',
'duration': segment_duration,
})
representation_ms_info['fragments'] = fragments
# If there is a fragments key available then we correctly recognized fragmented media.
# Otherwise we will assume unfragmented media with direct access. Technically, such
# assumption is not necessarily correct since we may simply have no support for
# some forms of fragmented media renditions yet, but for now we'll use this fallback.
if 'fragments' in representation_ms_info:
base_url = representation_ms_info['base_url']
f.update({
# NB: mpd_url may be empty when MPD manifest is parsed from a string
'url': mpd_url or base_url,
'fragment_base_url': base_url,
'fragments': [],
'protocol': 'http_dash_segments',
})
if 'initialization_url' in representation_ms_info and 'initialization_url_range' in representation_ms_info:
# Initialization URL with range (accompanied by Segment URLs with mediaRange above)
# https://github.com/ytdl-org/youtube-dl/issues/30235
initialization_url = representation_ms_info['initialization_url']
f['fragments'].append({
location_key(initialization_url): initialization_url,
'range': representation_ms_info['initialization_url_range'],
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url']
if not f.get('url'):
f['url'] = initialization_url
f['fragments'].append({location_key(initialization_url): initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
else:
# Assuming direct URL to unfragmented media.
f['url'] = base_url
formats.append(f)
elif 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url']
if not f.get('url'):
f['url'] = initialization_url
f['fragments'].append({location_key(initialization_url): initialization_url})
elif 'initialization_url_range' in representation_ms_info:
# no Initialization URL but range (accompanied by no Segment URLs but mediaRange above)
# https://github.com/ytdl-org/youtube-dl/issues/27575
f['fragments'].append({
location_key(base_url): base_url,
'range': representation_ms_info['initialization_url_range'],
})
f['fragments'].extend(representation_ms_info['fragments'])
if not period_duration:
period_duration = sum(traverse_obj(representation_ms_info, (
'fragments', Ellipsis, 'duration', T(float_or_none))))
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats
# Assuming direct URL to unfragmented media.
f['url'] = representation_ms_info['base_url']
if content_type in ('video', 'audio', 'image/jpeg'):
f['manifest_stream_number'] = stream_numbers[f['url']]
stream_numbers[f['url']] += 1
formats.append(f)
elif content_type == 'text':
subtitles.setdefault(lang or 'und', []).append(f)
return formats, subtitles
def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
res = self._download_xml_handle(
@ -2785,25 +3021,22 @@ class InfoExtractor(object):
return formats
def _find_jwplayer_data(self, webpage, video_id=None, transform_source=js_to_json):
mobj = re.search(
r'''(?s)jwplayer\s*\(\s*(?P<q>'|")(?!(?P=q)).+(?P=q)\s*\)(?!</script>).*?\.\s*setup\s*\(\s*(?P<options>(?:\([^)]*\)|[^)])+)\s*\)''',
webpage)
if mobj:
try:
jwplayer_data = self._parse_json(mobj.group('options'),
video_id=video_id,
transform_source=transform_source)
except ExtractorError:
pass
else:
if isinstance(jwplayer_data, dict):
return jwplayer_data
return self._search_json(
r'''(?<!-)\bjwplayer\s*\(\s*(?P<q>'|")(?!(?P=q)).+(?P=q)\s*\)(?:(?!</script>).)*?\.\s*(?:setup\s*\(|(?P<load>load)\s*\(\s*\[)''',
webpage, 'JWPlayer data', video_id,
# must be a {...} or sequence, ending
contains_pattern=r'\{[\s\S]*}(?(load)(?:\s*,\s*\{[\s\S]*})*)', end_pattern=r'(?(load)\]|\))',
transform_source=transform_source, default=None)
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._find_jwplayer_data(
webpage, video_id, transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
# allow passing `transform_source` through to _find_jwplayer_data()
transform_source = kwargs.pop('transform_source', None)
kwfind = compat_kwargs({'transform_source': transform_source}) if transform_source else {}
jwplayer_data = self._find_jwplayer_data(webpage, video_id, **kwfind)
return self._parse_jwplayer_data(jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
@ -2837,22 +3070,14 @@ class InfoExtractor(object):
mpd_id=mpd_id, rtmp_params=rtmp_params, base_url=base_url)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if not isinstance(track, dict):
continue
track_kind = track.get('kind')
if not track_kind or not isinstance(track_kind, compat_str):
continue
if track_kind.lower() not in ('captions', 'subtitles'):
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
for track in traverse_obj(video_data, (
'tracks', lambda _, t: t.get('kind').lower() in ('captions', 'subtitles'))):
track_url = urljoin(base_url, track.get('file'))
if not track_url:
continue
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track_url)
})
entry = {
'id': this_video_id,

View File

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
float_or_none,
T,
traverse_obj,
txt_or_none,
unified_timestamp,
url_or_none,
)
class EpidemicSoundIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?epidemicsound\.com/track/(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://www.epidemicsound.com/track/yFfQVRpSPz/',
'md5': 'd98ff2ddb49e8acab9716541cbc9dfac',
'info_dict': {
'id': '45014',
'display_id': 'yFfQVRpSPz',
'ext': 'mp3',
'tags': ['foley', 'door', 'knock', 'glass', 'window', 'glass door knock'],
'title': 'Door Knock Door 1',
'duration': 1,
'thumbnail': 'https://cdn.epidemicsound.com/curation-assets/commercial-release-cover-images/default-sfx/3000x3000.jpg',
'timestamp': 1415320353,
'upload_date': '20141107',
'age_limit': None,
# check that the "best" format was found, since test file MD5 doesn't
# distinguish the formats
'format': 'full',
},
}, {
'url': 'https://www.epidemicsound.com/track/mj8GTTwsZd/',
'md5': 'c82b745890f9baf18dc2f8d568ee3830',
'info_dict': {
'id': '148700',
'display_id': 'mj8GTTwsZd',
'ext': 'mp3',
'tags': ['liquid drum n bass', 'energetic'],
'title': 'Noplace',
'duration': 237,
'thumbnail': 'https://cdn.epidemicsound.com/curation-assets/commercial-release-cover-images/11138/3000x3000.jpg',
'timestamp': 1694426482,
'release_timestamp': 1700535606,
'upload_date': '20230911',
'age_limit': None,
'format': 'full',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json('https://www.epidemicsound.com/json/track/' + video_id, video_id)
def fmt_or_none(f):
if not f.get('format'):
f['format'] = f.get('format_id')
elif not f.get('format_id'):
f['format_id'] = f['format']
if not (f['url'] and f['format']):
return
if f.get('format_note'):
f['format_note'] = 'track ID ' + f['format_note']
f['preference'] = -1 if f['format'] == 'full' else -2
return f
formats = traverse_obj(json_data, (
'stems', T(dict.items), Ellipsis, {
'format': (0, T(txt_or_none)),
'format_note': (1, 's3TrackId', T(txt_or_none)),
'format_id': (1, 'stemType', T(txt_or_none)),
'url': (1, 'lqMp3Url', T(url_or_none)),
}, T(fmt_or_none)))
self._sort_formats(formats)
info = traverse_obj(json_data, {
'id': ('id', T(txt_or_none)),
'tags': ('metadataTags', Ellipsis, T(txt_or_none)),
'title': ('title', T(txt_or_none)),
'duration': ('length', T(float_or_none)),
'timestamp': ('added', T(unified_timestamp)),
'thumbnail': (('imageUrl', 'cover'), T(url_or_none)),
'age_limit': ('isExplicit', T(lambda b: 18 if b else None)),
'release_timestamp': ('releaseDate', T(unified_timestamp)),
}, get_all=False)
info.update(traverse_obj(json_data, {
'categories': ('genres', Ellipsis, 'tag', T(txt_or_none)),
'tags': ('metadataTags', Ellipsis, T(txt_or_none)),
}))
info.update({
'display_id': video_id,
'formats': formats,
})
return info

View File

@ -159,6 +159,7 @@ from .businessinsider import BusinessInsiderIE
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .c56 import C56IE
from .caffeine import CaffeineTVIE
from .callin import CallinIE
from .camdemy import (
CamdemyIE,
@ -357,6 +358,7 @@ from .ellentube import (
from .elpais import ElPaisIE
from .embedly import EmbedlyIE
from .engadget import EngadgetIE
from .epidemicsound import EpidemicSoundIE
from .eporner import EpornerIE
from .eroprofile import EroProfileIE
from .escapist import EscapistIE
@ -381,7 +383,6 @@ from .fc2 import (
FC2EmbedIE,
)
from .fczenit import FczenitIE
from .filemoon import FileMoonIE
from .fifa import FifaIE
from .filmon import (
FilmOnIE,
@ -442,6 +443,7 @@ from .gamespot import GameSpotIE
from .gamestar import GameStarIE
from .gaskrank import GaskrankIE
from .gazeta import GazetaIE
from .gbnews import GBNewsIE
from .gdcvault import GDCVaultIE
from .gedidigital import GediDigitalIE
from .generic import GenericIE

View File

@ -1,43 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
decode_packed_codes,
js_to_json,
)
class FileMoonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?filemoon\.sx/./(?P<id>\w+)'
_TEST = {
'url': 'https://filemoon.sx/e/dw40rxrzruqz',
'md5': '5a713742f57ac4aef29b74733e8dda01',
'info_dict': {
'id': 'dw40rxrzruqz',
'title': 'dw40rxrzruqz',
'ext': 'mp4'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
matches = re.findall(r'(?s)(eval.*?)</script>', webpage)
packed = matches[-1]
unpacked = decode_packed_codes(packed)
jwplayer_sources = self._parse_json(
self._search_regex(
r'(?s)player\s*\.\s*setup\s*\(\s*\{\s*sources\s*:\s*(.*?])', unpacked, 'jwplayer sources'),
video_id, transform_source=js_to_json)
formats = self._parse_jwplayer_formats(jwplayer_sources, video_id)
return {
'id': video_id,
'title': self._generic_title(url) or video_id,
'formats': formats
}

View File

@ -0,0 +1,139 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
extract_attributes,
ExtractorError,
T,
traverse_obj,
txt_or_none,
url_or_none,
)
class GBNewsIE(InfoExtractor):
IE_DESC = 'GB News clips, features and live stream'
# \w+ is normally shows or news, but apparently any word redirects to the correct URL
_VALID_URL = r'https?://(?:www\.)?gbnews\.(?:uk|com)/(?:\w+/)?(?P<id>[^#?]+)'
_PLATFORM = 'safari'
_SSMP_URL = 'https://mm-v2.simplestream.com/ssmp/api.php'
_TESTS = [{
'url': 'https://www.gbnews.uk/shows/andrew-neils-message-to-companies-choosing-to-boycott-gb-news/106889',
'info_dict': {
'id': '106889',
'ext': 'mp4',
'title': "Andrew Neil's message to companies choosing to boycott GB News",
'description': 'md5:b281f5d22fd6d5eda64a4e3ba771b351',
},
'skip': '404 not found',
}, {
'url': 'https://www.gbnews.com/news/bbc-claudine-gay-harvard-university-antisemitism-row',
'info_dict': {
'id': '52264136',
'display_id': 'bbc-claudine-gay-harvard-university-antisemitism-row',
'ext': 'mp4',
'title': 'BBC deletes post after furious backlash over headline downplaying antisemitism',
'description': 'The post was criticised by former employers of the broadcaster',
},
}, {
'url': 'https://www.gbnews.uk/watchlive',
'info_dict': {
'id': '1069',
'display_id': 'watchlive',
'ext': 'mp4',
'title': 'GB News Live',
'is_live': True,
},
'params': {
'skip_download': 'm3u8',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url).split('/')[-1]
webpage = self._download_webpage(url, display_id)
# extraction based on https://github.com/ytdl-org/youtube-dl/issues/29341
'''
<div id="video-106908"
class="simplestream"
data-id="GB001"
data-type="vod"
data-key="3Li3Nt2Qs8Ct3Xq9Fi5Uy0Mb2Bj0Qs"
data-token="f9c317c727dc07f515b20036c8ef14a6"
data-expiry="1624300052"
data-uvid="37900558"
data-poster="https://thumbnails.simplestreamcdn.com/gbnews/ondemand/37900558.jpg?width=700&"
data-npaw="false"
data-env="production">
'''
# exception if no match
video_data = self._search_regex(
r'(<div\s[^>]*\bclass\s*=\s*(\'|")(?!.*sidebar\b)simplestream(?:\s[\s\w$-]*)?\2[^>]*>)',
webpage, 'video data')
video_data = extract_attributes(video_data)
ss_id = video_data.get('data-id')
if not ss_id:
raise ExtractorError('Simplestream ID not found')
json_data = self._download_json(
self._SSMP_URL, display_id,
note='Downloading Simplestream JSON metadata',
errnote='Unable to download Simplestream JSON metadata',
query={
'id': ss_id,
'env': video_data.get('data-env', 'production'),
}, fatal=False)
meta_url = traverse_obj(json_data, ('response', 'api_hostname'))
if not meta_url:
raise ExtractorError('No API host found')
uvid = video_data['data-uvid']
dtype = video_data.get('data-type')
stream_data = self._download_json(
'%s/api/%s/stream/%s' % (meta_url, 'show' if dtype == 'vod' else dtype, uvid),
uvid,
query={
'key': video_data.get('data-key'),
'platform': self._PLATFORM,
},
headers={
'Token': video_data.get('data-token'),
'Token-Expiry': video_data.get('data-expiry'),
'Uvid': uvid,
}, fatal=False)
stream_url = traverse_obj(stream_data, (
'response', 'stream', T(url_or_none)))
if not stream_url:
raise ExtractorError('No stream data/URL')
# now known to be a dict
stream_data = stream_data['response']
drm = stream_data.get('drm')
if drm:
self.report_drm(uvid)
formats = self._extract_m3u8_formats(
stream_url, uvid, ext='mp4', entry_protocol='m3u8_native',
fatal=False)
# exception if no formats
self._sort_formats(formats)
return {
'id': uvid,
'display_id': display_id,
'title': (traverse_obj(stream_data, ('title', T(txt_or_none)))
or self._og_search_title(webpage, default=None)
or display_id.replace('-', ' ').capitalize()),
'description': self._og_search_description(webpage, default=None),
'thumbnail': (traverse_obj(video_data, ('data-poster', T(url_or_none)))
or self._og_search_thumbnail(webpage)),
'formats': formats,
'is_live': (dtype == 'live') or None,
}

View File

@ -1,101 +1,267 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
float_or_none,
int_or_none,
js_to_json,
merge_dicts,
mimetype2ext,
ExtractorError,
parse_iso8601,
T,
traverse_obj,
txt_or_none,
url_or_none,
)
class ImgurIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?!(?:a|gallery|(?:t(?:opic)?|r)/[^/]+)/)(?P<id>[a-zA-Z0-9]+)'
class ImgurBaseIE(InfoExtractor):
# hard-coded value, as also used by ArchiveTeam
_CLIENT_ID = '546c25a59c58ad7'
@classmethod
def _imgur_result(cls, item_id):
return cls.url_result('imgur:%s' % item_id, ImgurIE.ie_key(), item_id)
def _call_api(self, endpoint, video_id, **kwargs):
return self._download_json(
'https://api.imgur.com/post/v1/%s/%s?client_id=%s&include=media,account' % (endpoint, video_id, self._CLIENT_ID),
video_id, **kwargs)
@staticmethod
def get_description(s):
if 'Discover the magic of the internet at Imgur' in s:
return None
return txt_or_none(s)
class ImgurIE(ImgurBaseIE):
_VALID_URL = r'''(?x)
(?:
https?://(?:i\.)?imgur\.com/(?!(?:a|gallery|t|topic|r)/)|
imgur:
)(?P<id>[a-zA-Z0-9]+)
'''
_TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv',
'url': 'https://imgur.com/A61SaA1',
'info_dict': {
'id': 'A61SaA1',
'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'timestamp': 1416446068,
'upload_date': '20141120',
},
}, {
'url': 'https://imgur.com/A61SaA1',
'url': 'https://i.imgur.com/A61SaA1.gifv',
'only_matching': True,
}, {
'url': 'https://i.imgur.com/crGpqCV.mp4',
'only_matching': True,
}, {
# no title
# previously, no title
'url': 'https://i.imgur.com/jxBXAMC.gifv',
'only_matching': True,
'info_dict': {
'id': 'jxBXAMC',
'ext': 'mp4',
'title': 'Fahaka puffer feeding',
'timestamp': 1533835503,
'upload_date': '20180809',
},
}]
def _extract_twitter_formats(self, html, tw_id='twitter', **kwargs):
fatal = kwargs.pop('fatal', False)
tw_stream = self._html_search_meta('twitter:player:stream', html, fatal=fatal, **kwargs)
if not tw_stream:
return []
ext = mimetype2ext(self._html_search_meta(
'twitter:player:stream:content_type', html, default=None))
width, height = (int_or_none(self._html_search_meta('twitter:player:' + v, html, default=None))
for v in ('width', 'height'))
return [{
'format_id': tw_id,
'url': tw_stream,
'ext': ext or determine_ext(tw_stream),
'width': width,
'height': height,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._call_api('media', video_id, fatal=False, expected_status=404)
webpage = self._download_webpage(
'https://i.imgur.com/{id}.gifv'.format(id=video_id), video_id)
'https://i.imgur.com/{id}.gifv'.format(id=video_id), video_id, fatal=not data) or ''
width = int_or_none(self._og_search_property(
'video:width', webpage, default=None))
height = int_or_none(self._og_search_property(
'video:height', webpage, default=None))
if not traverse_obj(data, ('media', 0, (
('type', T(lambda t: t == 'video' or None)),
('metadata', 'is_animated'))), get_all=False):
raise ExtractorError(
'%s is not a video or animated image' % video_id,
expected=True)
media_fmt = traverse_obj(data, ('media', 0, {
'url': ('url', T(url_or_none)),
'ext': 'ext',
'width': ('width', T(int_or_none)),
'height': ('height', T(int_or_none)),
'filesize': ('size', T(int_or_none)),
'acodec': ('metadata', 'has_sound', T(lambda b: None if b else 'none')),
}))
media_url = traverse_obj(media_fmt, 'url')
if media_url:
if not media_fmt.get('ext'):
media_fmt['ext'] = mimetype2ext(traverse_obj(
data, ('media', 0, 'mime_type'))) or determine_ext(media_url)
if traverse_obj(data, ('media', 0, 'type')) == 'image':
media_fmt['acodec'] = 'none'
media_fmt.setdefault('preference', -10)
tw_formats = self._extract_twitter_formats(webpage)
if traverse_obj(tw_formats, (0, 'url')) == media_url:
tw_formats = []
else:
# maybe this isn't an animated image/video?
self._check_formats(tw_formats, video_id)
video_elements = self._search_regex(
r'(?s)<div class="video-elements">(.*?)</div>',
webpage, 'video elements', default=None)
if not video_elements:
if not (video_elements or tw_formats or media_url):
raise ExtractorError(
'No sources found for video %s. Maybe an image?' % video_id,
'No sources found for video %s. Maybe a plain image?' % video_id,
expected=True)
formats = []
for m in re.finditer(r'<source\s+src="(?P<src>[^"]+)"\s+type="(?P<type>[^"]+)"', video_elements):
formats.append({
'format_id': m.group('type').partition('/')[2],
'url': self._proto_relative_url(m.group('src')),
'ext': mimetype2ext(m.group('type')),
'width': width,
'height': height,
def mung_format(fmt, *extra):
fmt.update({
'http_headers': {
'User-Agent': 'youtube-dl (like wget)',
},
})
for d in extra:
fmt.update(d)
return fmt
gif_json = self._search_regex(
r'(?s)var\s+videoItem\s*=\s*(\{.*?\})',
webpage, 'GIF code', fatal=False)
if gif_json:
gifd = self._parse_json(
gif_json, video_id, transform_source=js_to_json)
formats.append({
'format_id': 'gif',
'preference': -10,
'width': width,
'height': height,
'ext': 'gif',
'acodec': 'none',
'vcodec': 'gif',
'container': 'gif',
'url': self._proto_relative_url(gifd['gifUrl']),
'filesize': gifd.get('size'),
'http_headers': {
'User-Agent': 'youtube-dl (like wget)',
},
})
if video_elements:
def og_get_size(media_type):
return dict((p, int_or_none(self._og_search_property(
':'.join((media_type, p)), webpage, default=None)))
for p in ('width', 'height'))
size = og_get_size('video')
if all(v is None for v in size.values()):
size = og_get_size('image')
formats = traverse_obj(
re.finditer(r'<source\s+src="(?P<src>[^"]+)"\s+type="(?P<type>[^"]+)"', video_elements),
(Ellipsis, {
'format_id': ('type', T(lambda s: s.partition('/')[2])),
'url': ('src', T(self._proto_relative_url)),
'ext': ('type', T(mimetype2ext)),
}, T(lambda f: mung_format(f, size))))
gif_json = self._search_regex(
r'(?s)var\s+videoItem\s*=\s*(\{.*?\})',
webpage, 'GIF code', fatal=False)
MUST_BRANCH = (None, T(lambda _: None))
formats.extend(traverse_obj(gif_json, (
T(lambda j: self._parse_json(
j, video_id, transform_source=js_to_json, fatal=False)), {
'url': ('gifUrl', T(self._proto_relative_url)),
'filesize': ('size', T(int_or_none)),
}, T(lambda f: mung_format(f, size, {
'format_id': 'gif',
'preference': -10, # gifs are worse than videos
'ext': 'gif',
'acodec': 'none',
'vcodec': 'gif',
'container': 'gif',
})), MUST_BRANCH)))
else:
formats = []
# maybe add formats from JSON or page Twitter metadata
if not any((u == media_url) for u in traverse_obj(formats, (Ellipsis, 'url'))):
formats.append(mung_format(media_fmt))
tw_url = traverse_obj(tw_formats, (0, 'url'))
if not any((u == tw_url) for u in traverse_obj(formats, (Ellipsis, 'url'))):
formats.extend(mung_format(f) for f in tw_formats)
self._sort_formats(formats)
return {
return merge_dicts(traverse_obj(data, {
'uploader_id': ('account_id', T(txt_or_none),
T(lambda a: a if int_or_none(a) != 0 else None)),
'uploader': ('account', 'username', T(txt_or_none)),
'uploader_url': ('account', 'avatar_url', T(url_or_none)),
'like_count': ('upvote_count', T(int_or_none)),
'dislike_count': ('downvote_count', T(int_or_none)),
'comment_count': ('comment_count', T(int_or_none)),
'age_limit': ('is_mature', T(lambda x: 18 if x else None)),
'timestamp': (('updated_at', 'created_at'), T(parse_iso8601)),
'release_timestamp': ('created_at', T(parse_iso8601)),
}, get_all=False), traverse_obj(data, ('media', 0, 'metadata', {
'title': ('title', T(txt_or_none)),
'description': ('description', T(self.get_description)),
'duration': ('duration', T(float_or_none)),
'timestamp': (('updated_at', 'created_at'), T(parse_iso8601)),
'release_timestamp': ('created_at', T(parse_iso8601)),
})), {
'id': video_id,
'formats': formats,
'title': self._og_search_title(webpage, default=video_id),
}
'title': self._og_search_title(webpage, default='Imgur video ' + video_id),
'description': self.get_description(self._og_search_description(webpage)),
'thumbnail': url_or_none(self._html_search_meta('thumbnailUrl', webpage, default=None)),
})
class ImgurGalleryIE(InfoExtractor):
class ImgurGalleryBaseIE(ImgurBaseIE):
_GALLERY = True
def _real_extract(self, url):
gallery_id = self._match_id(url)
data = self._call_api('albums', gallery_id, fatal=False, expected_status=404)
info = traverse_obj(data, {
'title': ('title', T(txt_or_none)),
'description': ('description', T(self.get_description)),
})
if traverse_obj(data, 'is_album'):
def yield_media_ids():
for m_id in traverse_obj(data, (
'media', lambda _, v: v.get('type') == 'video' or v['metadata']['is_animated'],
'id', T(txt_or_none))):
yield m_id
# if a gallery with exactly one video, apply album metadata to video
media_id = (
self._GALLERY
and traverse_obj(data, ('image_count', T(lambda c: c == 1)))
and next(yield_media_ids(), None))
if not media_id:
result = self.playlist_result(
map(self._imgur_result, yield_media_ids()), gallery_id)
result.update(info)
return result
gallery_id = media_id
result = self._imgur_result(gallery_id)
info['_type'] = 'url_transparent'
result.update(info)
return result
class ImgurGalleryIE(ImgurGalleryBaseIE):
IE_NAME = 'imgur:gallery'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:gallery|(?:t(?:opic)?|r)/[^/]+)/(?P<id>[a-zA-Z0-9]+)'
@ -106,49 +272,93 @@ class ImgurGalleryIE(InfoExtractor):
'title': 'Adding faces make every GIF better',
},
'playlist_count': 25,
'skip': 'Zoinks! You\'ve taken a wrong turn.',
}, {
# TODO: static images - replace with animated/video gallery
'url': 'http://imgur.com/topic/Aww/ll5Vk',
'only_matching': True,
}, {
'url': 'https://imgur.com/gallery/YcAQlkx',
'add_ies': ['Imgur'],
'info_dict': {
'id': 'YcAQlkx',
'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
}
'timestamp': 1358554297,
'upload_date': '20130119',
'uploader_id': '1648642',
'uploader': 'wittyusernamehere',
},
}, {
# TODO: static image - replace with animated/video gallery
'url': 'http://imgur.com/topic/Funny/N8rOudd',
'only_matching': True,
}, {
'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True,
'add_ies': ['Imgur'],
'info_dict': {
'id': 'VQcQPhM',
'ext': 'mp4',
'title': 'The boss is here',
'timestamp': 1476494751,
'upload_date': '20161015',
'uploader_id': '19138530',
'uploader': 'thematrixcam',
},
},
# from PR #16674
{
'url': 'https://imgur.com/t/unmuted/6lAn9VQ',
'info_dict': {
'id': '6lAn9VQ',
'title': 'Penguins !',
},
'playlist_count': 3,
}, {
'url': 'https://imgur.com/t/unmuted/kx2uD3C',
'add_ies': ['Imgur'],
'info_dict': {
'id': 'ZVMv45i',
'ext': 'mp4',
'title': 'Intruder',
'timestamp': 1528129683,
'upload_date': '20180604',
},
}, {
'url': 'https://imgur.com/t/unmuted/wXSK0YH',
'add_ies': ['Imgur'],
'info_dict': {
'id': 'JCAP4io',
'ext': 'mp4',
'title': 're:I got the blues$',
'description': 'Lukas vocal stylings.\n\nFP edit: dont encourage me. Ill never stop posting Luka and friends.',
'timestamp': 1527809525,
'upload_date': '20180531',
},
}]
def _real_extract(self, url):
gallery_id = self._match_id(url)
data = self._download_json(
'https://imgur.com/gallery/%s.json' % gallery_id,
gallery_id)['data']['image']
if data.get('is_album'):
entries = [
self.url_result('http://imgur.com/%s' % image['hash'], ImgurIE.ie_key(), image['hash'])
for image in data['album_images']['images'] if image.get('hash')]
return self.playlist_result(entries, gallery_id, data.get('title'), data.get('description'))
return self.url_result('http://imgur.com/%s' % gallery_id, ImgurIE.ie_key(), gallery_id)
class ImgurAlbumIE(ImgurGalleryIE):
class ImgurAlbumIE(ImgurGalleryBaseIE):
IE_NAME = 'imgur:album'
_VALID_URL = r'https?://(?:i\.)?imgur\.com/a/(?P<id>[a-zA-Z0-9]+)'
_GALLERY = False
_TESTS = [{
# TODO: only static images - replace with animated/video gallery
'url': 'http://imgur.com/a/j6Orj',
'only_matching': True,
},
# from PR #21693
{
'url': 'https://imgur.com/a/iX265HX',
'info_dict': {
'id': 'j6Orj',
'title': 'A Literary Analysis of "Star Wars: The Force Awakens"',
'id': 'iX265HX',
'title': 'enen-no-shouboutai'
},
'playlist_count': 12,
'playlist_count': 2,
}, {
'url': 'https://imgur.com/a/8pih2Ed',
'info_dict': {
'id': '8pih2Ed'
},
'playlist_mincount': 1,
}]

View File

@ -3,17 +3,23 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
url_or_none,
)
class TelewebionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?telewebion\.com/#!/episode/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?telewebion\.com/(episode|clip)/(?P<id>[a-zA-Z0-9]+)'
_TEST = {
'url': 'http://www.telewebion.com/#!/episode/1263668/',
'url': 'http://www.telewebion.com/episode/0x1b3139c/',
'info_dict': {
'id': '1263668',
'id': '0x1b3139c',
'ext': 'mp4',
'title': 'قرعه\u200cکشی لیگ قهرمانان اروپا',
'thumbnail': r're:^https?://.*\.jpg',
'thumbnail': r're:^https?://static\.telewebion\.com/episodeImages/.*/default',
'view_count': int,
},
'params': {
@ -25,31 +31,24 @@ class TelewebionIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
secure_token = self._download_webpage(
'http://m.s2.telewebion.com/op/op?action=getSecurityToken', video_id)
episode_details = self._download_json(
'http://m.s2.telewebion.com/op/op', video_id,
query={'action': 'getEpisodeDetails', 'episode_id': video_id})
episode_details = self._download_json('https://gateway.telewebion.ir/kandoo/episode/getEpisodeDetail/?EpisodeId={0}'.format(video_id), video_id)
episode_details = episode_details['body']['queryEpisode'][0]
m3u8_url = 'http://m.s1.telewebion.com/smil/%s.m3u8?filepath=%s&m3u8=1&secure_token=%s' % (
video_id, episode_details['file_path'], secure_token)
channel_id = episode_details['channel']['descriptor']
episode_image_id = episode_details.get('image')
episode_image = 'https://static.telewebion.com/episodeImages/{0}/default'.format(episode_image_id) if episode_image_id else None
m3u8_url = 'https://cdna.telewebion.com/{0}/episode/{1}/playlist.m3u8'.format(channel_id, video_id)
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', m3u8_id='hls')
picture_paths = [
episode_details.get('picture_path'),
episode_details.get('large_picture_path'),
]
thumbnails = [{
'url': picture_path,
'preference': idx,
} for idx, picture_path in enumerate(picture_paths) if picture_path is not None]
m3u8_url, video_id, ext='mp4', m3u8_id='hls',
entry_protocol='m3u8_native')
self._sort_formats(formats)
return {
'id': video_id,
'title': episode_details['title'],
'formats': formats,
'thumbnails': thumbnails,
'view_count': episode_details.get('view_count'),
'thumbnail': url_or_none(episode_image),
'view_count': int_or_none(episode_details.get('view_count')),
'duration': float_or_none(episode_details.get('duration')),
}

View File

@ -2,9 +2,22 @@
from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor
from ..utils import ExtractorError
from ..compat import compat_kwargs
from ..utils import (
base_url,
determine_ext,
ExtractorError,
float_or_none,
merge_dicts,
T,
traverse_obj,
txt_or_none,
url_basename,
url_or_none,
)
class Vbox7IE(InfoExtractor):
@ -20,23 +33,27 @@ class Vbox7IE(InfoExtractor):
)
(?P<id>[\da-fA-F]+)
'''
_EMBED_REGEX = [r'<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)']
_GEO_COUNTRIES = ['BG']
_TESTS = [{
'url': 'http://vbox7.com/play:0946fff23c',
'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
# the http: URL just redirects here
'url': 'https://vbox7.com/play:0946fff23c',
'md5': '50ca1f78345a9c15391af47d8062d074',
'info_dict': {
'id': '0946fff23c',
'ext': 'mp4',
'title': 'Борисов: Притеснен съм за бъдещето на България',
'description': 'По думите му е опасно страната ни да бъде обявена за "сигурна"',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1470982814,
'upload_date': '20160812',
'uploader': 'zdraveibulgaria',
'thumbnail': r're:^https?://.*\.jpg$',
'view_count': int,
'duration': 2640,
},
'params': {
'proxy': '127.0.0.1:8118',
},
'expected_warnings': [
'Unable to download webpage',
],
}, {
'url': 'http://vbox7.com/play:249bb972c2',
'md5': '99f65c0c9ef9b682b97313e052734c3f',
@ -44,8 +61,15 @@ class Vbox7IE(InfoExtractor):
'id': '249bb972c2',
'ext': 'mp4',
'title': 'Смях! Чудо - чист за секунди - Скрита камера',
'description': 'Смях! Чудо - чист за секунди - Скрита камера',
'timestamp': 1360215023,
'upload_date': '20130207',
'uploader': 'svideteliat_ot_varshava',
'thumbnail': 'https://i49.vbox7.com/o/249/249bb972c20.jpg',
'view_count': int,
'duration': 83,
},
'skip': 'georestricted',
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'http://vbox7.com/emb/external.php?vid=a240d20f9c&autoplay=1',
'only_matching': True,
@ -54,52 +78,127 @@ class Vbox7IE(InfoExtractor):
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
webpage)
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(cls._EMBED_REGEX[0], webpage)
if mobj:
return mobj.group('url')
# specialisation to transform what looks like ld+json that
# may contain invalid character combinations
# transform_source=None, fatal=True
def _parse_json(self, json_string, video_id, *args, **kwargs):
if '"@context"' in json_string[:30]:
# this is ld+json, or that's the way to bet
transform_source = args[0] if len(args) > 0 else kwargs.get('transform_source')
if not transform_source:
def fix_chars(src):
# fix malformed ld+json: replace raw CRLFs with escaped LFs
return re.sub(
r'"[^"]+"', lambda m: re.sub(r'\r?\n', r'\\n', m.group(0)), src)
if len(args) > 0:
args = (fix_chars,) + args[1:]
else:
kwargs['transform_source'] = fix_chars
kwargs = compat_kwargs(kwargs)
return super(Vbox7IE, self)._parse_json(
json_string, video_id, *args, **kwargs)
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'https://vbox7.com/play:%s' % (video_id,)
now = time.time()
response = self._download_json(
'https://www.vbox7.com/ajax/video/nextvideo.php?vid=%s' % video_id,
video_id)
'https://www.vbox7.com/aj/player/item/options', video_id,
query={'vid': video_id}, headers={'Referer': url})
# estimate time to which possible `ago` member is relative
now = now + 0.5 * (time.time() - now)
if 'error' in response:
if traverse_obj(response, 'error'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, response['error']), expected=True)
video = response['options']
src_url = traverse_obj(response, ('options', 'src', T(url_or_none))) or ''
title = video['title']
video_url = video['src']
if '/na.mp4' in video_url:
fmt_base = url_basename(src_url).rsplit('.', 1)[0].rsplit('_', 1)[0]
if fmt_base in ('na', 'vn'):
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
uploader = video.get('uploader')
ext = determine_ext(src_url)
if ext == 'mpd':
# extract MPD
try:
formats, subtitles = self._extract_mpd_formats_and_subtitles(
src_url, video_id, 'dash', fatal=False)
except KeyError: # fatal doesn't catch this
self.report_warning('Failed to parse MPD manifest')
formats, subtitles = [], {}
elif ext != 'm3u8':
formats = [{
'url': src_url,
}] if src_url else []
subtitles = {}
webpage = self._download_webpage(
'http://vbox7.com/play:%s' % video_id, video_id, fatal=None)
if src_url:
# possibly extract HLS, based on https://github.com/yt-dlp/yt-dlp/pull/9100
fmt_base = base_url(src_url) + fmt_base
# prepare for _extract_m3u8_formats_and_subtitles()
# hls_formats, hls_subs = self._extract_m3u8_formats_and_subtitles(
hls_formats = self._extract_m3u8_formats(
'{0}.m3u8'.format(fmt_base), video_id, m3u8_id='hls', fatal=False)
formats.extend(hls_formats)
# self._merge_subtitles(hls_subs, target=subtitles)
info = {}
# In case MPD/HLS cannot be parsed, or anyway, get mp4 combined
# formats usually provided to Safari, iOS, and old Windows
video = response['options']
resolutions = (1080, 720, 480, 240, 144)
highest_res = traverse_obj(video, (
'highestRes', T(int))) or resolutions[0]
resolutions = traverse_obj(video, (
'resolutions', lambda _, r: highest_res >= int(r) > 0)) or resolutions
mp4_formats = traverse_obj(resolutions, (
Ellipsis, T(lambda res: {
'url': '{0}_{1}.mp4'.format(fmt_base, res),
'format_id': 'http-{0}'.format(res),
'height': res,
})))
# if above formats are flaky, enable the line below
# self._check_formats(mp4_formats, video_id)
formats.extend(mp4_formats)
if webpage:
info = self._search_json_ld(
webpage.replace('"/*@context"', '"@context"'), video_id,
fatal=False)
self._sort_formats(formats)
info.update({
webpage = self._download_webpage(url, video_id, fatal=False) or ''
info = self._search_json_ld(
webpage.replace('"/*@context"', '"@context"'), video_id,
fatal=False) if webpage else {}
if not info.get('title'):
info['title'] = traverse_obj(response, (
'options', 'title', T(txt_or_none))) or self._og_search_title(webpage)
def if_missing(k):
return lambda x: None if k in info else x
info = merge_dicts(info, {
'id': video_id,
'title': title,
'url': video_url,
'uploader': uploader,
'thumbnail': self._proto_relative_url(
'formats': formats,
'subtitles': subtitles or None,
}, info, traverse_obj(response, ('options', {
'uploader': ('uploader', T(txt_or_none)),
'timestamp': ('ago', T(if_missing('timestamp')), T(lambda t: int(round((now - t) / 60.0)) * 60)),
'duration': ('duration', T(if_missing('duration')), T(float_or_none)),
})))
if 'thumbnail' not in info:
info['thumbnail'] = self._proto_relative_url(
info.get('thumbnail') or self._og_search_thumbnail(webpage),
'http:'),
})
'https:'),
return info

View File

@ -6,22 +6,31 @@ import re
import string
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_ord,
compat_struct_pack,
)
from ..utils import (
ExtractorError,
int_or_none,
mimetype2ext,
parse_codecs,
parse_qs,
update_url_query,
urljoin,
xpath_element,
xpath_text,
)
from ..compat import (
compat_b64decode,
compat_ord,
compat_struct_pack,
compat_urlparse,
)
def compat_random_choices(population, *args, **kwargs):
# weights=None, *, cum_weights=None, k=1
# limited implementation needed here
weights = args[0] if args else kwargs.get('weights')
assert all(w is None for w in (weights, kwargs.get('cum_weights')))
k = kwargs.get('k', 1)
return ''.join(random.choice(population) for _ in range(k))
class VideaIE(InfoExtractor):
@ -35,6 +44,7 @@ class VideaIE(InfoExtractor):
)
(?P<id>[^?#&]+)
'''
_EMBED_REGEX = [r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//videa\.hu/player\?.*?\bv=.+?)\1']
_TESTS = [{
'url': 'http://videa.hu/videok/allatok/az-orult-kigyasz-285-kigyot-kigyo-8YfIAjxwWGwT8HVQ',
'md5': '97a7af41faeaffd9f1fc864a7c7e7603',
@ -44,6 +54,7 @@ class VideaIE(InfoExtractor):
'title': 'Az őrült kígyász 285 kígyót enged szabadon',
'thumbnail': r're:^https?://.*',
'duration': 21,
'age_limit': 0,
},
}, {
'url': 'http://videa.hu/videok/origo/jarmuvek/supercars-elozes-jAHDWfWSJH5XuFhH',
@ -54,6 +65,7 @@ class VideaIE(InfoExtractor):
'title': 'Supercars előzés',
'thumbnail': r're:^https?://.*',
'duration': 64,
'age_limit': 0,
},
}, {
'url': 'http://videa.hu/player?v=8YfIAjxwWGwT8HVQ',
@ -64,6 +76,7 @@ class VideaIE(InfoExtractor):
'title': 'Az őrült kígyász 285 kígyót enged szabadon',
'thumbnail': r're:^https?://.*',
'duration': 21,
'age_limit': 0,
},
}, {
'url': 'http://videa.hu/player/v/8YfIAjxwWGwT8HVQ?autoplay=1',
@ -80,11 +93,14 @@ class VideaIE(InfoExtractor):
}]
_STATIC_SECRET = 'xHb0ZvME5q8CBcoQi6AngerDu3FGO9fkUlwPmLVY_RTzj2hJIS4NasXWKy1td7p'
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//videa\.hu/player\?.*?\bv=.+?)\1',
webpage)]
@classmethod
def _extract_urls(cls, webpage):
def yield_urls():
for pattern in cls._EMBED_REGEX:
for m in re.finditer(pattern, webpage):
yield m.group('url')
return list(yield_urls())
@staticmethod
def rc4(cipher_text, key):
@ -130,13 +146,13 @@ class VideaIE(InfoExtractor):
for i in range(0, 32):
result += s[i - (self._STATIC_SECRET.index(l[i]) - 31)]
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(player_url).query)
random_seed = ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(8))
query = parse_qs(player_url)
random_seed = ''.join(compat_random_choices(string.ascii_letters + string.digits, k=8))
query['_s'] = random_seed
query['_t'] = result[:16]
b64_info, handle = self._download_webpage_handle(
'http://videa.hu/videaplayer_get_xml.php', video_id, query=query)
'http://videa.hu/player/xml', video_id, query=query)
if b64_info.startswith('<?xml'):
info = self._parse_xml(b64_info, video_id)
else:

View File

@ -673,8 +673,8 @@ class VimeoIE(VimeoBaseInfoExtractor):
raise
if '//player.vimeo.com/video/' in url:
config = self._parse_json(self._search_regex(
r'(?s)\b(?:playerC|c)onfig\s*=\s*({.+?})\s*[;\n]', webpage, 'info section'), video_id)
config = self._search_json(
r'\b(?:playerC|c)onfig\s*=', webpage, 'info section', video_id)
if config.get('view') == 4:
config = self._verify_player_video_password(
redirect_url, video_id, headers)

View File

@ -4,20 +4,28 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_chr
from ..compat import (
compat_chr,
compat_zip as zip,
)
from ..utils import (
clean_html,
decode_packed_codes,
determine_ext,
ExtractorError,
get_element_by_id,
int_or_none,
js_to_json,
merge_dicts,
T,
traverse_obj,
url_or_none,
urlencode_postdata,
)
# based on openload_decode from 2bfeee69b976fe049761dd3012e30b637ee05a58
def aa_decode(aa_code):
symbol_table = [
symbol_table = (
('7', '((゚ー゚) + (o^_^o))'),
('6', '((o^_^o) +(o^_^o))'),
('5', '((゚ー゚) + (゚Θ゚))'),
@ -26,84 +34,180 @@ def aa_decode(aa_code):
('3', '(o^_^o)'),
('1', '(゚Θ゚)'),
('0', '(c^_^o)'),
]
('+', ''),
)
delim = '(゚Д゚)[゚ε゚]+'
ret = ''
for aa_char in aa_code.split(delim):
def chr_from_code(c):
for val, pat in symbol_table:
aa_char = aa_char.replace(pat, val)
aa_char = aa_char.replace('+ ', '')
m = re.match(r'^\d+', aa_char)
if m:
ret += compat_chr(int(m.group(0), 8))
c = c.replace(pat, val)
if c.startswith(('u', 'U')):
base = 16
c = c[1:]
else:
m = re.match(r'^u([\da-f]+)', aa_char)
if m:
ret += compat_chr(int(m.group(1), 16))
return ret
base = 10
c = int_or_none(c, base=base)
return '' if c is None else compat_chr(c)
return ''.join(
chr_from_code(aa_char)
for aa_char in aa_code.split(delim))
class XFileShareIE(InfoExtractor):
_SITES = (
(r'aparat\.cam', 'Aparat'),
(r'clipwatching\.com', 'ClipWatching'),
(r'gounlimited\.to', 'GoUnlimited'),
(r'govid\.me', 'GoVid'),
(r'holavid\.com', 'HolaVid'),
(r'streamty\.com', 'Streamty'),
(r'thevideobee\.to', 'TheVideoBee'),
(r'uqload\.com', 'Uqload'),
(r'vidbom\.com', 'VidBom'),
(r'vidlo\.us', 'vidlo'),
(r'vidlocker\.xyz', 'VidLocker'),
(r'vidshare\.tv', 'VidShare'),
(r'vup\.to', 'VUp'),
# status check 2024-02: site availability, G site: search
(r'aparat\.cam', 'Aparat'), # Cloudflare says host error 522, apparently changed to wolfstreeam.tv
(r'filemoon\.sx/.', 'FileMoon'),
(r'gounlimited\.to', 'GoUnlimited'), # no media pages listed
(r'govid\.me', 'GoVid'), # no media pages listed
(r'highstream\.tv', 'HighStream'), # clipwatching.com redirects here
(r'holavid\.com', 'HolaVid'), # Cloudflare says host error 522
# (r'streamty\.com', 'Streamty'), # no media pages listed, connection timeout
# (r'thevideobee\.to', 'TheVideoBee'), # no pages listed, refuses connection
(r'uqload\.to', 'Uqload'), # .com, .co redirect here
(r'(?:vedbam\.xyz|vadbam.net)', 'V?dB?m'), # vidbom.com redirects here, but no valid media pages listed
(r'vidlo\.us', 'vidlo'), # no valid media pages listed
(r'vidlocker\.xyz', 'VidLocker'), # no media pages listed
(r'(?:w\d\.)?viidshar\.com', 'VidShare'), # vidshare.tv redirects here
# (r'vup\.to', 'VUp'), # domain not found
(r'wolfstream\.tv', 'WolfStream'),
(r'xvideosharing\.com', 'XVideoSharing'),
(r'xvideosharing\.com', 'XVideoSharing'), # just started showing 'maintenance mode'
)
IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
IE_DESC = 'XFileShare-based sites: %s' % ', '.join(list(zip(*_SITES))[1])
_VALID_URL = (r'https?://(?:www\.)?(?P<host>%s)/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
% '|'.join(site for site in list(zip(*_SITES))[0]))
_EMBED_REGEX = [r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:%s)/embed-[0-9a-zA-Z]+.*?)\1' % '|'.join(site for site in list(zip(*_SITES))[0])]
_FILE_NOT_FOUND_REGEXES = (
r'>(?:404 - )?File Not Found<',
r'>The file was removed by administrator<',
)
_TITLE_REGEXES = (
r'style="z-index: [0-9]+;">([^<]+)</span>',
r'<td nowrap>([^<]+)</td>',
r'h4-fine[^>]*>([^<]+)<',
r'>Watch (.+)[ <]',
r'<h2 class="video-page-head">([^<]+)</h2>',
r'<h2 style="[^"]*color:#403f3d[^"]*"[^>]*>([^<]+)<', # streamin.to (dead)
r'title\s*:\s*"([^"]+)"', # govid.me
)
_SOURCE_URL_REGEXES = (
r'(?:file|src)\s*:\s*(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1',
r'file_link\s*=\s*(["\'])(?P<url>http(?:(?!\1).)+)\1',
r'addVariable\((\\?["\'])file\1\s*,\s*(\\?["\'])(?P<url>http(?:(?!\2).)+)\2\)',
r'<embed[^>]+src=(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1',
)
_THUMBNAIL_REGEXES = (
r'<video[^>]+poster="([^"]+)"',
r'(?:image|poster)\s*:\s*["\'](http[^"\']+)["\'],',
)
_TESTS = [{
'url': 'http://xvideosharing.com/fq65f94nd2ve',
'md5': '4181f63957e8fe90ac836fa58dc3c8a6',
'note': 'link in `sources`',
'url': 'https://uqload.to/dcsu06gdb45o',
'md5': '7f8db187b254379440bf4fcad094ae86',
'info_dict': {
'id': 'fq65f94nd2ve',
'id': 'dcsu06gdb45o',
'ext': 'mp4',
'title': 'sample',
'thumbnail': r're:http://.*\.jpg',
'title': 'f2e31015957e74c8c8427982e161c3fc mp4',
'thumbnail': r're:https://.*\.jpg'
},
'params': {
'nocheckcertificate': True,
},
'expected_warnings': ['Unable to extract JWPlayer data'],
}, {
'note': 'link in decoded `sources`',
'url': 'https://xvideosharing.com/1tlg6agrrdgc',
'md5': '2608ce41932c1657ae56258a64e647d9',
'info_dict': {
'id': '1tlg6agrrdgc',
'ext': 'mp4',
'title': '0121',
'thumbnail': r're:https?://.*\.jpg',
},
'skip': 'This server is in maintenance mode.',
}, {
'note': 'JWPlayer link in un-p,a,c,k,e,d JS',
'url': 'https://filemoon.sx/e/dw40rxrzruqz',
'md5': '5a713742f57ac4aef29b74733e8dda01',
'info_dict': {
'id': 'dw40rxrzruqz',
'title': 'dw40rxrzruqz',
'ext': 'mp4'
},
}, {
'note': 'JWPlayer link in un-p,a,c,k,e,d JS',
'url': 'https://vadbam.net/6lnbkci96wly.html',
'md5': 'a1616800076177e2ac769203957c54bc',
'info_dict': {
'id': '6lnbkci96wly',
'title': 'Heart Crime S01 E03 weciima autos',
'ext': 'mp4'
},
}, {
'note': 'JWPlayer link in clear',
'url': 'https://w1.viidshar.com/nnibe0xf0h79.html',
'md5': 'f0a580ce9df06cc61b4a5c979d672367',
'info_dict': {
'id': 'nnibe0xf0h79',
'title': 'JaGa 68ar',
'ext': 'mp4'
},
'params': {
'skip_download': 'ffmpeg',
},
'expected_warnings': ['hlsnative has detected features it does not support'],
}, {
'note': 'JWPlayer link in clear',
'url': 'https://wolfstream.tv/a3drtehyrg52.html',
'md5': '1901d86a79c5e0c6a51bdc9a4cfd3769',
'info_dict': {
'id': 'a3drtehyrg52',
'title': 'NFL 2023 W04 DET@GB',
'ext': 'mp4'
},
}, {
'url': 'https://aparat.cam/n4d6dh0wvlpr',
'only_matching': True,
}, {
'url': 'https://wolfstream.tv/nthme29v9u2x',
'url': 'https://uqload.to/ug5somm0ctnk.html',
'only_matching': True,
}, {
'url': 'https://highstream.tv/2owiyz3sjoux',
'only_matching': True,
}, {
'url': 'https://vedbam.xyz/6lnbkci96wly.html',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:%s)/embed-[0-9a-zA-Z]+.*?)\1'
% '|'.join(site for site in list(zip(*XFileShareIE._SITES))[0]),
webpage)]
@classmethod
def _extract_urls(cls, webpage):
def yield_urls():
for regex in cls._EMBED_REGEX:
for mobj in re.finditer(regex, webpage):
yield mobj.group('url')
return list(yield_urls())
def _real_extract(self, url):
host, video_id = re.match(self._VALID_URL, url).groups()
host, video_id = self._match_valid_url(url).group('host', 'id')
url = 'https://%s/' % host + ('embed-%s.html' % video_id if host in ('govid.me', 'vidlo.us') else video_id)
url = 'https://%s/%s' % (
host,
'embed-%s.html' % video_id if host in ('govid.me', 'vidlo.us') else video_id)
webpage = self._download_webpage(url, video_id)
if any(re.search(p, webpage) for p in self._FILE_NOT_FOUND_REGEXES):
container_div = get_element_by_id('container', webpage) or webpage
if self._search_regex(
r'>This server is in maintenance mode\.', container_div,
'maint error', group=0, default=None):
raise ExtractorError(clean_html(container_div), expected=True)
if self._search_regex(
self._FILE_NOT_FOUND_REGEXES, container_div,
'missing video error', group=0, default=None):
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = self._hidden_inputs(webpage)
@ -122,59 +226,43 @@ class XFileShareIE(InfoExtractor):
'Content-type': 'application/x-www-form-urlencoded',
})
title = (self._search_regex(
(r'style="z-index: [0-9]+;">([^<]+)</span>',
r'<td nowrap>([^<]+)</td>',
r'h4-fine[^>]*>([^<]+)<',
r'>Watch (.+)[ <]',
r'<h2 class="video-page-head">([^<]+)</h2>',
r'<h2 style="[^"]*color:#403f3d[^"]*"[^>]*>([^<]+)<', # streamin.to
r'title\s*:\s*"([^"]+)"'), # govid.me
webpage, 'title', default=None) or self._og_search_title(
webpage, default=None) or video_id).strip()
title = (
self._search_regex(self._TITLE_REGEXES, webpage, 'title', default=None)
or self._og_search_title(webpage, default=None)
or video_id).strip()
for regex, func in (
(r'(eval\(function\(p,a,c,k,e,d\){.+)', decode_packed_codes),
(r'(゚.+)', aa_decode)):
obf_code = self._search_regex(regex, webpage, 'obfuscated code', default=None)
if obf_code:
webpage = webpage.replace(obf_code, func(obf_code))
obf_code = True
while obf_code:
for regex, func in (
(r'(?s)(?<!-)\b(eval\(function\(p,a,c,k,e,d\)\{(?:(?!</script>).)+\)\))',
decode_packed_codes),
(r'(゚.+)', aa_decode)):
obf_code = self._search_regex(regex, webpage, 'obfuscated code', default=None)
if obf_code:
webpage = webpage.replace(obf_code, func(obf_code))
break
formats = []
jwplayer_data = self._find_jwplayer_data(
webpage.replace(r'\'', '\''), video_id)
result = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False,
m3u8_id='hls', mpd_id='dash')
jwplayer_data = self._search_regex(
[
r'jwplayer\("[^"]+"\)\.load\(\[({.+?})\]\);',
r'jwplayer\("[^"]+"\)\.setup\(({.+?})\);',
], webpage,
'jwplayer data', default=None)
if jwplayer_data:
jwplayer_data = self._parse_json(
jwplayer_data.replace(r"\'", "'"), video_id, js_to_json)
if not traverse_obj(result, 'formats'):
if jwplayer_data:
formats = self._parse_jwplayer_data(
jwplayer_data, video_id, False,
m3u8_id='hls', mpd_id='dash')['formats']
if not formats:
urls = []
for regex in (
r'(?:file|src)\s*:\s*(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1',
r'file_link\s*=\s*(["\'])(?P<url>http(?:(?!\1).)+)\1',
r'addVariable\((\\?["\'])file\1\s*,\s*(\\?["\'])(?P<url>http(?:(?!\2).)+)\2\)',
r'<embed[^>]+src=(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1'):
self.report_warning(
'Failed to extract JWPlayer formats', video_id=video_id)
urls = set()
for regex in self._SOURCE_URL_REGEXES:
for mobj in re.finditer(regex, webpage):
video_url = mobj.group('url')
if video_url not in urls:
urls.append(video_url)
urls.add(mobj.group('url'))
sources = self._search_regex(
r'sources\s*:\s*(\[(?!{)[^\]]+\])', webpage, 'sources', default=None)
if sources:
urls.extend(self._parse_json(sources, video_id))
urls.update(traverse_obj(sources, (T(lambda s: self._parse_json(s, video_id)), Ellipsis)))
formats = []
for video_url in urls:
for video_url in traverse_obj(urls, (Ellipsis, T(url_or_none))):
if determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4',
@ -185,17 +273,19 @@ class XFileShareIE(InfoExtractor):
'url': video_url,
'format_id': 'sd',
})
self._sort_formats(formats)
result = {'formats': formats}
self._sort_formats(result['formats'])
thumbnail = self._search_regex(
[
r'<video[^>]+poster="([^"]+)"',
r'(?:image|poster)\s*:\s*["\'](http[^"\']+)["\'],',
], webpage, 'thumbnail', default=None)
self._THUMBNAIL_REGEXES, webpage, 'thumbnail', default=None)
return {
if not (title or result.get('title')):
title = self._generic_title(url) or video_id
return merge_dicts(result, {
'id': video_id,
'title': title,
'title': title or None,
'thumbnail': thumbnail,
'formats': formats,
}
'http_headers': {'Referer': url}
})

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals
import collections
import itertools
import json
import os.path
@ -23,10 +24,10 @@ from ..compat import (
)
from ..jsinterp import JSInterpreter
from ..utils import (
ExtractorError,
clean_html,
dict_get,
error_to_compat_str,
ExtractorError,
float_or_none,
extract_attributes,
get_element_by_attribute,
@ -36,7 +37,9 @@ from ..utils import (
LazyList,
merge_dicts,
mimetype2ext,
NO_DEFAULT,
parse_codecs,
parse_count,
parse_duration,
parse_qs,
qualities,
@ -44,7 +47,9 @@ from ..utils import (
smuggle_url,
str_or_none,
str_to_int,
T,
traverse_obj,
try_call,
try_get,
txt_or_none,
unescapeHTML,
@ -1247,7 +1252,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'IMG 3456',
'description': '',
'upload_date': '20170613',
'uploader': 'ElevageOrVert',
'uploader': "l'Or Vert asbl",
'uploader_id': '@ElevageOrVert',
},
'params': {
@ -1460,6 +1465,30 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self._code_cache = {}
self._player_cache = {}
# *ytcfgs, webpage=None
def _extract_player_url(self, *ytcfgs, **kw_webpage):
if ytcfgs and not isinstance(ytcfgs[0], dict):
webpage = kw_webpage.get('webpage') or ytcfgs[0]
if webpage:
player_url = self._search_regex(
r'"(?:PLAYER_JS_URL|jsUrl)"\s*:\s*"([^"]+)"',
webpage or '', 'player URL', fatal=False)
if player_url:
ytcfgs = ytcfgs + ({'PLAYER_JS_URL': player_url},)
return traverse_obj(
ytcfgs, (Ellipsis, 'PLAYER_JS_URL'), (Ellipsis, 'WEB_PLAYER_CONTEXT_CONFIGS', Ellipsis, 'jsUrl'),
get_all=False, expected_type=lambda u: urljoin('https://www.youtube.com', u))
def _download_player_url(self, video_id, fatal=False):
res = self._download_webpage(
'https://www.youtube.com/iframe_api',
note='Downloading iframe API JS', video_id=video_id, fatal=fatal)
player_version = self._search_regex(
r'player\\?/([0-9a-fA-F]{8})\\?/', res or '', 'player version', fatal=fatal,
default=NO_DEFAULT if res else None)
if player_version:
return 'https://www.youtube.com/s/player/{0}/player_ias.vflset/en_US/base.js'.format(player_version)
def _signature_cache_id(self, example_sig):
""" Return a string representation of a signature """
return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
@ -1474,46 +1503,49 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
raise ExtractorError('Cannot identify player %r' % player_url)
return id_m.group('id')
def _get_player_code(self, video_id, player_url, player_id=None):
def _load_player(self, video_id, player_url, fatal=True, player_id=None):
if not player_id:
player_id = self._extract_player_info(player_url)
if player_id not in self._code_cache:
self._code_cache[player_id] = self._download_webpage(
player_url, video_id,
code = self._download_webpage(
player_url, video_id, fatal=fatal,
note='Downloading player ' + player_id,
errnote='Download of %s failed' % player_url)
return self._code_cache[player_id]
if code:
self._code_cache[player_id] = code
return self._code_cache[player_id] if fatal else self._code_cache.get(player_id)
def _extract_signature_function(self, video_id, player_url, example_sig):
player_id = self._extract_player_info(player_url)
# Read from filesystem cache
func_id = 'js_%s_%s' % (
func_id = 'js_{0}_{1}'.format(
player_id, self._signature_cache_id(example_sig))
assert os.path.basename(func_id) == func_id
cache_spec = self._downloader.cache.load('youtube-sigfuncs', func_id)
if cache_spec is not None:
return lambda s: ''.join(s[i] for i in cache_spec)
self.write_debug('Extracting signature function {0}'.format(func_id))
cache_spec, code = self.cache.load('youtube-sigfuncs', func_id), None
code = self._get_player_code(video_id, player_url, player_id)
res = self._parse_sig_js(code)
if not cache_spec:
code = self._load_player(video_id, player_url, player_id)
if code:
res = self._parse_sig_js(code)
test_string = ''.join(map(compat_chr, range(len(example_sig))))
cache_spec = [ord(c) for c in res(test_string)]
self.cache.store('youtube-sigfuncs', func_id, cache_spec)
test_string = ''.join(map(compat_chr, range(len(example_sig))))
cache_res = res(test_string)
cache_spec = [ord(c) for c in cache_res]
self._downloader.cache.store('youtube-sigfuncs', func_id, cache_spec)
return res
return lambda s: ''.join(s[i] for i in cache_spec)
def _print_sig_code(self, func, example_sig):
if not self.get_param('youtube_print_sig_code'):
return
def gen_sig_code(idxs):
def _genslice(start, end, step):
starts = '' if start == 0 else str(start)
ends = (':%d' % (end + step)) if end + step >= 0 else ':'
steps = '' if step == 1 else (':%d' % step)
return 's[%s%s%s]' % (starts, ends, steps)
return 's[{0}{1}{2}]'.format(starts, ends, steps)
step = None
# Quelch pyflakes warnings - start will be set when step is set
@ -1564,143 +1596,137 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
jscode, 'Initial JS player signature function name', group='sig')
jsi = JSInterpreter(jscode)
initial_function = jsi.extract_function(funcname)
return lambda s: initial_function([s])
def _cached(self, func, *cache_id):
def inner(*args, **kwargs):
if cache_id not in self._player_cache:
try:
self._player_cache[cache_id] = func(*args, **kwargs)
except ExtractorError as e:
self._player_cache[cache_id] = e
except Exception as e:
self._player_cache[cache_id] = ExtractorError(traceback.format_exc(), cause=e)
ret = self._player_cache[cache_id]
if isinstance(ret, Exception):
raise ret
return ret
return inner
def _decrypt_signature(self, s, video_id, player_url):
"""Turn the encrypted s field into a working signature"""
if player_url is None:
raise ExtractorError('Cannot decrypt signature without player_url')
try:
player_id = (player_url, self._signature_cache_id(s))
if player_id not in self._player_cache:
func = self._extract_signature_function(
video_id, player_url, s
)
self._player_cache[player_id] = func
func = self._player_cache[player_id]
if self._downloader.params.get('youtube_print_sig_code'):
self._print_sig_code(func, s)
return func(s)
except Exception as e:
tb = traceback.format_exc()
raise ExtractorError(
'Signature extraction failed: ' + tb, cause=e)
def _extract_player_url(self, webpage):
player_url = self._search_regex(
r'"(?:PLAYER_JS_URL|jsUrl)"\s*:\s*"([^"]+)"',
webpage or '', 'player URL', fatal=False)
if not player_url:
return
if player_url.startswith('//'):
player_url = 'https:' + player_url
elif not re.match(r'https?://', player_url):
player_url = compat_urllib_parse.urljoin(
'https://www.youtube.com', player_url)
return player_url
extract_sig = self._cached(
self._extract_signature_function, 'sig', player_url, self._signature_cache_id(s))
func = extract_sig(video_id, player_url, s)
self._print_sig_code(func, s)
return func(s)
# from yt-dlp
# See also:
# 1. https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-894619419
# 2. https://code.videolan.org/videolan/vlc/-/blob/4fb284e5af69aa9ac2100ccbdd3b88debec9987f/share/lua/playlist/youtube.lua#L116
# 3. https://github.com/ytdl-org/youtube-dl/issues/30097#issuecomment-950157377
def _extract_n_function_name(self, jscode):
target = r'(?P<nfunc>[a-zA-Z_$][\w$]*)(?:\[(?P<idx>\d+)\])?'
nfunc_and_idx = self._search_regex(
r'\.get\("n"\)\)&&\(b=(%s)\([\w$]+\)' % (target, ),
jscode, 'Initial JS player n function name')
nfunc, idx = re.match(target, nfunc_and_idx).group('nfunc', 'idx')
if not idx:
return nfunc
VAR_RE_TMPL = r'var\s+%s\s*=\s*(?P<name>\[(?P<alias>%s)\])[;,]'
note = 'Initial JS player n function {0} (%s[%s])' % (nfunc, idx)
def search_function_code(needle, group):
return self._search_regex(
VAR_RE_TMPL % (re.escape(nfunc), needle), jscode,
note.format(group), group=group)
if int_or_none(idx) == 0:
real_nfunc = search_function_code(r'[a-zA-Z_$][\w$]*', group='alias')
if real_nfunc:
return real_nfunc
return self._parse_json(
search_function_code('.+?', group='name'),
nfunc, transform_source=js_to_json)[int(idx)]
def _extract_n_function(self, video_id, player_url):
player_id = self._extract_player_info(player_url)
func_code = self._downloader.cache.load('youtube-nsig', player_id)
if func_code:
jsi = JSInterpreter(func_code)
else:
jscode = self._get_player_code(video_id, player_url, player_id)
funcname = self._extract_n_function_name(jscode)
jsi = JSInterpreter(jscode)
func_code = jsi.extract_function_code(funcname)
self._downloader.cache.store('youtube-nsig', player_id, func_code)
if self._downloader.params.get('youtube_print_sig_code'):
self.to_screen('Extracted nsig function from {0}:\n{1}\n'.format(player_id, func_code[1]))
return lambda s: jsi.extract_function_from_code(*func_code)([s])
def _n_descramble(self, n_param, player_url, video_id):
"""Compute the response to YT's "n" parameter challenge,
or None
Args:
n_param -- challenge string that is the value of the
URL's "n" query parameter
player_url -- URL of YT player JS
video_id
"""
sig_id = ('nsig_value', n_param)
if sig_id in self._player_cache:
return self._player_cache[sig_id]
def _decrypt_nsig(self, n, video_id, player_url):
"""Turn the encrypted n field into a working signature"""
if player_url is None:
raise ExtractorError('Cannot decrypt nsig without player_url')
try:
player_id = ('nsig', player_url)
if player_id not in self._player_cache:
self._player_cache[player_id] = self._extract_n_function(video_id, player_url)
func = self._player_cache[player_id]
ret = func(n_param)
if ret.startswith('enhanced_except_'):
raise ExtractorError('Unhandled exception in decode')
self._player_cache[sig_id] = ret
if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] [%s] %s' % (self.IE_NAME, 'Decrypted nsig {0} => {1}'.format(n_param, self._player_cache[sig_id])))
return self._player_cache[sig_id]
except Exception as e:
self._downloader.report_warning(
'[%s] %s (%s %s)' % (
self.IE_NAME,
jsi, player_id, func_code = self._extract_n_function_code(video_id, player_url)
except ExtractorError as e:
raise ExtractorError('Unable to extract nsig jsi, player_id, func_codefunction code', cause=e)
if self.get_param('youtube_print_sig_code'):
self.to_screen('Extracted nsig function from {0}:\n{1}\n'.format(
player_id, func_code[1]))
try:
extract_nsig = self._cached(self._extract_n_function_from_code, 'nsig func', player_url)
ret = extract_nsig(jsi, func_code)(n)
except JSInterpreter.Exception as e:
self.report_warning(
'%s (%s %s)' % (
'Unable to decode n-parameter: download likely to be throttled',
error_to_compat_str(e),
traceback.format_exc()))
traceback.format_exc()),
video_id=video_id)
return
self.write_debug('Decrypted nsig {0} => {1}'.format(n, ret))
return ret
def _extract_n_function_name(self, jscode):
func_name, idx = self._search_regex(
r'\.get\("n"\)\)&&\(b=(?P<nfunc>[a-zA-Z_$][\w$]*)(?:\[(?P<idx>\d+)\])?\([\w$]+\)',
jscode, 'Initial JS player n function name', group=('nfunc', 'idx'))
if not idx:
return func_name
return self._parse_json(self._search_regex(
r'var {0}\s*=\s*(\[.+?\])\s*[,;]'.format(re.escape(func_name)), jscode,
'Initial JS player n function list ({0}.{1})'.format(func_name, idx)),
func_name, transform_source=js_to_json)[int(idx)]
def _extract_n_function_code(self, video_id, player_url):
player_id = self._extract_player_info(player_url)
func_code = self.cache.load('youtube-nsig', player_id)
jscode = func_code or self._load_player(video_id, player_url)
jsi = JSInterpreter(jscode)
if func_code:
return jsi, player_id, func_code
func_name = self._extract_n_function_name(jscode)
# For redundancy
func_code = self._search_regex(
r'''(?xs)%s\s*=\s*function\s*\((?P<var>[\w$]+)\)\s*
# NB: The end of the regex is intentionally kept strict
{(?P<code>.+?}\s*return\ [\w$]+.join\(""\))};''' % func_name,
jscode, 'nsig function', group=('var', 'code'), default=None)
if func_code:
func_code = ([func_code[0]], func_code[1])
else:
self.write_debug('Extracting nsig function with jsinterp')
func_code = jsi.extract_function_code(func_name)
self.cache.store('youtube-nsig', player_id, func_code)
return jsi, player_id, func_code
def _extract_n_function_from_code(self, jsi, func_code):
func = jsi.extract_function_from_code(*func_code)
def extract_nsig(s):
try:
ret = func([s])
except JSInterpreter.Exception:
raise
except Exception as e:
raise JSInterpreter.Exception(traceback.format_exc(), cause=e)
if ret.startswith('enhanced_except_'):
raise JSInterpreter.Exception('Signature function returned an exception')
return ret
return extract_nsig
def _unthrottle_format_urls(self, video_id, player_url, *formats):
def decrypt_nsig(n):
return self._cached(self._decrypt_nsig, 'nsig', n, player_url)
def _unthrottle_format_urls(self, video_id, player_url, formats):
for fmt in formats:
parsed_fmt_url = compat_urllib_parse.urlparse(fmt['url'])
n_param = compat_parse_qs(parsed_fmt_url.query).get('n')
if not n_param:
continue
n_param = n_param[-1]
n_response = self._n_descramble(n_param, player_url, video_id)
n_response = decrypt_nsig(n_param)(n_param, video_id, player_url)
if n_response is None:
# give up if descrambling failed
break
for fmt_dct in traverse_obj(fmt, (None, (None, ('fragments', Ellipsis))), expected_type=dict):
fmt_dct['url'] = update_url(
fmt_dct['url'], query_update={'n': [n_response]})
fmt['url'] = update_url_query(fmt['url'], {'n': n_response})
# from yt-dlp, with tweaks
def _extract_signature_timestamp(self, video_id, player_url, ytcfg=None, fatal=False):
@ -1708,16 +1734,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
Extract signatureTimestamp (sts)
Required to tell API what sig/player version is in use.
"""
sts = int_or_none(ytcfg.get('STS')) if isinstance(ytcfg, dict) else None
sts = traverse_obj(ytcfg, 'STS', expected_type=int)
if not sts:
# Attempt to extract from player
if player_url is None:
error_msg = 'Cannot extract signature timestamp without player_url.'
if fatal:
raise ExtractorError(error_msg)
self._downloader.report_warning(error_msg)
self.report_warning(error_msg)
return
code = self._get_player_code(video_id, player_url)
code = self._load_player(video_id, player_url, fatal=fatal)
sts = int_or_none(self._search_regex(
r'(?:signatureTimestamp|sts)\s*:\s*(?P<sts>[0-9]{5})', code or '',
'JS player signature timestamp', group='sts', fatal=fatal))
@ -1733,12 +1759,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# cpn generation algorithm is reverse engineered from base.js.
# In fact it works even with dummy cpn.
CPN_ALPHABET = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'
cpn = ''.join((CPN_ALPHABET[random.randint(0, 256) & 63] for _ in range(0, 16)))
cpn = ''.join(CPN_ALPHABET[random.randint(0, 256) & 63] for _ in range(0, 16))
playback_url = update_url(
playback_url, query_update={
'ver': ['2'],
'cpn': [cpn],
# more consistent results setting it to right before the end
qs = parse_qs(playback_url)
video_length = '{0}'.format(float((qs.get('len') or ['1.5'])[0]) - 1)
playback_url = update_url_query(
playback_url, {
'ver': '2',
'cpn': cpn,
'cmt': video_length,
'el': 'detailpage', # otherwise defaults to "shorts"
})
self._download_webpage(
@ -1986,8 +2018,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else:
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
if not player_url:
player_url = self._extract_player_url(webpage)
formats = []
itags = []
itags = collections.defaultdict(set)
itag_qualities = {}
q = qualities(['tiny', 'small', 'medium', 'large', 'hd720', 'hd1080', 'hd1440', 'hd2160', 'hd2880', 'highres'])
CHUNK_SIZE = 10 << 20
@ -2003,58 +2038,92 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
})
} for range_start in range(0, f['filesize'], CHUNK_SIZE))
lower = lambda s: s.lower()
for fmt in streaming_formats:
if fmt.get('targetDurationSec') or fmt.get('drmFamilies'):
if fmt.get('targetDurationSec'):
continue
itag = str_or_none(fmt.get('itag'))
quality = fmt.get('quality')
if itag and quality:
audio_track = traverse_obj(fmt, ('audioTrack', T(dict))) or {}
quality = traverse_obj(fmt, ((
# The 3gp format (17) in android client has a quality of "small",
# but is actually worse than other formats
T(lambda _: 'tiny' if itag == 17 else None),
('quality', T(lambda q: q if q and q != 'tiny' else None)),
('audioQuality', T(lower)),
'quality'), T(txt_or_none)), get_all=False)
if quality and itag:
itag_qualities[itag] = quality
# FORMAT_STREAM_TYPE_OTF(otf=1) requires downloading the init fragment
# (adding `&sq=0` to the URL) and parsing emsg box to determine the
# number of fragment that would subsequently requested with (`&sq=N`)
# number of fragments that would subsequently be requested with (`&sq=N`)
if fmt.get('type') == 'FORMAT_STREAM_TYPE_OTF':
continue
fmt_url = fmt.get('url')
if not fmt_url:
sc = compat_parse_qs(fmt.get('signatureCipher'))
fmt_url = url_or_none(try_get(sc, lambda x: x['url'][0]))
encrypted_sig = try_get(sc, lambda x: x['s'][0])
if not (sc and fmt_url and encrypted_sig):
fmt_url = traverse_obj(sc, ('url', -1, T(url_or_none)))
encrypted_sig = traverse_obj(sc, ('s', -1))
if not (fmt_url and encrypted_sig):
continue
if not player_url:
player_url = self._extract_player_url(webpage)
player_url = player_url or self._extract_player_url(webpage)
if not player_url:
continue
signature = self._decrypt_signature(sc['s'][0], video_id, player_url)
sp = try_get(sc, lambda x: x['sp'][0]) or 'signature'
fmt_url += '&' + sp + '=' + signature
try:
fmt_url = update_url_query(fmt_url, {
traverse_obj(sc, ('sp', -1)) or 'signature':
[self._decrypt_signature(encrypted_sig, video_id, player_url)],
})
except ExtractorError as e:
self.report_warning('Signature extraction failed: Some formats may be missing',
video_id=video_id, only_once=True)
self.write_debug(error_to_compat_str(e), only_once=True)
continue
if itag:
itags.append(itag)
tbr = float_or_none(
fmt.get('averageBitrate') or fmt.get('bitrate'), 1000)
language_preference = (
10 if audio_track.get('audioIsDefault')
else -10 if 'descriptive' in (traverse_obj(audio_track, ('displayName', T(lower))) or '')
else -1)
name = (
traverse_obj(fmt, ('qualityLabel', T(txt_or_none)))
or quality.replace('audio_quality_', ''))
dct = {
'asr': int_or_none(fmt.get('audioSampleRate')),
'filesize': int_or_none(fmt.get('contentLength')),
'format_id': itag,
'format_note': fmt.get('qualityLabel') or quality,
'fps': int_or_none(fmt.get('fps')),
'height': int_or_none(fmt.get('height')),
'quality': q(quality),
'tbr': tbr,
'format_id': join_nonempty(itag, fmt.get('isDrc') and 'drc'),
'url': fmt_url,
'width': fmt.get('width'),
# Format 22 is likely to be damaged: see https://github.com/yt-dlp/yt-dlp/issues/3372
'source_preference': ((-5 if itag == '22' else -1)
+ (100 if 'Premium' in name else 0)),
'quality': q(quality),
'language': join_nonempty(audio_track.get('id', '').split('.')[0],
'desc' if language_preference < -1 else '') or None,
'language_preference': language_preference,
# Strictly de-prioritize 3gp formats
'preference': -2 if itag == '17' else None,
}
mimetype = fmt.get('mimeType')
if mimetype:
mobj = re.match(
r'((?:[^/]+)/(?:[^;]+))(?:;\s*codecs="([^"]+)")?', mimetype)
if mobj:
dct['ext'] = mimetype2ext(mobj.group(1))
dct.update(parse_codecs(mobj.group(2)))
if itag:
itags[itag].add(('https', dct.get('language')))
self._unthrottle_format_urls(video_id, player_url, dct)
dct.update(traverse_obj(fmt, {
'asr': ('audioSampleRate', T(int_or_none)),
'filesize': ('contentLength', T(int_or_none)),
'format_note': ('qualityLabel', T(lambda x: x or quality)),
# for some formats, fps is wrongly returned as 1
'fps': ('fps', T(int_or_none), T(lambda f: f if f > 1 else None)),
'audio_channels': ('audioChannels', T(int_or_none)),
'height': ('height', T(int_or_none)),
'has_drm': ('drmFamilies', T(bool)),
'tbr': (('averageBitrate', 'bitrate'), T(lambda t: float_or_none(t, 1000))),
'width': ('width', T(int_or_none)),
'_duration_ms': ('approxDurationMs', T(int_or_none)),
}, get_all=False))
mime_mobj = re.match(
r'((?:[^/]+)/(?:[^;]+))(?:;\s*codecs="([^"]+)")?', fmt.get('mimeType') or '')
if mime_mobj:
dct['ext'] = mimetype2ext(mime_mobj.group(1))
dct.update(parse_codecs(mime_mobj.group(2)))
single_stream = 'none' in (dct.get(c) for c in ('acodec', 'vcodec'))
if single_stream and dct.get('ext'):
dct['container'] = dct['ext'] + '_dash'
@ -2069,32 +2138,62 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
formats.append(dct)
def process_manifest_format(f, proto, client_name, itag, all_formats=False):
key = (proto, f.get('language'))
if not all_formats and key in itags[itag]:
return False
itags[itag].add(key)
if itag:
f['format_id'] = (
'{0}-{1}'.format(itag, proto)
if all_formats or any(p != proto for p, _ in itags[itag])
else itag)
if f.get('source_preference') is None:
f['source_preference'] = -1
if itag in ('616', '235'):
f['format_note'] = join_nonempty(f.get('format_note'), 'Premium', delim=' ')
f['source_preference'] += 100
f['quality'] = q(traverse_obj(f, (
'format_id', T(lambda s: itag_qualities[s.split('-')[0]])), default=-1))
if try_call(lambda: f['fps'] <= 1):
del f['fps']
if proto == 'hls' and f.get('has_drm'):
f['has_drm'] = 'maybe'
f['source_preference'] -= 5
return True
hls_manifest_url = streaming_data.get('hlsManifestUrl')
if hls_manifest_url:
for f in self._extract_m3u8_formats(
hls_manifest_url, video_id, 'mp4', fatal=False):
itag = self._search_regex(
r'/itag/(\d+)', f['url'], 'itag', default=None)
if itag:
f['format_id'] = itag
formats.append(f)
if process_manifest_format(
f, 'hls', None, self._search_regex(
r'/itag/(\d+)', f['url'], 'itag', default=None)):
formats.append(f)
if self._downloader.params.get('youtube_include_dash_manifest', True):
dash_manifest_url = streaming_data.get('dashManifestUrl')
if dash_manifest_url:
for f in self._extract_mpd_formats(
dash_manifest_url, video_id, fatal=False):
itag = f['format_id']
if itag in itags:
continue
if itag in itag_qualities:
f['quality'] = q(itag_qualities[itag])
filesize = int_or_none(self._search_regex(
r'/clen/(\d+)', f.get('fragment_base_url')
or f['url'], 'file size', default=None))
if filesize:
f['filesize'] = filesize
formats.append(f)
if process_manifest_format(
f, 'dash', None, f['format_id']):
f['filesize'] = traverse_obj(f, (
('fragment_base_url', 'url'), T(lambda u: self._search_regex(
r'/clen/(\d+)', u, 'file size', default=None)),
T(int_or_none)), get_all=False)
formats.append(f)
playable_formats = [f for f in formats if not f.get('has_drm')]
if formats and not playable_formats:
# If there are no formats that definitely don't have DRM, all have DRM
self.report_drm(video_id)
formats[:] = playable_formats
if not formats:
if streaming_data.get('licenseInfos'):
@ -2166,6 +2265,17 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_details.get('lengthSeconds')
or microformat.get('lengthSeconds')) \
or parse_duration(search_meta('duration'))
for f in formats:
# Some formats may have much smaller duration than others (possibly damaged during encoding)
# but avoid false positives with small duration differences.
# Ref: https://github.com/yt-dlp/yt-dlp/issues/2823
if try_call(lambda x: float(x.pop('_duration_ms')) / duration < 500, args=(f,)):
self.report_warning(
'{0}: Some possibly damaged formats will be deprioritized'.format(video_id), only_once=True)
# Strictly de-prioritize damaged formats
f['preference'] = -10
is_live = video_details.get('isLive')
owner_profile_url = self._yt_urljoin(self._extract_author_var(
@ -2174,10 +2284,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
uploader = self._extract_author_var(
webpage, 'name', videodetails=video_details, metadata=microformat)
if not player_url:
player_url = self._extract_player_url(webpage)
self._unthrottle_format_urls(video_id, player_url, formats)
info = {
'id': video_id,
'title': self._live_title(video_title) if is_live else video_title,
@ -2370,6 +2476,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'like_count': str_to_int(like_count),
'dislike_count': str_to_int(dislike_count),
})
else:
info['like_count'] = traverse_obj(vpir, (
'videoActions', 'menuRenderer', 'topLevelButtons', Ellipsis,
'segmentedLikeDislikeButtonViewModel', 'likeButtonViewModel', 'likeButtonViewModel',
'toggleButtonViewModel', 'toggleButtonViewModel', 'defaultButtonViewModel',
'buttonViewModel', (('title', ('accessibilityText', T(lambda s: s.split()), Ellipsis))), T(parse_count)),
get_all=False)
vsir = content.get('videoSecondaryInfoRenderer')
if vsir:
rows = try_get(
@ -2484,7 +2598,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'playlist_mincount': 94,
'info_dict': {
'id': 'UCqj7Cz7revf5maW9g5pgNcg',
'title': 'Igor Kleiner - Playlists',
'title': r're:Igor Kleiner(?: Ph\.D\.)? - Playlists',
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
'uploader': 'Igor Kleiner',
'uploader_id': '@IgorDataScience',
@ -2495,7 +2609,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'playlist_mincount': 94,
'info_dict': {
'id': 'UCqj7Cz7revf5maW9g5pgNcg',
'title': 'Igor Kleiner - Playlists',
'title': r're:Igor Kleiner(?: Ph\.D\.)? - Playlists',
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
'uploader': 'Igor Kleiner',
'uploader_id': '@IgorDataScience',
@ -2607,7 +2721,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtube.com/channel/UCKfVa3S1e4PHvxWcwyMMg8w/channels',
'info_dict': {
'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
'title': 'lex will - Channels',
'title': r're:lex will - (?:Home|Channels)',
'description': 'md5:2163c5d0ff54ed5f598d6a7e6211e488',
'uploader': 'lex will',
'uploader_id': '@lexwill718',

View File

@ -74,8 +74,11 @@ class FFmpegPostProcessor(PostProcessor):
return FFmpegPostProcessor(downloader)._versions
def _determine_executables(self):
programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
# ordered to match prefer_ffmpeg!
convs = ['ffmpeg', 'avconv']
probes = ['ffprobe', 'avprobe']
prefer_ffmpeg = True
programs = convs + probes
def get_ffmpeg_version(path):
ver = get_exe_version(path, args=['-version'])
@ -96,6 +99,7 @@ class FFmpegPostProcessor(PostProcessor):
self._paths = None
self._versions = None
location = None
if self._downloader:
prefer_ffmpeg = self._downloader.params.get('prefer_ffmpeg', True)
location = self._downloader.params.get('ffmpeg_location')
@ -118,33 +122,21 @@ class FFmpegPostProcessor(PostProcessor):
location = os.path.dirname(os.path.abspath(location))
if basename in ('ffmpeg', 'ffprobe'):
prefer_ffmpeg = True
self._paths = dict(
(p, p if location is None else os.path.join(location, p))
for p in programs)
self._versions = dict(
x for x in (
(p, get_ffmpeg_version(self._paths[p])) for p in programs)
if x[1] is not None)
self._paths = dict(
(p, os.path.join(location, p)) for p in programs)
self._versions = dict(
(p, get_ffmpeg_version(self._paths[p])) for p in programs)
if self._versions is None:
self._versions = dict(
(p, get_ffmpeg_version(p)) for p in programs)
self._paths = dict((p, p) for p in programs)
if prefer_ffmpeg is False:
prefs = ('avconv', 'ffmpeg')
else:
prefs = ('ffmpeg', 'avconv')
for p in prefs:
if self._versions[p]:
self.basename = p
break
if prefer_ffmpeg is False:
prefs = ('avprobe', 'ffprobe')
else:
prefs = ('ffprobe', 'avprobe')
for p in prefs:
if self._versions[p]:
self.probe_basename = p
break
basenames = [None, None]
for i, progs in enumerate((convs, probes)):
for p in progs[::-1 if prefer_ffmpeg is False else 1]:
if self._versions.get(p):
basenames[i] = p
break
self.basename, self.probe_basename = basenames
@property
def available(self):

View File

@ -45,6 +45,7 @@ from .compat import (
compat_casefold,
compat_chr,
compat_collections_abc,
compat_contextlib_suppress,
compat_cookiejar,
compat_ctypes_WINFUNCTYPE,
compat_datetime_timedelta_total_seconds,
@ -1855,25 +1856,18 @@ def write_json_file(obj, fn):
try:
with tf:
json.dump(obj, tf)
if sys.platform == 'win32':
# Need to remove existing file on Windows, else os.rename raises
# WindowsError or FileExistsError.
try:
with compat_contextlib_suppress(OSError):
if sys.platform == 'win32':
# Need to remove existing file on Windows, else os.rename raises
# WindowsError or FileExistsError.
os.unlink(fn)
except OSError:
pass
try:
mask = os.umask(0)
os.umask(mask)
os.chmod(tf.name, 0o666 & ~mask)
except OSError:
pass
os.rename(tf.name, fn)
except Exception:
try:
with compat_contextlib_suppress(OSError):
os.remove(tf.name)
except OSError:
pass
raise
@ -2033,14 +2027,13 @@ def extract_attributes(html_element):
NB HTMLParser is stricter in Python 2.6 & 3.2 than in later versions,
but the cases in the unit test will work for all of 2.6, 2.7, 3.2-3.5.
"""
parser = HTMLAttributeParser()
try:
parser.feed(html_element)
parser.close()
# Older Python may throw HTMLParseError in case of malformed HTML
except compat_HTMLParseError:
pass
return parser.attrs
ret = None
# Older Python may throw HTMLParseError in case of malformed HTML (and on .close()!)
with compat_contextlib_suppress(compat_HTMLParseError):
with contextlib.closing(HTMLAttributeParser()) as parser:
parser.feed(html_element)
ret = parser.attrs
return ret or {}
def clean_html(html):
@ -2182,8 +2175,28 @@ def sanitize_url(url):
return url
def extract_basic_auth(url):
parts = compat_urllib_parse.urlsplit(url)
if parts.username is None:
return url, None
url = compat_urllib_parse.urlunsplit(parts._replace(netloc=(
parts.hostname if parts.port is None
else '%s:%d' % (parts.hostname, parts.port))))
auth_payload = base64.b64encode(
('%s:%s' % (parts.username, parts.password or '')).encode('utf-8'))
return url, 'Basic {0}'.format(auth_payload.decode('ascii'))
def sanitized_Request(url, *args, **kwargs):
return compat_urllib_request.Request(escape_url(sanitize_url(url)), *args, **kwargs)
url, auth_header = extract_basic_auth(escape_url(sanitize_url(url)))
if auth_header is not None:
headers = args[1] if len(args) > 1 else kwargs.get('headers')
headers = headers or {}
headers['Authorization'] = auth_header
if len(args) <= 1 and kwargs.get('headers') is None:
kwargs['headers'] = headers
kwargs = compat_kwargs(kwargs)
return compat_urllib_request.Request(url, *args, **kwargs)
def expand_path(s):
@ -2221,7 +2234,8 @@ def _htmlentity_transform(entity_with_semicolon):
numstr = '0%s' % numstr
else:
base = 10
# See https://github.com/ytdl-org/youtube-dl/issues/7518
# See https://github.com/ytdl-org/youtube-dl/issues/7518\
# Also, weirdly, compat_contextlib_suppress fails here in 2.6
try:
return compat_chr(int(numstr, base))
except ValueError:
@ -2328,11 +2342,9 @@ def make_HTTPS_handler(params, **kwargs):
# Some servers may (wrongly) reject requests if ALPN extension is not sent. See:
# https://github.com/python/cpython/issues/85140
# https://github.com/yt-dlp/yt-dlp/issues/3878
try:
with compat_contextlib_suppress(AttributeError, NotImplementedError):
# fails for Python < 2.7.10, not ssl.HAS_ALPN
ctx.set_alpn_protocols(ALPN_PROTOCOLS)
except (AttributeError, NotImplementedError):
# Python < 2.7.10, not ssl.HAS_ALPN
pass
opts_no_check_certificate = params.get('nocheckcertificate', False)
if hasattr(ssl, 'create_default_context'): # Python >= 3.4 or 2.7.9
@ -2342,12 +2354,10 @@ def make_HTTPS_handler(params, **kwargs):
context.check_hostname = False
context.verify_mode = ssl.CERT_NONE
try:
with compat_contextlib_suppress(TypeError):
# Fails with Python 2.7.8 (create_default_context present
# but HTTPSHandler has no context=)
return YoutubeDLHTTPSHandler(params, context=context, **kwargs)
except TypeError:
# Python 2.7.8
# (create_default_context present but HTTPSHandler has no context=)
pass
if sys.version_info < (3, 2):
return YoutubeDLHTTPSHandler(params, **kwargs)
@ -2361,15 +2371,24 @@ def make_HTTPS_handler(params, **kwargs):
return YoutubeDLHTTPSHandler(params, context=context, **kwargs)
def bug_reports_message():
def bug_reports_message(before=';'):
if ytdl_is_updateable():
update_cmd = 'type youtube-dl -U to update'
else:
update_cmd = 'see https://yt-dl.org/update on how to update'
msg = '; please report this issue on https://yt-dl.org/bug .'
msg += ' Make sure you are using the latest version; %s.' % update_cmd
msg += ' Be sure to call youtube-dl with the --verbose flag and include its complete output.'
return msg
update_cmd = 'see https://github.com/ytdl-org/youtube-dl/#user-content-installation on how to update'
msg = (
'please report this issue on https://github.com/ytdl-org/youtube-dl/issues ,'
' using the appropriate issue template.'
' Make sure you are using the latest version; %s.'
' Be sure to call youtube-dl with the --verbose option and include the complete output.'
) % update_cmd
before = (before or '').rstrip()
if not before or before.endswith(('.', '!', '?')):
msg = msg[0].title() + msg[1:]
return (before + ' ' if before else '') + msg
class YoutubeDLError(Exception):
@ -3156,12 +3175,10 @@ def parse_iso8601(date_str, delimiter='T', timezone=None):
if timezone is None:
timezone, date_str = extract_timezone(date_str)
try:
with compat_contextlib_suppress(ValueError):
date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter)
dt = datetime.datetime.strptime(date_str, date_format) - timezone
return calendar.timegm(dt.timetuple())
except ValueError:
pass
def date_formats(day_first=True):
@ -3181,17 +3198,13 @@ def unified_strdate(date_str, day_first=True):
_, date_str = extract_timezone(date_str)
for expression in date_formats(day_first):
try:
with compat_contextlib_suppress(ValueError):
upload_date = datetime.datetime.strptime(date_str, expression).strftime('%Y%m%d')
except ValueError:
pass
if upload_date is None:
timetuple = email.utils.parsedate_tz(date_str)
if timetuple:
try:
with compat_contextlib_suppress(ValueError):
upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
except ValueError:
pass
if upload_date is not None:
return compat_str(upload_date)
@ -3220,11 +3233,9 @@ def unified_timestamp(date_str, day_first=True):
date_str = m.group(1)
for expression in date_formats(day_first):
try:
with compat_contextlib_suppress(ValueError):
dt = datetime.datetime.strptime(date_str, expression) - timezone + datetime.timedelta(hours=pm_delta)
return calendar.timegm(dt.timetuple())
except ValueError:
pass
timetuple = email.utils.parsedate_tz(date_str)
if timetuple:
return calendar.timegm(timetuple) + pm_delta * 3600 - compat_datetime_timedelta_total_seconds(timezone)
@ -3832,14 +3843,15 @@ class PUTRequest(compat_urllib_request.Request):
return 'PUT'
def int_or_none(v, scale=1, default=None, get_attr=None, invscale=1):
def int_or_none(v, scale=1, default=None, get_attr=None, invscale=1, base=None):
if get_attr:
if v is not None:
v = getattr(v, get_attr, None)
if v in (None, ''):
return default
try:
return int(v) * invscale // scale
# like int, raise if base is specified and v is not a string
return (int(v) if base is None else int(v, base=base)) * invscale // scale
except (ValueError, TypeError, OverflowError):
return default