Compare commits

...

352 Commits

Author SHA1 Message Date
dirkf 668332b973 [YouPorn] Add playlist extractors
* YouPornCategoryIE
* YouPornChannelIE
* YouPornCollectionIE
* YouPornStarIE
* YouPornTagIE
* YouPornVideosIE,
2024-04-22 01:34:26 +01:00
dirkf 0b2ce3685e [YouPorn] Improve extraction
* detect unwatchable videos
* improve duration extraction
* fix count extraction and support large values
* detect and remove SEO spam boilerplate description
2024-04-22 01:34:26 +01:00
dirkf c2766cb80e [test/test_download] Support 'playlist_maxcount:count' expected value
* parallel to `playlist_mincount'
* specify both for a range of playlist lengths
* if max < min the test will always fail!
2024-04-22 01:34:26 +01:00
dirkf eb38665438 [YouPorn] Incorporate yt-dlp PR 8827
* from https://github.com/yt-dlp/yt-dlp/pull/8827
* extract from webpage instead of broken API URL
* thx The-MAGI
2024-04-22 01:34:26 +01:00
dirkf e0727e4ab6 [postprocessor/ffmpeg] Fix finding ffprobe (bug in 21792b8)
Fixes 21792b88b7 (commitcomment-140705274), thx: vonProteus
2024-04-07 15:33:30 +01:00
Ori Avtalion 4ea59c6107
[utils] Fix crash in _report_ignoring_subs from c58b655 (#32762)
Align `utils.bug_reports_message()` with yt-dlp https://github.com/yt-dlp/yt-dlp/commit/5873d4ccdd, thanks fstirlitz

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-04-05 15:25:29 +01:00
dirkf 21792b88b7 [external/FFmpeg] Fix and improve --ffmpeg-location handling
* pass YoutubeDL (FileDownloader) to FFmpegPostProcessor constructor
* consolidate path search in FFmpegPostProcessor
* make availability of FFmpegFD depend on existence of FFmpegPostProcessor
* detect ffmpeg executable on instantiation of FFmpegFD
* resolves #32735
2024-03-27 13:11:17 +00:00
dirkf d8f134a664 [downloader/external] Fix "Resource Warning" in downloader test
* add compat_subprocess_Popen context manager
* apply context manager in FFmpegFD._call_downloader()
2024-03-27 13:11:17 +00:00
dirkf 31a15a7c8d [compat] Simplify/fix compat_html_parser_HTMLParseError 2024-03-27 13:11:17 +00:00
dirkf 19dc10b986 [utils] Apply compat_contextlib_suppress 2024-03-27 13:11:17 +00:00
dirkf 182f63e82a [compat] Add compat_contextlib_suppress
with compat_contextlib_suppress(*Exceptions):
    # code that fails silently for any of Exceptions
2024-03-27 13:11:17 +00:00
gy-chen 71211e7db7
[Youtube] Fix unwanted private method __ie_msg in f8b0135850
Fixes `AttributeError no attribute '_YoutubeIE__ie_msg'` if unable to decode n-parameter
2024-03-23 15:30:13 +00:00
Zizheng Guo a96a45b2cd
[Vimeo] Improve `config` extraction (#32742)
* update for more robust json parsing
2024-03-12 11:44:13 +00:00
hatsomatt 820fae3b3a [Videa] Fix extraction
* update API URL
* from https://github.com/yt-dlp/yt-dlp/pull/8003
* thanks to the authors!

Closes yt-dlp/7427
Authored by: hatsomatt, aky-01
2024-03-08 13:14:52 +00:00
dirkf aef24d97e9 [Videa] Align with yt-dlp 2024-03-08 13:14:52 +00:00
dirkf f7b30e3f73 [XFileShare] Update extractor for 2024
* simplify aa_decode()
* review and update supported sites and tests
* in above, include FileMoon.sx, and remove separate module
* incorporate changes from yt-dlp
* allow for decoding multiple scripts (eg, FileMoon)
* use new JWPlayer extraction
2024-03-08 13:03:42 +00:00
dirkf f66372403f [InfoExtractor] Rework and improve JWPlayer extraction
* use traverse_obj() and _search_json()
* support playlist `.load({**video1},{**video2}, ...)`
* support transform_source=... for _extract_jwplayer_data()
2024-03-08 13:03:42 +00:00
dirkf 7216fa2ac4 [InfoExtractor] Add `_search_json()`
* uses the error diagnostic to truncate the JSON string
* may be confused by non-C-Pythons
2024-03-08 13:03:42 +00:00
dirkf acc383b9e3 [utils] Let int_or_none() accept a base, like int() 2024-03-08 13:03:42 +00:00
Hubert Hirtz f0812d7848
[utils] Handle user:pass in URLs (#28801)
* Handle user:pass in URLs

Fixes "nonnumeric port" errors when youtube-dl is given URLs with
usernames and passwords such as:

    http://username:password@example.com/myvideo.mp4

Refs:
- https://en.wikipedia.org/wiki/Basic_access_authentication
- https://tools.ietf.org/html/rfc1738#section-3.1
- https://docs.python.org/3.8/library/urllib.parse.html#urllib.parse.urlsplit

Fixes #18276 (point 4)
Fixes #20258
Fixes #26211 (see comment)

* Align code with yt-dlp

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-03-04 01:27:55 +00:00
Aaron Tan 40bd5c1815
[caffeine.tv] Add new extractor (#32514)
* Add CaffeineTVIE info extractor to support site caffeine.tv

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2024-02-22 12:54:07 +00:00
dirkf 70f230f9cf
[GBNews]Add new extractor for GB News TV channel (#29432)
* Add extractor for GB News TV channel

* Support more GBNews URL formats
Allow alphanumeric and _ in place of `shows`, which redirect to site's preferred URL

* Update for 2024
2024-02-22 12:44:00 +00:00
dirkf 48ddab1f3a
[downloader/external] Fix WgetFD proxy (rev 2)
From PR (defunct source), closes #29343.
Matches https://github.com/yt-dlp/yt-dlp/pull/3152
Thx former user kikuyan.
2024-02-21 16:29:08 +00:00
dirkf 7687389f08 [Vbox7] Improve extraction, adding features from yt-dlp PR #9100
* changes from https://github.com/yt-dlp/yt-dlp/pull/9100 (thx
seproDev):
  - attempt HLS extraction
  - re-enable XFF
  - test `view_count`, `duration` extraction
* improve commenting, error checks
2024-02-19 00:53:22 +00:00
dirkf 4416f82c80 [Vbox7IE] Sanitise ld+json containing unexpected characters
* based on PR #29680
* added hack to force invoking `transform_source`
* fixes #26218
2024-02-02 12:36:05 +00:00
dirkf bdda6b81df [Vbox7IE] Improve extraction
* DASH extraction no longer fails with new range support
* but always find combined formats if available
* suppress ineffective XFF geo-bypass (causes time-outs)
* adapted from https://github.com/ytdl-org/youtube-dl/pull/29680
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf 1fd8f802b8 [InfoExtractor] Correctly resolve BaseURL in DASH manifest
Specs:
* ISO/IEC 23009-1:2012 section 5.6
* RFC 3986 section 5.
2024-02-02 12:36:05 +00:00
dirkf 4eaeb9b2c6 [InfoExtractor] Support byte range for DASH
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
* thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf bec9180e89 [downloader/dash] Support `range` in fragment (format f'{start}-{end}')
* adapted from https://github.com/ytdl-org/youtube-dl/pull/30279
 * thx former GH user kikuyan
2024-02-02 12:36:05 +00:00
dirkf c58b655a9e [InfoExtractor] Support DASH subtitle extraction (yt-dlp back-port) 2024-02-02 12:36:05 +00:00
dirkf dc512e3a8a [YouTube] Fix `like_count` extraction using `likeButtonViewModel`
* also fix various tests
* TODO: check against yt-dlp tests
2024-01-22 11:10:34 +00:00
dirkf f8b0135850 [YouTube] Rework n-sig processing, realigning with yt-dlp
* apply n-sig before chunked fragments, fixes #32692
2024-01-22 11:10:34 +00:00
dirkf 640d39f03a [InfoExtractor] Support some warning and `._downloader` shortcut methods from yt-dlp 2024-01-22 11:10:34 +00:00
dirkf 6651871416 [compat] Rework compat for `method` parameter of `compat_urllib_request.Request` constructor
* fixes #32573
* does not break `utils.HEADrequest` (eg)
2024-01-22 11:10:34 +00:00
mk-pmb be008e657d [core] Fix format string injection for metadata JSON filename message. 2023-12-06 02:45:41 +00:00
Robotix b1bbc1e502
[Epidemic Sound] Add new extractor (#32628)
* Add simple extractor
* Support separate tracks
* Use index as id instead of slug

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-12-06 01:17:57 +00:00
dirkf 55a442adae
[Imgur] Overhaul extractor module (#32612)
Revise extractors for new API and page formats
2023-12-05 20:02:30 +00:00
mimvahedi c62936a5f2
[telewebion] Fix extraction (#32634)
* [telewebion] fix extraction

Resolves https://github.com/ytdl-org/youtube-dl/issues/5135#issuecomment-932952119

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-12-02 15:25:09 +00:00
dirkf 427472351c [utils] Make restricted filenames ignore characters in Unicode categories Mark, Other
Resolves #32629
2023-11-29 22:08:01 +00:00
dirkf c6538ed323 [workflows/ci.yml] Use setup-python for now released Python 3.12 2023-11-29 22:08:01 +00:00
dirkf 8d227cb97b [workflows/ci.yml] Actually use default values for push and pull_request 2023-11-29 22:08:01 +00:00
dirkf 4e115e18cb [workflows/ci.yml] Run apt-get update before installing 2023-11-29 22:08:01 +00:00
ReenigneArcher b7fca0fab3 [Youtube] Update consent cookie handling to match site
Apologies for force push!
[skip ci]
2023-11-29 21:43:02 +00:00
dirkf 00ef748cc0 [downloader] Fix baa6c5e: show ETA of http download as ETA instead of total d/l time 2023-09-24 22:07:47 +01:00
dirkf 66ab0814c4 [utils] Revert bbd3e7e, updating docstring, test instead 2023-09-03 23:15:19 +01:00
dirkf bbd3e7e999 [utils] Properly handle list values in update_url()
An actual list value in a query update could have been treated
as a list of values because of the key:list parse_qs format.
2023-09-03 01:18:22 +01:00
dirkf 21caaf2380 [test] Remove redundancy from lambda expected value regex 2023-09-03 01:13:40 +01:00
dirkf 31f50c8194 [S4C] Add thumbnail extraction, extract series as playlist
Based on https://github.com/yt-dlp/yt-dlp/pull/7776: thx ifan-t, bashonly
2023-08-31 23:16:50 +01:00
dirkf 7d58f0769a
[ci.yml] Improve conditions for nosetest installations 2023-08-31 17:16:47 +01:00
dirkf 86e3cf5e58 [S4C] Add extractor for Sianel Pedwar Cymru
* from https://github.com/yt-dlp/yt-dlp/pull/7730, thx ifan-t, bashonly
2023-08-04 22:54:12 +01:00
dirkf 2efc8de4d2 [utils] Advertise optional supported `Content-Encoding`s 2023-08-01 01:05:09 +01:00
dirkf e4178b5af3 [utils] Add and use `filter_dict()` from yt-dlp 2023-08-01 01:05:09 +01:00
dirkf 2d2a4bc832 [utils] Revise `isinstance()` tests (especially for str/unicode/bytes) to complete Linter fix 2023-08-01 01:05:09 +01:00
dirkf 7d965e6b65 [utils] Avoid comparing `type(var)`, etc, to pass new Linter rules 2023-08-01 01:05:09 +01:00
dirkf abef53466d [utils] Rework URL path munging for ., .. components
* move processing to YoutubeDLHandler
* also process `Location` header for redirect
* use tests from https://github.com/yt-dlp/yt-dlp/pull/7662
2023-07-29 14:27:26 +01:00
dirkf e7926ae9f4 [utils] Rework decoding of `Content-Encoding`s
* support nested encodings
* support optional `br` encoding, if brotli package is installed
* support optional 'compress' encoding, if ncompress package is installed
* response `Content-Encoding` has only unprocessed encodings, or removed
* response `Content-Length` is decoded length (usable for filesize metadata)
* use zlib for both deflate and gzip decompression
* some elements taken from yt-dlp: thx especially coletdjnz
2023-07-29 14:27:26 +01:00
dirkf 87e578c9b8 [workflows/ci.yml] Update to setup-java@v3
* avoid Node 12 deprecation
2023-07-29 14:27:26 +01:00
dirkf 0861812d72 [build] Fix typo in `devscripts/fish-completion.py` (fix 2285605) 2023-07-25 15:11:15 +01:00
dirkf b870181229 [build] Extend use of `devscripts/utils` 2023-07-25 13:19:43 +01:00
dirkf a25e9f3c84 [compat] Use `compat_open()` 2023-07-25 13:19:43 +01:00
dirkf aac33155e4 [build] Add and use `devscripts/utils` 2023-07-25 13:19:43 +01:00
dirkf 2b7dd3b2a2 [utils] Fix update_Request() with empty data (not None) 2023-07-25 13:19:43 +01:00
dirkf 44faa71b19 [test/test_execution.py] Use `compat_subprocess_get_DEVNULL()` 2023-07-25 13:19:43 +01:00
dirkf 7bce2ad441 [build] Fix various Jython CI and test issues 2023-07-25 13:19:43 +01:00
dirkf ca71e56c48 [workflows/ci.yml] Build 3.12 with pyenv 2023-07-25 13:19:43 +01:00
dirkf 2a4e9faa77 [doc] Update developer guidance
* mention pynose
* mention traverse_obj and add/revise examples

[skip ci]
2023-07-25 13:19:43 +01:00
dirkf 74eef6bb5e [workflows/ci.yml] Extend Python versions
* add 3.10 - 3.12
* use https://pypi.org/project/pynose/ for Py >= 3.9
* test Windows with 3.4
* set defaults (main, both) except push: (all, core)
2023-07-25 13:19:43 +01:00
dirkf 1fa8b86f0b [utils] Remove stray undocumented Host header in redirect (fix 46fde7c) 2023-07-20 05:29:59 +01:00
dirkf b2ba24bb02 [InfoExtractor] Add `_match_valid_url()` class method and refactor
* API compatible with yt-dlp
* also support Sequence of patterns in _VALID_URL
* one place to compile _VALID_URL
* TODO: remove existing extractor shims
2023-07-19 22:14:50 +01:00
dirkf a190b55964 [utils] Fix broken Py 3.11+ compat in `traverse_obj()`
* inspect.getargspec is missing despite doc claiming backward compat
* replace with emulation of `Signature.bind()`
2023-07-19 22:14:50 +01:00
dirkf b2741f2654 [InfoExtractor] Add search methods for Next/Nuxt.js from yt-dlp
* add _search_nextjs_data(), from https://github.com/yt-dlp/yt-dlp/pull/1386
  thanks selfisekai
* add _search_nuxt_data(), from https://github.com/yt-dlp/yt-dlp/pull/1921,
  thanks Lesmiscore, pukkandan
* add tests for the above
* also fix HTML5 type recognition and tests, from
  222a230871,
  thanks Lesmiscore
* update extractors in PR using above, fix tests.
2023-07-19 22:14:50 +01:00
dirkf 8465222041 [Clipchamp] Add new extractor back-ported from yt-dlp 2023-07-19 22:14:50 +01:00
dirkf 4339910df3 [DLF] Add site extractors back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6697, thanks nick-cd
2023-07-19 22:14:50 +01:00
dirkf eaaf4c6736 [Whyp] Add extractor back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6803, thanks CoryTibbettsDev
2023-07-19 22:14:50 +01:00
dirkf 4566e6e53e [GlobalPlayer] Add site extractors back-ported from yt-dlp
* from https://github.com/yt-dlp/yt-dlp/pull/6903, thanks garret1317
2023-07-19 22:14:50 +01:00
dirkf 1e8ccdd2eb [InfoExtractor] Support groups in _`search_regex()`, etc 2023-07-19 22:14:50 +01:00
dirkf cb9366eda5 [utils] Minor updates (merge_dicts, T)
A couple of mods to ease yt-dlp back-ports:
* add kwargs to merge_dicts:
  `unblank=True` (disallow empty string), `rev=False` (reverse the merge list)
* add `T(x)` shortcut for `{x}`, unsupported in Py2.6
2023-07-19 22:14:50 +01:00
dirkf d9d07a9581 [utils] Improve js_to_json, align with yt-dlp
* support variable substitution, from https://github.com/yt-dlp/yt-dlp/pull/#521 etc,
  thanks ChillingPepper, Grub4k, pukkandan
* improve escape handling, from https://github.com/yt-dlp/yt-dlp/pull/#521
  thanks Grub4k
* support template strings from https://github.com/yt-dlp/yt-dlp/pull/6623
  thanks Grub4k
* add limited `!` evaluation (eg, !!0 -> false, see tests)
2023-07-19 22:14:50 +01:00
dirkf 825a40744b [utils] Align traverse_obj() with yt-dlp
Thanks Grub4k for these:
* traverse `Iterable`s, from https://github.com/yt-dlp/yt-dlp/pull/6902, etc
* traverse `set` key for transformations/filters, `re.Match` group names, from
  776995bc10, etc
* traverse `re.Match`es, from https://github.com/yt-dlp/yt-dlp/pull/5174
* always return list when branching, from https://github.com/yt-dlp/yt-dlp/pull/5170
2023-07-19 22:14:50 +01:00
dirkf 47214e46d8 [compat] Fix old Pythons broken loading of valueless cookie attributes
Cookie string parsing in Py 2.6.9, probably earlier, requires `=`.
Also 3.2, though the CPython code appears to be OK: 3.1 was also wrong.
2023-07-18 10:50:46 +01:00
dirkf 1d8d5a93f7 [test] Fixes for old Pythons 2023-07-18 10:50:46 +01:00
dirkf 1634b1d61e [doc] Warn against setting cookies with --add-header 2023-07-18 10:50:46 +01:00
bashonly 21438a4194 [downloader/external] Fix cookie support 2023-07-18 10:50:46 +01:00
Simon Sawicki 8334ec961b [core] Process header cookies on loading 2023-07-18 10:50:46 +01:00
bashonly 3801d36416 [utils] `YoutubeDLCookieJar`: Add `get_cookie_header` and `get_cookies_for_url` methods 2023-07-18 10:50:46 +01:00
dirkf b383be9887 [core] Remove `Cookie` header on redirect to prevent leaks
Adated from yt-dlp/yt-dlp-ghsa-v8mc-9377-rwjj/pull/1/commits/101caac
Thx coletdjnz
2023-07-18 10:50:46 +01:00
dirkf 46fde7caee [core] Update redirect handling from yt-dlp
* Thx coletdjnz: https://github.com/yt-dlp/yt-dlp/pull/7094
* add test that redirected `POST` loses its `Content-Type`
2023-07-18 10:50:46 +01:00
dirkf 648dc5304c [compat] Add Request and HTTPClient compat for redirect
* support `method` parameter of `Request.__init__`  (Py 2 and old Py 3)
* support `getcode` method of compat_http_client.HTTPResponse (Py 2)
2023-07-18 10:50:46 +01:00
dirkf 1720c04dc5 [test] Make skipped tests in test_execution work with Py 2.6 2023-07-18 10:50:46 +01:00
dirkf d5ef405c5d [core] Align error reporting methods with yt-dlp 2023-07-18 10:50:46 +01:00
dirkf f47fdb9564 [utils] Add {expected_type} and Iterable support to traverse_obj() 2023-07-18 10:50:46 +01:00
dirkf b6dff4073d [core] Revert version display from b8a86dc 2023-07-18 10:50:46 +01:00
dirkf f24bc9272e [Misc] Fixes for 2.6 compatibility 2023-07-05 22:58:54 +01:00
dirkf b08a580906 [workflows/ci.yml] Fix test support for Py 2.6 2023-07-05 22:58:09 +01:00
dirkf 2500300c2a [workflows/ci.yml] Restore test support for Py 3.2 2023-07-05 22:51:15 +01:00
dirkf 58fc5bde47 [workflows/ci.yml] Restore test support for Py 3.3, 3.4, and add 2.6 2023-06-23 00:15:06 +01:00
dirkf fa7f0effbe [YouTube] Avoid crash in author extraction 2023-06-22 23:14:21 +01:00
dirkf ebdc82c586 [workflows/ci.yml] Replace actions/setup-python for legacy Pythons
Thanks MatteoH2O1999: https://github.com/MatteoH2O1999/setup-python
2023-06-22 23:12:22 +01:00
pukkandan 9112e668a5 [YouTube] Improve nsig function name extraction
Fixes player b7910ca8, using `,` vs `;`
See https://github.com/ytdl-org/youtube-dl/issues/32292#issuecomment-1602231170

Co-authored-by: dirkf
2023-06-22 16:46:53 +01:00
dirkf 07af47960f [YouTube] Improve fix for ae8ba2c
Thx: https://github.com/yt-dlp/yt-dlp/commit/01aba25
2023-06-18 00:52:18 +01:00
dirkf ae8ba2c319 [YouTube] Fix `KeyError QV` in signature extraction failed
* temporarily force missing global definition into sig JS
* improve test: thanks https://github.com/yt-dlp/yt-dlp/issues/7327#issuecomment-1595274615
* resolves #32314
2023-06-17 15:55:19 +01:00
dirkf d6433cbb2c [jsinterp] Don't find unrelated objects 2023-06-17 15:46:12 +01:00
dirkf ff75c300f5 [jsinterp] Fix test for failed match in extract_object() 2023-06-17 15:34:11 +01:00
dirkf a2534f7b88 [jsinterp] Fix div bug breaking player 8c7583ff
Thx bashonly: https://github.com/ytdl-org/youtube-dl/issues/32292#issuecomment-1585639223
Fixes #32292
2023-06-11 17:23:00 +01:00
dirkf b8a86dcf1a [core] Revise 1f7c6f8 to help downstream merger (possibly) 2023-05-26 20:25:25 +01:00
dirkf 2389c7cbd3 [compat] Fix casefold import __all__ syntax in a19855f 2023-05-23 17:11:22 +01:00
dirkf ee731f3d00 [ITV] Fix UA capitalisation in 384f632 2023-05-23 16:50:25 +01:00
dirkf 1f7c6f8b2b [core] Further improve platform debug log
* see d1c6c5c
2023-05-23 16:50:25 +01:00
dirkf d89c2137ba [jsinterp] Small updates for a85a875
* update signature tests
* clarify NaN handling
2023-05-23 16:50:25 +01:00
dirkf d1c6c5c4d6 [core] Improve platform debug log, based on yt-dlp 2023-05-11 21:17:31 +01:00
dirkf 6ed3433828 [jsinterp] Add short-cut evaluation for common expression
* special handling for (d%e.length+e.length)%e.length speeds up ~6%
2023-05-11 21:02:01 +01:00
dirkf a85a875fef [jsinterp] Handle NaN in bitwise operators
* also add _NaN
* also pull function naming from yt-dlp
2023-05-11 20:59:30 +01:00
dirkf 11cc3f3ad0 [utils] Fix `compiled_regex_type` in 249f2b6 2023-05-11 20:53:07 +01:00
dirkf 64d6dd64c8 [YouTube] Support Releases tab 2023-04-23 22:58:35 +01:00
dirkf 211cbfd5d4 [jsinterp] Minimally handle arithmetic operator precedence
Resolves #32066
2023-04-21 14:04:30 +01:00
dirkf 26035bde46 [DashSegmentsFD] Correctly detect errors when `fragment_retries` == 0
* use the success flag instead of the retry count
* establish the fragment_url outside the retry loop
* only report skipping a fragment once.
* resolves #32033
2023-04-13 00:23:17 +01:00
dirkf 2da3fa04a6 [YouTube] Simplify signature patterns 2023-04-12 23:53:14 +01:00
Gabriel Nagy 735e87adfc
[core] Sanitize info dict before dumping JSON (fixes fe7e130) (#32032)
* follow up to fe7e130 which didn't fix everything.

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-04-12 23:40:38 +01:00
dirkf fe7e13066c [core] Add and use sanitize_info() method from yt-dlp 2023-04-10 17:12:31 +01:00
dirkf 213d1d91bf [core] No longer importing copy 2023-04-06 19:49:46 +01:00
dirkf f8253a5289 [core] Avoid deepcopy of ctx dict (fix f35b757) (Pt 2) 2023-04-06 19:42:36 +01:00
dirkf d6ae3b77cd [core] Avoid deepcopy of ctx dict (fix f35b757)
* may now contain `LazyList`s
* resolves #31999
2023-04-06 14:56:55 +01:00
dirkf 9f4d83ff42 [options] Add --mtime option, unsets default --no-mtime
* resolves #1709 (!)
2023-04-05 19:05:16 +01:00
dirkf 25124bd640 [devscripts] Improve hack to convert command-line options to API options
* define equality for DateRange
* don't show default DateRange
2023-04-05 19:05:16 +01:00
dirkf 78da22489b [compat] Add and use `compat_open()` like Py3 `open()`
* resolves FIXME: ytdl-org/youtube-dl/commit/dfe5fa4
2023-04-05 18:57:37 +01:00
dirkf 557dbac173 [FragmentFD] Fix iteration with infinite limit
* fixes ytdl-org/youtube-dl/baa6c5e
* resolves #31885
2023-04-05 18:55:41 +01:00
dirkf cdf40b6aa6 [test] Update tests for Ubuntu 20.04
* 18.04 test runner was withdrawn
* for now, disable Py 3.3/3.4 tests
2023-04-05 18:54:30 +01:00
pukkandan 3f6d2bd76f [extractor/youtube] Bypass throttling for `-f17`
and related cleanup

Thanks @AudricV for the finding

Ref: yt-dlp/yt-dlp/commit/c9abebb
2023-03-19 02:29:00 +00:00
pukkandan 88f28f620b [extractor/youtube] Construct fragment list lazily
Ref: yt-dlp/yt-dlp/commit/e389d17
See: yt-dlp/yt-dlp#6517
2023-03-19 02:29:00 +00:00
dirkf f35b757c82 [utils] Ensure `allow_types` for `variadic()` is a tuple 2023-03-19 02:29:00 +00:00
dirkf 45495228b7 [downloader/http] Only check for resumability when actually resuming 2023-03-19 02:15:41 +00:00
dirkf 6fece0a96b [AENetworksBaseIE] Report missing show data instead of crash 2023-03-14 16:23:20 +00:00
dirkf 70ff013910 [devscripts] Add a hack to convert command-line options to API options 2023-03-14 16:23:20 +00:00
dirkf e8de54bce5 [core] Handle `/../` sequences in HTTP URLs
* use Python's RFC implementation for embedded sequences
* hack: strip unbalanced leading `../` from path, like eg Firefox

See https://github.com/yt-dlp/yt-dlp/issues/3355
2023-03-14 16:23:20 +00:00
dirkf baa6c5e95c [FragmentFD] Respect `--no-continue`
* discard partial fragment on `--no-continue`
* continue with correct progress display otherwise

Resolves #21467
2023-03-14 16:23:20 +00:00
dirkf 5c985d4f81 [downloader] Let _ffmpeg_ handle DASH segments
Fixes https://github.com/ytdl-org/youtube-dl/issues/31792 after 3da1783.
2023-03-14 16:23:20 +00:00
dirkf 8c86fd33dc
[doc] Improve "guidance" on bug reporting 2023-03-09 16:40:30 +00:00
Sophira 27d41d7365
[doc] Recommend "Get cookies.txt LOCALLY" extension in README.md (#31763)
* remove link to suspect "Get cookies.txt" extension, dropped from Chrome store
* link to new Manifest V3-compatible open-source "Get cookies.txt LOCALLY" extension.

Fixes #31465.
2023-03-07 15:49:31 +00:00
dirkf 0402710227 [jsinterp] Fix regexp parsing and .replace[All] method
* For performance, make regexp object instantiation lazy
 * Other small performance improvements
2023-03-07 01:24:21 +00:00
pukkandan 3e92c60fcd [jsinterp] Handle `Date` at epoch 0
See yt-dlp/yt_dlp#6400
2023-03-03 15:02:15 +00:00
pukkandan 3da17834a4 [Youtube] Construct dash formats with `range` query
See yt-dlp/yt_dlp#6369
2023-03-03 15:02:15 +00:00
dirkf f7ce98a21e [YouTube] Support @owner format in uploader_id etc
* implement https://github.com/ytdl-org/youtube-dl/issues/31530#issuecomment-1435734719
* update affected tests
* misc clean-ups
2023-02-24 12:22:16 +00:00
dirkf e67e52a8f8 [test] Support test-case with volatile ID (eg live show)
Signalled by regexp ID value, eg: `'id': r're:[\da-zA-Z_-]{8,}'`
2023-02-24 12:22:16 +00:00
pukkandan 1d3751c3fe Escape URLs in `sanitized_Request`, not `sanitize_url` d2558234cf5dd12d6896eed5427b7dcdb3ab7b5a added escaping of URLs while sanitizing. However, `sanitize_url` may not always receive an actual URL. Eg: When using `youtube-dl "search query" --default-search ytsearch`, `search query` gets escaped to `search%20query` before being prefixed with `ytsearch:` which is not the intended behavior. So the escaping is moved to `sanitized_Request` instead. 2023-02-20 20:27:25 +00:00
df 6067451e43 [Vimeo] Fix e19ec52 for tween-age Pythons
* a check in older Pythons in the 2.7 and earlier, 3.3, 3.4 series caused "sre_constants.error: nothing to repeat"
* satisfy the check by avoiding nested qualifiers that can match empty string

Resolves #31597
2023-02-20 01:41:46 +00:00
dirkf 57802e632f [jsinterp] Fix dict comprehension for Py2.6
Resolves #31600
2023-02-19 13:48:58 +00:00
dirkf 2dd6c6edd8
[YouTube] Avoid crash if uploader_id extraction fails
See #31530.
2023-02-17 11:16:54 +00:00
dirkf dd9aa74bee [test] Avoid name TestIE which causes a pytest warning
See: 060ac76257
2023-02-14 16:36:40 +00:00
dirkf 42b098dd79 [InfoExtractor] Handle unquoted values in OpenGraph searches 2023-02-14 02:53:16 +00:00
fonkap 6f8c2635a5 [StreamsbIE] Add extractor for streamsb.com (viewsb.com) (#31517)
* Add extractor for streamsb.com (viewsb.com)

* make data url using app.js version

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
fonkap de48105dd8 [KommunetvIE] Add extractor for kommunetv.no (#31516)
* Add extractor for kommunetv.no
* Using utils.update_url instead of regex

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
fonkap 822f19f05d [FileMoonIE] Add extractor for filemoon.sx (#31515)
---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
teddy171 33db85c571 [feat]: Add support to external downloader aria2p (#31500)
* feat: add class Aria2pFD

* feat: create call_downloader function

* feat: a colorful download interface to aria2pFD

* feat: change value name

* Apply suggestions from code review

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Typo in suggestion

* fix: remove unused value

* fix: add not function to return value(0 is normal); add total_seconds to download.eta(timedelta object); add waiting status when hook progress

* fix: remove unuse method ..utils.format_bytes

* fix: be up to flake8

* fix: be up to flake8

* Apply suggestions from code review

* [feat] test external downloader aria2p

* [feat] test external downloader aria2p

* [fix] test_external_downloader.py

* Apply suggestions from code review

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Apply suggestions from code review

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Update test/test_external_downloader.py

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Update test/test_external_downloader.py

Co-authored-by: dirkf <fieldhouse@gmx.net>

* Update youtube_dl/downloader/external.py

Co-authored-by: dirkf <fieldhouse@gmx.net>

* refactoring code and fix bugs

* Apply suggestions from code review

* Rename test_external_downloader.py to test_downloader_external.py

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
Valentin Metz f33923cba7 [rbgtum] Add new extractor (#31305)
* [rbgtum] Add new extractor

* Small update, force CI

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-13 03:54:51 +00:00
dirkf e8198c517b [YouTube] Fix tests 2023-02-13 03:54:51 +00:00
dirkf bafb6dec72 [YouTube] Refresh compat/utils usage
* import parse_qs()
* import parse_qs in lazy_extractors (clears old TODO)
* clean up old compiled lazy_extractors for Py2
* use update_url()
2023-02-13 03:54:51 +00:00
dirkf 4e04f10499 [compat] Update test_compat
[skip ci]
2023-02-13 03:54:51 +00:00
dirkf 90c9f789d9 [utils] Add parse_qs, update_url
[skip ci]
2023-02-13 03:54:51 +00:00
dirkf 249f2b6316 [compat] Systematise compat_ naming
[skip ci]
2023-02-13 03:54:51 +00:00
dirkf d6b14ba316 [test] Fix TestAgeRestriction
* age restriction may cause DownloadError
* update obsolete test URLs
[skip ci]
2023-02-13 03:54:51 +00:00
dirkf 30e986b834 [YouTube] Add `signatureTimestamp` for age-gate bypass 2023-02-13 03:54:51 +00:00
dirkf 58988c1421 [YouTube] Bypass age-gating for certain restricted videos
* Use TVHTML5_SIMPLY_EMBEDDED_PLAYER client

* Also add and fix tests

* Introduce and use new utility function `update_url()`
2023-02-13 03:54:51 +00:00
dirkf e19ec52322 [Vimeo] Support /user{video_id}/{slug} URL format 2023-02-12 22:16:00 +00:00
dirkf f2f90887ca [Vimeo] Fix `Unable to extract info section` redux
* as reported in yt-dlp/yt-dlp#6149
* also allow newline in target JSON object
2023-02-12 22:16:00 +00:00
dirkf cd987e6fca [jsinterp] Nits 2023-02-12 22:16:00 +00:00
dirkf d947ffe8e3 [IGN] Overhaul extractor to avoid URL redirection loop
Consequently/also:
* centralise video data extraction
* detect 404 and 503 expected errors
* handle the test video in IGNVideo
* handle two additional page formats for the tests in IGNArticle
2023-02-12 22:16:00 +00:00
dirkf 384f632e8a
[ITV] Overhaul ITV extractor (#30266)
* support ITVX URLs (thanks Vangelis66)
* support legacy ITV Hub URLs
* include extraction fix 4c57dd2 from sleaux-meaux 3 May 2021
* include extraction fix 6fbcc16, fix by staubichsauger & pukkandan
* work-around duration parsing pending fix to utils.parse_duration
* apply default vanilla UA for pages and media to avoid site blocking
* also detect and report `Episode not found` instead of generic 404
* rework ITVBTCCIE with geo-block detection, best effort geo-restriction handling, news article support
* fix tests
2023-02-03 21:10:07 +00:00
dirkf 9d17948b5a
[myvideoge] Add new extractor (#31360)
NB download tests on CI servers blocked 

Co-authored-by: Alfonso Solbes <fonk666@gmail.com>
2023-02-02 23:25:44 +00:00
afterdelight f316f5d4e3
[xhamster] add support for new domain xhvid.com (#31370) 2023-02-02 23:20:14 +00:00
dirkf bc6f94e459
[FIFA] Back-port extractor from yt-dlp (#31385) 2023-02-02 23:19:03 +00:00
Epsilonator be3392a0d4
[Blerp] Add new extractor (#31398)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:33:09 +00:00
zhangeric-15 6d829d8119
[YouTube] Fix not finding videos listed under a channel's "shorts" subpage. (#31409)
Resolves #31336

Co-authored-by: Jouni Järvinen <rautamiekka@users.noreply.github.com>
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:26:31 +00:00
Ruowang Sun 98b0cf1cd0
[Callin] Add new extractor (#31414)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:21:05 +00:00
Leon Etienne e9611a2a36
[pr0gramm] implement InfoExtractor, Resolves #31433 (#31434)
* [pr0gramm] implement infoextractor

* [pr0gramm] remove misplaced comment, uncapture regex-group

* [pr0gramm]: specify utf-8 coding

* [pr0gramm]: add trailing comma to lists for maintainability

* [pr0gramm]: ie only sets upload_date attribute

* [pr0gramm]: add video_id to title

* [pr0gramm]: more forgiving _valid_url regex

* [pr0gramm]: add uploader to title, if set

* Discriminate URL pattern

---------

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:13:39 +00:00
JChris246 807e593a32
[cammodels] fix and improve extractor (#31453)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:12:36 +00:00
Rodrigo Dias 297fbff23b
[doc] Fixed typo appearing to promise an example (#31489)
Resolves #31425 

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 17:10:09 +00:00
Brian Marks 37cbdfa0e7
[americastestkitchen] Add support for downloading entire series (#31493)
Also
* support new sites and URL patterns
* back-port from yt-dlp

Co-authored-by: dirkf <fieldhouse@gmx.net>
2023-02-02 16:58:21 +00:00
dirkf 295736c9cb [jsinterp] Improve parsing
* support subset `... else if ...`
* support `while`
* add `RegExp` class
* generalise `new` support
* limited more debug strings
* matching test changes
2023-02-02 16:31:49 +00:00
pukkandan 14ef89a8da Support `if` statements
Fix for yt-dlp/yt_dlp#6131
Closes #31509
2023-02-02 13:12:46 +00:00
dirkf 195f22f679
[generic] Improve KVS (etc) extraction 2022-11-13 15:09:29 +00:00
dirkf fc2beab0e7
[generic] Improve KVS (etc) extraction
* detect kt_player('kt_player', 'https://.../kt_player.swf?v=5...
* detect age limit if 18 USC 2257 is mentioned
* test with shooshtime.com

Partially resolves #31332.
2022-11-13 14:59:30 +00:00
FraFraFra-LongD 1a4fbe8462
Added ThisVid.com support (#29187)
* add ThisVidIE, ThisVidMemberIE, ThisVidPlaylistIE
* redirect embed to main page for more metadata
* use KVS extraction newly added to GenericIE and remove duplicate tests
* also add MrDeepFake etc compat to GenericIE
(closes #22390)

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-11-13 13:22:04 +00:00
dirkf c2f9be3e63
[generic] Add KVS player extraction 2022-11-12 11:55:05 +00:00
dirkf 604762a9f8
[common:jwplayer] Improve jwplayer extraction and parsing (#31000)
* don't crash parser if jwplayer_data is invalid (empty, or no formats)
* use `label` in `sources[n]` as `format_id`
* relax `jwplayer().setup(...)` RE (also rework PR #27274 enhancement)
* detect more manifest formats in _parse_jwplayer_formats() (from PR #29596)
* improve metadata extraction (from PR #25433)
* remember URLs in a set
* use parse_resolution() in format
* extract filesize in format (from yt-dlp)

Co-authored-by: kikuyan <kikuyan@users.noreply.github.com>
Co-authored-by: martin54 <martin54@users.noreply.github.com>
2022-11-11 00:49:13 +00:00
Moises Lima 47e70fff8b
[PeekVids, PlayVids] Add new extractor (#29765)
* Merge back-port from yt-dlp
* Merge features from PR #29798
* Improve metadata extraction

Co-authored-by: dirkf <fieldhouse@gmx.net>
Co-authored by: AXDOOMER
2022-11-09 20:26:30 +00:00
dirkf de39d1281c
[extractor/ceskatelevize] Back-port extractor from yt-dlp, etc (#30713)
* back-port extractor, removing CeskaTelevizePoradyIE
* follow redirect URL
* support liveBroadcast and videobonusDetail in __NEXT__ data
* return single video for singleton playlist
* fix/add tests
2022-11-04 10:13:07 +00:00
Andrei Lebedev 27ed77aabb
[utils] Backport traverse_obj (etc) from yt-dlp (#31156)
* Backport traverse_obj and closely related function from yt-dlp (code by pukkandan)
* Backport LazyList, variadic(), try_call (code by pukkandan)
* Recast using yt-dlp's newer traverse_obj() implementation and tests (code by grub4k)
* Add tests for Unicode case folding support matching Py3.5+ (requires f102e3d)
* Improve/add tests for variadic, try_call, join_nonempty

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-11-03 10:09:37 +00:00
dirkf c4b19a8816
[compat] Work around in case folding for narrow Python build
Resolves #31324.
2022-11-02 11:56:26 +00:00
dirkf 087ddc2371
[compat] Add test for compat_casefold() 2022-11-01 22:47:02 +00:00
dirkf 65ccb0dd4e
[compat] Add test for compat_casefold() 2022-11-01 21:33:39 +00:00
dirkf a874871801
[compat] Reformat casefold.py for easier updating 2022-11-01 19:25:59 +00:00
dirkf b7c25959f0
[compat] Unify unicode/str compat and move up 2022-11-01 12:40:23 +00:00
dirkf f102e3dc4e
[compat] Add compat_casefold and compat_re_Match, for traverse_obj() port 2022-10-31 21:27:14 +00:00
dirkf a19855f0f5
[compat] Add Python 2 Unicode casefold using a trivial wrapper around icu/CaseFolding.txt 2022-10-31 21:18:36 +00:00
Xie Yanbo ce5d36486e
[netease] Support urls shared from mobile app (#31304)
Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-10-30 11:48:44 +00:00
Xie Yanbo d25cf62086
[netease] Impove error handling (#31303)
* add warnings for users outside of China
* skip empty song urls

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-10-30 11:46:46 +00:00
dirkf 502cefa41f
[Vimeo] Update variable name in hydration JSON pattern
Fixes #31311
2022-10-27 14:33:00 +00:00
dirkf 0faa45d6c0
[BongaCams] Support new .net domain
Resolves #31262.
2022-10-20 11:06:44 +00:00
ache 447edc48e6
Fix ADN extractor (#31275)
* Rename Anime Digital Network to Animation Digital Network, animationdigitalnetwork.fr
* Update the test to an available video
* Update the decoding key of subtitles
* Keep the support of old URLs
* Add a test to match the old URL
* Reduce redundancy of the URL name
* Fix md5 ^^"
* Fix undefined _BASE
* Process HTTP error text (eg geo-block) correctly and uniformly in Py3, Py2
* Skip test for CI since geo-blocked

Signed-off-by: ache <ache@ache.one>
Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-10-18 16:06:27 +01:00
dirkf ee8560d01e
[ManyVids] Support new single-page app structure 2022-10-13 02:42:49 +00:00
dirkf 7135277fec
[ManyVids] Support new single-page app structure
See https://github.com/yt-dlp/yt-dlp/issues/5210#issuecomment-1276919962.
2022-10-13 01:59:01 +00:00
dirkf 7bbd5b13d4
[Motherless] Pull from yt-dlp, etc
* use username field
* loosen regexes
* warn on page count 0 in group
* avoid reloading group page 1
Closes #29626
2022-10-12 01:09:55 +01:00
Xie Yanbo c91cbf6072
[netease] Get netease music download url through player api (#31235)
* remove unplayable song from test
* compatible with python 2
* using standard User_Agent, fix imports
* use hash instead of long description
* fix lint
* fix hash
2022-10-11 13:55:09 +01:00
dirkf 11b284c81f
[Common:JWPlayer] Fix x1000 scaling error
See https://github.com/yt-dlp/yt-dlp/issues/5106#issuecomment-1264625161
2022-10-11 12:36:44 +00:00
dirkf c94a459a24
[utils] Sanitize look-alike Unicode glyphs in non-ID filename fields when --restrict-filenames
Implements https://github.com/ytdl-org/youtube-dl/issues/31216#issuecomment-1236102822, which has a test.
2022-10-11 12:18:12 +00:00
dirkf 6e2626f092
[JSInterp] Improve separation logic
Based on 0468a3b325
2022-10-11 05:58:10 +01:00
dirkf c282e5f8d7 [ZDF] Overhaul ZDF extractors
* pull some yt-dlp changes into ZDFBaseIE._extract_format()
* add test cases from yt-dlp to ZDFIE
* fix crash in ZDFIE._extract_mobile() when object had no `formitaeten`
* improve title extraction in ZDFChannelIE (remove trailing station ident)
* avoid extracting non-video playlist items (fixes #31149)
2022-10-11 00:05:17 +01:00
dirkf 2ced5a7912 [test] Implement string "lambda x: condition(x)" as an expected value
Semantics equivalent to `assert condition(got)`
2022-10-11 00:05:17 +01:00
Xiyue 82e4eca711
[motherless] Fixed the broken uploader_id in the extractor (#31243)
* Fixed the broken uploader_id in the extractor.
* Make uploader_id RE looser
* Fix uploader_id in test Motherless_3
* Fix group pagination
* # coding: utf-8

Co-authored-by: Andy Xuming <xuminic@gmail.com>
Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-10-10 23:52:48 +01:00
dirkf 1b1442887e
[manyvids] Improve extraction (#31172)
* extract all formats from page
* extract description, uploader, views, likes
* downrate previews
* fix tests
* use txt_or_none()
2022-10-10 19:26:32 +01:00
dirkf 22127b271c
[NRK] Remove explicit Accept-Encoding header that invites Brotli
Fixes #31285
2022-10-10 17:41:40 +00:00
coletdjnz d35557a75d [Telegraaf] Use mobile GraphQL API endpoint
Workaround for Cloudflare 403
Fixes https://github.com/yt-dlp/yt-dlp/issues/5000
Authored by: coletdjnz
2022-10-04 11:43:08 +01:00
dirkf 9493ffdb8b [test] Use windows-2019 for tests
(At least for now) resolves #31249
2022-10-04 11:31:29 +01:00
pukkandan 7009bb9f31 [jsinterp] Workaround operator associativity issue
* temporary fix for player 5a3b6271 [1]

1. https://github.com/yt-dlp/yt-dlp/issues/4635#issuecomment-1235384480
2022-09-03 00:53:56 +01:00
dirkf 218c423bc0 [cache] Add cache validation by program version, based on yt-dlp 2022-09-01 13:28:30 +01:00
dirkf 55c823634d [jsinterp] Handle new YT players 113ca41c, c57c113c
* add NaN
* allow any white-space character for `after_op`
* align with yt-dlp f26af78a8ac11d9d617ed31ea5282cfaa5bcbcfa (charcodeAt and bitwise overflow)
* allow escaping in regex, fixing player c57c113c
2022-09-01 10:57:12 +01:00
dirkf 4050e10a4c [options] Document that postprocessing is not forced by --postprocessor-args
Resolves #30307
2022-08-29 13:02:17 +01:00
dirkf ed5c44e7b7 [compat] Replace deficient ChainMap class in Py3.3 and earlier
* fix version check
2022-08-26 12:22:01 +01:00
dirkf 0f6422590e [compat] Replace deficient ChainMap class in Py3.3 and earlier 2022-08-26 10:24:42 +01:00
dirkf 4c6fba3765 [jsinterp] Improve try/catch/finally support 2022-08-26 08:51:17 +01:00
dirkf d619dd712f [jsinterp] Fix bug in operator precedence
* from 164b03c486
* added tests
2022-08-25 12:16:10 +01:00
dirkf 573b13410e [YouTube] Improve error check for n-sig processing 2022-08-25 12:14:59 +01:00
dirkf 66e58dccc2 [core] Avoid processing empty format list after removing bad formats
* also ensure compat encoding of error strings
2022-08-21 00:45:06 +01:00
dirkf 556862bc91 [utils] Ensure RFC3986 encoding result is unicode 2022-08-21 00:45:06 +01:00
gudata a8d5316aaf
[infoq] Avoid crash if the page has no `mp3Form`
* proposed fix for issue #31131, aligns with yt-dlp

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-08-19 21:00:21 +01:00
dirkf fd3f3bebd0 [uktvplay] Support domain without .uktv 2022-08-19 19:11:08 +01:00
dirkf 46b8ae2f52 [jsinterp] Clean up and pull yt-dlp style
* add compat_re_Pattern
* improve compat_collections_chain_map
* use class JS_Undefined
* remove unused code
2022-08-19 15:34:33 +01:00
dirkf 538ec65ba7
[jsinterp] Handle regexp literals and throw/catch execution (#31182)
* based on f6ca640b12, thanks pukkandan
* adds parse support for regexp flags
2022-08-19 11:45:04 +01:00
dirkf b0a60ce203
[jsinterp] Improve JS language support (#31175)
* operator ??
* operator ?.
* operator **
* accurate operator functions
* `undefined` handling
* object literals {a: 1, "b": expr}
* more tests for weird JS comparisons: see https://github.com/ytdl-org/youtube-dl/issues/31173#issuecomment-1217854397.
2022-08-17 14:22:02 +01:00
dirkf e52e8b8111 [postprocessor] Don't replace existing value with null metadata parsed from title 2022-08-15 16:45:04 +01:00
dirkf d231b56717
[jsinterp] Overhaul JSInterp to handle new YT players 4c3f79c5, 324f67b9 (#31170)
* back-port from yt-dlp 8f53dc44a0cc1c2d98c35740b9293462c080f5d0, thanks pukkandan
* also support void, improve <</>> precedence, improve expressions in comma-list
* add more tests
2022-08-14 18:45:45 +01:00
dirkf e6a836d54c [core] Make `--max-downloads ...` stop immediately on reaching the limit
Based on and closes #26638.
2022-08-10 15:37:59 +01:00
dirkf deee741fb1
[test, etc] Improve download test logs; also clean up some new flake8 issues (#31153)
* [test] Identify testcase errors better
* [test] Identify download errors better
* [extractor/minds] Linter
* [extractor/aes] Linter
2022-08-09 21:05:00 +01:00
Wes adb5294177
[aenetworks] Update _THEPLATFORM_KEY and _THEPLATFORM_SECRET (#29749)
Fixes ytdl-org/youtube-dl#29300
2022-07-30 02:10:00 +01:00
Kyraminol Endyeran 5f5c127ece
[VVVVID] Support video/dash types (#31060)
Resolves #31030.
2022-07-12 00:35:40 +01:00
dirkf 090acd58c1
[options] Improve be35e53 (--match-/reject-title parameter value)
Resolves #31064.
2022-07-03 20:05:21 +01:00
dirkf a03b9775d5 [Mediaset] Support player version number in URL pattern
Ref: https://github.com/yt-dlp/yt-dlp/issues/4141
2022-06-26 14:24:06 +01:00
dirkf 8a158a936c [NHK] Use new API URL 2022-06-15 18:28:19 +01:00
dirkf 11665dd236 [test] Fix linter for 3aa94d7945 2022-06-15 18:28:19 +01:00
dirkf cc179df346 [XHamster] Support xhday.com alias, extract `uploader_id`
* support xhday.com alias for xhamster.com (resolves #31023)
  Authored by: dirkf
* extract `uploader_id`:
  from 908b56eaf7
  (PR https://github.com/yt-dlp/yt-dlp/pull/844)
  Authored by: octotherp
2022-06-12 14:10:38 +01:00
pukkandan 0700fde640 [utils, etc] Kill child processes when yt-dl is killed
* derived from PR #26592, closes #26592

Authored by: Unrud
2022-06-10 19:57:46 +01:00
dirkf 811c480f7b [YouTube] Support JSON3 subtitle format
* subtitle tests updated to match
2022-06-09 15:25:23 +01:00
dirkf 3aa94d7945 [test] Fix workable subtitle tests (except YT) and mark others as skip, broken
* broken tests need to be fixed when fixing the respective IE
2022-06-08 23:11:33 +01:00
dirkf ef044be34b [test] Skip not _WORKING IE in subtitle tests; use unittest.skipTest throughout 2022-06-08 15:52:21 +01:00
dirkf 530f4582d0 [HRFernsehen] Back-port new extractor from yt-dlp
Closes #26445, where this was originally proposed.
2022-06-06 19:29:48 +01:00
pukkandan 1baa0f5f66 [utils] Escape URL while sanitizing
Closes #31008, #yt-dlp/263

While this fixes the issue in question, it does not try to address the root-cause of the problem
Refer: 915f911e365736227e134ad654601443dbfd7ccb, f5fa042c82300218a2d07b95dd6b9c0756745db3
2022-06-06 16:03:04 +01:00
LewdyCoder 9aa8e5340f
[Readme] Clarified extractor naming (#29799)
* Exported usable extractors must be named `xxxxIE`

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-05-30 02:50:50 +01:00
dirkf 04fd3289d3 [YouPorn] Improve `upload_date` extraction
See https://github.com/yt-dlp/yt-dlp/issues/2701#issuecomment-1034341883
2022-05-28 13:54:32 +01:00
dirkf 52c3751df7 [utils] Enable ALPN in HTTPS to satisfy broken servers
See https://github.com/yt-dlp/yt-dlp/issues/3878
2022-05-28 13:52:51 +01:00
dirkf 187a48aee2 [YouTube] Handle player c5a4daa1 with indirect n-function definition
* resolves #30976
2022-05-24 15:43:56 +01:00
Jacob Chapman be35e5343a Update options.py 2022-05-20 05:25:54 +01:00
dirkf c3deca86ae
[wat.tv] Add version `pver` to metadata API call
Resolves #30959.
2022-05-19 17:41:48 +00:00
dirkf c7965b9fc2
[NHK] Support alphabetic characters in 7-char NhkVod IDs (#29682) 2022-05-09 18:54:41 +01:00
dirkf e988fa4523 [doc] Clarify test naming 2022-04-29 16:56:00 +01:00
dirkf e27d8d819f
[streamcz] Remove empty `'{}'.format()` for Py2.6
Use `'-join()'` here, or `{0}`, ..., in general.
2022-04-29 13:36:02 +01:00
Árni Dagur ebc627847c
[KTH] Add new extractor for KTH play (#30885)
* Implement extractor for KTH play
* Make KTH Play url regex more relaxed
2022-04-28 10:18:10 +01:00
dirkf a0068bd6be [Youtube] Fix "n" descrambling for player fae06c11
Resolves #30856.
2022-04-15 16:07:09 +01:00
dirkf b764dbe773
Disable blank issues 2022-04-10 05:49:09 +01:00
nixxo 871645a4a4 [RAI] Fix extraction of http formats
From https://github.com/yt-dlp/yt-dlp/pull/3272
Closes https://github.com/yt-dlp/yt-dlp/issues/3270
Authored by: nixxo
2022-04-05 15:21:59 +01:00
nixxo 1f50a07771 [RAI] Extend formats with direct http mp4 link (PR #27990)
* initial support for creating direct mp4 link
* improved regexes and info extraction
* added "connection: close" to request headers
* updated to https://github.com/yt-dlp/yt-dlp/pull/208
2022-04-05 15:21:59 +01:00
nixxo 9e5ca66f16 [RAI] Added checks for DRM protected content (PR #27657)
reviewed by pukkandan (https://github.com/yt-dlp/yt-dlp/pull/150)
2022-04-05 15:21:59 +01:00
lihan7 17d295a1ec [extractor/bilibili] Fix path "/audio/auxxxxx" download return 403 2022-04-01 00:46:34 +01:00
dirkf 49c5293014 Ignore --external-downloader-args if --external-downloader was rejected
... and generate warning
2022-03-25 14:47:26 +00:00
df 6508688e88 Make default upload_/release_date a compat_str
Ensures download tests pass in Python 2 as well as 3; also
add YoutubeDL tests for timestamp -> upload_date etc.
2022-02-26 10:29:42 +00:00
dirkf 4194d253c0 Avoid skipping ID when unlisted_hash is numeric
Pattern needed a non-greedy match; also replaced a redundant test with one for this, issue 29690
2022-02-26 10:29:42 +00:00
dirkf f8e543c906 [Alsace20TV] Add new extractors Alsace20TVIE, Alsace20TVEmbedIE 2022-02-24 18:43:47 +00:00
dirkf c4d1738316 [CPAC] Add extractor for Canadian Parliament
CPACIE: single episode
CPACPlaylistIE: playlists and searches
2022-02-24 18:27:57 +00:00
dirkf 1f13ccfd7f
Fixed groups() call on potentially empty regex search object (#30676)
* Fixed groups() call on potentially empty regex search object.
- https://github.com/ytdl-org/youtube-dl/issues/30521

* minimising lines changed

Co-authored-by: yayorbitgum <50963144+yayorbitgum@users.noreply.github.com>
2022-02-24 18:26:58 +00:00
marieell 923292ba64 [aliexpress] Fix test case 2022-02-24 13:44:52 +00:00
Lesmiscore (Naoya Ozaki) 782bfd26db
[bigo] add support for bigo.tv (#30635)
* [bigo] add support for bigo.tv

* [bigo] prepend "Bigo says"

* title fallback

* add error for invalid json data
2022-02-24 13:34:32 +00:00
Vladimir Stavrinov 3472227074
[rutv] fix vbr for empty string value (#30623)
* [rutv] use str_to_int() (thx dirkf)
2022-02-14 17:54:31 +00:00
Petr Vaněk bf23bc0489 add missing __future__ import unicode_literals 2022-02-14 07:07:05 +00:00
Petr Vaněk 85bf26c1d0 resolve problem with unpacking operator for <py3.5 2022-02-14 07:07:05 +00:00
Petr Vaněk d8adca1b66 [streamcz] test fixes and one additional test 2022-02-14 07:07:05 +00:00
Petr Vaněk d02064218b do not use f-strings 2022-02-14 07:07:05 +00:00
Petr Vaněk b1297308fb avoid traverse_obj function 2022-02-14 07:07:05 +00:00
Petr Vaněk 8088ce036a revert: use _match_valid_url function 2022-02-14 07:07:05 +00:00
Petr Vaněk 29f7bfc4d7 [streamcz] cherry-pick from yt-dlp
Cherry-picked-from: 7d449fff5346 ("[streamcz] Fix extractor (#1616)")
2022-02-14 07:07:05 +00:00
dirkf 74f8cc48af [extractor/videa] Back-port from yt-dlp PRs 463+1028
Authored by: nyuszika7h
2022-02-11 12:43:26 +00:00
kikuyan 8ff961d10f [extractor/videa] fix extraction in Py2
Fixes #30416
2022-02-11 12:43:26 +00:00
dirkf 266b6ef185 [BBC] Also allow PID with leading 'l' (live?) 2022-02-09 21:21:59 +00:00
dirkf 825d3426c5
[Nuvid] Use site JSON for video details (#29332)
Back-port yt-dlp PR 1022 onto PR #17890 and update

Video details aren't in the original HTML now but populated by async JS

Co-authored by: u-spec-png
Co-authored by: vidaritos
2022-02-09 02:40:34 +00:00
dirkf 47b0c8697a [ARD] Back-port subtitle extraction from yt-dlp PR 2409
Authored by: fstirlitz
Fixes #30543
Closes #17766 (thanks ngdio)
2022-02-07 13:47:38 +00:00
Seonghyeon Cho 734dfbb4e3 Remove redundant assigning `format_id` 2022-02-05 03:04:35 +00:00
df ddc080a562 Add ArteTVCategoryIE to support category playlists 2022-02-05 03:02:56 +00:00
Abdullah Ibn Fulan 16a3fe2ba6 Updated Album URL regex
Mistakenly forgot to edit a line in last commit.

Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-02-05 02:53:23 +00:00
Abdullah Ibn Fulan c820a284a2 [extractor/audiomack] Updated URL regex, corrected invalid testcases, fixed bug
Co-authored-by: dirkf <fieldhouse@gmx.net>
2022-02-05 02:53:23 +00:00
dirkf 58babe9af7 Support __INITIAL_DATA__ with stringified JSON
Add test and fix test for bbcthreeConfig
2022-02-05 02:51:46 +00:00
df 6d4932f023 Try for timestamp, description from window.__INITIAL_DATA__ pages 2022-02-05 02:51:46 +00:00
dirkf 92d73ef393 [niconico] Implement heartbeat for download 2022-02-05 02:47:21 +00:00
dirkf 91278f4b6b [niconico] Back-port extractor from yt-dlp
Add Nico search extractors, fix extraction
2022-02-05 02:47:21 +00:00
dirkf 73e1ab6125 [test:download] Only extract enough videos for playlist_mincount 2022-02-05 02:47:21 +00:00
dirkf 584715a803 [applepodcasts] Extract default thumbnail image 2022-02-05 02:32:45 +00:00
dirkf e00b0eab1e [applepodcasts] Improve format extraction
Set acodec and vcodec, etc, to avoid breaking, eg, bestaudio
2022-02-05 02:32:45 +00:00
dirkf 005339d637 [applepodcasts] Support new AMP-ish page structure 2022-02-05 02:32:45 +00:00
Chris Rose 23ad6402a6 xvideos: Fix for #30271 2022-02-05 02:24:51 +00:00
dirkf 9642344965 Fix tests for working IEs; disable obsolete WDRMobile 2022-02-05 02:22:45 +00:00
dirkf 568c7005d5 Fix WDRMaus; extend URL matching for other Maus pages; improve ID extraction 2022-02-05 02:22:45 +00:00
dirkf 5cb4833f40 Update URPlayIE extractor for Next.js page format, with subtitles 2022-02-05 02:16:53 +00:00
dirkf 5197336de6 Support more deeply nested ptmd_path with test, update tests 2022-02-05 02:14:35 +00:00
dirkf 01824d275b Additional tweaks: allow any .ndr.de, simplify quote match 2022-02-05 02:12:44 +00:00
dirkf 39a98b09a2 Fix NDR, NJoy tests 2022-02-05 02:12:44 +00:00
dirkf f0a05a55c2 NJoy: improve extraction of NDR id, description, etc with current page formats 2022-02-05 02:12:44 +00:00
dirkf 4186e81777 NDR: improve extraction of NDR id, description, etc with current page formats 2022-02-05 02:12:44 +00:00
dirkf b494824286 Support Tele5 pages with Discovery Networks format instead of JWPlatform 2022-02-05 02:08:11 +00:00
dirkf 8248133e5e Back-port yt-dlp Viki extractor
From https://github.com/yt-dlp/yt-dlp/pull/2540
2022-02-04 15:49:12 +00:00
dirkf 27dbf6f0ab Return the item itself if playlist has one entry
Removes playlist spam from log
2022-02-04 14:28:50 +00:00
dirkf 61d791726f Find TV2DK Kaltura ID in Nuxt.js page format 2022-02-04 14:28:50 +00:00
pukkandan 0c0876f790 [youtube:search] Add tests 2022-02-04 11:09:18 +00:00
dirkf 7a497f1405 Rework 2c2c2bd with an actual Mix page and realistic playlist size
From 2c2c2bd348 (commitcomment-65953545)
2022-02-04 04:09:23 +00:00
dirkf 5add3f4373 Merge branch 'pukkandan-yt-searchurl' into yt-dl-master
Closes #27749
2022-02-04 03:50:32 +00:00
pukkandan 78ce962f4f [youtube] Support channel search
Code from cd684175ad
2022-02-03 01:02:58 +00:00
dirkf 41f0043983 Avoid crashing if n-sig decode fails 2022-02-02 14:25:03 +00:00
dirkf 34c06b16f5 Support Youtube Shorts URL format 2022-02-01 14:40:20 +00:00
dirkf 1e677567cd
[YouTube] Fix n-sig for player e06dea74 (#30582)
From yt-dl commit 48416bc
2022-02-01 14:39:03 +00:00
df af9e72507e Implement n-param descrambling using JSInterp
Fixes #29326, closes #29790, closes #30004, closes #30024, closes #30052,
closes #30088, closes #30097, closes #30102, closes #30109, closes #30119,
closes #30125, closes #30128, closes #30162, closes #30173, closes #30186,
closes #30192, closes #30221, closes #30239, closes #30539, closes #30552.
2022-01-31 00:19:58 +00:00
dirkf 6ca7b77696 Refactor JSInterpreter._separate
yt-dlp/yt-dlp/@06dfe0a, improve _MATCHING_PARENS
2022-01-30 00:05:54 +00:00
dirkf 9d142109f4 Back-port test_youtube_signature.py from yt-dlp and fix JSInterp accordingly 2022-01-30 00:05:54 +00:00
dirkf 1ca673bd98 Fix splice to handle float
Needed for new youtube js player f1ca6900
Add 57dbe8077f (diff-729b57caa8d006426f6a8960c061f519a8b6658682284015e069745af52ffb07)
2022-01-30 00:05:54 +00:00
df e1eae16b56 Handle default in switch better
Add a1fc7ca074
Thanks coletdjnz
2022-01-30 00:05:54 +00:00
df 96f87aaa3b Back-port JS interpreter upgrade from yt-dlp PR #1437 2022-01-30 00:05:54 +00:00
df 5f5de51a49 Add compat_map/filter and use the former 2022-01-30 00:05:36 +00:00
df 39ca35e765 Fix test_youtube_flat_playlist_extraction 2022-01-29 20:00:21 +00:00
df d76d59d99d Remove obsolete non-working test_youtube_toptracks 2022-01-29 20:00:21 +00:00
df 2c2c2bd348 Fix test_youtube_mix 2022-01-29 20:00:21 +00:00
df 46e0a729b2 Remove obsolete test_youtube_course 2022-01-29 20:00:21 +00:00
df 57044eaceb Fix test_youtube_playlist_noplaylist 2022-01-29 20:00:21 +00:00
pukkandan a3373da70c
Merge branch 'UP/youtube-dl' into dl/YoutubeSearchURLIE 2022-01-30 01:07:28 +05:30
pukkandan 2c4cb134a9
Fix max_results 2022-01-30 00:54:22 +05:30
pukkandan bfe72723d8
Use `itertools.islice` 2022-01-30 00:49:55 +05:30
pukkandan ed99d68bdd
Add back `YoutubeSearchURLIE` 2022-01-30 00:41:47 +05:30
Sergey M․ 5014bd67c2
release 2021.12.17 2021-12-17 01:49:07 +07:00
Sergey M․ e418823350
[ChangeLog] Actualize
[ci skip]
2021-12-17 01:43:16 +07:00
lanegramling b5242da7d2
[youtube] Update signature function patterns (closes #30363) (#30366) 2021-12-17 01:42:17 +07:00
bopol a803582717
[peertube] only call description endpoint if necessary (#29383) 2021-07-01 06:53:22 +00:00
Remita Amine 7fb9564420 [periscope] pass referer to HLS requests(closes #29419) 2021-06-28 20:08:39 +01:00
Aleri Kaisattera 379f52a495
[liveleak] Remove extractor (closes #17625, closes #24222) (#29331) 2021-06-21 04:23:50 +07:00
Sergey M․ cb668eb973
[pornhub] Add support for pornhubthbh7ap3u.onion 2021-06-21 04:08:15 +07:00
Sergey M․ 751c9ae39a
[pornhub] Detect geo restriction 2021-06-21 03:33:43 +07:00
Sergey M․ da32828208
[pornhub] Dismiss tbr extracted from download URLs (closes #28927)
No longer reliable
2021-06-21 03:22:37 +07:00
Sergey M․ 2ccee8db74
[curiositystream:collection] Extend _VALID_URL (closes #26326, closes #29117) 2021-06-21 01:54:52 +07:00
Sergey M․ 47f2f2fbe9
[youtube] Make get_video_info processing more robust (closes #29333) 2021-06-21 01:35:21 +07:00
Sergey M․ 03ab02730f
[youtube] Workaround for get_video_info request (refs #29333)
See https://github.com/ytdl-org/youtube-dl/issues/29333#issuecomment-864049544
2021-06-21 01:34:27 +07:00
Tianyi Shi 4c77a2e538
[bilibili] Strip uploader name (#29202) 2021-06-21 01:03:21 +07:00
bopol 4131703001
[youtube] Update invidious instance list (#29281) 2021-06-21 00:42:09 +07:00
Logan B cc21aebe90
[umg:de] Update GraphQL API URL (#29304)
Previous one no longer resolves

Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-06-21 00:41:14 +07:00
Sergey M․ 57b9a4b4c6
[nrk] Switch psapi URL to https (closes #29344)
Catalog calls no longer work via http
2021-06-21 00:36:28 +07:00
kikuyan 3a7ef27cf3
[postprocessor/ffmpeg] Show ffmpeg output on error (refs #22680) (#29336) 2021-06-20 23:58:19 +07:00
kikuyan a7f61feab2
[egghead] Add support for app.egghead.io (closes #28404) (#29303)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-06-17 10:34:33 +07:00
kikuyan 8fe5d54eb7
[appleconnect] Fix extraction (#29208) 2021-06-17 04:12:13 +07:00
kikuyan d156bc8d59
[orf:tvthek] Add support for MPD formats (closes #28672) (#29236) 2021-06-17 04:02:06 +07:00
Sergey M c2350cac24
[README.md] Update MSVC 2010 redist URL (closes #29222) 2021-06-06 05:32:27 +07:00
166 changed files with 17380 additions and 3713 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.06.06
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2021.06.06
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2021.06.06**
- [ ] I've verified that I'm running youtube-dl version **2021.12.17**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

1
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1 @@
blank_issues_enabled: false

View File

@ -1,81 +1,476 @@
name: CI
on: [push, pull_request]
env:
all-cpython-versions: 2.6, 2.7, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12
main-cpython-versions: 2.7, 3.2, 3.5, 3.9, 3.11
pypy-versions: pypy-2.7, pypy-3.6, pypy-3.7
cpython-versions: main
test-set: core
# Python beta version to be built using pyenv before setup-python support
# Must also be included in all-cpython-versions
next: 3.13
on:
push:
# push inputs aren't known to GitHub
inputs:
cpython-versions:
type: string
default: all
test-set:
type: string
default: core
pull_request:
# pull_request inputs aren't known to GitHub
inputs:
cpython-versions:
type: string
default: main
test-set:
type: string
default: both
workflow_dispatch:
inputs:
cpython-versions:
type: choice
description: CPython versions (main = 2.7, 3.2, 3.5, 3.9, 3.11)
options:
- all
- main
required: true
default: main
test-set:
type: choice
description: core, download
options:
- both
- core
- download
required: true
default: both
permissions:
contents: read
jobs:
select:
name: Select tests from inputs
runs-on: ubuntu-latest
outputs:
cpython-versions: ${{ steps.run.outputs.cpython-versions }}
test-set: ${{ steps.run.outputs.test-set }}
own-pip-versions: ${{ steps.run.outputs.own-pip-versions }}
steps:
# push and pull_request inputs aren't known to GitHub (pt3)
- name: Set push defaults
if: ${{ github.event_name == 'push' }}
env:
cpython-versions: all
test-set: core
run: |
echo "cpython-versions=${{env.cpython-versions}}" >> "$GITHUB_ENV"
echo "test_set=${{env.test_set}}" >> "$GITHUB_ENV"
- name: Get pull_request inputs
if: ${{ github.event_name == 'pull_request' }}
env:
cpython-versions: main
test-set: both
run: |
echo "cpython-versions=${{env.cpython-versions}}" >> "$GITHUB_ENV"
echo "test_set=${{env.test_set}}" >> "$GITHUB_ENV"
- name: Make version array
id: run
run: |
# Make a JSON Array from comma/space-separated string (no extra escaping)
json_list() { \
ret=""; IFS="${IFS},"; set -- $*; \
for a in "$@"; do \
ret=$(printf '%s"%s"' "${ret}${ret:+, }" "$a"); \
done; \
printf '[%s]' "$ret"; }
tests="${{ inputs.test-set || env.test-set }}"
[ $tests = both ] && tests="core download"
printf 'test-set=%s\n' "$(json_list $tests)" >> "$GITHUB_OUTPUT"
versions="${{ inputs.cpython-versions || env.cpython-versions }}"
if [ "$versions" = all ]; then \
versions="${{ env.all-cpython-versions }}"; else \
versions="${{ env.main-cpython-versions }}"; \
fi
printf 'cpython-versions=%s\n' \
"$(json_list ${versions}${versions:+, }${{ env.pypy-versions }})" >> "$GITHUB_OUTPUT"
# versions with a special get-pip.py in a per-version subdirectory
printf 'own-pip-versions=%s\n' \
"$(json_list 2.6, 2.7, 3.2, 3.3, 3.4, 3.5, 3.6)" >> "$GITHUB_OUTPUT"
tests:
name: Tests
name: Run tests
needs: select
permissions:
contents: read
packages: write
runs-on: ${{ matrix.os }}
env:
PIP: python -m pip
PIP_DISABLE_PIP_VERSION_CHECK: true
PIP_NO_PYTHON_VERSION_WARNING: true
strategy:
fail-fast: true
matrix:
os: [ubuntu-18.04]
# TODO: python 2.6
python-version: [2.7, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6, pypy-3.7]
os: [ubuntu-20.04]
python-version: ${{ fromJSON(needs.select.outputs.cpython-versions) }}
python-impl: [cpython]
ytdl-test-set: [core, download]
ytdl-test-set: ${{ fromJSON(needs.select.outputs.test-set) }}
run-tests-ext: [sh]
include:
# python 3.2 is only available on windows via setup-python
- os: windows-latest
python-version: 3.2
- os: windows-2019
python-version: 3.4
python-impl: cpython
ytdl-test-set: core
ytdl-test-set: ${{ contains(needs.select.outputs.test-set, 'core') && 'core' || 'nocore' }}
run-tests-ext: bat
- os: windows-latest
python-version: 3.2
- os: windows-2019
python-version: 3.4
python-impl: cpython
ytdl-test-set: download
ytdl-test-set: ${{ contains(needs.select.outputs.test-set, 'download') && 'download' || 'nodownload' }}
run-tests-ext: bat
# jython
- os: ubuntu-18.04
- os: ubuntu-20.04
python-version: 2.7
python-impl: jython
ytdl-test-set: core
ytdl-test-set: ${{ contains(needs.select.outputs.test-set, 'core') && 'core' || 'nocore' }}
run-tests-ext: sh
- os: ubuntu-18.04
- os: ubuntu-20.04
python-version: 2.7
python-impl: jython
ytdl-test-set: download
ytdl-test-set: ${{ contains(needs.select.outputs.test-set, 'download') && 'download' || 'nodownload' }}
run-tests-ext: sh
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
if: ${{ matrix.python-impl == 'cpython' }}
- name: Prepare Linux
if: ${{ startswith(matrix.os, 'ubuntu') }}
shell: bash
run: |
# apt in runner, if needed, may not be up-to-date
sudo apt-get update
- name: Checkout
uses: actions/checkout@v3
#-------- Python 3 -----
- name: Set up supported Python ${{ matrix.python-version }}
id: setup-python
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version != '2.6' && matrix.python-version != '2.7' && matrix.python-version != env.next }}
# wrap broken actions/setup-python@v4
# NB may run apt-get install in Linux
uses: ytdl-org/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
cache-build: true
allow-build: info
- name: Locate supported Python ${{ matrix.python-version }}
if: ${{ env.pythonLocation }}
shell: bash
run: |
echo "PYTHONHOME=${pythonLocation}" >> "$GITHUB_ENV"
export expected="${{ steps.setup-python.outputs.python-path }}"
dirname() { printf '%s\n' \
'import os, sys' \
'print(os.path.dirname(sys.argv[1]))' \
| ${expected} - "$1"; }
expd="$(dirname "$expected")"
export python="$(command -v python)"
[ "$expd" = "$(dirname "$python")" ] || echo "PATH=$expd:${PATH}" >> "$GITHUB_ENV"
[ -x "$python" ] || printf '%s\n' \
'import os' \
'exp = os.environ["expected"]' \
'python = os.environ["python"]' \
'exps = os.path.split(exp)' \
'if python and (os.path.dirname(python) == exp[0]):' \
' exit(0)' \
'exps[1] = "python" + os.path.splitext(exps[1])[1]' \
'python = os.path.join(*exps)' \
'try:' \
' os.symlink(exp, python)' \
'except AttributeError:' \
' os.rename(exp, python)' \
| ${expected} -
printf '%s\n' \
'import sys' \
'print(sys.path)' \
| ${expected} -
#-------- Python next (was 3.12) -
- name: Set up CPython 3.next environment
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version == env.next }}
shell: bash
run: |
PYENV_ROOT=$HOME/.local/share/pyenv
echo "PYENV_ROOT=${PYENV_ROOT}" >> "$GITHUB_ENV"
- name: Cache Python 3.next
id: cachenext
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version == env.next }}
uses: actions/cache@v3
with:
key: python-${{ env.next }}
path: |
${{ env.PYENV_ROOT }}
- name: Build and set up Python 3.next
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version == env.next && ! steps.cachenext.outputs.cache-hit }}
# dl and build locally
shell: bash
run: |
# Install build environment
sudo apt-get install -y build-essential llvm libssl-dev tk-dev \
libncursesw5-dev libreadline-dev libsqlite3-dev \
libffi-dev xz-utils zlib1g-dev libbz2-dev liblzma-dev
# Download PyEnv from its GitHub repository.
export PYENV_ROOT=${{ env.PYENV_ROOT }}
export PATH=$PYENV_ROOT/bin:$PATH
git clone "https://github.com/pyenv/pyenv.git" "$PYENV_ROOT"
pyenv install ${{ env.next }}
- name: Locate Python 3.next
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version == env.next }}
shell: bash
run: |
PYTHONHOME="$(echo "${{ env.PYENV_ROOT }}/versions/${{ env.next }}."*)"
test -n "$PYTHONHOME"
echo "PYTHONHOME=$PYTHONHOME" >> "$GITHUB_ENV"
echo "PATH=${PYTHONHOME}/bin:$PATH" >> "$GITHUB_ENV"
#-------- Python 2.7 --
- name: Set up Python 2.7
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version == '2.7' }}
# install 2.7
shell: bash
run: |
sudo apt-get install -y python2 python-is-python2
echo "PYTHONHOME=/usr" >> "$GITHUB_ENV"
#-------- Python 2.6 --
- name: Set up Python 2.6 environment
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version == '2.6' }}
shell: bash
run: |
openssl_name=openssl-1.0.2u
echo "openssl_name=${openssl_name}" >> "$GITHUB_ENV"
openssl_dir=$HOME/.local/opt/$openssl_name
echo "openssl_dir=${openssl_dir}" >> "$GITHUB_ENV"
PYENV_ROOT=$HOME/.local/share/pyenv
echo "PYENV_ROOT=${PYENV_ROOT}" >> "$GITHUB_ENV"
sudo apt-get install -y openssl ca-certificates
- name: Cache Python 2.6
id: cache26
if: ${{ matrix.python-version == '2.6' }}
uses: actions/cache@v3
with:
key: python-2.6.9
path: |
${{ env.openssl_dir }}
${{ env.PYENV_ROOT }}
- name: Build and set up Python 2.6
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version == '2.6' && ! steps.cache26.outputs.cache-hit }}
# dl and build locally
shell: bash
run: |
# Install build environment
sudo apt-get install -y build-essential llvm libssl-dev tk-dev \
libncursesw5-dev libreadline-dev libsqlite3-dev \
libffi-dev xz-utils zlib1g-dev libbz2-dev liblzma-dev
# Download and install OpenSSL 1.0.2, back in time
openssl_name=${{ env.openssl_name }}
openssl_targz=${openssl_name}.tar.gz
openssl_dir=${{ env.openssl_dir }}
openssl_inc=$openssl_dir/include
openssl_lib=$openssl_dir/lib
openssl_ssl=$openssl_dir/ssl
curl -L "https://www.openssl.org/source/$openssl_targz" -o $openssl_targz
tar -xf $openssl_targz
( cd $openssl_name; \
./config --prefix=$openssl_dir --openssldir=${openssl_dir}/ssl \
--libdir=lib -Wl,-rpath=${openssl_dir}/lib shared zlib-dynamic && \
make && \
make install )
rm -rf $openssl_name
rmdir $openssl_ssl/certs && ln -s /etc/ssl/certs $openssl_ssl/certs
# Download PyEnv from its GitHub repository.
export PYENV_ROOT=${{ env.PYENV_ROOT }}
export PATH=$PYENV_ROOT/bin:$PATH
git clone "https://github.com/pyenv/pyenv.git" "$PYENV_ROOT"
# Prevent pyenv build trying (and failing) to update pip
export GET_PIP=get-pip-2.6.py
echo 'import sys; sys.exit(0)' > ${GET_PIP}
GET_PIP=$(realpath $GET_PIP)
# Build and install Python
export CFLAGS="-I$openssl_inc"
export LDFLAGS="-L$openssl_lib"
export LD_LIBRARY_PATH="$openssl_lib"
pyenv install 2.6.9
- name: Locate Python 2.6
if: ${{ matrix.python-impl == 'cpython' && matrix.python-version == '2.6' }}
shell: bash
run: |
PYTHONHOME="${{ env.PYENV_ROOT }}/versions/2.6.9"
echo "PYTHONHOME=$PYTHONHOME" >> "$GITHUB_ENV"
echo "PATH=${PYTHONHOME}/bin:$PATH" >> "$GITHUB_ENV"
echo "LD_LIBRARY_PATH=${{ env.openssl_dir }}/lib${LD_LIBRARY_PATH:+:}${LD_LIBRARY_PATH}" >> "$GITHUB_ENV"
#-------- Jython ------
- name: Set up Java 8
if: ${{ matrix.python-impl == 'jython' }}
uses: actions/setup-java@v1
uses: actions/setup-java@v3
with:
java-version: 8
distribution: 'zulu'
- name: Setup Jython environment
if: ${{ matrix.python-impl == 'jython' }}
shell: bash
run: |
echo "JYTHON_ROOT=${HOME}/jython" >> "$GITHUB_ENV"
echo "PIP=pip" >> "$GITHUB_ENV"
- name: Cache Jython
id: cachejy
if: ${{ matrix.python-impl == 'jython' && matrix.python-version == '2.7' }}
uses: actions/cache@v3
with:
# 2.7.3 now available, may solve SNI issue
key: jython-2.7.1
path: |
${{ env.JYTHON_ROOT }}
- name: Install Jython
if: ${{ matrix.python-impl == 'jython' }}
if: ${{ matrix.python-impl == 'jython' && matrix.python-version == '2.7' && ! steps.cachejy.outputs.cache-hit }}
shell: bash
run: |
wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
java -jar jython-installer.jar -s -d "$HOME/jython"
echo "$HOME/jython/bin" >> $GITHUB_PATH
- name: Install nose
if: ${{ matrix.python-impl != 'jython' }}
run: pip install nose
- name: Install nose (Jython)
if: ${{ matrix.python-impl == 'jython' }}
# Working around deprecation of support for non-SNI clients at PyPI CDN (see https://status.python.org/incidents/hzmjhqsdjqgb)
JYTHON_ROOT="${{ env.JYTHON_ROOT }}"
curl -L "https://repo1.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar" -o jython-installer.jar
java -jar jython-installer.jar -s -d "${JYTHON_ROOT}"
echo "${JYTHON_ROOT}/bin" >> "$GITHUB_PATH"
- name: Set up cached Jython
if: ${{ steps.cachejy.outputs.cache-hit }}
shell: bash
run: |
wget https://files.pythonhosted.org/packages/99/4f/13fb671119e65c4dce97c60e67d3fd9e6f7f809f2b307e2611f4701205cb/nose-1.3.7-py2-none-any.whl
pip install nose-1.3.7-py2-none-any.whl
JYTHON_ROOT="${{ env.JYTHON_ROOT }}"
echo "${JYTHON_ROOT}/bin" >> $GITHUB_PATH
- name: Install supporting Python 2.7 if possible
if: ${{ steps.cachejy.outputs.cache-hit }}
shell: bash
run: |
sudo apt-get install -y python2.7 || true
#-------- pip ---------
- name: Set up supported Python ${{ matrix.python-version }} pip
if: ${{ (matrix.python-version != '3.2' && steps.setup-python.outputs.python-path) || matrix.python-version == '2.7' }}
# This step may run in either Linux or Windows
shell: bash
run: |
echo "$PATH"
echo "$PYTHONHOME"
# curl is available on both Windows and Linux, -L follows redirects, -O gets name
python -m ensurepip || python -m pip --version || { \
get_pip="${{ contains(needs.select.outputs.own-pip-versions, matrix.python-version) && format('{0}/', matrix.python-version) || '' }}"; \
curl -L -O "https://bootstrap.pypa.io/pip/${get_pip}get-pip.py"; \
python get-pip.py; }
- name: Set up Python 2.6 pip
if: ${{ matrix.python-version == '2.6' }}
shell: bash
run: |
python -m pip --version || { \
curl -L -O "https://bootstrap.pypa.io/pip/2.6/get-pip.py"; \
curl -L -O "https://files.pythonhosted.org/packages/ac/95/a05b56bb975efa78d3557efa36acaf9cf5d2fd0ee0062060493687432e03/pip-9.0.3-py2.py3-none-any.whl"; \
python get-pip.py --no-setuptools --no-wheel pip-9.0.3-py2.py3-none-any.whl; }
# work-around to invoke pip module on 2.6: https://bugs.python.org/issue2751
echo "PIP=python -m pip.__main__" >> "$GITHUB_ENV"
- name: Set up other Python ${{ matrix.python-version }} pip
if: ${{ matrix.python-version == '3.2' && steps.setup-python.outputs.python-path }}
shell: bash
run: |
python -m pip --version || { \
curl -L -O "https://bootstrap.pypa.io/pip/3.2/get-pip.py"; \
curl -L -O "https://files.pythonhosted.org/packages/b2/d0/cd115fe345dd6f07ec1c780020a7dfe74966fceeb171e0f20d1d4905b0b7/pip-7.1.2-py2.py3-none-any.whl"; \
python get-pip.py --no-setuptools --no-wheel pip-7.1.2-py2.py3-none-any.whl; }
#-------- unittest ----
- name: Upgrade Unittest for Python 2.6
if: ${{ matrix.python-version == '2.6' }}
shell: bash
run: |
# Work around deprecation of support for non-SNI clients at PyPI CDN (see https://status.python.org/incidents/hzmjhqsdjqgb)
$PIP -qq show unittest2 || { \
for u in "65/26/32b8464df2a97e6dd1b656ed26b2c194606c16fe163c695a992b36c11cdf/six-1.13.0-py2.py3-none-any.whl" \
"f2/94/3af39d34be01a24a6e65433d19e107099374224905f1e0cc6bbe1fd22a2f/argparse-1.4.0-py2.py3-none-any.whl" \
"c7/a3/c5da2a44c85bfbb6eebcfc1dde24933f8704441b98fdde6528f4831757a6/linecache2-1.0.0-py2.py3-none-any.whl" \
"17/0a/6ac05a3723017a967193456a2efa0aa9ac4b51456891af1e2353bb9de21e/traceback2-1.4.0-py2.py3-none-any.whl" \
"72/20/7f0f433060a962200b7272b8c12ba90ef5b903e218174301d0abfd523813/unittest2-1.1.0-py2.py3-none-any.whl"; do \
curl -L -O "https://files.pythonhosted.org/packages/${u}"; \
$PIP install ${u##*/}; \
done; }
# make tests use unittest2
for test in ./test/test_*.py ./test/helper.py; do
sed -r -i -e '/^import unittest$/s/test/test2 as unittest/' "$test"
done
#-------- nose --------
- name: Install nose for Python ${{ matrix.python-version }}
if: ${{ (matrix.python-version != '3.2' && steps.setup-python.outputs.python-path) || (matrix.python-impl == 'cpython' && (matrix.python-version == '2.7' || matrix.python-version == env.next)) }}
shell: bash
run: |
echo "$PATH"
echo "$PYTHONHOME"
# Use PyNose for recent Pythons instead of Nose
py3ver="${{ matrix.python-version }}"
py3ver=${py3ver#3.}
[ "$py3ver" != "${{ matrix.python-version }}" ] && py3ver=${py3ver%.*} || py3ver=0
[ "$py3ver" -ge 9 ] && nose=pynose || nose=nose
$PIP -qq show $nose || $PIP install $nose
- name: Install nose for other Python 2
if: ${{ matrix.python-impl == 'jython' || (matrix.python-impl == 'cpython' && matrix.python-version == '2.6') }}
shell: bash
run: |
# Work around deprecation of support for non-SNI clients at PyPI CDN (see https://status.python.org/incidents/hzmjhqsdjqgb)
$PIP -qq show nose || { \
curl -L -O "https://files.pythonhosted.org/packages/99/4f/13fb671119e65c4dce97c60e67d3fd9e6f7f809f2b307e2611f4701205cb/nose-1.3.7-py2-none-any.whl"; \
$PIP install nose-1.3.7-py2-none-any.whl; }
- name: Install nose for other Python 3
if: ${{ matrix.python-version == '3.2' && steps.setup-python.outputs.python-path }}
shell: bash
run: |
$PIP -qq show nose || { \
curl -L -O "https://files.pythonhosted.org/packages/15/d8/dd071918c040f50fa1cf80da16423af51ff8ce4a0f2399b7bf8de45ac3d9/nose-1.3.7-py3-none-any.whl"; \
$PIP install nose-1.3.7-py3-none-any.whl; }
- name: Set up nosetest test
if: ${{ contains(needs.select.outputs.test-set, matrix.ytdl-test-set ) }}
shell: bash
run: |
# set PYTHON_VER
PYTHON_VER=${{ matrix.python-version }}
[ "${PYTHON_VER#*-}" != "$PYTHON_VER" ] || PYTHON_VER="${{ matrix.python-impl }}-${PYTHON_VER}"
echo "PYTHON_VER=$PYTHON_VER" >> "$GITHUB_ENV"
echo "PYTHON_IMPL=${{ matrix.python-impl }}" >> "$GITHUB_ENV"
# define a test to validate the Python version used by nosetests
printf '%s\n' \
'from __future__ import unicode_literals' \
'import sys, os, platform' \
'try:' \
' import unittest2 as unittest' \
'except ImportError:' \
' import unittest' \
'class TestPython(unittest.TestCase):' \
' def setUp(self):' \
' self.ver = os.environ["PYTHON_VER"].split("-")' \
' def test_python_ver(self):' \
' self.assertEqual(["%d" % v for v in sys.version_info[:2]], self.ver[-1].split(".")[:2])' \
' self.assertTrue(sys.version.startswith(self.ver[-1]))' \
' self.assertIn(self.ver[0], ",".join((sys.version, platform.python_implementation())).lower())' \
' def test_python_impl(self):' \
' self.assertIn(platform.python_implementation().lower(), (os.environ["PYTHON_IMPL"], self.ver[0]))' \
> test/test_python.py
#-------- TESTS -------
- name: Run tests
if: ${{ contains(needs.select.outputs.test-set, matrix.ytdl-test-set ) }}
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
env:
YTDL_TEST_SET: ${{ matrix.ytdl-test-set }}
run: ./devscripts/run_tests.${{ matrix.run-tests-ext }}
run: |
./devscripts/run_tests.${{ matrix.run-tests-ext }}
flake8:
name: Linter
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: 3.9
- name: Install flake8
run: pip install flake8
- name: Run flake8
run: flake8 .

View File

@ -150,7 +150,7 @@ After you have ensured this site is distributing its content legally, you can fo
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). This makes the extractor available for use, as long as the class ends with `IE`.
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):

View File

@ -1,3 +1,28 @@
version 2021.12.17
Core
* [postprocessor/ffmpeg] Show ffmpeg output on error (#22680, #29336)
Extractors
* [youtube] Update signature function patterns (#30363, #30366)
* [peertube] Only call description endpoint if necessary (#29383)
* [periscope] Pass referer to HLS requests (#29419)
- [liveleak] Remove extractor (#17625, #24222, #29331)
+ [pornhub] Add support for pornhubthbh7ap3u.onion
* [pornhub] Detect geo restriction
* [pornhub] Dismiss tbr extracted from download URLs (#28927)
* [curiositystream:collection] Extend _VALID_URL (#26326, #29117)
* [youtube] Make get_video_info processing more robust (#29333)
* [youtube] Workaround for get_video_info request (#29333)
* [bilibili] Strip uploader name (#29202)
* [youtube] Update invidious instance list (#29281)
* [umg:de] Update GraphQL API URL (#29304)
* [nrk] Switch psapi URL to https (#29344)
+ [egghead] Add support for app.egghead.io (#28404, #29303)
* [appleconnect] Fix extraction (#29208)
+ [orf:tvthek] Add support for MPD formats (#28672, #29236)
version 2021.06.06
Extractors

149
README.md
View File

@ -33,7 +33,7 @@ Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.ex
You can also use pip:
sudo -H pip install --upgrade youtube-dl
This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
macOS users can install youtube-dl with [Homebrew](https://brew.sh/):
@ -563,7 +563,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- `start_time` (numeric): Time in seconds where the reproduction should start, as specified in the URL
- `end_time` (numeric): Time in seconds where the reproduction should end, as specified in the URL
- `format` (string): A human-readable description of the format
- `format` (string): A human-readable description of the format
- `format_id` (string): Format code specified by `--format`
- `format_note` (string): Additional info about the format
- `width` (numeric): Width of the video
@ -632,7 +632,7 @@ To use percent literals in an output template use `%%`. To output to stdout use
The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title.
#### Output template and Windows batch files
@ -675,7 +675,7 @@ The general syntax for format selection is `--format FORMAT` or shorter `-f FORM
**tl;dr:** [navigate me to examples](#format-selection-examples).
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file.
@ -760,7 +760,7 @@ Videos can be filtered by their upload date using the options `--date`, `--dateb
- Absolute dates: Dates in the format `YYYYMMDD`.
- Relative dates: Dates in the format `(now|today)[+-][0-9](day|week|month|year)(s)?`
Examples:
```bash
@ -893,7 +893,7 @@ Since June 2012 ([#342](https://github.com/ytdl-org/youtube-dl/issues/342)) yout
### The exe throws an error due to missing `MSVCR100.dll`
To run the exe you need to install first the [Microsoft Visual C++ 2010 Redistributable Package (x86)](https://www.microsoft.com/en-US/download/details.aspx?id=5555).
To run the exe you need to install first the [Microsoft Visual C++ 2010 Service Pack 1 Redistributable Package (x86)](https://download.microsoft.com/download/1/6/5/165255E7-1014-4D0A-B094-B6A430A6BFFC/vcredist_x86.exe).
### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?
@ -918,7 +918,7 @@ Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the op
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [Get cookies.txt](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [Get cookies.txt LOCALLY](https://chrome.google.com/webstore/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
@ -1000,6 +1000,8 @@ To run the test, simply invoke your favorite test runner, or execute a test file
python test/test_download.py
nosetests
For Python versions 3.6 and later, you can use [pynose](https://pypi.org/project/pynose/) to implement `nosetests`. The original [nose](https://pypi.org/project/nose/) has not been upgraded for 3.10 and later.
See item 6 of [new extractor tutorial](#adding-support-for-a-new-site) for how to run extractor specific test cases.
If you want to create a build of youtube-dl yourself, you'll need
@ -1069,9 +1071,11 @@ After you have ensured this site is distributing its content legally, you can fo
}
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test (actually, test case) then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note:
* the test names use the extractor class name **without the trailing `IE`**
* tests with `only_matching` key in test's dict are not counted.
8. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
9. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py
@ -1089,7 +1093,7 @@ In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
This section introduces guidelines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
@ -1112,7 +1116,7 @@ Say you have some source dictionary `meta` that you've fetched as JSON with HTTP
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
@ -1156,7 +1160,7 @@ description = self._search_regex(
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks
When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
@ -1204,7 +1208,7 @@ r'(id|ID)=(?P<id>\d+)'
#### Make regular expressions relaxed and flexible
When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
##### Example
Say you need to extract `title` from the following HTML code:
@ -1228,7 +1232,7 @@ title = self._search_regex(
webpage, 'title', group='title')
```
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
@ -1329,27 +1333,114 @@ Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`]
Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON.
Use `traverse_obj` for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples
##### Safely extract optional description from parsed JSON
When processing complex JSON, as often returned by site API requests or stashed in web pages for "hydration", you can use the `traverse_obj()` utility function to handle multiple fallback values and to ensure the expected type of metadata items. The function's docstring defines how the function works: also review usage in the codebase for more examples.
In this example, a text `description`, or `None`, is pulled from the `.result.video[0].summary` member of the parsed JSON `response`, if available.
```python
description = traverse_obj(response, ('result', 'video', 0, 'summary', T(compat_str)))
```
`T(...)` is a shorthand for a set literal; if you hate people who still run Python 2.6, `T(type_or_transformation)` could be written as a set literal `{type_or_transformation}`.
Some extractors use the older and less capable `try_get()` function in the same way.
```python
description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
```
##### Safely extract more optional metadata
In this example, various optional metadata values are extracted from the `.result.video[0]` member of the parsed JSON `response`, which is expected to be a JS object, parsed into a `dict`, with no crash if that isn't so, or if any of the target values are missing or invalid.
```python
video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
video = traverse_obj(response, ('result', 'video', 0, T(dict))) or {}
# formerly:
# video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
description = video.get('summary')
duration = float_or_none(video.get('durationMs'), scale=1000)
view_count = int_or_none(video.get('views'))
```
#### Safely extract nested lists
Suppose you've extracted JSON like this into a Python data structure named `media_json` using, say, the `_download_json()` or `_parse_json()` methods of `InfoExtractor`:
```json
{
"title": "Example video",
"comment": "try extracting this",
"media": [{
"type": "bad",
"size": 320,
"url": "https://some.cdn.site/bad.mp4"
}, {
"type": "streaming",
"url": "https://some.cdn.site/hls.m3u8"
}, {
"type": "super",
"size": 1280,
"url": "https://some.cdn.site/good.webm"
}],
"moreStuff": "more values",
...
}
```
Then extractor code like this can collect the various fields of the JSON:
```python
...
from ..utils import (
determine_ext,
int_or_none,
T,
traverse_obj,
txt_or_none,
url_or_none,
)
...
...
info_dict = {}
# extract title and description if valid and not empty
info_dict.update(traverse_obj(media_json, {
'title': ('title', T(txt_or_none)),
'description': ('comment', T(txt_or_none)),
}))
# extract any recognisable media formats
fmts = []
# traverse into "media" list, extract `dict`s with desired keys
for fmt in traverse_obj(media_json, ('media', Ellipsis, {
'format_id': ('type', T(txt_or_none)),
'url': ('url', T(url_or_none)),
'width': ('size', T(int_or_none)), })):
# bad `fmt` values were `None` and removed
if 'url' not in fmt:
continue
fmt_url = fmt['url'] # known to be valid URL
ext = determine_ext(fmt_url)
if ext == 'm3u8':
fmts.extend(self._extract_m3u8_formats(fmt_url, video_id, 'mp4', fatal=False))
else:
fmt['ext'] = ext
fmts.append(fmt)
# sort, raise if no formats
self._sort_formats(fmts)
info_dict['formats'] = fmts
...
```
The extractor raises an exception rather than random crashes if the JSON structure changes so that no formats are found.
# EMBEDDING YOUTUBE-DL
youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/ytdl-org/youtube-dl/issues/new).
@ -1406,7 +1497,11 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
# BUGS
Bugs and suggestions should be reported at: <https://github.com/ytdl-org/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
Bugs and suggestions should be reported in the issue tracker: <https://github.com/ytdl-org/youtube-dl/issues> (<https://yt-dl.org/bug> is an alias for this). Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
## Opening a bug report or suggestion
Be sure to follow instructions provided **below** and **in the issue tracker**. Complete the appropriate issue template fully. Consider whether your problem is covered by an existing issue: if so, follow the discussion there. Avoid commenting on existing duplicate issues as such comments do not add to the discussion of the issue and are liable to be treated as spam.
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
@ -1426,17 +1521,17 @@ $ youtube-dl -v <your command line>
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist):
Finally please review your issue to avoid various common mistakes (you can and should use this as a checklist) listed below.
### Is the description of the issue itself sufficient?
We often get issue reports that we cannot really decipher. While in most cases we eventually get the required information after asking back multiple times, this poses an unnecessary drain on our resources. Many contributors, including myself, are also not native speakers, so we may misread some parts.
We often get issue reports that are hard to understand. To avoid subsequent clarifications, and to assist participants who are not native English speakers, please elaborate on what feature you are requesting, or what bug you want to be fixed.
So please elaborate on what feature you are requesting, or what bug you want to be fixed. Make sure that it's obvious
Make sure that it's obvious
- What the problem is
- How it could be fixed
- How your proposed solution would look like
- How your proposed solution would look
If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
@ -1446,14 +1541,14 @@ If your server has multiple IPs or you suspect censorship, adding `--call-home`
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL.
### Is the issue already documented?
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/ytdl-org/youtube-dl/search?type=Issues) of this repository. Initially, at least, use the search term `-label:duplicate` to focus on active issues. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
### Are you using the latest version?
Before reporting any issue, type `youtube-dl -U`. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
### Is the issue already documented?
Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/ytdl-org/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.

1
devscripts/__init__.py Normal file
View File

@ -0,0 +1 @@
# Empty file needed to make devscripts.utils properly importable from outside

View File

@ -5,8 +5,12 @@ import os
from os.path import dirname as dirn
import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
import youtube_dl
from youtube_dl.compat import compat_open as open
from utils import read_file
BASH_COMPLETION_FILE = "youtube-dl.bash-completion"
BASH_COMPLETION_TEMPLATE = "devscripts/bash-completion.in"
@ -18,9 +22,8 @@ def build_completion(opt_parser):
for option in group.option_list:
# for every long flag
opts_flag.append(option.get_opt_string())
with open(BASH_COMPLETION_TEMPLATE) as f:
template = f.read()
with open(BASH_COMPLETION_FILE, "w") as f:
template = read_file(BASH_COMPLETION_TEMPLATE)
with open(BASH_COMPLETION_FILE, "w", encoding='utf-8') as f:
# just using the special char
filled_template = template.replace("{{flags}}", " ".join(opts_flag))
f.write(filled_template)

83
devscripts/cli_to_api.py Executable file
View File

@ -0,0 +1,83 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
"""
This script displays the API parameters corresponding to a yt-dl command line
Example:
$ ./cli_to_api.py -f best
{u'format': 'best'}
$
"""
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import youtube_dl
from types import MethodType
def cli_to_api(*opts):
YDL = youtube_dl.YoutubeDL
# to extract the parsed options, break out of YoutubeDL instantiation
# return options via this Exception
class ParseYTDLResult(Exception):
def __init__(self, result):
super(ParseYTDLResult, self).__init__('result')
self.opts = result
# replacement constructor that raises ParseYTDLResult
def ytdl_init(ydl, ydl_opts):
super(YDL, ydl).__init__(ydl_opts)
raise ParseYTDLResult(ydl_opts)
# patch in the constructor
YDL.__init__ = MethodType(ytdl_init, YDL)
# core parser
def parsed_options(argv):
try:
youtube_dl._real_main(list(argv))
except ParseYTDLResult as result:
return result.opts
# from https://github.com/yt-dlp/yt-dlp/issues/5859#issuecomment-1363938900
default = parsed_options([])
def neq_opt(a, b):
if a == b:
return False
if a is None and repr(type(object)).endswith(".utils.DateRange'>"):
return '0001-01-01 - 9999-12-31' != '{0}'.format(b)
return a != b
diff = dict((k, v) for k, v in parsed_options(opts).items() if neq_opt(default[k], v))
if 'postprocessors' in diff:
diff['postprocessors'] = [pp for pp in diff['postprocessors'] if pp not in default['postprocessors']]
return diff
def main():
from pprint import PrettyPrinter
pprint = PrettyPrinter()
super_format = pprint.format
def format(object, context, maxlevels, level):
if repr(type(object)).endswith(".utils.DateRange'>"):
return '{0}: {1}>'.format(repr(object)[:-2], object), True, False
return super_format(object, context, maxlevels, level)
pprint.format = format
pprint.pprint(cli_to_api(*sys.argv))
if __name__ == '__main__':
main()

View File

@ -1,7 +1,6 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import json
import mimetypes
import netrc
@ -10,7 +9,9 @@ import os
import re
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_basestring,
@ -22,6 +23,7 @@ from youtube_dl.utils import (
make_HTTPS_handler,
sanitized_Request,
)
from utils import read_file
class GitHubReleaser(object):
@ -89,8 +91,7 @@ def main():
changelog_file, version, build_path = args
with io.open(changelog_file, encoding='utf-8') as inf:
changelog = inf.read()
changelog = read_file(changelog_file)
mobj = re.search(r'(?s)version %s\n{2}(.+?)\n{3}' % version, changelog)
body = mobj.group(1) if mobj else ''

View File

@ -6,10 +6,13 @@ import os
from os.path import dirname as dirn
import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
import youtube_dl
from youtube_dl.utils import shell_quote
from utils import read_file, write_file
FISH_COMPLETION_FILE = 'youtube-dl.fish'
FISH_COMPLETION_TEMPLATE = 'devscripts/fish-completion.in'
@ -38,11 +41,9 @@ def build_completion(opt_parser):
complete_cmd.extend(EXTRA_ARGS.get(long_option, []))
commands.append(shell_quote(complete_cmd))
with open(FISH_COMPLETION_TEMPLATE) as f:
template = f.read()
template = read_file(FISH_COMPLETION_TEMPLATE)
filled_template = template.replace('{{commands}}', '\n'.join(commands))
with open(FISH_COMPLETION_FILE, 'w') as f:
f.write(filled_template)
write_file(FISH_COMPLETION_FILE, filled_template)
parser = youtube_dl.parseOpts()[0]

View File

@ -6,16 +6,21 @@ import sys
import hashlib
import os.path
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn(dirn(os.path.abspath(__file__)))))
from devscripts.utils import read_file, write_file
from youtube_dl.compat import compat_open as open
if len(sys.argv) <= 1:
print('Specify the version number as parameter')
sys.exit()
version = sys.argv[1]
with open('update/LATEST_VERSION', 'w') as f:
f.write(version)
write_file('update/LATEST_VERSION', version)
versions_info = json.load(open('update/versions.json'))
versions_info = json.loads(read_file('update/versions.json'))
if 'signature' in versions_info:
del versions_info['signature']
@ -39,5 +44,5 @@ for key, filename in filenames.items():
versions_info['versions'][version] = new_version
versions_info['latest'] = version
with open('update/versions.json', 'w') as jsonf:
json.dump(versions_info, jsonf, indent=4, sort_keys=True)
with open('update/versions.json', 'w', encoding='utf-8') as jsonf:
json.dumps(versions_info, jsonf, indent=4, sort_keys=True)

View File

@ -2,14 +2,21 @@
from __future__ import unicode_literals
import json
import os.path
import sys
versions_info = json.load(open('update/versions.json'))
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
from utils import read_file, write_file
versions_info = json.loads(read_file('update/versions.json'))
version = versions_info['latest']
version_dict = versions_info['versions'][version]
# Read template page
with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
template = read_file('download.html.in')
template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', version_dict['bin'][0])
@ -18,5 +25,5 @@ template = template.replace('@EXE_URL@', version_dict['exe'][0])
template = template.replace('@EXE_SHA256SUM@', version_dict['exe'][1])
template = template.replace('@TAR_URL@', version_dict['tar'][0])
template = template.replace('@TAR_SHA256SUM@', version_dict['tar'][1])
with open('download.html', 'w', encoding='utf-8') as dlf:
dlf.write(template)
write_file('download.html', template)

View File

@ -5,17 +5,22 @@ from __future__ import with_statement, unicode_literals
import datetime
import glob
import io # For Python 2 compatibility
import os
import re
import sys
year = str(datetime.datetime.now().year)
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn(dirn(os.path.abspath(__file__)))))
from devscripts.utils import read_file, write_file
from youtube_dl import compat_str
year = compat_str(datetime.datetime.now().year)
for fn in glob.glob('*.html*'):
with io.open(fn, encoding='utf-8') as f:
content = f.read()
content = read_file(fn)
newc = re.sub(r'(?P<copyright>Copyright © 2011-)(?P<year>[0-9]{4})', 'Copyright © 2011-' + year, content)
if content != newc:
tmpFn = fn + '.part'
with io.open(tmpFn, 'wt', encoding='utf-8') as outf:
outf.write(newc)
write_file(tmpFn, newc)
os.rename(tmpFn, fn)

View File

@ -2,10 +2,16 @@
from __future__ import unicode_literals
import datetime
import io
import json
import os.path
import textwrap
import sys
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
from utils import write_file
atom_template = textwrap.dedent("""\
<?xml version="1.0" encoding="utf-8"?>
@ -72,5 +78,4 @@ for v in versions:
entries_str = textwrap.indent(''.join(entries), '\t')
atom_template = atom_template.replace('@ENTRIES@', entries_str)
with io.open('update/releases.atom', 'w', encoding='utf-8') as atom_file:
atom_file.write(atom_template)
write_file('update/releases.atom', atom_template)

View File

@ -5,15 +5,17 @@ import sys
import os
import textwrap
dirn = os.path.dirname
# We must be able to import youtube_dl
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
sys.path.insert(0, dirn(dirn(dirn(os.path.abspath(__file__)))))
import youtube_dl
from devscripts.utils import read_file, write_file
def main():
with open('supportedsites.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
template = read_file('supportedsites.html.in')
ie_htmls = []
for ie in youtube_dl.list_extractors(age_limit=None):
@ -29,8 +31,7 @@ def main():
template = template.replace('@SITES@', textwrap.indent('\n'.join(ie_htmls), '\t'))
with open('supportedsites.html', 'w', encoding='utf-8') as sitesf:
sitesf.write(template)
write_file('supportedsites.html', template)
if __name__ == '__main__':

View File

@ -1,10 +1,11 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import re
from utils import read_file, write_file
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
@ -14,8 +15,7 @@ def main():
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
readme = inf.read()
readme = read_file(infile)
bug_text = re.search(
r'(?s)#\s*BUGS\s*[^\n]*\s*(.*?)#\s*COPYRIGHT', readme).group(1)
@ -25,8 +25,7 @@ def main():
out = bug_text + dev_text
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
write_file(outfile, out)
if __name__ == '__main__':

View File

@ -1,8 +1,11 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import os.path
import sys
from utils import read_file, read_version, write_file
def main():
@ -13,17 +16,11 @@ def main():
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
issue_template_tmpl = read_file(infile)
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
out = issue_template_tmpl % {'version': read_version()}
out = issue_template_tmpl % {'version': locals()['__version__']}
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
write_file(outfile, out)
if __name__ == '__main__':
main()

View File

@ -1,28 +1,49 @@
from __future__ import unicode_literals, print_function
from inspect import getsource
import io
import os
from os.path import dirname as dirn
import re
import sys
print('WARNING: Lazy loading extractors is an experimental feature that may not always work', file=sys.stderr)
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
lazy_extractors_filename = sys.argv[1]
if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename)
# Py2: may be confused by leftover lazy_extractors.pyc
if sys.version_info[0] < 3:
for c in ('c', 'o'):
try:
os.remove(lazy_extractors_filename + 'c')
except OSError:
pass
from devscripts.utils import read_file, write_file
from youtube_dl.compat import compat_register_utf8
compat_register_utf8()
from youtube_dl.extractor import _ALL_CLASSES
from youtube_dl.extractor.common import InfoExtractor, SearchInfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read()
module_template = read_file('devscripts/lazy_load_template.py')
def get_source(m):
return re.sub(r'(?m)^\s*#.*\n', '', getsource(m))
module_contents = [
module_template + '\n' + getsource(InfoExtractor.suitable) + '\n',
'class LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n']
module_template,
get_source(InfoExtractor.suitable),
get_source(InfoExtractor._match_valid_url) + '\n',
'class LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n',
# needed for suitable() methods of Youtube extractor (see #28780)
'from youtube_dl.utils import parse_qs, variadic\n',
]
ie_template = '''
class {name}({bases}):
@ -54,7 +75,7 @@ def build_lazy_ie(ie, name):
valid_url=valid_url,
module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
s += '\n' + getsource(ie.suitable)
s += '\n' + get_source(ie.suitable)
if hasattr(ie, '_make_valid_url'):
# search extractors
s += make_valid_template.format(valid_url=ie._make_valid_url())
@ -94,7 +115,17 @@ for ie in ordered_cls:
module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names)))
module_src = '\n'.join(module_contents) + '\n'
module_src = '\n'.join(module_contents)
with io.open(lazy_extractors_filename, 'wt', encoding='utf-8') as f:
f.write(module_src)
write_file(lazy_extractors_filename, module_src + '\n')
# work around JVM byte code module limit in Jython
if sys.platform.startswith('java') and sys.version_info[:2] == (2, 7):
import subprocess
from youtube_dl.compat import compat_subprocess_get_DEVNULL
# if Python 2.7 is available, use it to compile the module for Jython
try:
# if Python 2.7 is available, use it to compile the module for Jython
subprocess.check_call(['python2.7', '-m', 'py_compile', lazy_extractors_filename], stdout=compat_subprocess_get_DEVNULL())
except Exception:
pass

View File

@ -1,8 +1,14 @@
from __future__ import unicode_literals
import io
import sys
import os.path
import re
import sys
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
from utils import read_file
from youtube_dl.compat import compat_open as open
README_FILE = 'README.md'
helptext = sys.stdin.read()
@ -10,8 +16,7 @@ helptext = sys.stdin.read()
if isinstance(helptext, bytes):
helptext = helptext.decode('utf-8')
with io.open(README_FILE, encoding='utf-8') as f:
oldreadme = f.read()
oldreadme = read_file(README_FILE)
header = oldreadme[:oldreadme.index('# OPTIONS')]
footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
@ -20,7 +25,7 @@ options = helptext[helptext.index(' General Options:') + 19:]
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
options = '# OPTIONS\n' + options + '\n'
with io.open(README_FILE, 'w', encoding='utf-8') as f:
with open(README_FILE, 'w', encoding='utf-8') as f:
f.write(header)
f.write(options)
f.write(footer)

View File

@ -1,17 +1,19 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import os
import os.path
import sys
# Import youtube_dl
ROOT_DIR = os.path.join(os.path.dirname(__file__), '..')
sys.path.insert(0, ROOT_DIR)
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
import youtube_dl
from utils import write_file
def main():
parser = optparse.OptionParser(usage='%prog OUTFILE.md')
@ -38,8 +40,7 @@ def main():
' - ' + md + '\n'
for md in gen_ies_md(ies))
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
write_file(outfile, out)
if __name__ == '__main__':

View File

@ -1,13 +1,13 @@
from __future__ import unicode_literals
import io
import optparse
import os.path
import re
from utils import read_file, write_file
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md')
PREFIX = r'''%YOUTUBE-DL(1)
# NAME
@ -29,8 +29,7 @@ def main():
outfile, = args
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
readme = read_file(README_FILE)
readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
@ -38,8 +37,7 @@ def main():
readme = filter_options(readme)
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(readme)
write_file(outfile, readme)
def filter_options(readme):

62
devscripts/utils.py Normal file
View File

@ -0,0 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
import argparse
import functools
import os.path
import subprocess
import sys
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_kwargs,
compat_open as open,
)
def read_file(fname):
with open(fname, encoding='utf-8') as f:
return f.read()
def write_file(fname, content, mode='w'):
with open(fname, mode, encoding='utf-8') as f:
return f.write(content)
def read_version(fname='youtube_dl/version.py'):
"""Get the version without importing the package"""
exec(compile(read_file(fname), fname, 'exec'))
return locals()['__version__']
def get_filename_args(has_infile=False, default_outfile=None):
parser = argparse.ArgumentParser()
if has_infile:
parser.add_argument('infile', help='Input file')
kwargs = {'nargs': '?', 'default': default_outfile} if default_outfile else {}
kwargs['help'] = 'Output file'
parser.add_argument('outfile', **compat_kwargs(kwargs))
opts = parser.parse_args()
if has_infile:
return opts.infile, opts.outfile
return opts.outfile
def compose_functions(*functions):
return lambda x: functools.reduce(lambda y, f: f(y), functions, x)
def run_process(*args, **kwargs):
kwargs.setdefault('text', True)
kwargs.setdefault('check', True)
kwargs.setdefault('capture_output', True)
if kwargs['text']:
kwargs.setdefault('encoding', 'utf-8')
kwargs.setdefault('errors', 'replace')
kwargs = compat_kwargs(kwargs)
return subprocess.run(args, **kwargs)

View File

@ -7,6 +7,8 @@ import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
import youtube_dl
from utils import read_file, write_file
ZSH_COMPLETION_FILE = "youtube-dl.zsh"
ZSH_COMPLETION_TEMPLATE = "devscripts/zsh-completion.in"
@ -34,15 +36,13 @@ def build_completion(opt_parser):
flags = [opt.get_opt_string() for opt in opts]
with open(ZSH_COMPLETION_TEMPLATE) as f:
template = f.read()
template = read_file(ZSH_COMPLETION_TEMPLATE)
template = template.replace("{{fileopts}}", "|".join(fileopts))
template = template.replace("{{diropts}}", "|".join(diropts))
template = template.replace("{{flags}}", " ".join(flags))
with open(ZSH_COMPLETION_FILE, "w") as f:
f.write(template)
write_file(ZSH_COMPLETION_FILE, template)
parser = youtube_dl.parseOpts()[0]

View File

@ -472,8 +472,6 @@
- **LinuxAcademy**
- **LiTV**
- **LiveJournal**
- **LiveLeak**
- **LiveLeakEmbed**
- **livestream**
- **livestream:original**
- **LnkGo**

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals
import errno
import io
import hashlib
import json
import os.path
@ -9,14 +8,17 @@ import re
import types
import ssl
import sys
import unittest
import youtube_dl.extractor
from youtube_dl import YoutubeDL
from youtube_dl.compat import (
compat_open as open,
compat_os_name,
compat_str,
)
from youtube_dl.utils import (
IDENTITY,
preferredencoding,
write_string,
)
@ -27,10 +29,10 @@ def get_params(override=None):
"parameters.json")
LOCAL_PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"local_parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
with open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if os.path.exists(LOCAL_PARAMETERS_FILE):
with io.open(LOCAL_PARAMETERS_FILE, encoding='utf-8') as pf:
with open(LOCAL_PARAMETERS_FILE, encoding='utf-8') as pf:
parameters.update(json.load(pf))
if override:
parameters.update(override)
@ -72,7 +74,8 @@ class FakeYDL(YoutubeDL):
def to_screen(self, s, skip_eol=None):
print(s)
def trouble(self, s, tb=None):
def trouble(self, *args, **kwargs):
s = args[0] if len(args) > 0 else kwargs.get('message', 'Missing message')
raise Exception(s)
def download(self, x):
@ -89,6 +92,17 @@ class FakeYDL(YoutubeDL):
self.report_warning = types.MethodType(report_warning, self)
class FakeLogger(object):
def debug(self, msg):
pass
def warning(self, msg):
pass
def error(self, msg):
pass
def gettestcases(include_onlymatching=False):
for ie in youtube_dl.extractor.gen_extractors():
for tc in ie.get_testcases(include_onlymatching):
@ -128,6 +142,12 @@ def expect_value(self, got, expected, field):
self.assertTrue(
contains_str in got,
'field %s (value: %r) should contain %r' % (field, got, contains_str))
elif isinstance(expected, compat_str) and re.match(r'lambda \w+:', expected):
fn = eval(expected)
suite = expected.split(':', 1)[1].strip()
self.assertTrue(
fn(got),
'Expected field %s to meet condition %s, but value %r failed ' % (field, suite, got))
elif isinstance(expected, type):
self.assertTrue(
isinstance(got, expected),
@ -137,7 +157,7 @@ def expect_value(self, got, expected, field):
elif isinstance(expected, list) and isinstance(got, list):
self.assertEqual(
len(expected), len(got),
'Expect a list of length %d, but got a list of length %d for field %s' % (
'Expected a list of length %d, but got a list of length %d for field %s' % (
len(expected), len(got), field))
for index, (item_got, item_expected) in enumerate(zip(got, expected)):
type_got = type(item_got)
@ -280,3 +300,7 @@ def http_server_port(httpd):
else:
sock = httpd.socket
return sock.getsockname()[1]
def expectedFailureIf(cond):
return unittest.expectedFailure if cond else IDENTITY

View File

@ -18,7 +18,6 @@
"noprogress": false,
"outtmpl": "%(id)s.%(ext)s",
"password": null,
"playlistend": -1,
"playliststart": 1,
"prefer_free_formats": false,
"quiet": false,

View File

@ -3,19 +3,37 @@
from __future__ import unicode_literals
# Allow direct execution
import io
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, expect_dict, expect_value, http_server_port
from youtube_dl.compat import compat_etree_fromstring, compat_http_server
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
import threading
from test.helper import (
expect_dict,
expect_value,
FakeYDL,
http_server_port,
)
from youtube_dl.compat import (
compat_etree_fromstring,
compat_http_server,
compat_open as open,
)
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import (
get_info_extractor,
YoutubeIE,
)
from youtube_dl.utils import (
encode_data_uri,
ExtractorError,
RegexNotFoundError,
strip_jsonp,
)
TEAPOT_RESPONSE_STATUS = 418
TEAPOT_RESPONSE_BODY = "<h1>418 I'm a teapot</h1>"
@ -35,13 +53,13 @@ class InfoExtractorTestRequestHandler(compat_http_server.BaseHTTPRequestHandler)
assert False
class TestIE(InfoExtractor):
class DummyIE(InfoExtractor):
pass
class TestInfoExtractor(unittest.TestCase):
def setUp(self):
self.ie = TestIE(FakeYDL())
self.ie = DummyIE(FakeYDL())
def test_ie_key(self):
self.assertEqual(get_info_extractor(YoutubeIE.ie_key()), YoutubeIE)
@ -62,6 +80,7 @@ class TestInfoExtractor(unittest.TestCase):
<meta name="og:test1" content='foo > < bar'/>
<meta name="og:test2" content="foo >//< bar"/>
<meta property=og-test3 content='Ill-formatted opengraph'/>
<meta property=og:test4 content=unquoted-value/>
'''
self.assertEqual(ie._og_search_title(html), 'Foo')
self.assertEqual(ie._og_search_description(html), 'Some video\'s description ')
@ -74,6 +93,7 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
self.assertEqual(ie._og_search_property('test4', html), 'unquoted-value')
def test_html_search_meta(self):
ie = self.ie
@ -98,6 +118,71 @@ class TestInfoExtractor(unittest.TestCase):
self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
def test_search_nextjs_data(self):
html = '''
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content=
"text/html; charset=utf-8">
<meta name="viewport" content="width=device-width">
<title>Test _search_nextjs_data()</title>
</head>
<body>
<div id="__next">
<div style="background-color:#17171E" class="FU" dir="ltr">
<div class="sc-93de261d-0 dyzzYE">
<div>
<header class="HD"></header>
<main class="MN">
<div style="height:0" class="HT0">
<div style="width:NaN%" data-testid=
"stream-container" class="WDN"></div>
</div>
</main>
</div>
<footer class="sc-6e5faf91-0 dEGaHS"></footer>
</div>
</div>
</div>
<script id="__NEXT_DATA__" type="application/json">
{"props":{"pageProps":{"video":{"id":"testid"}}}}
</script>
</body>
</html>
'''
search = self.ie._search_nextjs_data(html, 'testID')
self.assertEqual(search['props']['pageProps']['video']['id'], 'testid')
def test_search_nuxt_data(self):
html = '''
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content=
"text/html; charset=utf-8">
<title>Nuxt.js Test Page</title>
<meta name="viewport" content=
"width=device-width, initial-scale=1">
<meta data-hid="robots" name="robots" content="all">
</head>
<body class="BD">
<div id="__layout">
<h1 class="H1">Example heading</h1>
<div class="IN">
<p>Decoy text</p>
</div>
</div>
<script>
window.__NUXT__=(function(a,b,c,d,e,f,g,h){return {decoy:" default",data:[{track:{id:f,title:g}}]}}(null,null,"c",null,null,"testid","Nuxt.js title",null));
</script>
<script src="/_nuxt/a12345b.js" defer="defer"></script>
</body>
</html>
'''
search = self.ie._search_nuxt_data(html, 'testID')
self.assertEqual(search['track']['id'], 'testid')
def test_search_json_ld_realworld(self):
# https://github.com/ytdl-org/youtube-dl/issues/23306
expect_dict(
@ -346,6 +431,24 @@ class TestInfoExtractor(unittest.TestCase):
}],
})
# from https://0000.studio/
# with type attribute but without extension in URL
expect_dict(
self,
self.ie._parse_html5_media_entries(
'https://0000.studio',
r'''
<video src="https://d1ggyt9m8pwf3g.cloudfront.net/protected/ap-northeast-1:1864af40-28d5-492b-b739-b32314b1a527/archive/clip/838db6a7-8973-4cd6-840d-8517e4093c92"
controls="controls" type="video/mp4" preload="metadata" autoplay="autoplay" playsinline class="object-contain">
</video>
''', None)[0],
{
'formats': [{
'url': 'https://d1ggyt9m8pwf3g.cloudfront.net/protected/ap-northeast-1:1864af40-28d5-492b-b739-b32314b1a527/archive/clip/838db6a7-8973-4cd6-840d-8517e4093c92',
'ext': 'mp4',
}],
})
def test_extract_jwplayer_data_realworld(self):
# from http://www.suffolk.edu/sjc/
expect_dict(
@ -799,8 +902,8 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
]
for m3u8_file, m3u8_url, expected_formats in _TEST_CASES:
with io.open('./test/testdata/m3u8/%s.m3u8' % m3u8_file,
mode='r', encoding='utf-8') as f:
with open('./test/testdata/m3u8/%s.m3u8' % m3u8_file,
mode='r', encoding='utf-8') as f:
formats = self.ie._parse_m3u8_formats(
f.read(), m3u8_url, ext='mp4')
self.ie._sort_formats(formats)
@ -890,7 +993,8 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'tbr': 5997.485,
'width': 1920,
'height': 1080,
}]
}],
{},
), (
# https://github.com/ytdl-org/youtube-dl/pull/14844
'urls_only',
@ -973,7 +1077,8 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'tbr': 4400,
'width': 1920,
'height': 1080,
}]
}],
{},
), (
# https://github.com/ytdl-org/youtube-dl/issues/20346
# Media considered unfragmented even though it contains
@ -1019,18 +1124,185 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'width': 360,
'height': 360,
'fps': 30,
}]
}],
{},
), (
# https://github.com/ytdl-org/youtube-dl/issues/30235
# Bento4 generated test mpd
# mp4dash --mpd-name=manifest.mpd --no-split --use-segment-list mediafiles
'url_and_range',
'http://unknown/manifest.mpd', # mpd_url
'http://unknown/', # mpd_base_url
[{
'manifest_url': 'http://unknown/manifest.mpd',
'fragment_base_url': 'http://unknown/',
'ext': 'm4a',
'format_id': 'audio-und-mp4a.40.2',
'format_note': 'DASH audio',
'container': 'm4a_dash',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'none',
'tbr': 98.808,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'fragment_base_url': 'http://unknown/',
'ext': 'mp4',
'format_id': 'video-avc1',
'format_note': 'DASH video',
'container': 'mp4_dash',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.4D401E',
'tbr': 699.597,
'width': 768,
'height': 432
}],
{},
), (
# https://github.com/ytdl-org/youtube-dl/issues/27575
# GPAC generated test mpd
# MP4Box -dash 10000 -single-file -out manifest.mpd mediafiles
'range_only',
'http://unknown/manifest.mpd', # mpd_url
'http://unknown/', # mpd_base_url
[{
'manifest_url': 'http://unknown/manifest.mpd',
'fragment_base_url': 'http://unknown/audio_dashinit.mp4',
'ext': 'm4a',
'format_id': '2',
'format_note': 'DASH audio',
'container': 'm4a_dash',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'none',
'tbr': 98.096,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'fragment_base_url': 'http://unknown/video_dashinit.mp4',
'ext': 'mp4',
'format_id': '1',
'format_note': 'DASH video',
'container': 'mp4_dash',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.4D401E',
'tbr': 526.987,
'width': 768,
'height': 432
}],
{},
), (
'subtitles',
'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/',
[{
'format_id': 'audio=128001',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'm4a',
'tbr': 128.001,
'asr': 48000,
'format_note': 'DASH audio',
'container': 'm4a_dash',
'vcodec': 'none',
'acodec': 'mp4a.40.2',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=100000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 336,
'height': 144,
'tbr': 100,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=326000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 562,
'height': 240,
'tbr': 326,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=698000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 844,
'height': 360,
'tbr': 698,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=1493000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 1126,
'height': 480,
'tbr': 1493,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}, {
'format_id': 'video=4482000',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'ext': 'mp4',
'width': 1688,
'height': 720,
'tbr': 4482,
'format_note': 'DASH video',
'container': 'mp4_dash',
'vcodec': 'avc1.4D401F',
'acodec': 'none',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}],
{
'en': [
{
'ext': 'mp4',
'manifest_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/manifest.mpd',
'fragment_base_url': 'https://sdn-global-streaming-cache-3qsdn.akamaized.net/stream/3144/files/17/07/672975/3144-kZT4LWMQw6Rh7Kpd.ism/dash/',
'protocol': 'http_dash_segments',
}
]
},
)
]
for mpd_file, mpd_url, mpd_base_url, expected_formats in _TEST_CASES:
with io.open('./test/testdata/mpd/%s.mpd' % mpd_file,
mode='r', encoding='utf-8') as f:
formats = self.ie._parse_mpd_formats(
for mpd_file, mpd_url, mpd_base_url, expected_formats, expected_subtitles in _TEST_CASES:
with open('./test/testdata/mpd/%s.mpd' % mpd_file,
mode='r', encoding='utf-8') as f:
formats, subtitles = self.ie._parse_mpd_formats_and_subtitles(
compat_etree_fromstring(f.read().encode('utf-8')),
mpd_base_url=mpd_base_url, mpd_url=mpd_url)
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
expect_value(self, subtitles, expected_subtitles, None)
def test_parse_f4m_formats(self):
_TEST_CASES = [
@ -1051,8 +1323,8 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
]
for f4m_file, f4m_url, expected_formats in _TEST_CASES:
with io.open('./test/testdata/f4m/%s.f4m' % f4m_file,
mode='r', encoding='utf-8') as f:
with open('./test/testdata/f4m/%s.f4m' % f4m_file,
mode='r', encoding='utf-8') as f:
formats = self.ie._parse_f4m_formats(
compat_etree_fromstring(f.read().encode('utf-8')),
f4m_url, None)
@ -1099,8 +1371,8 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
]
for xspf_file, xspf_url, expected_entries in _TEST_CASES:
with io.open('./test/testdata/xspf/%s.xspf' % xspf_file,
mode='r', encoding='utf-8') as f:
with open('./test/testdata/xspf/%s.xspf' % xspf_file,
mode='r', encoding='utf-8') as f:
entries = self.ie._parse_xspf(
compat_etree_fromstring(f.read().encode('utf-8')),
xspf_file, xspf_url=xspf_url, xspf_base_url=xspf_url)

View File

@ -10,14 +10,31 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import copy
import json
from test.helper import FakeYDL, assertRegexpMatches
from test.helper import (
FakeYDL,
assertRegexpMatches,
try_rm,
)
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_str, compat_urllib_error
from youtube_dl.compat import (
compat_http_cookiejar_Cookie,
compat_http_cookies_SimpleCookie,
compat_kwargs,
compat_open as open,
compat_str,
compat_urllib_error,
)
from youtube_dl.extractor import YoutubeIE
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.postprocessor.common import PostProcessor
from youtube_dl.utils import ExtractorError, match_filter_func
from youtube_dl.utils import (
ExtractorError,
match_filter_func,
traverse_obj,
)
TEST_URL = 'http://localhost/sample.mp4'
@ -29,11 +46,14 @@ class YDL(FakeYDL):
self.msgs = []
def process_info(self, info_dict):
self.downloaded_info_dicts.append(info_dict)
self.downloaded_info_dicts.append(info_dict.copy())
def to_screen(self, msg):
self.msgs.append(msg)
def dl(self, *args, **kwargs):
assert False, 'Downloader must not be invoked for test_YoutubeDL'
def _make_result(formats, **kwargs):
res = {
@ -42,8 +62,9 @@ def _make_result(formats, **kwargs):
'title': 'testttitle',
'extractor': 'testex',
'extractor_key': 'TestEx',
'webpage_url': 'http://example.com/watch?v=shenanigans',
}
res.update(**kwargs)
res.update(**compat_kwargs(kwargs))
return res
@ -681,12 +702,12 @@ class TestYoutubeDL(unittest.TestCase):
class SimplePP(PostProcessor):
def run(self, info):
with open(audiofile, 'wt') as f:
with open(audiofile, 'w') as f:
f.write('EXAMPLE')
return [info['filepath']], info
def run_pp(params, PP):
with open(filename, 'wt') as f:
with open(filename, 'w') as f:
f.write('EXAMPLE')
ydl = YoutubeDL(params)
ydl.add_post_processor(PP())
@ -705,7 +726,7 @@ class TestYoutubeDL(unittest.TestCase):
class ModifierPP(PostProcessor):
def run(self, info):
with open(info['filepath'], 'wt') as f:
with open(info['filepath'], 'w') as f:
f.write('MODIFIED')
return [], info
@ -930,17 +951,11 @@ class TestYoutubeDL(unittest.TestCase):
# Test case for https://github.com/ytdl-org/youtube-dl/issues/27064
def test_ignoreerrors_for_playlist_with_url_transparent_iterable_entries(self):
class _YDL(YDL):
def __init__(self, *args, **kwargs):
super(_YDL, self).__init__(*args, **kwargs)
def trouble(self, s, tb=None):
pass
ydl = _YDL({
ydl = YDL({
'format': 'extra',
'ignoreerrors': True,
})
ydl.trouble = lambda *_, **__: None
class VideoIE(InfoExtractor):
_VALID_URL = r'video:(?P<id>\d+)'
@ -997,6 +1012,180 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(downloaded['extractor'], 'Video')
self.assertEqual(downloaded['extractor_key'], 'Video')
def test_default_times(self):
"""Test addition of missing upload/release/_date from /release_/timestamp"""
info = {
'id': '1234',
'url': TEST_URL,
'title': 'Title',
'ext': 'mp4',
'timestamp': 1631352900,
'release_timestamp': 1632995931,
}
params = {'simulate': True, }
ydl = FakeYDL(params)
out_info = ydl.process_ie_result(info)
self.assertTrue(isinstance(out_info['upload_date'], compat_str))
self.assertEqual(out_info['upload_date'], '20210911')
self.assertTrue(isinstance(out_info['release_date'], compat_str))
self.assertEqual(out_info['release_date'], '20210930')
class TestYoutubeDLCookies(unittest.TestCase):
@staticmethod
def encode_cookie(cookie):
if not isinstance(cookie, dict):
cookie = vars(cookie)
for name, value in cookie.items():
yield name, compat_str(value)
@classmethod
def comparable_cookies(cls, cookies):
# Work around cookiejar cookies not being unicode strings
return sorted(map(tuple, map(sorted, map(cls.encode_cookie, cookies))))
def assertSameCookies(self, c1, c2, msg=None):
return self.assertEqual(
*map(self.comparable_cookies, (c1, c2)),
msg=msg)
def assertSameCookieStrings(self, c1, c2, msg=None):
return self.assertSameCookies(
*map(lambda c: compat_http_cookies_SimpleCookie(c).values(), (c1, c2)),
msg=msg)
def test_header_cookies(self):
ydl = FakeYDL()
ydl.report_warning = lambda *_, **__: None
def cookie(name, value, version=None, domain='', path='', secure=False, expires=None):
return compat_http_cookiejar_Cookie(
version or 0, name, value, None, False,
domain, bool(domain), bool(domain), path, bool(path),
secure, expires, False, None, None, rest={})
test_url, test_domain = (t % ('yt.dl',) for t in ('https://%s/test', '.%s'))
def test(encoded_cookies, cookies, headers=False, round_trip=None, error_re=None):
def _test():
ydl.cookiejar.clear()
ydl._load_cookies(encoded_cookies, autoscope=headers)
if headers:
ydl._apply_header_cookies(test_url)
data = {'url': test_url}
ydl._calc_headers(data)
self.assertSameCookies(
cookies, ydl.cookiejar,
'Extracted cookiejar.Cookie is not the same')
if not headers:
self.assertSameCookieStrings(
data.get('cookies'), round_trip or encoded_cookies,
msg='Cookie is not the same as round trip')
ydl.__dict__['_YoutubeDL__header_cookies'] = []
try:
_test()
except AssertionError:
raise
except Exception as e:
if not error_re:
raise
assertRegexpMatches(self, e.args[0], error_re.join(('.*',) * 2))
test('test=value; Domain=' + test_domain, [cookie('test', 'value', domain=test_domain)])
test('test=value', [cookie('test', 'value')], error_re='Unscoped cookies are not allowed')
test('cookie1=value1; Domain={0}; Path=/test; cookie2=value2; Domain={0}; Path=/'.format(test_domain), [
cookie('cookie1', 'value1', domain=test_domain, path='/test'),
cookie('cookie2', 'value2', domain=test_domain, path='/')])
cookie_kw = compat_kwargs(
{'domain': test_domain, 'path': '/test', 'secure': True, 'expires': '9999999999', })
test('test=value; Domain={domain}; Path={path}; Secure; Expires={expires}'.format(**cookie_kw), [
cookie('test', 'value', **cookie_kw)])
test('test="value; "; path=/test; domain=' + test_domain, [
cookie('test', 'value; ', domain=test_domain, path='/test')],
round_trip='test="value\\073 "; Domain={0}; Path=/test'.format(test_domain))
test('name=; Domain=' + test_domain, [cookie('name', '', domain=test_domain)],
round_trip='name=""; Domain=' + test_domain)
test('test=value', [cookie('test', 'value', domain=test_domain)], headers=True)
test('cookie1=value; Domain={0}; cookie2=value'.format(test_domain), [],
headers=True, error_re='Invalid syntax')
ydl.report_warning = ydl.report_error
test('test=value', [], headers=True, error_re='Passing cookies as a header is a potential security risk')
def test_infojson_cookies(self):
TEST_FILE = 'test_infojson_cookies.info.json'
TEST_URL = 'https://example.com/example.mp4'
COOKIES = 'a=b; Domain=.example.com; c=d; Domain=.example.com'
COOKIE_HEADER = {'Cookie': 'a=b; c=d'}
ydl = FakeYDL()
ydl.process_info = lambda x: ydl._write_info_json('test', x, TEST_FILE)
def make_info(info_header_cookies=False, fmts_header_cookies=False, cookies_field=False):
fmt = {'url': TEST_URL}
if fmts_header_cookies:
fmt['http_headers'] = COOKIE_HEADER
if cookies_field:
fmt['cookies'] = COOKIES
return _make_result([fmt], http_headers=COOKIE_HEADER if info_header_cookies else None)
def test(initial_info, note):
def failure_msg(why):
return ' when '.join((why, note))
result = {}
result['processed'] = ydl.process_ie_result(initial_info)
self.assertTrue(ydl.cookiejar.get_cookies_for_url(TEST_URL),
msg=failure_msg('No cookies set in cookiejar after initial process'))
ydl.cookiejar.clear()
with open(TEST_FILE) as infojson:
result['loaded'] = ydl.sanitize_info(json.load(infojson), True)
result['final'] = ydl.process_ie_result(result['loaded'].copy(), download=False)
self.assertTrue(ydl.cookiejar.get_cookies_for_url(TEST_URL),
msg=failure_msg('No cookies set in cookiejar after final process'))
ydl.cookiejar.clear()
for key in ('processed', 'loaded', 'final'):
info = result[key]
self.assertIsNone(
traverse_obj(info, ((None, ('formats', 0)), 'http_headers', 'Cookie'), casesense=False, get_all=False),
msg=failure_msg('Cookie header not removed in {0} result'.format(key)))
self.assertSameCookieStrings(
traverse_obj(info, ((None, ('formats', 0)), 'cookies'), get_all=False), COOKIES,
msg=failure_msg('No cookies field found in {0} result'.format(key)))
test({'url': TEST_URL, 'http_headers': COOKIE_HEADER, 'id': '1', 'title': 'x'}, 'no formats field')
test(make_info(info_header_cookies=True), 'info_dict header cokies')
test(make_info(fmts_header_cookies=True), 'format header cookies')
test(make_info(info_header_cookies=True, fmts_header_cookies=True), 'info_dict and format header cookies')
test(make_info(info_header_cookies=True, fmts_header_cookies=True, cookies_field=True), 'all cookies fields')
test(make_info(cookies_field=True), 'cookies format field')
test({'url': TEST_URL, 'cookies': COOKIES, 'id': '1', 'title': 'x'}, 'info_dict cookies field only')
try_rm(TEST_FILE)
def test_add_headers_cookie(self):
def check_for_cookie_header(result):
return traverse_obj(result, ((None, ('formats', 0)), 'http_headers', 'Cookie'), casesense=False, get_all=False)
ydl = FakeYDL({'http_headers': {'Cookie': 'a=b'}})
ydl._apply_header_cookies(_make_result([])['webpage_url']) # Scope to input webpage URL: .example.com
fmt = {'url': 'https://example.com/video.mp4'}
result = ydl.process_ie_result(_make_result([fmt]), download=False)
self.assertIsNone(check_for_cookie_header(result), msg='http_headers cookies in result info_dict')
self.assertEqual(result.get('cookies'), 'a=b; Domain=.example.com', msg='No cookies were set in cookies field')
self.assertIn('a=b', ydl.cookiejar.get_cookie_header(fmt['url']), msg='No cookies were set in cookiejar')
fmt = {'url': 'https://wrong.com/video.mp4'}
result = ydl.process_ie_result(_make_result([fmt]), download=False)
self.assertIsNone(check_for_cookie_header(result), msg='http_headers cookies for wrong domain')
self.assertFalse(result.get('cookies'), msg='Cookies set in cookies field for wrong domain')
self.assertFalse(ydl.cookiejar.get_cookie_header(fmt['url']), msg='Cookies set in cookiejar for wrong domain')
if __name__ == '__main__':
unittest.main()

View File

@ -46,6 +46,20 @@ class TestYoutubeDLCookieJar(unittest.TestCase):
# will be ignored
self.assertFalse(cookiejar._cookies)
def test_get_cookie_header(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/httponly_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
header = cookiejar.get_cookie_header('https://www.foobar.foobar')
self.assertIn('HTTPONLY_COOKIE', header)
def test_get_cookies_for_url(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/session_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
cookies = cookiejar.get_cookies_for_url('https://www.foobar.foobar/')
self.assertEqual(len(cookies), 2)
cookies = cookiejar.get_cookies_for_url('https://foobar.foobar/')
self.assertFalse(cookies)
if __name__ == '__main__':
unittest.main()

View File

@ -8,7 +8,7 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.aes import aes_decrypt, aes_encrypt, aes_cbc_decrypt, aes_cbc_encrypt, aes_decrypt_text
from youtube_dl.aes import aes_decrypt, aes_encrypt, aes_cbc_decrypt, aes_cbc_encrypt, aes_decrypt_text, aes_ecb_encrypt
from youtube_dl.utils import bytes_to_intlist, intlist_to_bytes
import base64
@ -58,6 +58,13 @@ class TestAES(unittest.TestCase):
decrypted = (aes_decrypt_text(encrypted, password, 32))
self.assertEqual(decrypted, self.secret_msg)
def test_ecb_encrypt(self):
data = bytes_to_intlist(self.secret_msg)
encrypted = intlist_to_bytes(aes_ecb_encrypt(data, self.key))
self.assertEqual(
encrypted,
b'\xaa\x86]\x81\x97>\x02\x92\x9d\x1bR[[L/u\xd3&\xd1(h\xde{\x81\x94\xba\x02\xae\xbd\xa6\xd0:')
if __name__ == '__main__':
unittest.main()

View File

@ -11,6 +11,7 @@ from test.helper import try_rm
from youtube_dl import YoutubeDL
from youtube_dl.utils import DownloadError
def _download_restricted(url, filename, age):
@ -26,7 +27,10 @@ def _download_restricted(url, filename, age):
ydl.add_default_info_extractors()
json_filename = os.path.splitext(filename)[0] + '.info.json'
try_rm(json_filename)
ydl.download([url])
try:
ydl.download([url])
except DownloadError:
try_rm(json_filename)
res = os.path.exists(json_filename)
try_rm(json_filename)
return res
@ -38,12 +42,12 @@ class TestAgeRestriction(unittest.TestCase):
self.assertFalse(_download_restricted(url, filename, age))
def test_youtube(self):
self._assert_restricted('07FYdnEawAQ', '07FYdnEawAQ.mp4', 10)
self._assert_restricted('HtVdAasjOgU', 'HtVdAasjOgU.mp4', 10)
def test_youporn(self):
self._assert_restricted(
'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'505835.mp4', 2, old_age=25)
'https://www.youporn.com/watch/16715086/sex-ed-in-detention-18-asmr/',
'16715086.mp4', 2, old_age=25)
if __name__ == '__main__':

View File

@ -66,9 +66,9 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:tab'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:tab'])
# def test_youtube_search_matching(self):
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_search_matching(self):
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_facebook_matching(self):
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))

View File

@ -3,17 +3,18 @@
from __future__ import unicode_literals
import shutil
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import shutil
from test.helper import FakeYDL
from youtube_dl.cache import Cache
from youtube_dl.utils import version_tuple
from youtube_dl.version import __version__
def _is_empty(d):
@ -54,6 +55,17 @@ class TestCache(unittest.TestCase):
self.assertFalse(os.path.exists(self.test_dir))
self.assertEqual(c.load('test_cache', 'k.'), None)
def test_cache_validation(self):
ydl = FakeYDL({
'cachedir': self.test_dir,
})
c = Cache(ydl)
obj = {'x': 1, 'y': ['ä', '\\a', True]}
c.store('test_cache', 'k.', obj)
self.assertEqual(c.load('test_cache', 'k.', min_ver='1970.01.01'), obj)
new_version = '.'.join(('%d' % ((v + 1) if i == 0 else v, )) for i, v in enumerate(version_tuple(__version__)))
self.assertIs(c.load('test_cache', 'k.', min_ver=new_version), None)
if __name__ == '__main__':
unittest.main()

View File

@ -11,6 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_casefold,
compat_getenv,
compat_setenv,
compat_etree_Element,
@ -22,6 +23,7 @@ from youtube_dl.compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode,
compat_urllib_request,
)
@ -47,10 +49,11 @@ class TestCompat(unittest.TestCase):
def test_all_present(self):
import youtube_dl.compat
all_names = youtube_dl.compat.__all__
present_names = set(filter(
all_names = sorted(
youtube_dl.compat.__all__ + youtube_dl.compat.legacy)
present_names = set(map(compat_str, filter(
lambda c: '_' in c and not c.startswith('_'),
dir(youtube_dl.compat))) - set(['unicode_literals'])
dir(youtube_dl.compat)))) - set(['unicode_literals'])
self.assertEqual(all_names, sorted(present_names))
def test_compat_urllib_parse_unquote(self):
@ -118,9 +121,34 @@ class TestCompat(unittest.TestCase):
<smil xmlns="http://www.w3.org/2001/SMIL20/Language"></smil>'''
compat_etree_fromstring(xml)
def test_struct_unpack(self):
def test_compat_struct_unpack(self):
self.assertEqual(compat_struct_unpack('!B', b'\x00'), (0,))
def test_compat_casefold(self):
if hasattr(compat_str, 'casefold'):
# don't bother to test str.casefold() (again)
return
# thanks https://bugs.python.org/file24232/casefolding.patch
self.assertEqual(compat_casefold('hello'), 'hello')
self.assertEqual(compat_casefold('hELlo'), 'hello')
self.assertEqual(compat_casefold('ß'), 'ss')
self.assertEqual(compat_casefold(''), 'fi')
self.assertEqual(compat_casefold('\u03a3'), '\u03c3')
self.assertEqual(compat_casefold('A\u0345\u03a3'), 'a\u03b9\u03c3')
def test_compat_urllib_request_Request(self):
self.assertEqual(
compat_urllib_request.Request('http://127.0.0.1', method='PUT').get_method(),
'PUT')
class PUTrequest(compat_urllib_request.Request):
def get_method(self):
return 'PUT'
self.assertEqual(
PUTrequest('http://127.0.0.1').get_method(),
'PUT')
if __name__ == '__main__':
unittest.main()

View File

@ -10,6 +10,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
assertGreaterEqual,
assertLessEqual,
expect_warnings,
get_params,
gettestcases,
@ -20,19 +21,20 @@ from test.helper import (
import hashlib
import io
import json
import socket
import youtube_dl.YoutubeDL
from youtube_dl.compat import (
compat_http_client,
compat_urllib_error,
compat_HTTPError,
compat_open as open,
compat_urllib_error,
)
from youtube_dl.utils import (
DownloadError,
ExtractorError,
error_to_compat_str,
format_bytes,
UnavailableVideoError,
)
@ -100,27 +102,31 @@ def generator(test_case, tname):
def print_skipping(reason):
print('Skipping %s: %s' % (test_case['name'], reason))
self.skipTest(reason)
if not ie.working():
print_skipping('IE marked as not _WORKING')
return
for tc in test_cases:
info_dict = tc.get('info_dict', {})
if not (info_dict.get('id') and info_dict.get('ext')):
raise Exception('Test definition incorrect. The output file cannot be known. Are both \'id\' and \'ext\' keys present?')
raise Exception('Test definition (%s) requires both \'id\' and \'ext\' keys present to define the output file' % (tname, ))
if 'skip' in test_case:
print_skipping(test_case['skip'])
return
for other_ie in other_ies:
if not other_ie.working():
print_skipping('test depends on %sIE, marked as not WORKING' % other_ie.ie_key())
return
params = get_params(test_case.get('params', {}))
params['outtmpl'] = tname + '_' + params['outtmpl']
if is_playlist and 'playlist' not in test_case:
params.setdefault('extract_flat', 'in_playlist')
params.setdefault('playlistend',
test_case['playlist_maxcount'] + 1
if test_case.get('playlist_maxcount')
else test_case.get('playlist_mincount'))
params.setdefault('skip_download', True)
ydl = YoutubeDL(params, auto_init=False)
@ -146,6 +152,7 @@ def generator(test_case, tname):
try_rm(tc_filename)
try_rm(tc_filename + '.part')
try_rm(os.path.splitext(tc_filename)[0] + '.info.json')
try_rm_tcs_files()
try:
try_num = 1
@ -160,7 +167,9 @@ def generator(test_case, tname):
except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError, compat_http_client.BadStatusLine) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
raise
msg = getattr(err, 'msg', error_to_compat_str(err))
err.msg = '%s (%s)' % (msg, tname, )
raise err
if try_num == RETRIES:
report_warning('%s failed due to network errors, skipping...' % tname)
@ -185,6 +194,14 @@ def generator(test_case, tname):
'Expected at least %d in playlist %s, but got only %d' % (
test_case['playlist_mincount'], test_case['url'],
len(res_dict['entries'])))
if 'playlist_maxcount' in test_case:
assertLessEqual(
self,
len(res_dict['entries']),
test_case['playlist_maxcount'],
'Expected at most %d in playlist %s, but got %d' % (
test_case['playlist_maxcount'], test_case['url'],
len(res_dict['entries'])))
if 'playlist_count' in test_case:
self.assertEqual(
len(res_dict['entries']),
@ -209,7 +226,15 @@ def generator(test_case, tname):
# First, check test cases' data against extracted data alone
expect_info_dict(self, tc_res_dict, tc.get('info_dict', {}))
# Now, check downloaded file consistency
# support test-case with volatile ID, signalled by regexp value
if tc.get('info_dict', {}).get('id', '').startswith('re:'):
test_id = tc['info_dict']['id']
tc['info_dict']['id'] = tc_res_dict['id']
else:
test_id = None
tc_filename = get_tc_filename(tc)
if test_id:
tc['info_dict']['id'] = test_id
if not test_case.get('params', {}).get('skip_download', False):
self.assertTrue(os.path.exists(tc_filename), msg='Missing file ' + tc_filename)
self.assertTrue(tc_filename in finished_hook_called)
@ -232,7 +257,7 @@ def generator(test_case, tname):
self.assertTrue(
os.path.exists(info_json_fn),
'Missing info file %s' % info_json_fn)
with io.open(info_json_fn, encoding='utf-8') as infof:
with open(info_json_fn, encoding='utf-8') as infof:
info_dict = json.load(infof)
expect_info_dict(self, info_dict, tc.get('info_dict', {}))
finally:

View File

@ -0,0 +1,272 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import re
import sys
import subprocess
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
FakeLogger,
FakeYDL,
http_server_port,
try_rm,
)
from youtube_dl import YoutubeDL
from youtube_dl.compat import (
compat_contextlib_suppress,
compat_http_cookiejar_Cookie,
compat_http_server,
compat_kwargs,
)
from youtube_dl.utils import (
encodeFilename,
join_nonempty,
)
from youtube_dl.downloader.external import (
Aria2cFD,
Aria2pFD,
AxelFD,
CurlFD,
FFmpegFD,
HttpieFD,
WgetFD,
)
from youtube_dl.postprocessor import (
FFmpegPostProcessor,
)
import threading
TEST_SIZE = 10 * 1024
TEST_COOKIE = {
'version': 0,
'name': 'test',
'value': 'ytdlp',
'port': None,
'port_specified': False,
'domain': '.example.com',
'domain_specified': True,
'domain_initial_dot': False,
'path': '/',
'path_specified': True,
'secure': False,
'expires': None,
'discard': False,
'comment': None,
'comment_url': None,
'rest': {},
}
TEST_COOKIE_VALUE = join_nonempty('name', 'value', delim='=', from_dict=TEST_COOKIE)
TEST_INFO = {'url': 'http://www.example.com/'}
def cookiejar_Cookie(**cookie_args):
return compat_http_cookiejar_Cookie(**compat_kwargs(cookie_args))
def ifExternalFDAvailable(externalFD):
return unittest.skipUnless(externalFD.available(),
externalFD.get_basename() + ' not found')
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass
def send_content_range(self, total=None):
range_header = self.headers.get('Range')
start = end = None
if range_header:
mobj = re.match(r'bytes=(\d+)-(\d+)', range_header)
if mobj:
start, end = (int(mobj.group(i)) for i in (1, 2))
valid_range = start is not None and end is not None
if valid_range:
content_range = 'bytes %d-%d' % (start, end)
if total:
content_range += '/%d' % total
self.send_header('Content-Range', content_range)
return (end - start + 1) if valid_range else total
def serve(self, range=True, content_length=True):
self.send_response(200)
self.send_header('Content-Type', 'video/mp4')
size = TEST_SIZE
if range:
size = self.send_content_range(TEST_SIZE)
if content_length:
self.send_header('Content-Length', size)
self.end_headers()
self.wfile.write(b'#' * size)
def do_GET(self):
if self.path == '/regular':
self.serve()
elif self.path == '/no-content-length':
self.serve(content_length=False)
elif self.path == '/no-range':
self.serve(range=False)
elif self.path == '/no-range-no-content-length':
self.serve(range=False, content_length=False)
else:
assert False, 'unrecognised server path'
@ifExternalFDAvailable(Aria2pFD)
class TestAria2pFD(unittest.TestCase):
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
def download(self, params, ep):
with subprocess.Popen(
['aria2c', '--enable-rpc'],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL
) as process:
if not process.poll():
filename = 'testfile.mp4'
params['logger'] = FakeLogger()
params['outtmpl'] = filename
ydl = YoutubeDL(params)
try_rm(encodeFilename(filename))
self.assertEqual(ydl.download(['http://127.0.0.1:%d/%s' % (self.port, ep)]), 0)
self.assertEqual(os.path.getsize(encodeFilename(filename)), TEST_SIZE)
try_rm(encodeFilename(filename))
process.kill()
def download_all(self, params):
for ep in ('regular', 'no-content-length', 'no-range', 'no-range-no-content-length'):
self.download(params, ep)
def test_regular(self):
self.download_all({'external_downloader': 'aria2p'})
def test_chunked(self):
self.download_all({
'external_downloader': 'aria2p',
'http_chunk_size': 1000,
})
@ifExternalFDAvailable(HttpieFD)
class TestHttpieFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = HttpieFD(ydl, {})
self.assertEqual(
downloader._make_cmd('test', TEST_INFO),
['http', '--download', '--output', 'test', 'http://www.example.com/'])
# Test cookie header is added
ydl.cookiejar.set_cookie(cookiejar_Cookie(**TEST_COOKIE))
self.assertEqual(
downloader._make_cmd('test', TEST_INFO),
['http', '--download', '--output', 'test',
'http://www.example.com/', 'Cookie:' + TEST_COOKIE_VALUE])
@ifExternalFDAvailable(AxelFD)
class TestAxelFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = AxelFD(ydl, {})
self.assertEqual(
downloader._make_cmd('test', TEST_INFO),
['axel', '-o', 'test', '--', 'http://www.example.com/'])
# Test cookie header is added
ydl.cookiejar.set_cookie(cookiejar_Cookie(**TEST_COOKIE))
self.assertEqual(
downloader._make_cmd('test', TEST_INFO),
['axel', '-o', 'test', '-H', 'Cookie: ' + TEST_COOKIE_VALUE,
'--max-redirect=0', '--', 'http://www.example.com/'])
@ifExternalFDAvailable(WgetFD)
class TestWgetFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = WgetFD(ydl, {})
self.assertNotIn('--load-cookies', downloader._make_cmd('test', TEST_INFO))
# Test cookiejar tempfile arg is added
ydl.cookiejar.set_cookie(cookiejar_Cookie(**TEST_COOKIE))
self.assertIn('--load-cookies', downloader._make_cmd('test', TEST_INFO))
@ifExternalFDAvailable(CurlFD)
class TestCurlFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = CurlFD(ydl, {})
self.assertNotIn('--cookie', downloader._make_cmd('test', TEST_INFO))
# Test cookie header is added
ydl.cookiejar.set_cookie(cookiejar_Cookie(**TEST_COOKIE))
self.assertIn('--cookie', downloader._make_cmd('test', TEST_INFO))
self.assertIn(TEST_COOKIE_VALUE, downloader._make_cmd('test', TEST_INFO))
@ifExternalFDAvailable(Aria2cFD)
class TestAria2cFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = Aria2cFD(ydl, {})
downloader._make_cmd('test', TEST_INFO)
self.assertFalse(hasattr(downloader, '_cookies_tempfile'))
# Test cookiejar tempfile arg is added
ydl.cookiejar.set_cookie(cookiejar_Cookie(**TEST_COOKIE))
cmd = downloader._make_cmd('test', TEST_INFO)
self.assertIn('--load-cookies=%s' % downloader._cookies_tempfile, cmd)
# Handle delegated availability
def ifFFmpegFDAvailable(externalFD):
# raise SkipTest, or set False!
avail = ifExternalFDAvailable(externalFD) and False
with compat_contextlib_suppress(Exception):
avail = FFmpegPostProcessor(downloader=None).available
return unittest.skipUnless(
avail, externalFD.get_basename() + ' not found')
@ifFFmpegFDAvailable(FFmpegFD)
class TestFFmpegFD(unittest.TestCase):
_args = []
def _test_cmd(self, args):
self._args = args
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = FFmpegFD(ydl, {})
downloader._debug_cmd = self._test_cmd
info_dict = TEST_INFO.copy()
info_dict['ext'] = 'mp4'
downloader._call_downloader('test', info_dict)
self.assertEqual(self._args, [
'ffmpeg', '-y', '-i', 'http://www.example.com/',
'-c', 'copy', '-f', 'mp4', 'file:test'])
# Test cookies arg is added
ydl.cookiejar.set_cookie(cookiejar_Cookie(**TEST_COOKIE))
downloader._call_downloader('test', info_dict)
self.assertEqual(self._args, [
'ffmpeg', '-y', '-cookies', TEST_COOKIE_VALUE + '; path=/; domain=.example.com;\r\n',
'-i', 'http://www.example.com/', '-c', 'copy', '-f', 'mp4', 'file:test'])
if __name__ == '__main__':
unittest.main()

View File

@ -9,7 +9,11 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import http_server_port, try_rm
from test.helper import (
FakeLogger,
http_server_port,
try_rm,
)
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server
from youtube_dl.downloader.http import HttpFD
@ -66,17 +70,6 @@ class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
assert False
class FakeLogger(object):
def debug(self, msg):
pass
def warning(self, msg):
pass
def error(self, msg):
pass
class TestHttpFD(unittest.TestCase):
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
@ -95,7 +88,7 @@ class TestHttpFD(unittest.TestCase):
self.assertTrue(downloader.real_download(filename, {
'url': 'http://127.0.0.1:%d/%s' % (self.port, ep),
}))
self.assertEqual(os.path.getsize(encodeFilename(filename)), TEST_SIZE)
self.assertEqual(os.path.getsize(encodeFilename(filename)), TEST_SIZE, ep)
try_rm(encodeFilename(filename))
def download_all(self, params):

View File

@ -8,46 +8,54 @@ import unittest
import sys
import os
import subprocess
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.utils import encodeArgument
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, rootDir)
try:
_DEV_NULL = subprocess.DEVNULL
except AttributeError:
_DEV_NULL = open(os.devnull, 'wb')
from youtube_dl.compat import compat_register_utf8, compat_subprocess_get_DEVNULL
from youtube_dl.utils import encodeArgument
compat_register_utf8()
_DEV_NULL = compat_subprocess_get_DEVNULL()
class TestExecution(unittest.TestCase):
def setUp(self):
self.module = 'youtube_dl'
if sys.version_info < (2, 7):
self.module += '.__main__'
def test_import(self):
subprocess.check_call([sys.executable, '-c', 'import youtube_dl'], cwd=rootDir)
def test_module_exec(self):
if sys.version_info >= (2, 7): # Python 2.6 doesn't support package execution
subprocess.check_call([sys.executable, '-m', 'youtube_dl', '--version'], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, '-m', self.module, '--version'], cwd=rootDir, stdout=_DEV_NULL)
def test_main_exec(self):
subprocess.check_call([sys.executable, 'youtube_dl/__main__.py', '--version'], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, os.path.normpath('youtube_dl/__main__.py'), '--version'], cwd=rootDir, stdout=_DEV_NULL)
def test_cmdline_umlauts(self):
os.environ['PYTHONIOENCODING'] = 'utf-8'
p = subprocess.Popen(
[sys.executable, 'youtube_dl/__main__.py', encodeArgument('ä'), '--version'],
[sys.executable, '-m', self.module, encodeArgument('ä'), '--version'],
cwd=rootDir, stdout=_DEV_NULL, stderr=subprocess.PIPE)
_, stderr = p.communicate()
self.assertFalse(stderr)
def test_lazy_extractors(self):
lazy_extractors = os.path.normpath('youtube_dl/extractor/lazy_extractors.py')
try:
subprocess.check_call([sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, 'test/test_all_urls.py'], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, os.path.normpath('devscripts/make_lazy_extractors.py'), lazy_extractors], cwd=rootDir, stdout=_DEV_NULL)
subprocess.check_call([sys.executable, os.path.normpath('test/test_all_urls.py')], cwd=rootDir, stdout=_DEV_NULL)
finally:
try:
os.remove('youtube_dl/extractor/lazy_extractors.py')
except (IOError, OSError):
pass
for x in ('', 'c') if sys.version_info[0] < 3 else ('',):
try:
os.remove(lazy_extractors + x)
except OSError:
pass
if __name__ == '__main__':

View File

@ -8,30 +8,163 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import http_server_port
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server, compat_urllib_request
import contextlib
import gzip
import io
import ssl
import tempfile
import threading
import zlib
# avoid deprecated alias assertRaisesRegexp
if hasattr(unittest.TestCase, 'assertRaisesRegex'):
unittest.TestCase.assertRaisesRegexp = unittest.TestCase.assertRaisesRegex
try:
import brotli
except ImportError:
brotli = None
try:
from urllib.request import pathname2url
except ImportError:
from urllib import pathname2url
from youtube_dl.compat import (
compat_http_cookiejar_Cookie,
compat_http_server,
compat_str as str,
compat_urllib_error,
compat_urllib_HTTPError,
compat_urllib_parse,
compat_urllib_request,
)
from youtube_dl.utils import (
sanitized_Request,
update_Request,
urlencode_postdata,
)
from test.helper import (
expectedFailureIf,
FakeYDL,
FakeLogger,
http_server_port,
)
from youtube_dl import YoutubeDL
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
protocol_version = 'HTTP/1.1'
# work-around old/new -style class inheritance
def super(self, meth_name, *args, **kwargs):
from types import MethodType
try:
super()
fn = lambda s, m, *a, **k: getattr(super(), m)(*a, **k)
except TypeError:
fn = lambda s, m, *a, **k: getattr(compat_http_server.BaseHTTPRequestHandler, m)(s, *a, **k)
self.super = MethodType(fn, self)
return self.super(meth_name, *args, **kwargs)
def log_message(self, format, *args):
pass
def _headers(self):
payload = str(self.headers).encode('utf-8')
self.send_response(200)
self.send_header('Content-Type', 'application/json')
self.send_header('Content-Length', str(len(payload)))
self.end_headers()
self.wfile.write(payload)
def _redirect(self):
self.send_response(int(self.path[len('/redirect_'):]))
self.send_header('Location', '/method')
self.send_header('Content-Length', '0')
self.end_headers()
def _method(self, method, payload=None):
self.send_response(200)
self.send_header('Content-Length', str(len(payload or '')))
self.send_header('Method', method)
self.end_headers()
if payload:
self.wfile.write(payload)
def _status(self, status):
payload = '<html>{0} NOT FOUND</html>'.format(status).encode('utf-8')
self.send_response(int(status))
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.send_header('Content-Length', str(len(payload)))
self.end_headers()
self.wfile.write(payload)
def _read_data(self):
if 'Content-Length' in self.headers:
return self.rfile.read(int(self.headers['Content-Length']))
def _test_url(self, path, host='127.0.0.1', scheme='http', port=None):
return '{0}://{1}:{2}/{3}'.format(
scheme, host,
port if port is not None
else http_server_port(self.server), path)
def do_POST(self):
data = self._read_data()
if self.path.startswith('/redirect_'):
self._redirect()
elif self.path.startswith('/method'):
self._method('POST', data)
elif self.path.startswith('/headers'):
self._headers()
else:
self._status(404)
def do_HEAD(self):
if self.path.startswith('/redirect_'):
self._redirect()
elif self.path.startswith('/method'):
self._method('HEAD')
else:
self._status(404)
def do_PUT(self):
data = self._read_data()
if self.path.startswith('/redirect_'):
self._redirect()
elif self.path.startswith('/method'):
self._method('PUT', data)
else:
self._status(404)
def do_GET(self):
def respond(payload=b'<html><video src="/vid.mp4" /></html>',
payload_type='text/html; charset=utf-8',
payload_encoding=None,
resp_code=200):
self.send_response(resp_code)
self.send_header('Content-Type', payload_type)
if payload_encoding:
self.send_header('Content-Encoding', payload_encoding)
self.send_header('Content-Length', str(len(payload))) # required for persistent connections
self.end_headers()
self.wfile.write(payload)
def gzip_compress(p):
buf = io.BytesIO()
with contextlib.closing(gzip.GzipFile(fileobj=buf, mode='wb')) as f:
f.write(p)
return buf.getvalue()
if self.path == '/video.html':
self.send_response(200)
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers()
self.wfile.write(b'<html><video src="/vid.mp4" /></html>')
respond()
elif self.path == '/vid.mp4':
self.send_response(200)
self.send_header('Content-Type', 'video/mp4')
self.end_headers()
self.wfile.write(b'\x00\x00\x00\x00\x20\x66\x74[video]')
respond(b'\x00\x00\x00\x00\x20\x66\x74[video]', 'video/mp4')
elif self.path == '/302':
if sys.version_info[0] == 3:
# XXX: Python 3 http server does not allow non-ASCII header values
@ -39,71 +172,336 @@ class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
self.end_headers()
return
new_url = 'http://127.0.0.1:%d/中文.html' % http_server_port(self.server)
new_url = self._test_url('中文.html')
self.send_response(302)
self.send_header(b'Location', new_url.encode('utf-8'))
self.end_headers()
elif self.path == '/%E4%B8%AD%E6%96%87.html':
self.send_response(200)
self.send_header('Content-Type', 'text/html; charset=utf-8')
respond()
elif self.path == '/%c7%9f':
respond()
elif self.path == '/redirect_dotsegments':
self.send_response(301)
# redirect to /headers but with dot segments before
self.send_header('Location', '/a/b/./../../headers')
self.send_header('Content-Length', '0')
self.end_headers()
self.wfile.write(b'<html><video src="/vid.mp4" /></html>')
elif self.path.startswith('/redirect_'):
self._redirect()
elif self.path.startswith('/method'):
self._method('GET')
elif self.path.startswith('/headers'):
self._headers()
elif self.path.startswith('/308-to-headers'):
self.send_response(308)
self.send_header('Location', '/headers')
self.send_header('Content-Length', '0')
self.end_headers()
elif self.path == '/trailing_garbage':
payload = b'<html><video src="/vid.mp4" /></html>'
compressed = gzip_compress(payload) + b'trailing garbage'
respond(compressed, payload_encoding='gzip')
elif self.path == '/302-non-ascii-redirect':
new_url = self._test_url('中文.html')
# actually respond with permanent redirect
self.send_response(301)
self.send_header('Location', new_url)
self.send_header('Content-Length', '0')
self.end_headers()
elif self.path == '/content-encoding':
encodings = self.headers.get('ytdl-encoding', '')
payload = b'<html><video src="/vid.mp4" /></html>'
for encoding in filter(None, (e.strip() for e in encodings.split(','))):
if encoding == 'br' and brotli:
payload = brotli.compress(payload)
elif encoding == 'gzip':
payload = gzip_compress(payload)
elif encoding == 'deflate':
payload = zlib.compress(payload)
elif encoding == 'unsupported':
payload = b'raw'
break
else:
self._status(415)
return
respond(payload, payload_encoding=encodings)
else:
assert False
self._status(404)
def send_header(self, keyword, value):
"""
Forcibly allow HTTP server to send non percent-encoded non-ASCII characters in headers.
This is against what is defined in RFC 3986: but we need to test that we support this
since some sites incorrectly do this.
"""
if keyword.lower() == 'connection':
return self.super('send_header', keyword, value)
class FakeLogger(object):
def debug(self, msg):
pass
if not hasattr(self, '_headers_buffer'):
self._headers_buffer = []
def warning(self, msg):
pass
self._headers_buffer.append('{0}: {1}\r\n'.format(keyword, value).encode('utf-8'))
def error(self, msg):
pass
def end_headers(self):
if hasattr(self, '_headers_buffer'):
self.wfile.write(b''.join(self._headers_buffer))
self._headers_buffer = []
self.super('end_headers')
class TestHTTP(unittest.TestCase):
# when does it make sense to check the SSL certificate?
_check_cert = (
sys.version_info >= (3, 2)
or (sys.version_info[0] == 2 and sys.version_info[1:] >= (7, 19)))
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
# HTTP server
self.http_httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
self.http_port = http_server_port(self.http_httpd)
self.http_server_thread = threading.Thread(target=self.http_httpd.serve_forever)
self.http_server_thread.daemon = True
self.http_server_thread.start()
try:
from http.server import ThreadingHTTPServer
except ImportError:
try:
from socketserver import ThreadingMixIn
except ImportError:
from SocketServer import ThreadingMixIn
class ThreadingHTTPServer(ThreadingMixIn, compat_http_server.HTTPServer):
pass
# HTTPS server
certfn = os.path.join(TEST_DIR, 'testcert.pem')
self.https_httpd = ThreadingHTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
try:
sslctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
sslctx.verify_mode = ssl.CERT_NONE
sslctx.check_hostname = False
sslctx.load_cert_chain(certfn, None)
self.https_httpd.socket = sslctx.wrap_socket(
self.https_httpd.socket, server_side=True)
except AttributeError:
self.https_httpd.socket = ssl.wrap_socket(
self.https_httpd.socket, certfile=certfn, server_side=True)
self.https_port = http_server_port(self.https_httpd)
self.https_server_thread = threading.Thread(target=self.https_httpd.serve_forever)
self.https_server_thread.daemon = True
self.https_server_thread.start()
def tearDown(self):
def closer(svr):
def _closer():
svr.shutdown()
svr.server_close()
return _closer
shutdown_thread = threading.Thread(target=closer(self.http_httpd))
shutdown_thread.start()
self.http_server_thread.join(2.0)
shutdown_thread = threading.Thread(target=closer(self.https_httpd))
shutdown_thread.start()
self.https_server_thread.join(2.0)
def _test_url(self, path, host='127.0.0.1', scheme='http', port=None):
return '{0}://{1}:{2}/{3}'.format(
scheme, host,
port if port is not None
else self.https_port if scheme == 'https'
else self.http_port, path)
@unittest.skipUnless(_check_cert, 'No support for certificate check in SSL')
def test_nocheckcertificate(self):
with FakeYDL({'logger': FakeLogger()}) as ydl:
with self.assertRaises(compat_urllib_error.URLError):
ydl.urlopen(sanitized_Request(self._test_url('headers', scheme='https')))
with FakeYDL({'logger': FakeLogger(), 'nocheckcertificate': True}) as ydl:
r = ydl.urlopen(sanitized_Request(self._test_url('headers', scheme='https')))
self.assertEqual(r.getcode(), 200)
r.close()
def test_percent_encode(self):
with FakeYDL() as ydl:
# Unicode characters should be encoded with uppercase percent-encoding
res = ydl.urlopen(sanitized_Request(self._test_url('中文.html')))
self.assertEqual(res.getcode(), 200)
res.close()
# don't normalize existing percent encodings
res = ydl.urlopen(sanitized_Request(self._test_url('%c7%9f')))
self.assertEqual(res.getcode(), 200)
res.close()
def test_unicode_path_redirection(self):
# XXX: Python 3 http server does not allow non-ASCII header values
if sys.version_info[0] == 3:
return
with FakeYDL() as ydl:
r = ydl.urlopen(sanitized_Request(self._test_url('302-non-ascii-redirect')))
self.assertEqual(r.url, self._test_url('%E4%B8%AD%E6%96%87.html'))
r.close()
ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://127.0.0.1:%d/302' % self.port)
self.assertEqual(r['entries'][0]['url'], 'http://127.0.0.1:%d/vid.mp4' % self.port)
def test_redirect(self):
with FakeYDL() as ydl:
def do_req(redirect_status, method, check_no_content=False):
data = b'testdata' if method in ('POST', 'PUT') else None
res = ydl.urlopen(sanitized_Request(
self._test_url('redirect_{0}'.format(redirect_status)),
method=method, data=data))
if check_no_content:
self.assertNotIn('Content-Type', res.headers)
return res.read().decode('utf-8'), res.headers.get('method', '')
# A 303 must either use GET or HEAD for subsequent request
self.assertEqual(do_req(303, 'POST'), ('', 'GET'))
self.assertEqual(do_req(303, 'HEAD'), ('', 'HEAD'))
self.assertEqual(do_req(303, 'PUT'), ('', 'GET'))
class TestHTTPS(unittest.TestCase):
def setUp(self):
certfn = os.path.join(TEST_DIR, 'testcert.pem')
self.httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
self.httpd.socket = ssl.wrap_socket(
self.httpd.socket, certfile=certfn, server_side=True)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
# 301 and 302 turn POST only into a GET, with no Content-Type
self.assertEqual(do_req(301, 'POST', True), ('', 'GET'))
self.assertEqual(do_req(301, 'HEAD'), ('', 'HEAD'))
self.assertEqual(do_req(302, 'POST', True), ('', 'GET'))
self.assertEqual(do_req(302, 'HEAD'), ('', 'HEAD'))
def test_nocheckcertificate(self):
if sys.version_info >= (2, 7, 9): # No certificate checking anyways
ydl = YoutubeDL({'logger': FakeLogger()})
self.assertRaises(
Exception,
ydl.extract_info, 'https://127.0.0.1:%d/video.html' % self.port)
self.assertEqual(do_req(301, 'PUT'), ('testdata', 'PUT'))
self.assertEqual(do_req(302, 'PUT'), ('testdata', 'PUT'))
ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
r = ydl.extract_info('https://127.0.0.1:%d/video.html' % self.port)
self.assertEqual(r['entries'][0]['url'], 'https://127.0.0.1:%d/vid.mp4' % self.port)
# 307 and 308 should not change method
for m in ('POST', 'PUT'):
self.assertEqual(do_req(307, m), ('testdata', m))
self.assertEqual(do_req(308, m), ('testdata', m))
self.assertEqual(do_req(307, 'HEAD'), ('', 'HEAD'))
self.assertEqual(do_req(308, 'HEAD'), ('', 'HEAD'))
# These should not redirect and instead raise an HTTPError
for code in (300, 304, 305, 306):
with self.assertRaises(compat_urllib_HTTPError):
do_req(code, 'GET')
# Jython 2.7.1 times out for some reason
@expectedFailureIf(sys.platform.startswith('java') and sys.version_info < (2, 7, 2))
def test_content_type(self):
# https://github.com/yt-dlp/yt-dlp/commit/379a4f161d4ad3e40932dcf5aca6e6fb9715ab28
with FakeYDL({'nocheckcertificate': True}) as ydl:
# method should be auto-detected as POST
r = sanitized_Request(self._test_url('headers', scheme='https'), data=urlencode_postdata({'test': 'test'}))
headers = ydl.urlopen(r).read().decode('utf-8')
self.assertIn('Content-Type: application/x-www-form-urlencoded', headers)
# test http
r = sanitized_Request(self._test_url('headers'), data=urlencode_postdata({'test': 'test'}))
headers = ydl.urlopen(r).read().decode('utf-8')
self.assertIn('Content-Type: application/x-www-form-urlencoded', headers)
def test_update_req(self):
req = sanitized_Request('http://example.com')
assert req.data is None
assert req.get_method() == 'GET'
assert not req.has_header('Content-Type')
# Test that zero-byte payloads will be sent
req = update_Request(req, data=b'')
assert req.data == b''
assert req.get_method() == 'POST'
# yt-dl expects data to be encoded and Content-Type to be added by sender
# assert req.get_header('Content-Type') == 'application/x-www-form-urlencoded'
def test_cookiejar(self):
with FakeYDL() as ydl:
ydl.cookiejar.set_cookie(compat_http_cookiejar_Cookie(
0, 'test', 'ytdl', None, False, '127.0.0.1', True,
False, '/headers', True, False, None, False, None, None, {}))
data = ydl.urlopen(sanitized_Request(
self._test_url('headers'))).read().decode('utf-8')
self.assertIn('Cookie: test=ytdl', data)
def test_passed_cookie_header(self):
# We should accept a Cookie header being passed as in normal headers and handle it appropriately.
with FakeYDL() as ydl:
# Specified Cookie header should be used
res = ydl.urlopen(sanitized_Request(
self._test_url('headers'), headers={'Cookie': 'test=test'})).read().decode('utf-8')
self.assertIn('Cookie: test=test', res)
# Specified Cookie header should be removed on any redirect
res = ydl.urlopen(sanitized_Request(
self._test_url('308-to-headers'), headers={'Cookie': 'test=test'})).read().decode('utf-8')
self.assertNotIn('Cookie: test=test', res)
# Specified Cookie header should override global cookiejar for that request
ydl.cookiejar.set_cookie(compat_http_cookiejar_Cookie(
0, 'test', 'ytdlp', None, False, '127.0.0.1', True,
False, '/headers', True, False, None, False, None, None, {}))
data = ydl.urlopen(sanitized_Request(
self._test_url('headers'), headers={'Cookie': 'test=test'})).read().decode('utf-8')
self.assertNotIn('Cookie: test=ytdlp', data)
self.assertIn('Cookie: test=test', data)
def test_no_compression_compat_header(self):
with FakeYDL() as ydl:
data = ydl.urlopen(
sanitized_Request(
self._test_url('headers'),
headers={'Youtubedl-no-compression': True})).read()
self.assertIn(b'Accept-Encoding: identity', data)
self.assertNotIn(b'youtubedl-no-compression', data.lower())
def test_gzip_trailing_garbage(self):
# https://github.com/ytdl-org/youtube-dl/commit/aa3e950764337ef9800c936f4de89b31c00dfcf5
# https://github.com/ytdl-org/youtube-dl/commit/6f2ec15cee79d35dba065677cad9da7491ec6e6f
with FakeYDL() as ydl:
data = ydl.urlopen(sanitized_Request(self._test_url('trailing_garbage'))).read().decode('utf-8')
self.assertEqual(data, '<html><video src="/vid.mp4" /></html>')
def __test_compression(self, encoding):
with FakeYDL() as ydl:
res = ydl.urlopen(
sanitized_Request(
self._test_url('content-encoding'),
headers={'ytdl-encoding': encoding}))
# decoded encodings are removed: only check for valid decompressed data
self.assertEqual(res.read(), b'<html><video src="/vid.mp4" /></html>')
@unittest.skipUnless(brotli, 'brotli support is not installed')
def test_brotli(self):
self.__test_compression('br')
def test_deflate(self):
self.__test_compression('deflate')
def test_gzip(self):
self.__test_compression('gzip')
def test_multiple_encodings(self):
# https://www.rfc-editor.org/rfc/rfc9110.html#section-8.4
for pair in ('gzip,deflate', 'deflate, gzip', 'gzip, gzip', 'deflate, deflate'):
self.__test_compression(pair)
def test_unsupported_encoding(self):
# it should return the raw content
with FakeYDL() as ydl:
res = ydl.urlopen(
sanitized_Request(
self._test_url('content-encoding'),
headers={'ytdl-encoding': 'unsupported'}))
self.assertEqual(res.headers.get('Content-Encoding'), 'unsupported')
self.assertEqual(res.read(), b'raw')
def test_remove_dot_segments(self):
with FakeYDL() as ydl:
res = ydl.urlopen(sanitized_Request(self._test_url('a/b/./../../headers')))
self.assertEqual(compat_urllib_parse.urlparse(res.geturl()).path, '/headers')
res = ydl.urlopen(sanitized_Request(self._test_url('redirect_dotsegments')))
self.assertEqual(compat_urllib_parse.urlparse(res.geturl()).path, '/headers')
def _build_proxy_handler(name):
@ -117,7 +515,7 @@ def _build_proxy_handler(name):
self.send_response(200)
self.send_header('Content-Type', 'text/plain; charset=utf-8')
self.end_headers()
self.wfile.write('{self.proxy_name}: {self.path}'.format(self=self).encode('utf-8'))
self.wfile.write('{0}: {1}'.format(self.proxy_name, self.path).encode('utf-8'))
return HTTPTestRequestHandler
@ -137,10 +535,30 @@ class TestProxy(unittest.TestCase):
self.geo_proxy_thread.daemon = True
self.geo_proxy_thread.start()
def tearDown(self):
def closer(svr):
def _closer():
svr.shutdown()
svr.server_close()
return _closer
shutdown_thread = threading.Thread(target=closer(self.proxy))
shutdown_thread.start()
self.proxy_thread.join(2.0)
shutdown_thread = threading.Thread(target=closer(self.geo_proxy))
shutdown_thread.start()
self.geo_proxy_thread.join(2.0)
def _test_proxy(self, host='127.0.0.1', port=None):
return '{0}:{1}'.format(
host, port if port is not None else self.port)
def test_proxy(self):
geo_proxy = '127.0.0.1:{0}'.format(self.geo_port)
geo_proxy = self._test_proxy(port=self.geo_port)
ydl = YoutubeDL({
'proxy': '127.0.0.1:{0}'.format(self.port),
'proxy': self._test_proxy(),
'geo_verification_proxy': geo_proxy,
})
url = 'http://foo.com/bar'
@ -154,7 +572,7 @@ class TestProxy(unittest.TestCase):
def test_proxy_with_idn(self):
ydl = YoutubeDL({
'proxy': '127.0.0.1:{0}'.format(self.port),
'proxy': self._test_proxy(),
})
url = 'http://中文.tw/'
response = ydl.urlopen(url).read().decode('utf-8')
@ -162,5 +580,25 @@ class TestProxy(unittest.TestCase):
self.assertEqual(response, 'normal: http://xn--fiq228c.tw/')
class TestFileURL(unittest.TestCase):
# See https://github.com/ytdl-org/youtube-dl/issues/8227
def test_file_urls(self):
tf = tempfile.NamedTemporaryFile(delete=False)
tf.write(b'foobar')
tf.close()
url = compat_urllib_parse.urljoin('file://', pathname2url(tf.name))
with FakeYDL() as ydl:
self.assertRaisesRegexp(
compat_urllib_error.URLError, 'file:// scheme is explicitly disabled in youtube-dl for security reasons', ydl.urlopen, url)
# not yet implemented
"""
with FakeYDL({'enable_file_urls': True}) as ydl:
res = ydl.urlopen(url)
self.assertEqual(res.read(), b'foobar')
res.close()
"""
os.unlink(tf.name)
if __name__ == '__main__':
unittest.main()

View File

@ -8,17 +8,24 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.jsinterp import JSInterpreter
import math
import re
from youtube_dl.jsinterp import JS_Undefined, JSInterpreter
class TestJSInterpreter(unittest.TestCase):
def test_basic(self):
jsi = JSInterpreter('function x(){;}')
self.assertEqual(jsi.call_function('x'), None)
self.assertEqual(repr(jsi.extract_function('x')), 'F<x>')
jsi = JSInterpreter('function x3(){return 42;}')
self.assertEqual(jsi.call_function('x3'), 42)
jsi = JSInterpreter('function x3(){42}')
self.assertEqual(jsi.call_function('x3'), None)
jsi = JSInterpreter('var x5 = function(){return 42;}')
self.assertEqual(jsi.call_function('x5'), 42)
@ -26,6 +33,55 @@ class TestJSInterpreter(unittest.TestCase):
jsi = JSInterpreter('function x4(a){return 2*a+1;}')
self.assertEqual(jsi.call_function('x4', 3), 7)
def test_add(self):
jsi = JSInterpreter('function f(){return 42 + 7;}')
self.assertEqual(jsi.call_function('f'), 49)
jsi = JSInterpreter('function f(){return 42 + undefined;}')
self.assertTrue(math.isnan(jsi.call_function('f')))
jsi = JSInterpreter('function f(){return 42 + null;}')
self.assertEqual(jsi.call_function('f'), 42)
def test_sub(self):
jsi = JSInterpreter('function f(){return 42 - 7;}')
self.assertEqual(jsi.call_function('f'), 35)
jsi = JSInterpreter('function f(){return 42 - undefined;}')
self.assertTrue(math.isnan(jsi.call_function('f')))
jsi = JSInterpreter('function f(){return 42 - null;}')
self.assertEqual(jsi.call_function('f'), 42)
def test_mul(self):
jsi = JSInterpreter('function f(){return 42 * 7;}')
self.assertEqual(jsi.call_function('f'), 294)
jsi = JSInterpreter('function f(){return 42 * undefined;}')
self.assertTrue(math.isnan(jsi.call_function('f')))
jsi = JSInterpreter('function f(){return 42 * null;}')
self.assertEqual(jsi.call_function('f'), 0)
def test_div(self):
jsi = JSInterpreter('function f(a, b){return a / b;}')
self.assertTrue(math.isnan(jsi.call_function('f', 0, 0)))
self.assertTrue(math.isnan(jsi.call_function('f', JS_Undefined, 1)))
self.assertTrue(math.isinf(jsi.call_function('f', 2, 0)))
self.assertEqual(jsi.call_function('f', 0, 3), 0)
def test_mod(self):
jsi = JSInterpreter('function f(){return 42 % 7;}')
self.assertEqual(jsi.call_function('f'), 0)
jsi = JSInterpreter('function f(){return 42 % 0;}')
self.assertTrue(math.isnan(jsi.call_function('f')))
jsi = JSInterpreter('function f(){return 42 % undefined;}')
self.assertTrue(math.isnan(jsi.call_function('f')))
def test_exp(self):
jsi = JSInterpreter('function f(){return 42 ** 2;}')
self.assertEqual(jsi.call_function('f'), 1764)
jsi = JSInterpreter('function f(){return 42 ** undefined;}')
self.assertTrue(math.isnan(jsi.call_function('f')))
jsi = JSInterpreter('function f(){return 42 ** null;}')
self.assertEqual(jsi.call_function('f'), 1)
jsi = JSInterpreter('function f(){return undefined ** 42;}')
self.assertTrue(math.isnan(jsi.call_function('f')))
def test_empty_return(self):
jsi = JSInterpreter('function f(){return; y()}')
self.assertEqual(jsi.call_function('f'), None)
@ -45,14 +101,32 @@ class TestJSInterpreter(unittest.TestCase):
jsi = JSInterpreter('function f(){return 1 << 5;}')
self.assertEqual(jsi.call_function('f'), 32)
jsi = JSInterpreter('function f(){return 2 ** 5}')
self.assertEqual(jsi.call_function('f'), 32)
jsi = JSInterpreter('function f(){return 19 & 21;}')
self.assertEqual(jsi.call_function('f'), 17)
jsi = JSInterpreter('function f(){return 11 >> 2;}')
self.assertEqual(jsi.call_function('f'), 2)
jsi = JSInterpreter('function f(){return []? 2+3: 4;}')
self.assertEqual(jsi.call_function('f'), 5)
jsi = JSInterpreter('function f(){return 1 == 2}')
self.assertEqual(jsi.call_function('f'), False)
jsi = JSInterpreter('function f(){return 0 && 1 || 2;}')
self.assertEqual(jsi.call_function('f'), 2)
jsi = JSInterpreter('function f(){return 0 ?? 42;}')
self.assertEqual(jsi.call_function('f'), 0)
jsi = JSInterpreter('function f(){return "life, the universe and everything" < 42;}')
self.assertFalse(jsi.call_function('f'))
def test_array_access(self):
jsi = JSInterpreter('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2] = 7; return x;}')
jsi = JSInterpreter('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}')
self.assertEqual(jsi.call_function('f'), [5, 2, 7])
def test_parens(self):
@ -62,6 +136,10 @@ class TestJSInterpreter(unittest.TestCase):
jsi = JSInterpreter('function f(){return (1 + 2) * 3;}')
self.assertEqual(jsi.call_function('f'), 9)
def test_quotes(self):
jsi = JSInterpreter(r'function f(){return "a\"\\("}')
self.assertEqual(jsi.call_function('f'), r'a"\(')
def test_assignments(self):
jsi = JSInterpreter('function f(){var x = 20; x = 30 + 1; return x;}')
self.assertEqual(jsi.call_function('f'), 31)
@ -104,13 +182,410 @@ class TestJSInterpreter(unittest.TestCase):
}''')
self.assertEqual(jsi.call_function('x'), [20, 20, 30, 40, 50])
def test_builtins(self):
jsi = JSInterpreter('''
function x() { return NaN }
''')
self.assertTrue(math.isnan(jsi.call_function('x')))
def test_Date(self):
jsi = JSInterpreter('''
function x(dt) { return new Date(dt) - 0; }
''')
self.assertEqual(jsi.call_function('x', 'Wednesday 31 December 1969 18:01:26 MDT'), 86000)
# date format m/d/y
self.assertEqual(jsi.call_function('x', '12/31/1969 18:01:26 MDT'), 86000)
# epoch 0
self.assertEqual(jsi.call_function('x', '1 January 1970 00:00:00 UTC'), 0)
def test_call(self):
jsi = JSInterpreter('''
function x() { return 2; }
function y(a) { return x() + a; }
function y(a) { return x() + (a?a:0); }
function z() { return y(3); }
''')
self.assertEqual(jsi.call_function('z'), 5)
self.assertEqual(jsi.call_function('y'), 2)
def test_if(self):
jsi = JSInterpreter('''
function x() {
let a = 9;
if (0==0) {a++}
return a
}''')
self.assertEqual(jsi.call_function('x'), 10)
jsi = JSInterpreter('''
function x() {
if (0==0) {return 10}
}''')
self.assertEqual(jsi.call_function('x'), 10)
jsi = JSInterpreter('''
function x() {
if (0!=0) {return 1}
else {return 10}
}''')
self.assertEqual(jsi.call_function('x'), 10)
""" # Unsupported
jsi = JSInterpreter('''
function x() {
if (0!=0) return 1;
else {return 10}
}''')
self.assertEqual(jsi.call_function('x'), 10)
"""
def test_elseif(self):
jsi = JSInterpreter('''
function x() {
if (0!=0) {return 1}
else if (1==0) {return 2}
else {return 10}
}''')
self.assertEqual(jsi.call_function('x'), 10)
""" # Unsupported
jsi = JSInterpreter('''
function x() {
if (0!=0) return 1;
else if (1==0) {return 2}
else {return 10}
}''')
self.assertEqual(jsi.call_function('x'), 10)
# etc
"""
def test_for_loop(self):
# function x() { a=0; for (i=0; i-10; i++) {a++} a }
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i++) {a++} return a }
''')
self.assertEqual(jsi.call_function('x'), 10)
def test_while_loop(self):
# function x() { a=0; while (a<10) {a++} a }
jsi = JSInterpreter('''
function x() { a=0; while (a<10) {a++} return a }
''')
self.assertEqual(jsi.call_function('x'), 10)
def test_switch(self):
jsi = JSInterpreter('''
function x(f) { switch(f){
case 1:f+=1;
case 2:f+=2;
case 3:f+=3;break;
case 4:f+=4;
default:f=0;
} return f }
''')
self.assertEqual(jsi.call_function('x', 1), 7)
self.assertEqual(jsi.call_function('x', 3), 6)
self.assertEqual(jsi.call_function('x', 5), 0)
def test_switch_default(self):
jsi = JSInterpreter('''
function x(f) { switch(f){
case 2: f+=2;
default: f-=1;
case 5:
case 6: f+=6;
case 0: break;
case 1: f+=1;
} return f }
''')
self.assertEqual(jsi.call_function('x', 1), 2)
self.assertEqual(jsi.call_function('x', 5), 11)
self.assertEqual(jsi.call_function('x', 9), 14)
def test_try(self):
jsi = JSInterpreter('''
function x() { try{return 10} catch(e){return 5} }
''')
self.assertEqual(jsi.call_function('x'), 10)
def test_catch(self):
jsi = JSInterpreter('''
function x() { try{throw 10} catch(e){return 5} }
''')
self.assertEqual(jsi.call_function('x'), 5)
def test_finally(self):
jsi = JSInterpreter('''
function x() { try{throw 10} finally {return 42} }
''')
self.assertEqual(jsi.call_function('x'), 42)
jsi = JSInterpreter('''
function x() { try{throw 10} catch(e){return 5} finally {return 42} }
''')
self.assertEqual(jsi.call_function('x'), 42)
def test_nested_try(self):
jsi = JSInterpreter('''
function x() {try {
try{throw 10} finally {throw 42}
} catch(e){return 5} }
''')
self.assertEqual(jsi.call_function('x'), 5)
def test_for_loop_continue(self):
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i++) { continue; a++ } return a }
''')
self.assertEqual(jsi.call_function('x'), 0)
def test_for_loop_break(self):
jsi = JSInterpreter('''
function x() { a=0; for (i=0; i-10; i++) { break; a++ } return a }
''')
self.assertEqual(jsi.call_function('x'), 0)
def test_for_loop_try(self):
jsi = JSInterpreter('''
function x() {
for (i=0; i-10; i++) { try { if (i == 5) throw i} catch {return 10} finally {break} };
return 42 }
''')
self.assertEqual(jsi.call_function('x'), 42)
def test_literal_list(self):
jsi = JSInterpreter('''
function x() { return [1, 2, "asdf", [5, 6, 7]][3] }
''')
self.assertEqual(jsi.call_function('x'), [5, 6, 7])
def test_comma(self):
jsi = JSInterpreter('''
function x() { a=5; a -= 1, a+=3; return a }
''')
self.assertEqual(jsi.call_function('x'), 7)
jsi = JSInterpreter('''
function x() { a=5; return (a -= 1, a+=3, a); }
''')
self.assertEqual(jsi.call_function('x'), 7)
jsi = JSInterpreter('''
function x() { return (l=[0,1,2,3], function(a, b){return a+b})((l[1], l[2]), l[3]) }
''')
self.assertEqual(jsi.call_function('x'), 5)
def test_void(self):
jsi = JSInterpreter('''
function x() { return void 42; }
''')
self.assertEqual(jsi.call_function('x'), None)
def test_return_function(self):
jsi = JSInterpreter('''
function x() { return [1, function(){return 1}][1] }
''')
self.assertEqual(jsi.call_function('x')([]), 1)
def test_null(self):
jsi = JSInterpreter('''
function x() { return null; }
''')
self.assertIs(jsi.call_function('x'), None)
jsi = JSInterpreter('''
function x() { return [null > 0, null < 0, null == 0, null === 0]; }
''')
self.assertEqual(jsi.call_function('x'), [False, False, False, False])
jsi = JSInterpreter('''
function x() { return [null >= 0, null <= 0]; }
''')
self.assertEqual(jsi.call_function('x'), [True, True])
def test_undefined(self):
jsi = JSInterpreter('''
function x() { return undefined === undefined; }
''')
self.assertTrue(jsi.call_function('x'))
jsi = JSInterpreter('''
function x() { return undefined; }
''')
self.assertIs(jsi.call_function('x'), JS_Undefined)
jsi = JSInterpreter('''
function x() { let v; return v; }
''')
self.assertIs(jsi.call_function('x'), JS_Undefined)
jsi = JSInterpreter('''
function x() { return [undefined === undefined, undefined == undefined, undefined < undefined, undefined > undefined]; }
''')
self.assertEqual(jsi.call_function('x'), [True, True, False, False])
jsi = JSInterpreter('''
function x() { return [undefined === 0, undefined == 0, undefined < 0, undefined > 0]; }
''')
self.assertEqual(jsi.call_function('x'), [False, False, False, False])
jsi = JSInterpreter('''
function x() { return [undefined >= 0, undefined <= 0]; }
''')
self.assertEqual(jsi.call_function('x'), [False, False])
jsi = JSInterpreter('''
function x() { return [undefined > null, undefined < null, undefined == null, undefined === null]; }
''')
self.assertEqual(jsi.call_function('x'), [False, False, True, False])
jsi = JSInterpreter('''
function x() { return [undefined === null, undefined == null, undefined < null, undefined > null]; }
''')
self.assertEqual(jsi.call_function('x'), [False, True, False, False])
jsi = JSInterpreter('''
function x() { let v; return [42+v, v+42, v**42, 42**v, 0**v]; }
''')
for y in jsi.call_function('x'):
self.assertTrue(math.isnan(y))
jsi = JSInterpreter('''
function x() { let v; return v**0; }
''')
self.assertEqual(jsi.call_function('x'), 1)
jsi = JSInterpreter('''
function x() { let v; return [v>42, v<=42, v&&42, 42&&v]; }
''')
self.assertEqual(jsi.call_function('x'), [False, False, JS_Undefined, JS_Undefined])
jsi = JSInterpreter('function x(){return undefined ?? 42; }')
self.assertEqual(jsi.call_function('x'), 42)
def test_object(self):
jsi = JSInterpreter('''
function x() { return {}; }
''')
self.assertEqual(jsi.call_function('x'), {})
jsi = JSInterpreter('''
function x() { let a = {m1: 42, m2: 0 }; return [a["m1"], a.m2]; }
''')
self.assertEqual(jsi.call_function('x'), [42, 0])
jsi = JSInterpreter('''
function x() { let a; return a?.qq; }
''')
self.assertIs(jsi.call_function('x'), JS_Undefined)
jsi = JSInterpreter('''
function x() { let a = {m1: 42, m2: 0 }; return a?.qq; }
''')
self.assertIs(jsi.call_function('x'), JS_Undefined)
def test_regex(self):
jsi = JSInterpreter('''
function x() { let a=/,,[/,913,/](,)}/; }
''')
self.assertIs(jsi.call_function('x'), None)
jsi = JSInterpreter('''
function x() { let a=/,,[/,913,/](,)}/; "".replace(a, ""); return a; }
''')
attrs = set(('findall', 'finditer', 'match', 'scanner', 'search',
'split', 'sub', 'subn'))
if sys.version_info >= (2, 7):
# documented for 2.6 but may not be found
attrs.update(('flags', 'groupindex', 'groups', 'pattern'))
self.assertSetEqual(set(dir(jsi.call_function('x'))) & attrs, attrs)
jsi = JSInterpreter('''
function x() { let a=/,,[/,913,/](,)}/i; return a; }
''')
self.assertEqual(jsi.call_function('x').flags & ~re.U, re.I)
jsi = JSInterpreter(r'''
function x() { let a="data-name".replace("data-", ""); return a }
''')
self.assertEqual(jsi.call_function('x'), 'name')
jsi = JSInterpreter(r'''
function x() { let a="data-name".replace(new RegExp("^.+-"), ""); return a; }
''')
self.assertEqual(jsi.call_function('x'), 'name')
jsi = JSInterpreter(r'''
function x() { let a="data-name".replace(/^.+-/, ""); return a; }
''')
self.assertEqual(jsi.call_function('x'), 'name')
jsi = JSInterpreter(r'''
function x() { let a="data-name".replace(/a/g, "o"); return a; }
''')
self.assertEqual(jsi.call_function('x'), 'doto-nome')
jsi = JSInterpreter(r'''
function x() { let a="data-name".replaceAll("a", "o"); return a; }
''')
self.assertEqual(jsi.call_function('x'), 'doto-nome')
jsi = JSInterpreter(r'''
function x() { let a=[/[)\\]/]; return a[0]; }
''')
self.assertEqual(jsi.call_function('x').pattern, r'[)\\]')
""" # fails
jsi = JSInterpreter(r'''
function x() { let a=100; a/=/[0-9]+/.exec('divide by 20 today')[0]; }
''')
self.assertEqual(jsi.call_function('x'), 5)
"""
def test_char_code_at(self):
jsi = JSInterpreter('function x(i){return "test".charCodeAt(i)}')
self.assertEqual(jsi.call_function('x', 0), 116)
self.assertEqual(jsi.call_function('x', 1), 101)
self.assertEqual(jsi.call_function('x', 2), 115)
self.assertEqual(jsi.call_function('x', 3), 116)
self.assertEqual(jsi.call_function('x', 4), None)
self.assertEqual(jsi.call_function('x', 'not_a_number'), 116)
def test_bitwise_operators_overflow(self):
jsi = JSInterpreter('function x(){return -524999584 << 5}')
self.assertEqual(jsi.call_function('x'), 379882496)
jsi = JSInterpreter('function x(){return 1236566549 << 5}')
self.assertEqual(jsi.call_function('x'), 915423904)
def test_bitwise_operators_madness(self):
jsi = JSInterpreter('function x(){return null << 5}')
self.assertEqual(jsi.call_function('x'), 0)
jsi = JSInterpreter('function x(){return undefined >> 5}')
self.assertEqual(jsi.call_function('x'), 0)
jsi = JSInterpreter('function x(){return 42 << NaN}')
self.assertEqual(jsi.call_function('x'), 42)
jsi = JSInterpreter('function x(){return 42 << Infinity}')
self.assertEqual(jsi.call_function('x'), 42)
def test_32066(self):
jsi = JSInterpreter("function x(){return Math.pow(3, 5) + new Date('1970-01-01T08:01:42.000+08:00') / 1000 * -239 - -24205;}")
self.assertEqual(jsi.call_function('x'), 70)
def test_unary_operators(self):
jsi = JSInterpreter('function f(){return 2 - - - 2;}')
self.assertEqual(jsi.call_function('f'), 0)
# fails
# jsi = JSInterpreter('function f(){return 2 + - + - - 2;}')
# self.assertEqual(jsi.call_function('f'), 0)
""" # fails so far
def test_packed(self):
jsi = JSInterpreter('''function x(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''')
self.assertEqual(jsi.call_function('x', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|')))
"""
if __name__ == '__main__':

View File

@ -38,6 +38,9 @@ class BaseTestSubtitles(unittest.TestCase):
self.DL = FakeYDL()
self.ie = self.IE()
self.DL.add_info_extractor(self.ie)
if not self.IE.working():
print('Skipping: %s marked as not _WORKING' % self.IE.ie_key())
self.skipTest('IE marked as not _WORKING')
def getInfoDict(self):
info_dict = self.DL.extract_info(self.url, download=False)
@ -56,6 +59,21 @@ class BaseTestSubtitles(unittest.TestCase):
class TestYoutubeSubtitles(BaseTestSubtitles):
# Available subtitles for QRS8MkLhQmM:
# Language formats
# ru vtt, ttml, srv3, srv2, srv1, json3
# fr vtt, ttml, srv3, srv2, srv1, json3
# en vtt, ttml, srv3, srv2, srv1, json3
# nl vtt, ttml, srv3, srv2, srv1, json3
# de vtt, ttml, srv3, srv2, srv1, json3
# ko vtt, ttml, srv3, srv2, srv1, json3
# it vtt, ttml, srv3, srv2, srv1, json3
# zh-Hant vtt, ttml, srv3, srv2, srv1, json3
# hi vtt, ttml, srv3, srv2, srv1, json3
# pt-BR vtt, ttml, srv3, srv2, srv1, json3
# es-MX vtt, ttml, srv3, srv2, srv1, json3
# ja vtt, ttml, srv3, srv2, srv1, json3
# pl vtt, ttml, srv3, srv2, srv1, json3
url = 'QRS8MkLhQmM'
IE = YoutubeIE
@ -64,41 +82,60 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
self.assertEqual(md5(subtitles['it']), '6d752b98c31f1cf8d597050c7a2cb4b5')
self.assertEqual(md5(subtitles['en']), 'ae1bd34126571a77aabd4d276b28044d')
self.assertEqual(md5(subtitles['it']), '0e0b667ba68411d88fd1c5f4f4eab2f9')
for lang in ['fr', 'de']:
self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
def test_youtube_subtitles_ttml_format(self):
def _test_subtitles_format(self, fmt, md5_hash, lang='en'):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'ttml'
self.DL.params['subtitlesformat'] = fmt
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), 'e306f8c42842f723447d9f63ad65df54')
self.assertEqual(md5(subtitles[lang]), md5_hash)
def test_youtube_subtitles_ttml_format(self):
self._test_subtitles_format('ttml', 'c97ddf1217390906fa9fbd34901f3da2')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
self._test_subtitles_format('vtt', 'ae1bd34126571a77aabd4d276b28044d')
def test_youtube_subtitles_json3_format(self):
self._test_subtitles_format('json3', '688dd1ce0981683867e7fe6fde2a224b')
def _test_automatic_captions(self, url, lang):
self.url = url
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = [lang]
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
self.assertTrue(subtitles[lang] is not None)
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
# Available automatic captions for 8YoUxe5ncPo:
# Language formats (all in vtt, ttml, srv3, srv2, srv1, json3)
# gu, zh-Hans, zh-Hant, gd, ga, gl, lb, la, lo, tt, tr,
# lv, lt, tk, th, tg, te, fil, haw, yi, ceb, yo, de, da,
# el, eo, en, eu, et, es, ru, rw, ro, bn, be, bg, uk, jv,
# bs, ja, or, xh, co, ca, cy, cs, ps, pt, pa, vi, pl, hy,
# hr, ht, hu, hmn, hi, ha, mg, uz, ml, mn, mi, mk, ur,
# mt, ms, mr, ug, ta, my, af, sw, is, am,
# *it*, iw, sv, ar,
# su, zu, az, id, ig, nl, no, ne, ny, fr, ku, fy, fa, fi,
# ka, kk, sr, sq, ko, kn, km, st, sk, si, so, sn, sm, sl,
# ky, sd
# ...
self._test_automatic_captions('8YoUxe5ncPo', 'it')
@unittest.skip('ASR subs all in all supported langs now')
def test_youtube_translated_subtitles(self):
# This video has a subtitles track, which can be translated
self.url = 'Ky9eprVWzlI'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
# This video has a subtitles track, which can be translated (#4555)
self._test_automatic_captions('Ky9eprVWzlI', 'it')
def test_youtube_nosubtitles(self):
self.DL.expect_warning('video doesn\'t have subtitles')
self.url = 'n5BB19UTcdA'
# Available automatic captions for 8YoUxe5ncPo:
# ...
# 8YoUxe5ncPo has no subtitles
self.url = '8YoUxe5ncPo'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
@ -128,6 +165,7 @@ class TestDailymotionSubtitles(BaseTestSubtitles):
self.assertFalse(subtitles)
@unittest.skip('IE broken')
class TestTedSubtitles(BaseTestSubtitles):
url = 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html'
IE = TEDIE
@ -152,18 +190,19 @@ class TestVimeoSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['de', 'en', 'es', 'fr']))
self.assertEqual(md5(subtitles['en']), '8062383cf4dec168fc40a088aa6d5888')
self.assertEqual(md5(subtitles['fr']), 'b6191146a6c5d3a452244d853fde6dc8')
self.assertEqual(md5(subtitles['en']), '386cbc9320b94e25cb364b97935e5dd1')
self.assertEqual(md5(subtitles['fr']), 'c9b69eef35bc6641c0d4da8a04f9dfac')
def test_nosubtitles(self):
self.DL.expect_warning('video doesn\'t have subtitles')
self.url = 'http://vimeo.com/56015672'
self.url = 'http://vimeo.com/68093876'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertFalse(subtitles)
@unittest.skip('IE broken')
class TestWallaSubtitles(BaseTestSubtitles):
url = 'http://vod.walla.co.il/movie/2705958/the-yes-men'
IE = WallaIE
@ -185,6 +224,7 @@ class TestWallaSubtitles(BaseTestSubtitles):
self.assertFalse(subtitles)
@unittest.skip('IE broken')
class TestCeskaTelevizeSubtitles(BaseTestSubtitles):
url = 'http://www.ceskatelevize.cz/ivysilani/10600540290-u6-uzasny-svet-techniky'
IE = CeskaTelevizeIE
@ -206,6 +246,7 @@ class TestCeskaTelevizeSubtitles(BaseTestSubtitles):
self.assertFalse(subtitles)
@unittest.skip('IE broken')
class TestLyndaSubtitles(BaseTestSubtitles):
url = 'http://www.lynda.com/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html'
IE = LyndaIE
@ -218,6 +259,7 @@ class TestLyndaSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['en']), '09bbe67222259bed60deaa26997d73a7')
@unittest.skip('IE broken')
class TestNPOSubtitles(BaseTestSubtitles):
url = 'http://www.npo.nl/nos-journaal/28-08-2014/POW_00722860'
IE = NPOIE
@ -230,6 +272,7 @@ class TestNPOSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['nl']), 'fc6435027572b63fb4ab143abd5ad3f4')
@unittest.skip('IE broken')
class TestMTVSubtitles(BaseTestSubtitles):
url = 'http://www.cc.com/video-clips/p63lk0/adam-devine-s-house-party-chasing-white-swans'
IE = ComedyCentralIE
@ -252,9 +295,10 @@ class TestNRKSubtitles(BaseTestSubtitles):
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
self.DL.params['format'] = 'best/bestvideo'
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['no']))
self.assertEqual(md5(subtitles['no']), '544fa917d3197fcbee64634559221cc2')
self.assertEqual(set(subtitles.keys()), set(['nb-ttv']))
self.assertEqual(md5(subtitles['nb-ttv']), '67e06ff02d0deaf975e68f6cb8f6a149')
class TestRaiPlaySubtitles(BaseTestSubtitles):
@ -277,6 +321,7 @@ class TestRaiPlaySubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['it']), '4b3264186fbb103508abe5311cfcb9cd')
@unittest.skip('IE broken - DRM only')
class TestVikiSubtitles(BaseTestSubtitles):
url = 'http://www.viki.com/videos/1060846v-punch-episode-18'
IE = VikiIE
@ -303,6 +348,7 @@ class TestThePlatformSubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['en']), '97e7670cbae3c4d26ae8bcc7fdd78d4b')
@unittest.skip('IE broken')
class TestThePlatformFeedSubtitles(BaseTestSubtitles):
url = 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207'
IE = ThePlatformFeedIE
@ -338,7 +384,7 @@ class TestDemocracynowSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
self.assertEqual(md5(subtitles['en']), 'a3cc4c0b5eadd74d9974f1c1f5101045')
def test_subtitles_in_page(self):
self.url = 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree'
@ -346,7 +392,7 @@ class TestDemocracynowSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
self.assertEqual(md5(subtitles['en']), 'a3cc4c0b5eadd74d9974f1c1f5101045')
if __name__ == '__main__':

View File

@ -5,16 +5,18 @@ from __future__ import unicode_literals
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
dirn = os.path.dirname
sys.path.insert(0, dirn(dirn(os.path.abspath(__file__))))
import errno
import io
import json
import re
import subprocess
from youtube_dl.swfinterp import SWFInterpreter
from youtube_dl.compat import compat_open as open
TEST_DIR = os.path.join(
@ -43,7 +45,7 @@ def _make_testfunc(testfile):
'-static-link-runtime-shared-libraries', as_file])
except OSError as ose:
if ose.errno == errno.ENOENT:
print('mxmlc not found! Skipping test.')
self.skipTest('mxmlc not found!')
return
raise
@ -51,7 +53,7 @@ def _make_testfunc(testfile):
swf_content = swf_f.read()
swfi = SWFInterpreter(swf_content)
with io.open(as_file, 'r', encoding='utf-8') as as_f:
with open(as_file, 'r', encoding='utf-8') as as_f:
as_content = as_f.read()
def _find_spec(key):

View File

@ -2,19 +2,21 @@ from __future__ import unicode_literals
# Allow direct execution
import os
import re
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import io
import re
dirn = os.path.dirname
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
rootDir = dirn(dirn(os.path.abspath(__file__)))
sys.path.insert(0, rootDir)
IGNORED_FILES = [
'setup.py', # http://bugs.python.org/issue13943
'conf.py',
'buildserver.py',
'get-pip.py',
]
IGNORED_DIRS = [
@ -23,6 +25,7 @@ IGNORED_DIRS = [
]
from test.helper import assertRegexpMatches
from youtube_dl.compat import compat_open as open
class TestUnicodeLiterals(unittest.TestCase):
@ -40,7 +43,7 @@ class TestUnicodeLiterals(unittest.TestCase):
continue
fn = os.path.join(dirpath, basename)
with io.open(fn, encoding='utf-8') as inf:
with open(fn, encoding='utf-8') as inf:
code = inf.read()
if "'" not in code and '"' not in code:

View File

@ -12,13 +12,15 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import io
import itertools
import json
import re
import xml.etree.ElementTree
from youtube_dl.utils import (
age_restricted,
args_to_str,
encode_base_n,
base_url,
caesar,
clean_html,
clean_podcast_url,
@ -27,10 +29,12 @@ from youtube_dl.utils import (
detect_exe_version,
determine_ext,
dict_get,
encode_base_n,
encode_compat_str,
encodeFilename,
escape_rfc3986,
escape_url,
expand_path,
extract_attributes,
ExtractorError,
find_xpath_attr,
@ -40,12 +44,16 @@ from youtube_dl.utils import (
get_element_by_attribute,
get_elements_by_class,
get_elements_by_attribute,
get_first,
InAdvancePagedList,
int_or_none,
intlist_to_bytes,
is_html,
join_nonempty,
js_to_json,
LazyList,
limit_length,
lowercase_escape,
merge_dicts,
mimetype2ext,
month_by_name,
@ -54,44 +62,49 @@ from youtube_dl.utils import (
OnDemandPagedList,
orderedSet,
parse_age_limit,
parse_bitrate,
parse_duration,
parse_filesize,
parse_codecs,
parse_count,
parse_iso8601,
parse_resolution,
parse_bitrate,
parse_qs,
pkcs1pad,
read_batch_urls,
sanitize_filename,
sanitize_path,
sanitize_url,
expand_path,
prepend_extension,
replace_extension,
read_batch_urls,
remove_start,
remove_end,
remove_quotes,
replace_extension,
rot47,
sanitize_filename,
sanitize_path,
sanitize_url,
sanitized_Request,
shell_quote,
smuggle_url,
str_or_none,
str_to_int,
strip_jsonp,
strip_or_none,
subtitles_filename,
T,
timeconvert,
traverse_obj,
try_call,
unescapeHTML,
unified_strdate,
unified_timestamp,
unsmuggle_url,
uppercase_escape,
lowercase_escape,
url_basename,
url_or_none,
base_url,
urljoin,
urlencode_postdata,
urshift,
update_url_query,
variadic,
version_tuple,
xpath_with_ns,
xpath_element,
@ -104,7 +117,7 @@ from youtube_dl.utils import (
cli_option,
cli_valueless_option,
cli_bool_option,
parse_codecs,
YoutubeDLHandler,
)
from youtube_dl.compat import (
compat_chr,
@ -112,12 +125,17 @@ from youtube_dl.compat import (
compat_getenv,
compat_os_name,
compat_setenv,
compat_str,
compat_urlparse,
compat_parse_qs,
)
class TestUtil(unittest.TestCase):
# yt-dlp shim
def assertCountEqual(self, expected, got, msg='count should be the same'):
return self.assertEqual(len(tuple(expected)), len(tuple(got)), msg=msg)
def test_timeconvert(self):
self.assertTrue(timeconvert('') is None)
self.assertTrue(timeconvert('bougrg') is None)
@ -236,6 +254,19 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar')
self.assertEqual(sanitize_url('rmtps://foo.bar'), 'rtmps://foo.bar')
self.assertEqual(sanitize_url('https://foo.bar'), 'https://foo.bar')
self.assertEqual(sanitize_url('foo bar'), 'foo bar')
def test_sanitized_Request(self):
self.assertFalse(sanitized_Request('http://foo.bar').has_header('Authorization'))
self.assertFalse(sanitized_Request('http://:foo.bar').has_header('Authorization'))
self.assertEqual(sanitized_Request('http://@foo.bar').get_header('Authorization'),
'Basic Og==')
self.assertEqual(sanitized_Request('http://:pass@foo.bar').get_header('Authorization'),
'Basic OnBhc3M=')
self.assertEqual(sanitized_Request('http://user:@foo.bar').get_header('Authorization'),
'Basic dXNlcjo=')
self.assertEqual(sanitized_Request('http://user:pass@foo.bar').get_header('Authorization'),
'Basic dXNlcjpwYXNz')
def test_expand_path(self):
def env(var):
@ -370,6 +401,9 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540)
self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140)
self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363)
self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1)
self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86)
self.assertEqual(unified_timestamp('12/31/1969 20:01:18 EDT', False), 78)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@ -491,11 +525,14 @@ class TestUtil(unittest.TestCase):
self.assertEqual(float_or_none(set()), None)
def test_int_or_none(self):
self.assertEqual(int_or_none(42), 42)
self.assertEqual(int_or_none('42'), 42)
self.assertEqual(int_or_none(''), None)
self.assertEqual(int_or_none(None), None)
self.assertEqual(int_or_none([]), None)
self.assertEqual(int_or_none(set()), None)
self.assertEqual(int_or_none('42', base=8), 34)
self.assertRaises(TypeError, int_or_none(42, base=8))
def test_str_to_int(self):
self.assertEqual(str_to_int('123,456'), 123456)
@ -662,38 +699,36 @@ class TestUtil(unittest.TestCase):
self.assertTrue(isinstance(data, bytes))
def test_update_url_query(self):
def query_dict(url):
return compat_parse_qs(compat_urlparse.urlparse(url).query)
self.assertEqual(query_dict(update_url_query(
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path', {'quality': ['HD'], 'format': ['mp4']})),
query_dict('http://example.com/path?quality=HD&format=mp4'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path?quality=HD&format=mp4'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path', {'system': ['LINUX', 'WINDOWS']})),
query_dict('http://example.com/path?system=LINUX&system=WINDOWS'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path?system=LINUX&system=WINDOWS'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path', {'fields': 'id,formats,subtitles'})),
query_dict('http://example.com/path?fields=id,formats,subtitles'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path?fields=id,formats,subtitles'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path', {'fields': ('id,formats,subtitles', 'thumbnails')})),
query_dict('http://example.com/path?fields=id,formats,subtitles&fields=thumbnails'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path?fields=id,formats,subtitles&fields=thumbnails'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path?manifest=f4m', {'manifest': []})),
query_dict('http://example.com/path'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path?system=LINUX&system=WINDOWS', {'system': 'LINUX'})),
query_dict('http://example.com/path?system=LINUX'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path?system=LINUX'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path', {'fields': b'id,formats,subtitles'})),
query_dict('http://example.com/path?fields=id,formats,subtitles'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path?fields=id,formats,subtitles'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path', {'width': 1080, 'height': 720})),
query_dict('http://example.com/path?width=1080&height=720'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path?width=1080&height=720'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path', {'bitrate': 5020.43})),
query_dict('http://example.com/path?bitrate=5020.43'))
self.assertEqual(query_dict(update_url_query(
parse_qs('http://example.com/path?bitrate=5020.43'))
self.assertEqual(parse_qs(update_url_query(
'http://example.com/path', {'test': '第二行тест'})),
query_dict('http://example.com/path?test=%E7%AC%AC%E4%BA%8C%E8%A1%8C%D1%82%D0%B5%D1%81%D1%82'))
parse_qs('http://example.com/path?test=%E7%AC%AC%E4%BA%8C%E8%A1%8C%D1%82%D0%B5%D1%81%D1%82'))
def test_multipart_encode(self):
self.assertEqual(
@ -885,6 +920,111 @@ class TestUtil(unittest.TestCase):
)
self.assertEqual(escape_url('http://vimeo.com/56015672#at=0'), 'http://vimeo.com/56015672#at=0')
def test_remove_dot_segments(self):
def remove_dot_segments(p):
q = '' if p.startswith('/') else '/'
p = 'http://example.com' + q + p
p = compat_urlparse.urlsplit(YoutubeDLHandler._fix_path(p)).path
return p[1:] if q else p
self.assertEqual(remove_dot_segments('/a/b/c/./../../g'), '/a/g')
self.assertEqual(remove_dot_segments('mid/content=5/../6'), 'mid/6')
self.assertEqual(remove_dot_segments('/ad/../cd'), '/cd')
self.assertEqual(remove_dot_segments('/ad/../cd/'), '/cd/')
self.assertEqual(remove_dot_segments('/..'), '/')
self.assertEqual(remove_dot_segments('/./'), '/')
self.assertEqual(remove_dot_segments('/./a'), '/a')
self.assertEqual(remove_dot_segments('/abc/./.././d/././e/.././f/./../../ghi'), '/ghi')
self.assertEqual(remove_dot_segments('/'), '/')
self.assertEqual(remove_dot_segments('/t'), '/t')
self.assertEqual(remove_dot_segments('t'), 't')
self.assertEqual(remove_dot_segments(''), '')
self.assertEqual(remove_dot_segments('/../a/b/c'), '/a/b/c')
self.assertEqual(remove_dot_segments('../a'), 'a')
self.assertEqual(remove_dot_segments('./a'), 'a')
self.assertEqual(remove_dot_segments('.'), '')
self.assertEqual(remove_dot_segments('////'), '////')
def test_js_to_json_vars_strings(self):
self.assertDictEqual(
json.loads(js_to_json(
'''{
'null': a,
'nullStr': b,
'true': c,
'trueStr': d,
'false': e,
'falseStr': f,
'unresolvedVar': g,
}''',
{
'a': 'null',
'b': '"null"',
'c': 'true',
'd': '"true"',
'e': 'false',
'f': '"false"',
'g': 'var',
}
)),
{
'null': None,
'nullStr': 'null',
'true': True,
'trueStr': 'true',
'false': False,
'falseStr': 'false',
'unresolvedVar': 'var'
}
)
self.assertDictEqual(
json.loads(js_to_json(
'''{
'int': a,
'intStr': b,
'float': c,
'floatStr': d,
}''',
{
'a': '123',
'b': '"123"',
'c': '1.23',
'd': '"1.23"',
}
)),
{
'int': 123,
'intStr': '123',
'float': 1.23,
'floatStr': '1.23',
}
)
self.assertDictEqual(
json.loads(js_to_json(
'''{
'object': a,
'objectStr': b,
'array': c,
'arrayStr': d,
}''',
{
'a': '{}',
'b': '"{}"',
'c': '[]',
'd': '"[]"',
}
)),
{
'object': {},
'objectStr': '{}',
'array': [],
'arrayStr': '[]',
}
)
def test_js_to_json_realworld(self):
inp = '''{
'clip':{'provider':'pseudo'}
@ -955,10 +1095,10 @@ class TestUtil(unittest.TestCase):
!42: 42
}''')
self.assertEqual(json.loads(on), {
'a': 0,
'b': 1,
'c': 0,
'd': 42.42,
'a': True,
'b': False,
'c': False,
'd': True,
'e': [],
'f': "abc",
'g': "",
@ -1028,10 +1168,26 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{ "040": "040" }')
self.assertEqual(json.loads(on), {'040': '040'})
on = js_to_json('[1,//{},\n2]')
self.assertEqual(json.loads(on), [1, 2])
on = js_to_json(r'"\^\$\#"')
self.assertEqual(json.loads(on), R'^$#', msg='Unnecessary escapes should be stripped')
on = js_to_json('\'"\\""\'')
self.assertEqual(json.loads(on), '"""', msg='Unnecessary quote escape should be escaped')
def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1')
def test_js_to_json_template_literal(self):
self.assertEqual(js_to_json('`Hello ${name}`', {'name': '"world"'}), '"Hello world"')
self.assertEqual(js_to_json('`${name}${name}`', {'name': '"X"'}), '"XX"')
self.assertEqual(js_to_json('`${name}${name}`', {'name': '5'}), '"55"')
self.assertEqual(js_to_json('`${name}"${name}"`', {'name': '5'}), '"5\\"5\\""')
self.assertEqual(js_to_json('`${name}`', {}), '"name"')
def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
@ -1475,6 +1631,414 @@ Line 1
self.assertEqual(clean_podcast_url('https://www.podtrac.com/pts/redirect.mp3/chtbl.com/track/5899E/traffic.megaphone.fm/HSW7835899191.mp3'), 'https://traffic.megaphone.fm/HSW7835899191.mp3')
self.assertEqual(clean_podcast_url('https://play.podtrac.com/npr-344098539/edge1.pod.npr.org/anon.npr-podcasts/podcast/npr/waitwait/2020/10/20201003_waitwait_wwdtmpodcast201003-015621a5-f035-4eca-a9a1-7c118d90bc3c.mp3'), 'https://edge1.pod.npr.org/anon.npr-podcasts/podcast/npr/waitwait/2020/10/20201003_waitwait_wwdtmpodcast201003-015621a5-f035-4eca-a9a1-7c118d90bc3c.mp3')
def test_LazyList(self):
it = list(range(10))
self.assertEqual(list(LazyList(it)), it)
self.assertEqual(LazyList(it).exhaust(), it)
self.assertEqual(LazyList(it)[5], it[5])
self.assertEqual(LazyList(it)[5:], it[5:])
self.assertEqual(LazyList(it)[:5], it[:5])
self.assertEqual(LazyList(it)[::2], it[::2])
self.assertEqual(LazyList(it)[1::2], it[1::2])
self.assertEqual(LazyList(it)[5::-1], it[5::-1])
self.assertEqual(LazyList(it)[6:2:-2], it[6:2:-2])
self.assertEqual(LazyList(it)[::-1], it[::-1])
self.assertTrue(LazyList(it))
self.assertFalse(LazyList(range(0)))
self.assertEqual(len(LazyList(it)), len(it))
self.assertEqual(repr(LazyList(it)), repr(it))
self.assertEqual(compat_str(LazyList(it)), compat_str(it))
self.assertEqual(list(LazyList(it, reverse=True)), it[::-1])
self.assertEqual(list(reversed(LazyList(it))[::-1]), it)
self.assertEqual(list(reversed(LazyList(it))[1:3:7]), it[::-1][1:3:7])
def test_LazyList_laziness(self):
def test(ll, idx, val, cache):
self.assertEqual(ll[idx], val)
self.assertEqual(ll._cache, list(cache))
ll = LazyList(range(10))
test(ll, 0, 0, range(1))
test(ll, 5, 5, range(6))
test(ll, -3, 7, range(10))
ll = LazyList(range(10), reverse=True)
test(ll, -1, 0, range(1))
test(ll, 3, 6, range(10))
ll = LazyList(itertools.count())
test(ll, 10, 10, range(11))
ll = reversed(ll)
test(ll, -15, 14, range(15))
def test_try_call(self):
def total(*x, **kwargs):
return sum(x) + sum(kwargs.values())
self.assertEqual(try_call(None), None,
msg='not a fn should give None')
self.assertEqual(try_call(lambda: 1), 1,
msg='int fn with no expected_type should give int')
self.assertEqual(try_call(lambda: 1, expected_type=int), 1,
msg='int fn with expected_type int should give int')
self.assertEqual(try_call(lambda: 1, expected_type=dict), None,
msg='int fn with wrong expected_type should give None')
self.assertEqual(try_call(total, args=(0, 1, 0, ), expected_type=int), 1,
msg='fn should accept arglist')
self.assertEqual(try_call(total, kwargs={'a': 0, 'b': 1, 'c': 0}, expected_type=int), 1,
msg='fn should accept kwargs')
self.assertEqual(try_call(lambda: 1, expected_type=dict), None,
msg='int fn with no expected_type should give None')
self.assertEqual(try_call(lambda x: {}, total, args=(42, ), expected_type=int), 42,
msg='expect first int result with expected_type int')
def test_variadic(self):
self.assertEqual(variadic(None), (None, ))
self.assertEqual(variadic('spam'), ('spam', ))
self.assertEqual(variadic('spam', allowed_types=dict), 'spam')
self.assertEqual(variadic('spam', allowed_types=[dict]), 'spam')
def test_traverse_obj(self):
str = compat_str
_TEST_DATA = {
100: 100,
1.2: 1.2,
'str': 'str',
'None': None,
'...': Ellipsis,
'urls': [
{'index': 0, 'url': 'https://www.example.com/0'},
{'index': 1, 'url': 'https://www.example.com/1'},
],
'data': (
{'index': 2},
{'index': 3},
),
'dict': {},
}
# define a pukka Iterable
def iter_range(stop):
for from_ in range(stop):
yield from_
# Test base functionality
self.assertEqual(traverse_obj(_TEST_DATA, ('str',)), 'str',
msg='allow tuple path')
self.assertEqual(traverse_obj(_TEST_DATA, ['str']), 'str',
msg='allow list path')
self.assertEqual(traverse_obj(_TEST_DATA, (value for value in ("str",))), 'str',
msg='allow iterable path')
self.assertEqual(traverse_obj(_TEST_DATA, 'str'), 'str',
msg='single items should be treated as a path')
self.assertEqual(traverse_obj(_TEST_DATA, None), _TEST_DATA)
self.assertEqual(traverse_obj(_TEST_DATA, 100), 100)
self.assertEqual(traverse_obj(_TEST_DATA, 1.2), 1.2)
# Test Ellipsis behavior
self.assertCountEqual(traverse_obj(_TEST_DATA, Ellipsis),
(item for item in _TEST_DATA.values() if item not in (None, {})),
msg='`...` should give all non-discarded values')
self.assertCountEqual(traverse_obj(_TEST_DATA, ('urls', 0, Ellipsis)), _TEST_DATA['urls'][0].values(),
msg='`...` selection for dicts should select all values')
self.assertEqual(traverse_obj(_TEST_DATA, (Ellipsis, Ellipsis, 'url')),
['https://www.example.com/0', 'https://www.example.com/1'],
msg='nested `...` queries should work')
self.assertCountEqual(traverse_obj(_TEST_DATA, (Ellipsis, Ellipsis, 'index')), iter_range(4),
msg='`...` query result should be flattened')
self.assertEqual(traverse_obj(iter(range(4)), Ellipsis), list(range(4)),
msg='`...` should accept iterables')
# Test function as key
self.assertEqual(traverse_obj(_TEST_DATA, lambda x, y: x == 'urls' and isinstance(y, list)),
[_TEST_DATA['urls']],
msg='function as query key should perform a filter based on (key, value)')
self.assertCountEqual(traverse_obj(_TEST_DATA, lambda _, x: isinstance(x[0], str)), set(('str',)),
msg='exceptions in the query function should be caught')
self.assertEqual(traverse_obj(iter(range(4)), lambda _, x: x % 2 == 0), [0, 2],
msg='function key should accept iterables')
if __debug__:
with self.assertRaises(Exception, msg='Wrong function signature should raise in debug'):
traverse_obj(_TEST_DATA, lambda a: Ellipsis)
with self.assertRaises(Exception, msg='Wrong function signature should raise in debug'):
traverse_obj(_TEST_DATA, lambda a, b, c: Ellipsis)
# Test set as key (transformation/type, like `expected_type`)
self.assertEqual(traverse_obj(_TEST_DATA, (Ellipsis, T(str.upper), )), ['STR'],
msg='Function in set should be a transformation')
self.assertEqual(traverse_obj(_TEST_DATA, (Ellipsis, T(str))), ['str'],
msg='Type in set should be a type filter')
self.assertEqual(traverse_obj(_TEST_DATA, T(dict)), _TEST_DATA,
msg='A single set should be wrapped into a path')
self.assertEqual(traverse_obj(_TEST_DATA, (Ellipsis, T(str.upper))), ['STR'],
msg='Transformation function should not raise')
self.assertEqual(traverse_obj(_TEST_DATA, (Ellipsis, T(str_or_none))),
[item for item in map(str_or_none, _TEST_DATA.values()) if item is not None],
msg='Function in set should be a transformation')
if __debug__:
with self.assertRaises(Exception, msg='Sets with length != 1 should raise in debug'):
traverse_obj(_TEST_DATA, set())
with self.assertRaises(Exception, msg='Sets with length != 1 should raise in debug'):
traverse_obj(_TEST_DATA, set((str.upper, str)))
# Test `slice` as a key
_SLICE_DATA = [0, 1, 2, 3, 4]
self.assertEqual(traverse_obj(_TEST_DATA, ('dict', slice(1))), None,
msg='slice on a dictionary should not throw')
self.assertEqual(traverse_obj(_SLICE_DATA, slice(1)), _SLICE_DATA[:1],
msg='slice key should apply slice to sequence')
self.assertEqual(traverse_obj(_SLICE_DATA, slice(1, 2)), _SLICE_DATA[1:2],
msg='slice key should apply slice to sequence')
self.assertEqual(traverse_obj(_SLICE_DATA, slice(1, 4, 2)), _SLICE_DATA[1:4:2],
msg='slice key should apply slice to sequence')
# Test alternative paths
self.assertEqual(traverse_obj(_TEST_DATA, 'fail', 'str'), 'str',
msg='multiple `paths` should be treated as alternative paths')
self.assertEqual(traverse_obj(_TEST_DATA, 'str', 100), 'str',
msg='alternatives should exit early')
self.assertEqual(traverse_obj(_TEST_DATA, 'fail', 'fail'), None,
msg='alternatives should return `default` if exhausted')
self.assertEqual(traverse_obj(_TEST_DATA, (Ellipsis, 'fail'), 100), 100,
msg='alternatives should track their own branching return')
self.assertEqual(traverse_obj(_TEST_DATA, ('dict', Ellipsis), ('data', Ellipsis)), list(_TEST_DATA['data']),
msg='alternatives on empty objects should search further')
# Test branch and path nesting
self.assertEqual(traverse_obj(_TEST_DATA, ('urls', (3, 0), 'url')), ['https://www.example.com/0'],
msg='tuple as key should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, ('urls', [3, 0], 'url')), ['https://www.example.com/0'],
msg='list as key should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, ('urls', ((1, 'fail'), (0, 'url')))), ['https://www.example.com/0'],
msg='double nesting in path should be treated as paths')
self.assertEqual(traverse_obj(['0', [1, 2]], [(0, 1), 0]), [1],
msg='do not fail early on branching')
self.assertCountEqual(traverse_obj(_TEST_DATA, ('urls', ((1, ('fail', 'url')), (0, 'url')))),
['https://www.example.com/0', 'https://www.example.com/1'],
msg='triple nesting in path should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, ('urls', ('fail', (Ellipsis, 'url')))),
['https://www.example.com/0', 'https://www.example.com/1'],
msg='ellipsis as branch path start gets flattened')
# Test dictionary as key
self.assertEqual(traverse_obj(_TEST_DATA, {0: 100, 1: 1.2}), {0: 100, 1: 1.2},
msg='dict key should result in a dict with the same keys')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('urls', 0, 'url')}),
{0: 'https://www.example.com/0'},
msg='dict key should allow paths')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('urls', (3, 0), 'url')}),
{0: ['https://www.example.com/0']},
msg='tuple in dict path should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('urls', ((1, 'fail'), (0, 'url')))}),
{0: ['https://www.example.com/0']},
msg='double nesting in dict path should be treated as paths')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('urls', ((1, ('fail', 'url')), (0, 'url')))}),
{0: ['https://www.example.com/1', 'https://www.example.com/0']},
msg='triple nesting in dict path should be treated as branches')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 'fail'}), {},
msg='remove `None` values when top level dict key fails')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 'fail'}, default=Ellipsis), {0: Ellipsis},
msg='use `default` if key fails and `default`')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 'dict'}), {},
msg='remove empty values when dict key')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 'dict'}, default=Ellipsis), {0: Ellipsis},
msg='use `default` when dict key and a default')
self.assertEqual(traverse_obj(_TEST_DATA, {0: {0: 'fail'}}), {},
msg='remove empty values when nested dict key fails')
self.assertEqual(traverse_obj(None, {0: 'fail'}), {},
msg='default to dict if pruned')
self.assertEqual(traverse_obj(None, {0: 'fail'}, default=Ellipsis), {0: Ellipsis},
msg='default to dict if pruned and default is given')
self.assertEqual(traverse_obj(_TEST_DATA, {0: {0: 'fail'}}, default=Ellipsis), {0: {0: Ellipsis}},
msg='use nested `default` when nested dict key fails and `default`')
self.assertEqual(traverse_obj(_TEST_DATA, {0: ('dict', Ellipsis)}), {},
msg='remove key if branch in dict key not successful')
# Testing default parameter behavior
_DEFAULT_DATA = {'None': None, 'int': 0, 'list': []}
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'fail'), None,
msg='default value should be `None`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'fail', 'fail', default=Ellipsis), Ellipsis,
msg='chained fails should result in default')
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'None', 'int'), 0,
msg='should not short cirquit on `None`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'fail', default=1), 1,
msg='invalid dict key should result in `default`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, 'None', default=1), 1,
msg='`None` is a deliberate sentinel and should become `default`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, ('list', 10)), None,
msg='`IndexError` should result in `default`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, (Ellipsis, 'fail'), default=1), 1,
msg='if branched but not successful return `default` if defined, not `[]`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, (Ellipsis, 'fail'), default=None), None,
msg='if branched but not successful return `default` even if `default` is `None`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, (Ellipsis, 'fail')), [],
msg='if branched but not successful return `[]`, not `default`')
self.assertEqual(traverse_obj(_DEFAULT_DATA, ('list', Ellipsis)), [],
msg='if branched but object is empty return `[]`, not `default`')
self.assertEqual(traverse_obj(None, Ellipsis), [],
msg='if branched but object is `None` return `[]`, not `default`')
self.assertEqual(traverse_obj({0: None}, (0, Ellipsis)), [],
msg='if branched but state is `None` return `[]`, not `default`')
branching_paths = [
('fail', Ellipsis),
(Ellipsis, 'fail'),
100 * ('fail',) + (Ellipsis,),
(Ellipsis,) + 100 * ('fail',),
]
for branching_path in branching_paths:
self.assertEqual(traverse_obj({}, branching_path), [],
msg='if branched but state is `None`, return `[]` (not `default`)')
self.assertEqual(traverse_obj({}, 'fail', branching_path), [],
msg='if branching in last alternative and previous did not match, return `[]` (not `default`)')
self.assertEqual(traverse_obj({0: 'x'}, 0, branching_path), 'x',
msg='if branching in last alternative and previous did match, return single value')
self.assertEqual(traverse_obj({0: 'x'}, branching_path, 0), 'x',
msg='if branching in first alternative and non-branching path does match, return single value')
self.assertEqual(traverse_obj({}, branching_path, 'fail'), None,
msg='if branching in first alternative and non-branching path does not match, return `default`')
# Testing expected_type behavior
_EXPECTED_TYPE_DATA = {'str': 'str', 'int': 0}
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=str),
'str', msg='accept matching `expected_type` type')
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=int),
None, msg='reject non-matching `expected_type` type')
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, 'int', expected_type=lambda x: str(x)),
'0', msg='transform type using type function')
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, 'str', expected_type=lambda _: 1 / 0),
None, msg='wrap expected_type function in try_call')
self.assertEqual(traverse_obj(_EXPECTED_TYPE_DATA, Ellipsis, expected_type=str),
['str'], msg='eliminate items that expected_type fails on')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 100, 1: 1.2}, expected_type=int),
{0: 100}, msg='type as expected_type should filter dict values')
self.assertEqual(traverse_obj(_TEST_DATA, {0: 100, 1: 1.2, 2: 'None'}, expected_type=str_or_none),
{0: '100', 1: '1.2'}, msg='function as expected_type should transform dict values')
self.assertEqual(traverse_obj(_TEST_DATA, ({0: 1.2}, 0, set((int_or_none,))), expected_type=int),
1, msg='expected_type should not filter non-final dict values')
self.assertEqual(traverse_obj(_TEST_DATA, {0: {0: 100, 1: 'str'}}, expected_type=int),
{0: {0: 100}}, msg='expected_type should transform deep dict values')
self.assertEqual(traverse_obj(_TEST_DATA, [({0: '...'}, {0: '...'})], expected_type=type(Ellipsis)),
[{0: Ellipsis}, {0: Ellipsis}], msg='expected_type should transform branched dict values')
self.assertEqual(traverse_obj({1: {3: 4}}, [(1, 2), 3], expected_type=int),
[4], msg='expected_type regression for type matching in tuple branching')
self.assertEqual(traverse_obj(_TEST_DATA, ['data', Ellipsis], expected_type=int),
[], msg='expected_type regression for type matching in dict result')
# Test get_all behavior
_GET_ALL_DATA = {'key': [0, 1, 2]}
self.assertEqual(traverse_obj(_GET_ALL_DATA, ('key', Ellipsis), get_all=False), 0,
msg='if not `get_all`, return only first matching value')
self.assertEqual(traverse_obj(_GET_ALL_DATA, Ellipsis, get_all=False), [0, 1, 2],
msg='do not overflatten if not `get_all`')
# Test casesense behavior
_CASESENSE_DATA = {
'KeY': 'value0',
0: {
'KeY': 'value1',
0: {'KeY': 'value2'},
},
# FULLWIDTH LATIN CAPITAL LETTER K
'\uff2bey': 'value3',
}
self.assertEqual(traverse_obj(_CASESENSE_DATA, 'key'), None,
msg='dict keys should be case sensitive unless `casesense`')
self.assertEqual(traverse_obj(_CASESENSE_DATA, 'keY',
casesense=False), 'value0',
msg='allow non matching key case if `casesense`')
self.assertEqual(traverse_obj(_CASESENSE_DATA, '\uff4bey', # FULLWIDTH LATIN SMALL LETTER K
casesense=False), 'value3',
msg='allow non matching Unicode key case if `casesense`')
self.assertEqual(traverse_obj(_CASESENSE_DATA, (0, ('keY',)),
casesense=False), ['value1'],
msg='allow non matching key case in branch if `casesense`')
self.assertEqual(traverse_obj(_CASESENSE_DATA, (0, ((0, 'keY'),)),
casesense=False), ['value2'],
msg='allow non matching key case in branch path if `casesense`')
# Test traverse_string behavior
_TRAVERSE_STRING_DATA = {'str': 'str', 1.2: 1.2}
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', 0)), None,
msg='do not traverse into string if not `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', 0),
_traverse_string=True), 's',
msg='traverse into string if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, (1.2, 1),
_traverse_string=True), '.',
msg='traverse into converted data if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', Ellipsis),
_traverse_string=True), 'str',
msg='`...` should result in string (same value) if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', slice(0, None, 2)),
_traverse_string=True), 'sr',
msg='`slice` should result in string if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', lambda i, v: i or v == 's'),
_traverse_string=True), 'str',
msg='function should result in string if `traverse_string`')
self.assertEqual(traverse_obj(_TRAVERSE_STRING_DATA, ('str', (0, 2)),
_traverse_string=True), ['s', 'r'],
msg='branching should result in list if `traverse_string`')
self.assertEqual(traverse_obj({}, (0, Ellipsis), _traverse_string=True), [],
msg='branching should result in list if `traverse_string`')
self.assertEqual(traverse_obj({}, (0, lambda x, y: True), _traverse_string=True), [],
msg='branching should result in list if `traverse_string`')
self.assertEqual(traverse_obj({}, (0, slice(1)), _traverse_string=True), [],
msg='branching should result in list if `traverse_string`')
# Test is_user_input behavior
_IS_USER_INPUT_DATA = {'range8': list(range(8))}
self.assertEqual(traverse_obj(_IS_USER_INPUT_DATA, ('range8', '3'),
_is_user_input=True), 3,
msg='allow for string indexing if `is_user_input`')
self.assertCountEqual(traverse_obj(_IS_USER_INPUT_DATA, ('range8', '3:'),
_is_user_input=True), tuple(range(8))[3:],
msg='allow for string slice if `is_user_input`')
self.assertCountEqual(traverse_obj(_IS_USER_INPUT_DATA, ('range8', ':4:2'),
_is_user_input=True), tuple(range(8))[:4:2],
msg='allow step in string slice if `is_user_input`')
self.assertCountEqual(traverse_obj(_IS_USER_INPUT_DATA, ('range8', ':'),
_is_user_input=True), range(8),
msg='`:` should be treated as `...` if `is_user_input`')
with self.assertRaises(TypeError, msg='too many params should result in error'):
traverse_obj(_IS_USER_INPUT_DATA, ('range8', ':::'), _is_user_input=True)
# Test re.Match as input obj
mobj = re.match(r'^0(12)(?P<group>3)(4)?$', '0123')
self.assertEqual(traverse_obj(mobj, Ellipsis), [x for x in mobj.groups() if x is not None],
msg='`...` on a `re.Match` should give its `groups()`')
self.assertEqual(traverse_obj(mobj, lambda k, _: k in (0, 2)), ['0123', '3'],
msg='function on a `re.Match` should give groupno, value starting at 0')
self.assertEqual(traverse_obj(mobj, 'group'), '3',
msg='str key on a `re.Match` should give group with that name')
self.assertEqual(traverse_obj(mobj, 2), '3',
msg='int key on a `re.Match` should give group with that name')
self.assertEqual(traverse_obj(mobj, 'gRoUp', casesense=False), '3',
msg='str key on a `re.Match` should respect casesense')
self.assertEqual(traverse_obj(mobj, 'fail'), None,
msg='failing str key on a `re.Match` should return `default`')
self.assertEqual(traverse_obj(mobj, 'gRoUpS', casesense=False), None,
msg='failing str key on a `re.Match` should return `default`')
self.assertEqual(traverse_obj(mobj, 8), None,
msg='failing int key on a `re.Match` should return `default`')
self.assertEqual(traverse_obj(mobj, lambda k, _: k in (0, 'group')), ['0123', '3'],
msg='function on a `re.Match` should give group name as well')
def test_get_first(self):
self.assertEqual(get_first([{'a': None}, {'a': 'spam'}], 'a'), 'spam')
def test_join_nonempty(self):
self.assertEqual(join_nonempty('a', 'b'), 'a-b')
self.assertEqual(join_nonempty(
'a', 'b', 'c', 'd',
from_dict={'a': 'c', 'c': [], 'b': 'd', 'd': None}), 'c-d')
if __name__ == '__main__':
unittest.main()

View File

@ -11,12 +11,11 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, try_rm
import io
import xml.etree.ElementTree
import youtube_dl.YoutubeDL
import youtube_dl.extractor
from youtube_dl.compat import compat_open as open
class YoutubeDL(youtube_dl.YoutubeDL):
@ -51,7 +50,7 @@ class TestAnnotations(unittest.TestCase):
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(ANNOTATIONS_FILE))
annoxml = None
with io.open(ANNOTATIONS_FILE, 'r', encoding='utf-8') as annof:
with open(ANNOTATIONS_FILE, 'r', encoding='utf-8') as annof:
annoxml = xml.etree.ElementTree.parse(annof)
self.assertTrue(annoxml is not None, 'Failed to parse annotations XML')
root = annoxml.getroot()

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
# Allow direct execution
@ -9,11 +10,10 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor import (
YoutubeIE,
YoutubePlaylistIE,
YoutubeTabIE,
YoutubeIE,
)
@ -25,38 +25,23 @@ class TestYoutubeLists(unittest.TestCase):
def test_youtube_playlist_noplaylist(self):
dl = FakeYDL()
dl.params['noplaylist'] = True
dl.params['format'] = 'best'
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
result = dl.extract_info(result['url'], download=False, ie_key=result.get('ie_key'), process=False)
self.assertEqual(YoutubeIE().extract_id(result['url']), 'FXxLjLQi3Fg')
def test_youtube_course(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
# TODO find a > 100 (paginating?) videos course
result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
entries = list(result['entries'])
self.assertEqual(YoutubeIE().extract_id(entries[0]['url']), 'j9WZyLZCBzs')
self.assertEqual(len(entries), 25)
self.assertEqual(YoutubeIE().extract_id(entries[-1]['url']), 'rYefUsYuEp0')
def test_youtube_mix(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries']
self.assertTrue(len(entries) >= 50)
dl.params['format'] = 'best'
ie = YoutubeTabIE(dl)
result = dl.extract_info('https://www.youtube.com/watch?v=tyITL_exICo&list=RDCLAK5uy_kLWIr9gv1XLlPbaDS965-Db4TrBoUTxQ8',
download=False, ie_key=ie.ie_key(), process=True)
entries = (result or {}).get('entries', [{'id': 'not_found', }])
self.assertTrue(len(entries) >= 25)
original_video = entries[0]
self.assertEqual(original_video['id'], 'OQpdSVF_k_w')
def test_youtube_toptracks(self):
print('Skipping: The playlist page gives error 500')
return
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=MCUS')
entries = result['entries']
self.assertEqual(len(entries), 100)
self.assertEqual(original_video['id'], 'tyITL_exICo')
def test_youtube_flat_playlist_extraction(self):
dl = FakeYDL()
@ -67,7 +52,7 @@ class TestYoutubeLists(unittest.TestCase):
entries = list(result['entries'])
self.assertTrue(len(entries) == 1)
video = entries[0]
self.assertEqual(video['_type'], 'url_transparent')
self.assertEqual(video['_type'], 'url')
self.assertEqual(video['ie_key'], 'Youtube')
self.assertEqual(video['id'], 'BaW_jenozKc')
self.assertEqual(video['url'], 'BaW_jenozKc')

View File

@ -8,15 +8,20 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import io
import re
import string
from youtube_dl.compat import (
compat_open as open,
compat_str,
compat_urlretrieve,
)
from test.helper import FakeYDL
from youtube_dl.extractor import YoutubeIE
from youtube_dl.compat import compat_str, compat_urlretrieve
from youtube_dl.jsinterp import JSInterpreter
_TESTS = [
_SIG_TESTS = [
(
'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
86,
@ -64,10 +69,102 @@ _TESTS = [
)
]
_NSIG_TESTS = [
(
'https://www.youtube.com/s/player/7862ca1f/player_ias.vflset/en_US/base.js',
'X_LCxVDjAavgE5t', 'yxJ1dM6iz5ogUg',
),
(
'https://www.youtube.com/s/player/9216d1f7/player_ias.vflset/en_US/base.js',
'SLp9F5bwjAdhE9F-', 'gWnb9IK2DJ8Q1w',
),
(
'https://www.youtube.com/s/player/f8cb7a3b/player_ias.vflset/en_US/base.js',
'oBo2h5euWy6osrUt', 'ivXHpm7qJjJN',
),
(
'https://www.youtube.com/s/player/2dfe380c/player_ias.vflset/en_US/base.js',
'oBo2h5euWy6osrUt', '3DIBbn3qdQ',
),
(
'https://www.youtube.com/s/player/f1ca6900/player_ias.vflset/en_US/base.js',
'cu3wyu6LQn2hse', 'jvxetvmlI9AN9Q',
),
(
'https://www.youtube.com/s/player/8040e515/player_ias.vflset/en_US/base.js',
'wvOFaY-yjgDuIEg5', 'HkfBFDHmgw4rsw',
),
(
'https://www.youtube.com/s/player/e06dea74/player_ias.vflset/en_US/base.js',
'AiuodmaDDYw8d3y4bf', 'ankd8eza2T6Qmw',
),
(
'https://www.youtube.com/s/player/5dd88d1d/player-plasma-ias-phone-en_US.vflset/base.js',
'kSxKFLeqzv_ZyHSAt', 'n8gS8oRlHOxPFA',
),
(
'https://www.youtube.com/s/player/324f67b9/player_ias.vflset/en_US/base.js',
'xdftNy7dh9QGnhW', '22qLGxrmX8F1rA',
),
(
'https://www.youtube.com/s/player/4c3f79c5/player_ias.vflset/en_US/base.js',
'TDCstCG66tEAO5pR9o', 'dbxNtZ14c-yWyw',
),
(
'https://www.youtube.com/s/player/c81bbb4a/player_ias.vflset/en_US/base.js',
'gre3EcLurNY2vqp94', 'Z9DfGxWP115WTg',
),
(
'https://www.youtube.com/s/player/1f7d5369/player_ias.vflset/en_US/base.js',
'batNX7sYqIJdkJ', 'IhOkL_zxbkOZBw',
),
(
'https://www.youtube.com/s/player/009f1d77/player_ias.vflset/en_US/base.js',
'5dwFHw8aFWQUQtffRq', 'audescmLUzI3jw',
),
(
'https://www.youtube.com/s/player/dc0c6770/player_ias.vflset/en_US/base.js',
'5EHDMgYLV6HPGk_Mu-kk', 'n9lUJLHbxUI0GQ',
),
(
'https://www.youtube.com/s/player/c2199353/player_ias.vflset/en_US/base.js',
'5EHDMgYLV6HPGk_Mu-kk', 'AD5rgS85EkrE7',
),
(
'https://www.youtube.com/s/player/113ca41c/player_ias.vflset/en_US/base.js',
'cgYl-tlYkhjT7A', 'hI7BBr2zUgcmMg',
),
(
'https://www.youtube.com/s/player/c57c113c/player_ias.vflset/en_US/base.js',
'-Txvy6bT5R6LqgnQNx', 'dcklJCnRUHbgSg',
),
(
'https://www.youtube.com/s/player/5a3b6271/player_ias.vflset/en_US/base.js',
'B2j7f_UPT4rfje85Lu_e', 'm5DmNymaGQ5RdQ',
),
(
'https://www.youtube.com/s/player/dac945fd/player_ias.vflset/en_US/base.js',
'o8BkRxXhuYsBCWi6RplPdP', '3Lx32v_hmzTm6A',
),
(
'https://www.youtube.com/s/player/6f20102c/player_ias.vflset/en_US/base.js',
'lE8DhoDmKqnmJJ', 'pJTTX6XyJP2BYw',
),
(
'https://www.youtube.com/s/player/cfa9e7cb/player_ias.vflset/en_US/base.js',
'qO0NiMtYQ7TeJnfFG2', 'k9cuJDHNS5O7kQ',
),
(
'https://www.youtube.com/s/player/b7910ca8/player_ias.vflset/en_US/base.js',
'_hXMCwMt9qE310D', 'LoZMgkkofRMCZQ',
),
]
class TestPlayerInfo(unittest.TestCase):
def test_youtube_extract_player_info(self):
PLAYER_URLS = (
('https://www.youtube.com/s/player/4c3f79c5/player_ias.vflset/en_US/base.js', '4c3f79c5'),
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/fr_FR/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-en_US.vflset/base.js', '64dddad9'),
@ -90,40 +187,61 @@ class TestPlayerInfo(unittest.TestCase):
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata')
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata/sigs')
if not os.path.exists(self.TESTDATA_DIR):
os.mkdir(self.TESTDATA_DIR)
def make_tfunc(url, sig_input, expected_sig):
m = re.match(r'.*-([a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?\.[a-z]+$', url)
assert m, '%r should follow URL format' % url
test_id = m.group(1)
def test_func(self):
basename = 'player-%s.js' % test_id
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
ydl = FakeYDL()
ie = YoutubeIE(ydl)
with io.open(fn, encoding='utf-8') as testf:
jscode = testf.read()
func = ie._parse_sig_js(jscode)
src_sig = (
compat_str(string.printable[:sig_input])
if isinstance(sig_input, int) else sig_input)
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)
test_func.__name__ = str('test_signature_js_' + test_id)
setattr(TestSignature, test_func.__name__, test_func)
def tearDown(self):
try:
for f in os.listdir(self.TESTDATA_DIR):
os.remove(f)
except OSError:
pass
for test_spec in _TESTS:
make_tfunc(*test_spec)
def t_factory(name, sig_func, url_pattern):
def make_tfunc(url, sig_input, expected_sig):
m = url_pattern.match(url)
assert m, '%r should follow URL format' % url
test_id = m.group('id')
def test_func(self):
basename = 'player-{0}-{1}.js'.format(name, test_id)
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
with open(fn, encoding='utf-8') as testf:
jscode = testf.read()
self.assertEqual(sig_func(jscode, sig_input), expected_sig)
test_func.__name__ = str('test_{0}_js_{1}'.format(name, test_id))
setattr(TestSignature, test_func.__name__, test_func)
return make_tfunc
def signature(jscode, sig_input):
func = YoutubeIE(FakeYDL())._parse_sig_js(jscode)
src_sig = (
compat_str(string.printable[:sig_input])
if isinstance(sig_input, int) else sig_input)
return func(src_sig)
def n_sig(jscode, sig_input):
funcname = YoutubeIE(FakeYDL())._extract_n_function_name(jscode)
return JSInterpreter(jscode).call_function(funcname, sig_input)
make_sig_test = t_factory(
'signature', signature, re.compile(r'.*-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?\.[a-z]+$'))
for test_spec in _SIG_TESTS:
make_sig_test(*test_spec)
make_nsig_test = t_factory(
'nsig', n_sig, re.compile(r'.+/player/(?P<id>[a-zA-Z0-9_-]+)/.+.js$'))
for test_spec in _NSIG_TESTS:
make_nsig_test(*test_spec)
if __name__ == '__main__':

35
test/testdata/mpd/range_only.mpd vendored Normal file
View File

@ -0,0 +1,35 @@
<?xml version="1.0"?>
<!-- MPD file Generated with GPAC version 1.0.1-revrelease at 2021-11-27T20:53:11.690Z -->
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" minBufferTime="PT1.500S" type="static" mediaPresentationDuration="PT0H0M30.196S" maxSegmentDuration="PT0H0M10.027S" profiles="urn:mpeg:dash:profile:full:2011">
<ProgramInformation moreInformationURL="http://gpac.io">
<Title>manifest.mpd generated by GPAC</Title>
</ProgramInformation>
<Period duration="PT0H0M30.196S">
<AdaptationSet segmentAlignment="true" maxWidth="768" maxHeight="432" maxFrameRate="30000/1001" par="16:9" lang="und" startWithSAP="1">
<Representation id="1" mimeType="video/mp4" codecs="avc1.4D401E" width="768" height="432" frameRate="30000/1001" sar="1:1" bandwidth="526987">
<BaseURL>video_dashinit.mp4</BaseURL>
<SegmentList timescale="90000" duration="900000">
<Initialization range="0-881"/>
<SegmentURL mediaRange="882-876094" indexRange="882-925"/>
<SegmentURL mediaRange="876095-1466732" indexRange="876095-876138"/>
<SegmentURL mediaRange="1466733-1953615" indexRange="1466733-1466776"/>
<SegmentURL mediaRange="1953616-1994211" indexRange="1953616-1953659"/>
</SegmentList>
</Representation>
</AdaptationSet>
<AdaptationSet segmentAlignment="true" lang="und" startWithSAP="1">
<Representation id="2" mimeType="audio/mp4" codecs="mp4a.40.2" audioSamplingRate="48000" bandwidth="98096">
<AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/>
<BaseURL>audio_dashinit.mp4</BaseURL>
<SegmentList timescale="48000" duration="480000">
<Initialization range="0-752"/>
<SegmentURL mediaRange="753-124129" indexRange="753-796"/>
<SegmentURL mediaRange="124130-250544" indexRange="124130-124173"/>
<SegmentURL mediaRange="250545-374929" indexRange="250545-250588"/>
</SegmentList>
</Representation>
</AdaptationSet>
</Period>
</MPD>

351
test/testdata/mpd/subtitles.mpd vendored Normal file
View File

@ -0,0 +1,351 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- Created with Unified Streaming Platform (version=1.10.18-20255) -->
<MPD
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:mpeg:dash:schema:mpd:2011"
xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
type="static"
mediaPresentationDuration="PT14M48S"
maxSegmentDuration="PT1M"
minBufferTime="PT10S"
profiles="urn:mpeg:dash:profile:isoff-live:2011">
<Period
id="1"
duration="PT14M48S">
<BaseURL>dash/</BaseURL>
<AdaptationSet
id="1"
group="1"
contentType="audio"
segmentAlignment="true"
audioSamplingRate="48000"
mimeType="audio/mp4"
codecs="mp4a.40.2"
startWithSAP="1">
<AudioChannelConfiguration
schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011"
value="2" />
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="main" />
<SegmentTemplate
timescale="48000"
initialization="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$.dash"
media="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$-$Time$.dash">
<SegmentTimeline>
<S t="0" d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="96256" r="2" />
<S d="95232" />
<S d="3584" />
</SegmentTimeline>
</SegmentTemplate>
<Representation
id="audio=128001"
bandwidth="128001">
</Representation>
</AdaptationSet>
<AdaptationSet
id="2"
group="3"
contentType="text"
lang="en"
mimeType="application/mp4"
codecs="stpp"
startWithSAP="1">
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle" />
<SegmentTemplate
timescale="1000"
initialization="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$.dash"
media="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$-$Time$.dash">
<SegmentTimeline>
<S t="0" d="60000" r="9" />
<S d="24000" />
</SegmentTimeline>
</SegmentTemplate>
<Representation
id="textstream_eng=1000"
bandwidth="1000">
</Representation>
</AdaptationSet>
<AdaptationSet
id="3"
group="2"
contentType="video"
par="960:409"
minBandwidth="100000"
maxBandwidth="4482000"
maxWidth="1689"
maxHeight="720"
segmentAlignment="true"
mimeType="video/mp4"
codecs="avc1.4D401F"
startWithSAP="1">
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="main" />
<SegmentTemplate
timescale="12288"
initialization="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$.dash"
media="3144-kZT4LWMQw6Rh7Kpd-$RepresentationID$-$Time$.dash">
<SegmentTimeline>
<S t="0" d="24576" r="443" />
</SegmentTimeline>
</SegmentTemplate>
<Representation
id="video=100000"
bandwidth="100000"
width="336"
height="144"
sar="2880:2863"
scanType="progressive">
</Representation>
<Representation
id="video=326000"
bandwidth="326000"
width="562"
height="240"
sar="115200:114929"
scanType="progressive">
</Representation>
<Representation
id="video=698000"
bandwidth="698000"
width="844"
height="360"
sar="86400:86299"
scanType="progressive">
</Representation>
<Representation
id="video=1493000"
bandwidth="1493000"
width="1126"
height="480"
sar="230400:230267"
scanType="progressive">
</Representation>
<Representation
id="video=4482000"
bandwidth="4482000"
width="1688"
height="720"
sar="86400:86299"
scanType="progressive">
</Representation>
</AdaptationSet>
</Period>
</MPD>

32
test/testdata/mpd/url_and_range.mpd vendored Normal file
View File

@ -0,0 +1,32 @@
<?xml version="1.0" ?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" profiles="urn:mpeg:dash:profile:isoff-live:2011" minBufferTime="PT10.01S" mediaPresentationDuration="PT30.097S" type="static">
<!-- Created with Bento4 mp4-dash.py, VERSION=2.0.0-639 -->
<Period>
<!-- Video -->
<AdaptationSet mimeType="video/mp4" segmentAlignment="true" startWithSAP="1" maxWidth="768" maxHeight="432">
<Representation id="video-avc1" codecs="avc1.4D401E" width="768" height="432" scanType="progressive" frameRate="30000/1001" bandwidth="699597">
<SegmentList timescale="1000" duration="10010">
<Initialization sourceURL="video-frag.mp4" range="36-746"/>
<SegmentURL media="video-frag.mp4" mediaRange="747-876117"/>
<SegmentURL media="video-frag.mp4" mediaRange="876118-1466913"/>
<SegmentURL media="video-frag.mp4" mediaRange="1466914-1953954"/>
<SegmentURL media="video-frag.mp4" mediaRange="1953955-1994652"/>
</SegmentList>
</Representation>
</AdaptationSet>
<!-- Audio -->
<AdaptationSet mimeType="audio/mp4" startWithSAP="1" segmentAlignment="true">
<Representation id="audio-und-mp4a.40.2" codecs="mp4a.40.2" bandwidth="98808" audioSamplingRate="48000">
<AudioChannelConfiguration schemeIdUri="urn:mpeg:mpegB:cicp:ChannelConfiguration" value="2"/>
<SegmentList timescale="1000" duration="10010">
<Initialization sourceURL="audio-frag.mp4" range="32-623"/>
<SegmentURL media="audio-frag.mp4" mediaRange="624-124199"/>
<SegmentURL media="audio-frag.mp4" mediaRange="124200-250303"/>
<SegmentURL media="audio-frag.mp4" mediaRange="250304-374365"/>
<SegmentURL media="audio-frag.mp4" mediaRange="374366-374836"/>
</SegmentList>
</Representation>
</AdaptationSet>
</Period>
</MPD>

View File

@ -4,11 +4,9 @@
from __future__ import absolute_import, unicode_literals
import collections
import contextlib
import copy
import datetime
import errno
import fileinput
import io
import itertools
import json
@ -26,25 +24,38 @@ import tokenize
import traceback
import random
try:
from ssl import OPENSSL_VERSION
except ImportError:
# Must be Python 2.6, should be built against 1.0.2
OPENSSL_VERSION = 'OpenSSL 1.0.2(?)'
from string import ascii_letters
from .compat import (
compat_basestring,
compat_cookiejar,
compat_collections_chain_map as ChainMap,
compat_filter as filter,
compat_get_terminal_size,
compat_http_client,
compat_http_cookiejar_Cookie,
compat_http_cookies_SimpleCookie,
compat_integer_types,
compat_kwargs,
compat_map as map,
compat_numeric_types,
compat_open as open,
compat_os_name,
compat_str,
compat_tokenize_tokenize,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urllib_request_DataHandler,
)
from .utils import (
age_restricted,
args_to_str,
bug_reports_message,
ContentTooShortError,
date_from_str,
DateRange,
@ -62,7 +73,9 @@ from .utils import (
GeoRestrictedError,
int_or_none,
ISO3166Utils,
join_nonempty,
locked_file,
LazyList,
make_HTTPS_handler,
MaxDownloadsReached,
orderedSet,
@ -73,6 +86,7 @@ from .utils import (
PostProcessingError,
preferredencoding,
prepend_extension,
process_communicate_or_kill,
register_socks_protocols,
render_table,
replace_extension,
@ -84,6 +98,7 @@ from .utils import (
std_headers,
str_or_none,
subtitles_filename,
traverse_obj,
UnavailableVideoError,
url_basename,
version_tuple,
@ -93,6 +108,7 @@ from .utils import (
YoutubeDLCookieProcessor,
YoutubeDLHandler,
YoutubeDLRedirectHandler,
ytdl_is_updateable,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
@ -362,6 +378,9 @@ class YoutubeDL(object):
self.params.update(params)
self.cache = Cache(self)
self._header_cookies = []
self._load_cookies_from_headers(self.params.get('http_headers'))
def check_deprecated(param, option, suggestion):
if self.params.get(param) is not None:
self.report_warning(
@ -568,7 +587,7 @@ class YoutubeDL(object):
if self.params.get('cookiefile') is not None:
self.cookiejar.save(ignore_discard=True, ignore_expires=True)
def trouble(self, message=None, tb=None):
def trouble(self, *args, **kwargs):
"""Determine action to take when a download problem appears.
Depending on if the downloader has been configured to ignore
@ -577,6 +596,11 @@ class YoutubeDL(object):
tb, if given, is additional traceback information.
"""
# message=None, tb=None, is_error=True
message = args[0] if len(args) > 0 else kwargs.get('message', None)
tb = args[1] if len(args) > 1 else kwargs.get('tb', None)
is_error = args[2] if len(args) > 2 else kwargs.get('is_error', True)
if message is not None:
self.to_stderr(message)
if self.params.get('verbose'):
@ -589,7 +613,10 @@ class YoutubeDL(object):
else:
tb_data = traceback.format_list(traceback.extract_stack())
tb = ''.join(tb_data)
self.to_stderr(tb)
if tb:
self.to_stderr(tb)
if not is_error:
return
if not self.params.get('ignoreerrors', False):
if sys.exc_info()[0] and hasattr(sys.exc_info()[1], 'exc_info') and sys.exc_info()[1].exc_info[0]:
exc_info = sys.exc_info()[1].exc_info
@ -598,11 +625,18 @@ class YoutubeDL(object):
raise DownloadError(message, exc_info)
self._download_retcode = 1
def report_warning(self, message):
def report_warning(self, message, only_once=False, _cache={}):
'''
Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored
'''
if only_once:
m_hash = hash((self, message))
m_cnt = _cache.setdefault(m_hash, 0)
_cache[m_hash] = m_cnt + 1
if m_cnt > 0:
return
if self.params.get('logger') is not None:
self.params['logger'].warning(message)
else:
@ -615,7 +649,7 @@ class YoutubeDL(object):
warning_message = '%s %s' % (_msg_header, message)
self.to_stderr(warning_message)
def report_error(self, message, tb=None):
def report_error(self, message, *args, **kwargs):
'''
Do the same as trouble, but prefixes the message with 'ERROR:', colored
in red if stderr is a tty file.
@ -624,8 +658,18 @@ class YoutubeDL(object):
_msg_header = '\033[0;31mERROR:\033[0m'
else:
_msg_header = 'ERROR:'
error_message = '%s %s' % (_msg_header, message)
self.trouble(error_message, tb)
kwargs['message'] = '%s %s' % (_msg_header, message)
self.trouble(*args, **kwargs)
def report_unscoped_cookies(self, *args, **kwargs):
# message=None, tb=False, is_error=False
if len(args) <= 2:
kwargs.setdefault('is_error', False)
if len(args) <= 0:
kwargs.setdefault(
'message',
'Unscoped cookies are not allowed: please specify some sort of scoping')
self.report_error(*args, **kwargs)
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
@ -720,7 +764,7 @@ class YoutubeDL(object):
filename = encodeFilename(filename, True).decode(preferredencoding())
return sanitize_path(filename)
except ValueError as err:
self.report_error('Error in output template: ' + str(err) + ' (encoding: ' + repr(preferredencoding()) + ')')
self.report_error('Error in output template: ' + error_to_compat_str(err) + ' (encoding: ' + repr(preferredencoding()) + ')')
return None
def _match_entry(self, info_dict, incomplete):
@ -821,7 +865,7 @@ class YoutubeDL(object):
msg += '\nYou might want to use a VPN or a proxy server (with --proxy) to workaround.'
self.report_error(msg)
except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback())
self.report_error(compat_str(e), tb=e.format_traceback())
except MaxDownloadsReached:
raise
except Exception as e:
@ -831,8 +875,83 @@ class YoutubeDL(object):
raise
return wrapper
def _remove_cookie_header(self, http_headers):
"""Filters out `Cookie` header from an `http_headers` dict
The `Cookie` header is removed to prevent leaks as a result of unscoped cookies.
See: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj
@param http_headers An `http_headers` dict from which any `Cookie` header
should be removed, or None
"""
return dict(filter(lambda pair: pair[0].lower() != 'cookie', (http_headers or {}).items()))
def _load_cookies(self, data, **kwargs):
"""Loads cookies from a `Cookie` header
This tries to work around the security vulnerability of passing cookies to every domain.
@param data The Cookie header as a string to load the cookies from
@param autoscope If `False`, scope cookies using Set-Cookie syntax and error for cookie without domains
If `True`, save cookies for later to be stored in the jar with a limited scope
If a URL, save cookies in the jar with the domain of the URL
"""
# autoscope=True (kw-only)
autoscope = kwargs.get('autoscope', True)
for cookie in compat_http_cookies_SimpleCookie(data).values() if data else []:
if autoscope and any(cookie.values()):
raise ValueError('Invalid syntax in Cookie Header')
domain = cookie.get('domain') or ''
expiry = cookie.get('expires')
if expiry == '': # 0 is valid so we check for `''` explicitly
expiry = None
prepared_cookie = compat_http_cookiejar_Cookie(
cookie.get('version') or 0, cookie.key, cookie.value, None, False,
domain, True, True, cookie.get('path') or '', bool(cookie.get('path')),
bool(cookie.get('secure')), expiry, False, None, None, {})
if domain:
self.cookiejar.set_cookie(prepared_cookie)
elif autoscope is True:
self.report_warning(
'Passing cookies as a header is a potential security risk; '
'they will be scoped to the domain of the downloaded urls. '
'Please consider loading cookies from a file or browser instead.',
only_once=True)
self._header_cookies.append(prepared_cookie)
elif autoscope:
self.report_warning(
'The extractor result contains an unscoped cookie as an HTTP header. '
'If you are specifying an input URL, ' + bug_reports_message(),
only_once=True)
self._apply_header_cookies(autoscope, [prepared_cookie])
else:
self.report_unscoped_cookies()
def _load_cookies_from_headers(self, headers):
self._load_cookies(traverse_obj(headers, 'cookie', casesense=False))
def _apply_header_cookies(self, url, cookies=None):
"""This method applies stray header cookies to the provided url
This loads header cookies and scopes them to the domain provided in `url`.
While this is not ideal, it helps reduce the risk of them being sent to
an unintended destination.
"""
parsed = compat_urllib_parse.urlparse(url)
if not parsed.hostname:
return
for cookie in map(copy.copy, cookies or self._header_cookies):
cookie.domain = '.' + parsed.hostname
self.cookiejar.set_cookie(cookie)
@__handle_extraction_exceptions
def __extract_info(self, url, ie, download, extra_info, process):
# Compat with passing cookies in http headers
self._apply_header_cookies(url)
ie_result = ie.extract(url)
if ie_result is None: # Finished already (backwards compatibility; listformats and friends should be moved here)
return
@ -858,7 +977,7 @@ class YoutubeDL(object):
def process_ie_result(self, ie_result, download=True, extra_info={}):
"""
Take the result of the ie(may be modified) and resolve all unresolved
Take the result of the ie (may be modified) and resolve all unresolved
references (URLs, playlist items).
It will also download the videos if 'download'.
@ -1385,17 +1504,16 @@ class YoutubeDL(object):
'abr': formats_info[1].get('abr'),
'ext': output_ext,
}
video_selector, audio_selector = map(_build_selector_function, selector.selector)
def selector_function(ctx):
for pair in itertools.product(
video_selector(copy.deepcopy(ctx)), audio_selector(copy.deepcopy(ctx))):
selector_fn = lambda x: _build_selector_function(x)(ctx)
for pair in itertools.product(*map(selector_fn, selector.selector)):
yield _merge(pair)
filters = [self._build_format_filter(f) for f in selector.filters]
def final_selector(ctx):
ctx_copy = copy.deepcopy(ctx)
ctx_copy = dict(ctx)
for _filter in filters:
ctx_copy['formats'] = list(filter(_filter, ctx_copy['formats']))
return selector_function(ctx_copy)
@ -1430,23 +1548,45 @@ class YoutubeDL(object):
parsed_selector = _parse_format_selection(iter(TokenIterator(tokens)))
return _build_selector_function(parsed_selector)
def _calc_headers(self, info_dict):
res = std_headers.copy()
def _calc_headers(self, info_dict, load_cookies=False):
if load_cookies: # For --load-info-json
# load cookies from http_headers in legacy info.json
self._load_cookies(traverse_obj(info_dict, ('http_headers', 'Cookie'), casesense=False),
autoscope=info_dict['url'])
# load scoped cookies from info.json
self._load_cookies(info_dict.get('cookies'), autoscope=False)
add_headers = info_dict.get('http_headers')
if add_headers:
res.update(add_headers)
cookies = self._calc_cookies(info_dict)
cookies = self.cookiejar.get_cookies_for_url(info_dict['url'])
if cookies:
res['Cookie'] = cookies
# Make a string like name1=val1; attr1=a_val1; ...name2=val2; ...
# By convention a cookie name can't be a well-known attribute name
# so this syntax is unambiguous and can be parsed by (eg) SimpleCookie
encoder = compat_http_cookies_SimpleCookie()
values = []
attributes = (('Domain', '='), ('Path', '='), ('Secure',), ('Expires', '='), ('Version', '='))
attributes = tuple([x[0].lower()] + list(x) for x in attributes)
for cookie in cookies:
_, value = encoder.value_encode(cookie.value)
# Py 2 '' --> '', Py 3 '' --> '""'
if value == '':
value = '""'
values.append('='.join((cookie.name, value)))
for attr in attributes:
value = getattr(cookie, attr[0], None)
if value:
values.append('%s%s' % (''.join(attr[1:]), value if len(attr) == 3 else ''))
info_dict['cookies'] = '; '.join(values)
res = std_headers.copy()
res.update(info_dict.get('http_headers') or {})
res = self._remove_cookie_header(res)
if 'X-Forwarded-For' not in res:
x_forwarded_for_ip = info_dict.get('__x_forwarded_for_ip')
if x_forwarded_for_ip:
res['X-Forwarded-For'] = x_forwarded_for_ip
return res
return res or None
def _calc_cookies(self, info_dict):
pr = sanitized_Request(info_dict['url'])
@ -1529,7 +1669,7 @@ class YoutubeDL(object):
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
info_dict[date_key] = upload_date.strftime('%Y%m%d')
info_dict[date_key] = compat_str(upload_date.strftime('%Y%m%d'))
except (ValueError, OverflowError, OSError):
pass
@ -1569,9 +1709,6 @@ class YoutubeDL(object):
else:
formats = info_dict['formats']
if not formats:
raise ExtractorError('No video formats found!')
def is_wellformed(f):
url = f.get('url')
if not url:
@ -1584,7 +1721,10 @@ class YoutubeDL(object):
return True
# Filter out malformed formats for better extraction robustness
formats = list(filter(is_wellformed, formats))
formats = list(filter(is_wellformed, formats or []))
if not formats:
raise ExtractorError('No video formats found!')
formats_dict = {}
@ -1625,10 +1765,13 @@ class YoutubeDL(object):
format['protocol'] = determine_protocol(format)
# Add HTTP headers, so that external programs can use them from the
# json output
full_format_info = info_dict.copy()
full_format_info.update(format)
format['http_headers'] = self._calc_headers(full_format_info)
# Remove private housekeeping stuff
format['http_headers'] = self._calc_headers(ChainMap(format, info_dict), load_cookies=True)
# Safeguard against old/insecure infojson when using --load-info-json
info_dict['http_headers'] = self._remove_cookie_header(
info_dict.get('http_headers') or {}) or None
# Remove private housekeeping stuff (copied to http_headers in _calc_headers())
if '__x_forwarded_for_ip' in info_dict:
del info_dict['__x_forwarded_for_ip']
@ -1771,17 +1914,16 @@ class YoutubeDL(object):
self.to_stdout(formatSeconds(info_dict['duration']))
print_mandatory('format')
if self.params.get('forcejson', False):
self.to_stdout(json.dumps(info_dict))
self.to_stdout(json.dumps(self.sanitize_info(info_dict)))
def process_info(self, info_dict):
"""Process a single resolved IE result."""
assert info_dict.get('_type', 'video') == 'video'
max_downloads = self.params.get('max_downloads')
if max_downloads is not None:
if self._num_downloads >= int(max_downloads):
raise MaxDownloadsReached()
max_downloads = int_or_none(self.params.get('max_downloads')) or float('inf')
if self._num_downloads >= max_downloads:
raise MaxDownloadsReached()
# TODO: backward compatibility, to be removed
info_dict['fulltitle'] = info_dict['title']
@ -1832,7 +1974,7 @@ class YoutubeDL(object):
else:
try:
self.to_screen('[info] Writing video description to: ' + descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
with open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(info_dict['description'])
except (OSError, IOError):
self.report_error('Cannot write description file ' + descfn)
@ -1847,7 +1989,7 @@ class YoutubeDL(object):
else:
try:
self.to_screen('[info] Writing video annotations to: ' + annofn)
with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
with open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
annofile.write(info_dict['annotations'])
except (KeyError, TypeError):
self.report_warning('There are no annotations to write.')
@ -1874,7 +2016,7 @@ class YoutubeDL(object):
try:
# Use newline='' to prevent conversion of newline characters
# See https://github.com/ytdl-org/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
with open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_info['data'])
except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename)
@ -1883,36 +2025,41 @@ class YoutubeDL(object):
try:
sub_data = ie._request_webpage(
sub_info['url'], info_dict['id'], note=False).read()
with io.open(encodeFilename(sub_filename), 'wb') as subfile:
with open(encodeFilename(sub_filename), 'wb') as subfile:
subfile.write(sub_data)
except (ExtractorError, IOError, OSError, ValueError) as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, error_to_compat_str(err)))
continue
if self.params.get('writeinfojson', False):
infofn = replace_extension(filename, 'info.json', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Video description metadata is already present')
else:
self.to_screen('[info] Writing video description metadata as JSON to: ' + infofn)
try:
write_json_file(self.filter_requested_info(info_dict), infofn)
except (OSError, IOError):
self.report_error('Cannot write metadata to JSON file ' + infofn)
return
self._write_info_json(
'video description', info_dict,
replace_extension(filename, 'info.json', info_dict.get('ext')))
self._write_thumbnails(info_dict, filename)
if not self.params.get('skip_download', False):
try:
def checked_get_suitable_downloader(info_dict, params):
ed_args = params.get('external_downloader_args')
dler = get_suitable_downloader(info_dict, params)
if ed_args and not params.get('external_downloader_args'):
# external_downloader_args was cleared because external_downloader was rejected
self.report_warning('Requested external downloader cannot be used: '
'ignoring --external-downloader-args.')
return dler
def dl(name, info):
fd = get_suitable_downloader(info, self.params)(self, self.params)
fd = checked_get_suitable_downloader(info, self.params)(self, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_screen('[debug] Invoking downloader on %r' % info.get('url'))
return fd.download(name, info)
new_info = dict((k, v) for k, v in info.items() if not k.startswith('__p'))
new_info['http_headers'] = self._calc_headers(new_info)
return fd.download(name, new_info)
if info_dict.get('requested_formats') is not None:
downloaded = []
@ -2049,9 +2196,12 @@ class YoutubeDL(object):
try:
self.post_process(filename, info_dict)
except (PostProcessingError) as err:
self.report_error('postprocessing: %s' % str(err))
self.report_error('postprocessing: %s' % error_to_compat_str(err))
return
self.record_download_archive(info_dict)
# avoid possible nugatory search for further items (PR #26638)
if self._num_downloads >= max_downloads:
raise MaxDownloadsReached()
def download(self, url_list):
"""Download a given list of URLs."""
@ -2074,16 +2224,13 @@ class YoutubeDL(object):
raise
else:
if self.params.get('dump_single_json', False):
self.to_stdout(json.dumps(res))
self.to_stdout(json.dumps(self.sanitize_info(res)))
return self._download_retcode
def download_with_info_file(self, info_filename):
with contextlib.closing(fileinput.FileInput(
[info_filename], mode='r',
openhook=fileinput.hook_encoded('utf-8'))) as f:
# FileInput doesn't have a read method, we can't call json.load
info = self.filter_requested_info(json.loads('\n'.join(f)))
with open(info_filename, encoding='utf-8') as f:
info = self.filter_requested_info(json.load(f))
try:
self.process_ie_result(info, download=True)
except DownloadError:
@ -2096,10 +2243,36 @@ class YoutubeDL(object):
return self._download_retcode
@staticmethod
def filter_requested_info(info_dict):
return dict(
(k, v) for k, v in info_dict.items()
if k not in ['requested_formats', 'requested_subtitles'])
def sanitize_info(info_dict, remove_private_keys=False):
''' Sanitize the infodict for converting to json '''
if info_dict is None:
return info_dict
if remove_private_keys:
reject = lambda k, v: (v is None
or k.startswith('__')
or k in ('requested_formats',
'requested_subtitles'))
else:
reject = lambda k, v: False
def filter_fn(obj):
if isinstance(obj, dict):
return dict((k, filter_fn(v)) for k, v in obj.items() if not reject(k, v))
elif isinstance(obj, (list, tuple, set, LazyList)):
return list(map(filter_fn, obj))
elif obj is None or any(isinstance(obj, c)
for c in (compat_integer_types,
(compat_str, float, bool))):
return obj
else:
return repr(obj)
return filter_fn(info_dict)
@classmethod
def filter_requested_info(cls, info_dict):
return cls.sanitize_info(info_dict, True)
def post_process(self, filename, ie_info):
"""Run all the postprocessors on the given file."""
@ -2306,18 +2479,21 @@ class YoutubeDL(object):
self.get_encoding()))
write_string(encoding_str, encoding=None)
self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
writeln_debug = lambda *s: self._write_string('[debug] %s\n' % (''.join(s), ))
writeln_debug('youtube-dl version ', __version__)
if _LAZY_LOADER:
self._write_string('[debug] Lazy loading extractors enabled' + '\n')
writeln_debug('Lazy loading extractors enabled')
if ytdl_is_updateable():
writeln_debug('Single file build')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
cwd=os.path.dirname(os.path.abspath(__file__)))
out, err = sp.communicate()
out, err = process_communicate_or_kill(sp)
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
self._write_string('[debug] Git HEAD: ' + out + '\n')
writeln_debug('Git HEAD: ', out)
except Exception:
try:
sys.exc_clear()
@ -2330,9 +2506,22 @@ class YoutubeDL(object):
return impl_name + ' version %d.%d.%d' % sys.pypy_version_info[:3]
return impl_name
self._write_string('[debug] Python version %s (%s) - %s\n' % (
platform.python_version(), python_implementation(),
platform_name()))
def libc_ver():
try:
return platform.libc_ver()
except OSError: # We may not have access to the executable
return []
libc = join_nonempty(*libc_ver(), delim=' ')
writeln_debug('Python %s (%s %s %s) - %s - %s%s' % (
platform.python_version(),
python_implementation(),
platform.machine(),
platform.architecture()[0],
platform_name(),
OPENSSL_VERSION,
(' - %s' % (libc, )) if libc else ''
))
exe_versions = FFmpegPostProcessor.get_versions(self)
exe_versions['rtmpdump'] = rtmpdump_version()
@ -2344,17 +2533,17 @@ class YoutubeDL(object):
)
if not exe_str:
exe_str = 'none'
self._write_string('[debug] exe versions: %s\n' % exe_str)
writeln_debug('exe versions: %s' % (exe_str, ))
proxy_map = {}
for handler in self._opener.handlers:
if hasattr(handler, 'proxies'):
proxy_map.update(handler.proxies)
self._write_string('[debug] Proxy map: ' + compat_str(proxy_map) + '\n')
writeln_debug('Proxy map: ', compat_str(proxy_map))
if self.params.get('call_home', False):
ipaddr = self.urlopen('https://yt-dl.org/ip').read().decode('utf-8')
self._write_string('[debug] Public IP address: %s\n' % ipaddr)
writeln_debug('Public IP address: %s' % (ipaddr, ))
latest_version = self.urlopen(
'https://yt-dl.org/latest/version').read().decode('utf-8')
if version_tuple(latest_version) > version_tuple(__version__):
@ -2371,7 +2560,7 @@ class YoutubeDL(object):
opts_proxy = self.params.get('proxy')
if opts_cookiefile is None:
self.cookiejar = compat_cookiejar.CookieJar()
self.cookiejar = YoutubeDLCookieJar()
else:
opts_cookiefile = expand_path(opts_cookiefile)
self.cookiejar = YoutubeDLCookieJar(opts_cookiefile)
@ -2432,6 +2621,28 @@ class YoutubeDL(object):
encoding = preferredencoding()
return encoding
def _write_info_json(self, label, info_dict, infofn, overwrite=None):
if not self.params.get('writeinfojson', False):
return False
def msg(fmt, lbl):
return fmt % (lbl + ' metadata',)
if overwrite is None:
overwrite = not self.params.get('nooverwrites', False)
if not overwrite and os.path.exists(encodeFilename(infofn)):
self.to_screen(msg('[info] %s is already present', label.title()))
return 'exists'
else:
self.to_screen(msg('[info] Writing %s as JSON to: ', label) + infofn)
try:
write_json_file(self.filter_requested_info(info_dict), infofn)
return True
except (OSError, IOError):
self.report_error(msg('Cannot write %s to JSON file ', label) + infofn)
return
def _write_thumbnails(self, info_dict, filename):
if self.params.get('writethumbnail', False):
thumbnails = info_dict.get('thumbnails')

View File

@ -5,7 +5,6 @@ from __future__ import unicode_literals
__license__ = 'Public Domain'
import codecs
import io
import os
import random
@ -17,6 +16,7 @@ from .options import (
)
from .compat import (
compat_getpass,
compat_register_utf8,
compat_shlex_split,
workaround_optparse_bug9161,
)
@ -46,10 +46,8 @@ from .YoutubeDL import YoutubeDL
def _real_main(argv=None):
# Compatibility fixes for Windows
if sys.platform == 'win32':
# https://github.com/ytdl-org/youtube-dl/issues/820
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
# Compatibility fix for Windows
compat_register_utf8()
workaround_optparse_bug9161()

View File

@ -8,6 +8,18 @@ from .utils import bytes_to_intlist, intlist_to_bytes
BLOCK_SIZE_BYTES = 16
def pkcs7_padding(data):
"""
PKCS#7 padding
@param {int[]} data cleartext
@returns {int[]} padding data
"""
remaining_length = BLOCK_SIZE_BYTES - len(data) % BLOCK_SIZE_BYTES
return data + [remaining_length] * remaining_length
def aes_ctr_decrypt(data, key, counter):
"""
Decrypt with aes in counter mode
@ -76,8 +88,7 @@ def aes_cbc_encrypt(data, key, iv):
previous_cipher_block = iv
for i in range(block_count):
block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]
remaining_length = BLOCK_SIZE_BYTES - len(block)
block += [remaining_length] * remaining_length
block = pkcs7_padding(block)
mixed_block = xor(block, previous_cipher_block)
encrypted_block = aes_encrypt(mixed_block, expanded_key)
@ -88,6 +99,28 @@ def aes_cbc_encrypt(data, key, iv):
return encrypted_data
def aes_ecb_encrypt(data, key):
"""
Encrypt with aes in ECB mode. Using PKCS#7 padding
@param {int[]} data cleartext
@param {int[]} key 16/24/32-Byte cipher key
@returns {int[]} encrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
encrypted_data = []
for i in range(block_count):
block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]
block = pkcs7_padding(block)
encrypted_block = aes_encrypt(block, expanded_key)
encrypted_data += encrypted_block
return encrypted_data
def key_expansion(data):
"""
Generate key schedule
@ -303,7 +336,7 @@ def xor(data1, data2):
def rijndael_mul(a, b):
if(a == 0 or b == 0):
if (a == 0 or b == 0):
return 0
return RIJNDAEL_EXP_TABLE[(RIJNDAEL_LOG_TABLE[a] + RIJNDAEL_LOG_TABLE[b]) % 0xFF]

View File

@ -1,21 +1,32 @@
from __future__ import unicode_literals
import errno
import io
import json
import os
import re
import shutil
import traceback
from .compat import compat_getenv
from .compat import (
compat_getenv,
compat_open as open,
)
from .utils import (
error_to_compat_str,
expand_path,
is_outdated_version,
try_get,
write_json_file,
)
from .version import __version__
class Cache(object):
_YTDL_DIR = 'youtube-dl'
_VERSION_KEY = _YTDL_DIR + '_version'
_DEFAULT_VERSION = '2021.12.17'
def __init__(self, ydl):
self._ydl = ydl
@ -23,7 +34,7 @@ class Cache(object):
res = self._ydl.params.get('cachedir')
if res is None:
cache_root = compat_getenv('XDG_CACHE_HOME', '~/.cache')
res = os.path.join(cache_root, 'youtube-dl')
res = os.path.join(cache_root, self._YTDL_DIR)
return expand_path(res)
def _get_cache_fn(self, section, key, dtype):
@ -50,13 +61,22 @@ class Cache(object):
except OSError as ose:
if ose.errno != errno.EEXIST:
raise
write_json_file(data, fn)
write_json_file({self._VERSION_KEY: __version__, 'data': data}, fn)
except Exception:
tb = traceback.format_exc()
self._ydl.report_warning(
'Writing cache to %r failed: %s' % (fn, tb))
def load(self, section, key, dtype='json', default=None):
def _validate(self, data, min_ver):
version = try_get(data, lambda x: x[self._VERSION_KEY])
if not version: # Backward compatibility
data, version = {'data': data}, self._DEFAULT_VERSION
if not is_outdated_version(version, min_ver or '0', assume_new=False):
return data['data']
self._ydl.to_screen(
'Discarding old cache from version {version} (needs {min_ver})'.format(**locals()))
def load(self, section, key, dtype='json', default=None, min_ver=None):
assert dtype in ('json',)
if not self.enabled:
@ -65,13 +85,13 @@ class Cache(object):
cache_fn = self._get_cache_fn(section, key, dtype)
try:
try:
with io.open(cache_fn, 'r', encoding='utf-8') as cachef:
return json.load(cachef)
with open(cache_fn, 'r', encoding='utf-8') as cachef:
return self._validate(json.load(cachef), min_ver)
except ValueError:
try:
file_size = os.path.getsize(cache_fn)
except (OSError, IOError) as oe:
file_size = str(oe)
file_size = error_to_compat_str(oe)
self._ydl.report_warning(
'Cache retrieval from %s failed (%s)' % (cache_fn, file_size))
except IOError:

1667
youtube_dl/casefold.py Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,12 @@
# coding: utf-8
from __future__ import unicode_literals
from __future__ import division
import base64
import binascii
import collections
import ctypes
import datetime
import email
import getpass
import io
@ -19,14 +21,64 @@ import socket
import struct
import subprocess
import sys
import types
import xml.etree.ElementTree
# naming convention
# 'compat_' + Python3_name.replace('.', '_')
# other aliases exist for convenience and/or legacy
# deal with critical unicode/str things first
try:
# Python 2
compat_str, compat_basestring, compat_chr = (
unicode, basestring, unichr
)
except NameError:
compat_str, compat_basestring, compat_chr = (
str, (str, bytes), chr
)
# casefold
try:
compat_str.casefold
compat_casefold = lambda s: s.casefold()
except AttributeError:
from .casefold import casefold as compat_casefold
try:
import collections.abc as compat_collections_abc
except ImportError:
import collections as compat_collections_abc
try:
import urllib.request as compat_urllib_request
except ImportError: # Python 2
import urllib2 as compat_urllib_request
# Also fix up lack of method arg in old Pythons
try:
type(compat_urllib_request.Request('http://127.0.0.1', method='GET'))
except TypeError:
def _add_init_method_arg(cls):
init = cls.__init__
def wrapped_init(self, *args, **kwargs):
method = kwargs.pop('method', 'GET')
init(self, *args, **kwargs)
if any(callable(x.__dict__.get('get_method')) for x in (self.__class__, self) if x != cls):
# allow instance or its subclass to override get_method()
return
if self.has_data() and method == 'GET':
method = 'POST'
self.get_method = types.MethodType(lambda _: method, self)
cls.__init__ = wrapped_init
_add_init_method_arg(compat_urllib_request.Request)
del _add_init_method_arg
try:
import urllib.error as compat_urllib_error
except ImportError: # Python 2
@ -36,26 +88,32 @@ try:
import urllib.parse as compat_urllib_parse
except ImportError: # Python 2
import urllib as compat_urllib_parse
import urlparse as _urlparse
for a in dir(_urlparse):
if not hasattr(compat_urllib_parse, a):
setattr(compat_urllib_parse, a, getattr(_urlparse, a))
del _urlparse
try:
from urllib.parse import urlparse as compat_urllib_parse_urlparse
except ImportError: # Python 2
from urlparse import urlparse as compat_urllib_parse_urlparse
try:
import urllib.parse as compat_urlparse
except ImportError: # Python 2
import urlparse as compat_urlparse
# unfavoured aliases
compat_urlparse = compat_urllib_parse
compat_urllib_parse_urlparse = compat_urllib_parse.urlparse
try:
import urllib.response as compat_urllib_response
except ImportError: # Python 2
import urllib as compat_urllib_response
try:
compat_urllib_response.addinfourl.status
except AttributeError:
# .getcode() is deprecated in Py 3.
compat_urllib_response.addinfourl.status = property(lambda self: self.getcode())
try:
import http.cookiejar as compat_cookiejar
except ImportError: # Python 2
import cookielib as compat_cookiejar
compat_http_cookiejar = compat_cookiejar
if sys.version_info[0] == 2:
class compat_cookiejar_Cookie(compat_cookiejar.Cookie):
@ -67,20 +125,35 @@ if sys.version_info[0] == 2:
compat_cookiejar.Cookie.__init__(self, version, name, value, *args, **kwargs)
else:
compat_cookiejar_Cookie = compat_cookiejar.Cookie
compat_http_cookiejar_Cookie = compat_cookiejar_Cookie
try:
import http.cookies as compat_cookies
except ImportError: # Python 2
import Cookie as compat_cookies
compat_http_cookies = compat_cookies
if sys.version_info[0] == 2:
if sys.version_info[0] == 2 or sys.version_info < (3, 3):
class compat_cookies_SimpleCookie(compat_cookies.SimpleCookie):
def load(self, rawdata):
if isinstance(rawdata, compat_str):
rawdata = str(rawdata)
return super(compat_cookies_SimpleCookie, self).load(rawdata)
must_have_value = 0
if not isinstance(rawdata, dict):
if sys.version_info[:2] != (2, 7) or sys.platform.startswith('java'):
# attribute must have value for parsing
rawdata, must_have_value = re.subn(
r'(?i)(;\s*)(secure|httponly)(\s*(?:;|$))', r'\1\2=\2\3', rawdata)
if sys.version_info[0] == 2:
if isinstance(rawdata, compat_str):
rawdata = str(rawdata)
super(compat_cookies_SimpleCookie, self).load(rawdata)
if must_have_value > 0:
for morsel in self.values():
for attr in ('secure', 'httponly'):
if morsel.get(attr):
morsel[attr] = True
else:
compat_cookies_SimpleCookie = compat_cookies.SimpleCookie
compat_http_cookies_SimpleCookie = compat_cookies_SimpleCookie
try:
import html.entities as compat_html_entities
@ -2329,39 +2402,45 @@ try:
import http.client as compat_http_client
except ImportError: # Python 2
import httplib as compat_http_client
try:
compat_http_client.HTTPResponse.getcode
except AttributeError:
# Py < 3.1
compat_http_client.HTTPResponse.getcode = lambda self: self.status
try:
from urllib.error import HTTPError as compat_HTTPError
except ImportError: # Python 2
from urllib2 import HTTPError as compat_HTTPError
compat_urllib_HTTPError = compat_HTTPError
try:
from urllib.request import urlretrieve as compat_urlretrieve
except ImportError: # Python 2
from urllib import urlretrieve as compat_urlretrieve
compat_urllib_request_urlretrieve = compat_urlretrieve
try:
from HTMLParser import (
HTMLParser as compat_HTMLParser,
HTMLParseError as compat_HTMLParseError)
except ImportError: # Python 3
from html.parser import HTMLParser as compat_HTMLParser
except ImportError: # Python 2
from HTMLParser import HTMLParser as compat_HTMLParser
try: # Python 2
from HTMLParser import HTMLParseError as compat_HTMLParseError
except ImportError: # Python <3.4
try:
from html.parser import HTMLParseError as compat_HTMLParseError
except ImportError: # Python >3.4
# HTMLParseError has been deprecated in Python 3.3 and removed in
# HTMLParseError was deprecated in Python 3.3 and removed in
# Python 3.5. Introducing dummy exception for Python >3.5 for compatible
# and uniform cross-version exception handling
class compat_HTMLParseError(Exception):
pass
compat_html_parser_HTMLParser = compat_HTMLParser
compat_html_parser_HTMLParseError = compat_HTMLParseError
try:
from subprocess import DEVNULL
compat_subprocess_get_DEVNULL = lambda: DEVNULL
except ImportError:
_DEVNULL = subprocess.DEVNULL
compat_subprocess_get_DEVNULL = lambda: _DEVNULL
except AttributeError:
compat_subprocess_get_DEVNULL = lambda: open(os.path.devnull, 'w')
try:
@ -2369,15 +2448,12 @@ try:
except ImportError:
import BaseHTTPServer as compat_http_server
try:
compat_str = unicode # Python 2
except NameError:
compat_str = str
try:
from urllib.parse import unquote_to_bytes as compat_urllib_parse_unquote_to_bytes
from urllib.parse import unquote as compat_urllib_parse_unquote
from urllib.parse import unquote_plus as compat_urllib_parse_unquote_plus
from urllib.parse import urlencode as compat_urllib_parse_urlencode
from urllib.parse import parse_qs as compat_parse_qs
except ImportError: # Python 2
_asciire = (compat_urllib_parse._asciire if hasattr(compat_urllib_parse, '_asciire')
else re.compile(r'([\x00-\x7f]+)'))
@ -2444,9 +2520,6 @@ except ImportError: # Python 2
string = string.replace('+', ' ')
return compat_urllib_parse_unquote(string, encoding, errors)
try:
from urllib.parse import urlencode as compat_urllib_parse_urlencode
except ImportError: # Python 2
# Python 2 will choke in urlencode on mixture of byte and unicode strings.
# Possible solutions are to either port it from python 3 with all
# the friends or manually ensure input query contains only byte strings.
@ -2468,7 +2541,62 @@ except ImportError: # Python 2
def encode_list(l):
return [encode_elem(e) for e in l]
return compat_urllib_parse.urlencode(encode_elem(query), doseq=doseq)
return compat_urllib_parse._urlencode(encode_elem(query), doseq=doseq)
# HACK: The following is the correct parse_qs implementation from cpython 3's stdlib.
# Python 2's version is apparently totally broken
def _parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
encoding='utf-8', errors='replace'):
qs, _coerce_result = qs, compat_str
pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
r = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError('bad query field: %r' % (name_value,))
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = nv[0].replace('+', ' ')
name = compat_urllib_parse_unquote(
name, encoding=encoding, errors=errors)
name = _coerce_result(name)
value = nv[1].replace('+', ' ')
value = compat_urllib_parse_unquote(
value, encoding=encoding, errors=errors)
value = _coerce_result(value)
r.append((name, value))
return r
def compat_parse_qs(qs, keep_blank_values=False, strict_parsing=False,
encoding='utf-8', errors='replace'):
parsed_result = {}
pairs = _parse_qsl(qs, keep_blank_values, strict_parsing,
encoding=encoding, errors=errors)
for name, value in pairs:
if name in parsed_result:
parsed_result[name].append(value)
else:
parsed_result[name] = [value]
return parsed_result
setattr(compat_urllib_parse, '_urlencode',
getattr(compat_urllib_parse, 'urlencode'))
for name, fix in (
('unquote_to_bytes', compat_urllib_parse_unquote_to_bytes),
('parse_unquote', compat_urllib_parse_unquote),
('unquote_plus', compat_urllib_parse_unquote_plus),
('urlencode', compat_urllib_parse_urlencode),
('parse_qs', compat_parse_qs)):
setattr(compat_urllib_parse, name, fix)
compat_urllib_parse_parse_qs = compat_parse_qs
try:
from urllib.request import DataHandler as compat_urllib_request_DataHandler
@ -2504,21 +2632,11 @@ except ImportError: # Python < 3.4
return compat_urllib_response.addinfourl(io.BytesIO(data), headers, url)
try:
compat_basestring = basestring # Python 2
except NameError:
compat_basestring = str
try:
compat_chr = unichr # Python 2
except NameError:
compat_chr = chr
try:
from xml.etree.ElementTree import ParseError as compat_xml_parse_error
except ImportError: # Python 2.6
from xml.parsers.expat import ExpatError as compat_xml_parse_error
compat_xml_etree_ElementTree_ParseError = compat_xml_parse_error
etree = xml.etree.ElementTree
@ -2532,10 +2650,11 @@ try:
# xml.etree.ElementTree.Element is a method in Python <=2.6 and
# the following will crash with:
# TypeError: isinstance() arg 2 must be a class, type, or tuple of classes and types
isinstance(None, xml.etree.ElementTree.Element)
isinstance(None, etree.Element)
from xml.etree.ElementTree import Element as compat_etree_Element
except TypeError: # Python <=2.6
from xml.etree.ElementTree import _ElementInterface as compat_etree_Element
compat_xml_etree_ElementTree_Element = compat_etree_Element
if sys.version_info[0] >= 3:
def compat_etree_fromstring(text):
@ -2591,6 +2710,7 @@ else:
if k == uri or v == prefix:
del etree._namespace_map[k]
etree._namespace_map[uri] = prefix
compat_xml_etree_register_namespace = compat_etree_register_namespace
if sys.version_info < (2, 7):
# Here comes the crazy part: In 2.6, if the xpath is a unicode,
@ -2602,53 +2722,6 @@ if sys.version_info < (2, 7):
else:
compat_xpath = lambda xpath: xpath
try:
from urllib.parse import parse_qs as compat_parse_qs
except ImportError: # Python 2
# HACK: The following is the correct parse_qs implementation from cpython 3's stdlib.
# Python 2's version is apparently totally broken
def _parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
encoding='utf-8', errors='replace'):
qs, _coerce_result = qs, compat_str
pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
r = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError('bad query field: %r' % (name_value,))
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = nv[0].replace('+', ' ')
name = compat_urllib_parse_unquote(
name, encoding=encoding, errors=errors)
name = _coerce_result(name)
value = nv[1].replace('+', ' ')
value = compat_urllib_parse_unquote(
value, encoding=encoding, errors=errors)
value = _coerce_result(value)
r.append((name, value))
return r
def compat_parse_qs(qs, keep_blank_values=False, strict_parsing=False,
encoding='utf-8', errors='replace'):
parsed_result = {}
pairs = _parse_qsl(qs, keep_blank_values, strict_parsing,
encoding=encoding, errors=errors)
for name, value in pairs:
if name in parsed_result:
parsed_result[name].append(value)
else:
parsed_result[name] = [value]
return parsed_result
compat_os_name = os._name if os.name == 'java' else os.name
@ -2773,6 +2846,8 @@ else:
else:
compat_expanduser = os.path.expanduser
compat_os_path_expanduser = compat_expanduser
if compat_os_name == 'nt' and sys.version_info < (3, 8):
# os.path.realpath on Windows does not follow symbolic links
@ -2784,6 +2859,8 @@ if compat_os_name == 'nt' and sys.version_info < (3, 8):
else:
compat_realpath = os.path.realpath
compat_os_path_realpath = compat_realpath
if sys.version_info < (3, 0):
def compat_print(s):
@ -2804,11 +2881,15 @@ if sys.version_info < (3, 0) and sys.platform == 'win32':
else:
compat_getpass = getpass.getpass
compat_getpass_getpass = compat_getpass
try:
compat_input = raw_input
except NameError: # Python 3
compat_input = input
# Python < 2.6.5 require kwargs to be bytes
try:
def _testfunc(x):
@ -2859,6 +2940,51 @@ else:
compat_socket_create_connection = socket.create_connection
try:
from contextlib import suppress as compat_contextlib_suppress
except ImportError:
class compat_contextlib_suppress(object):
_exceptions = None
def __init__(self, *exceptions):
super(compat_contextlib_suppress, self).__init__()
# TODO: [Base]ExceptionGroup (3.12+)
self._exceptions = exceptions
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
return exc_val is not None and isinstance(exc_val, self._exceptions or tuple())
# subprocess.Popen context manager
# avoids leaking handles if .communicate() is not called
try:
_Popen = subprocess.Popen
# check for required context manager attributes
_Popen.__enter__ and _Popen.__exit__
compat_subprocess_Popen = _Popen
except AttributeError:
# not a context manager - make one
from contextlib import contextmanager
@contextmanager
def compat_subprocess_Popen(*args, **kwargs):
popen = None
try:
popen = _Popen(*args, **kwargs)
yield popen
finally:
if popen:
for f in (popen.stdin, popen.stdout, popen.stderr):
if f:
# repeated .close() is OK, but just in case
with compat_contextlib_suppress(EnvironmentError):
f.close()
popen.wait()
# Fix https://github.com/ytdl-org/youtube-dl/issues/4223
# See http://bugs.python.org/issue9161 for what is broken
def workaround_optparse_bug9161():
@ -2886,6 +3012,7 @@ else:
_terminal_size = collections.namedtuple('terminal_size', ['columns', 'lines'])
def compat_get_terminal_size(fallback=(80, 24)):
from .utils import process_communicate_or_kill
columns = compat_getenv('COLUMNS')
if columns:
columns = int(columns)
@ -2902,7 +3029,7 @@ else:
sp = subprocess.Popen(
['stty', 'size'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = sp.communicate()
out, err = process_communicate_or_kill(sp)
_lines, _columns = map(int, out.split())
except Exception:
_columns, _lines = _terminal_size(*fallback)
@ -2913,15 +3040,16 @@ else:
lines = _lines
return _terminal_size(columns, lines)
try:
itertools.count(start=0, step=1)
compat_itertools_count = itertools.count
except TypeError: # Python 2.6
def compat_itertools_count(start=0, step=1):
n = start
while True:
yield n
n += step
yield start
start += step
if sys.version_info >= (3, 0):
from tokenize import tokenize as compat_tokenize_tokenize
@ -2962,6 +3090,24 @@ else:
compat_Struct = struct.Struct
# compat_map/filter() returning an iterator, supposedly the
# same versioning as for zip below
try:
from future_builtins import map as compat_map
except ImportError:
try:
from itertools import imap as compat_map
except ImportError:
compat_map = map
try:
from future_builtins import filter as compat_filter
except ImportError:
try:
from itertools import ifilter as compat_filter
except ImportError:
compat_filter = filter
try:
from future_builtins import zip as compat_zip
except ImportError: # not 2.6+ or is 3.x
@ -2971,6 +3117,82 @@ except ImportError: # not 2.6+ or is 3.x
compat_zip = zip
# method renamed between Py2/3
try:
from itertools import zip_longest as compat_itertools_zip_longest
except ImportError:
from itertools import izip_longest as compat_itertools_zip_longest
# new class in collections
try:
from collections import ChainMap as compat_collections_chain_map
# Py3.3's ChainMap is deficient
if sys.version_info < (3, 4):
raise ImportError
except ImportError:
# Py <= 3.3
class compat_collections_chain_map(compat_collections_abc.MutableMapping):
maps = [{}]
def __init__(self, *maps):
self.maps = list(maps) or [{}]
def __getitem__(self, k):
for m in self.maps:
if k in m:
return m[k]
raise KeyError(k)
def __setitem__(self, k, v):
self.maps[0].__setitem__(k, v)
return
def __contains__(self, k):
return any((k in m) for m in self.maps)
def __delitem(self, k):
if k in self.maps[0]:
del self.maps[0][k]
return
raise KeyError(k)
def __delitem__(self, k):
self.__delitem(k)
def __iter__(self):
return itertools.chain(*reversed(self.maps))
def __len__(self):
return len(iter(self))
# to match Py3, don't del directly
def pop(self, k, *args):
if self.__contains__(k):
off = self.__getitem__(k)
self.__delitem(k)
return off
elif len(args) > 0:
return args[0]
raise KeyError(k)
def new_child(self, m=None, **kwargs):
m = m or {}
m.update(kwargs)
return compat_collections_chain_map(m, *self.maps)
@property
def parents(self):
return compat_collections_chain_map(*(self.maps[1:]))
# Pythons disagree on the type of a pattern (RegexObject, _sre.SRE_Pattern, Pattern, ...?)
compat_re_Pattern = type(re.compile(''))
# and on the type of a match
compat_re_Match = type(re.match('a', 'a'))
if sys.version_info < (3, 3):
def compat_b64decode(s, *args, **kwargs):
if isinstance(s, compat_str):
@ -2979,6 +3201,8 @@ if sys.version_info < (3, 3):
else:
compat_b64decode = base64.b64decode
compat_base64_b64decode = compat_b64decode
if platform.python_implementation() == 'PyPy' and sys.pypy_version_info < (5, 4, 0):
# PyPy2 prior to version 5.4.0 expects byte strings as Windows function
@ -2998,26 +3222,96 @@ else:
return ctypes.WINFUNCTYPE(*args, **kwargs)
__all__ = [
if sys.version_info < (3, 0):
# open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True) not: opener=None
def compat_open(file_, *args, **kwargs):
if len(args) > 6 or 'opener' in kwargs:
raise ValueError('open: unsupported argument "opener"')
return io.open(file_, *args, **kwargs)
else:
compat_open = open
# compat_register_utf8
def compat_register_utf8():
if sys.platform == 'win32':
# https://github.com/ytdl-org/youtube-dl/issues/820
from codecs import register, lookup
register(
lambda name: lookup('utf-8') if name == 'cp65001' else None)
# compat_datetime_timedelta_total_seconds
try:
compat_datetime_timedelta_total_seconds = datetime.timedelta.total_seconds
except AttributeError:
# Py 2.6
def compat_datetime_timedelta_total_seconds(td):
return (td.microseconds + (td.seconds + td.days * 24 * 3600) * 10**6) / 10**6
# optional decompression packages
# PyPi brotli package implements 'br' Content-Encoding
try:
import brotli as compat_brotli
except ImportError:
compat_brotli = None
# PyPi ncompress package implements 'compress' Content-Encoding
try:
import ncompress as compat_ncompress
except ImportError:
compat_ncompress = None
legacy = [
'compat_HTMLParseError',
'compat_HTMLParser',
'compat_HTTPError',
'compat_Struct',
'compat_b64decode',
'compat_basestring',
'compat_chr',
'compat_cookiejar',
'compat_cookiejar_Cookie',
'compat_cookies',
'compat_cookies_SimpleCookie',
'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element',
'compat_etree_fromstring',
'compat_etree_register_namespace',
'compat_expanduser',
'compat_getpass',
'compat_parse_qs',
'compat_realpath',
'compat_urllib_parse_parse_qs',
'compat_urllib_parse_unquote',
'compat_urllib_parse_unquote_plus',
'compat_urllib_parse_unquote_to_bytes',
'compat_urllib_parse_urlencode',
'compat_urllib_parse_urlparse',
'compat_urlparse',
'compat_urlretrieve',
'compat_xml_parse_error',
]
__all__ = [
'compat_html_parser_HTMLParseError',
'compat_html_parser_HTMLParser',
'compat_Struct',
'compat_base64_b64decode',
'compat_basestring',
'compat_brotli',
'compat_casefold',
'compat_chr',
'compat_collections_abc',
'compat_collections_chain_map',
'compat_datetime_timedelta_total_seconds',
'compat_http_cookiejar',
'compat_http_cookiejar_Cookie',
'compat_http_cookies',
'compat_http_cookies_SimpleCookie',
'compat_contextlib_suppress',
'compat_ctypes_WINFUNCTYPE',
'compat_etree_fromstring',
'compat_filter',
'compat_get_terminal_size',
'compat_getenv',
'compat_getpass',
'compat_getpass_getpass',
'compat_html_entities',
'compat_html_entities_html5',
'compat_http_client',
@ -3025,13 +3319,20 @@ __all__ = [
'compat_input',
'compat_integer_types',
'compat_itertools_count',
'compat_itertools_zip_longest',
'compat_kwargs',
'compat_map',
'compat_ncompress',
'compat_numeric_types',
'compat_open',
'compat_ord',
'compat_os_name',
'compat_parse_qs',
'compat_os_path_expanduser',
'compat_os_path_realpath',
'compat_print',
'compat_realpath',
'compat_re_Match',
'compat_re_Pattern',
'compat_register_utf8',
'compat_setenv',
'compat_shlex_quote',
'compat_shlex_split',
@ -3040,20 +3341,18 @@ __all__ = [
'compat_struct_pack',
'compat_struct_unpack',
'compat_subprocess_get_DEVNULL',
'compat_subprocess_Popen',
'compat_tokenize_tokenize',
'compat_urllib_error',
'compat_urllib_parse',
'compat_urllib_parse_unquote',
'compat_urllib_parse_unquote_plus',
'compat_urllib_parse_unquote_to_bytes',
'compat_urllib_parse_urlencode',
'compat_urllib_parse_urlparse',
'compat_urllib_request',
'compat_urllib_request_DataHandler',
'compat_urllib_response',
'compat_urlparse',
'compat_urlretrieve',
'compat_xml_parse_error',
'compat_urllib_request_urlretrieve',
'compat_urllib_HTTPError',
'compat_xml_etree_ElementTree_Element',
'compat_xml_etree_ElementTree_ParseError',
'compat_xml_etree_register_namespace',
'compat_xpath',
'compat_zip',
'workaround_optparse_bug9161',

View File

@ -1,22 +1,31 @@
from __future__ import unicode_literals
from ..utils import (
determine_protocol,
)
def get_suitable_downloader(info_dict, params={}):
info_dict['protocol'] = determine_protocol(info_dict)
info_copy = info_dict.copy()
return _get_suitable_downloader(info_copy, params)
# Some of these require get_suitable_downloader
from .common import FileDownloader
from .dash import DashSegmentsFD
from .f4m import F4mFD
from .hls import HlsFD
from .http import HttpFD
from .rtmp import RtmpFD
from .dash import DashSegmentsFD
from .rtsp import RtspFD
from .ism import IsmFD
from .niconico import NiconicoDmcFD
from .external import (
get_external_downloader,
FFmpegFD,
)
from ..utils import (
determine_protocol,
)
PROTOCOL_MAP = {
'rtmp': RtmpFD,
'm3u8_native': HlsFD,
@ -26,13 +35,12 @@ PROTOCOL_MAP = {
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
'niconico_dmc': NiconicoDmcFD,
}
def get_suitable_downloader(info_dict, params={}):
def _get_suitable_downloader(info_dict, params={}):
"""Get the downloader class that can handle the info dict."""
protocol = determine_protocol(info_dict)
info_dict['protocol'] = protocol
# if (info_dict.get('start_time') or info_dict.get('end_time')) and not info_dict.get('requested_formats') and FFmpegFD.can_download(info_dict):
# return FFmpegFD
@ -42,7 +50,11 @@ def get_suitable_downloader(info_dict, params={}):
ed = get_external_downloader(external_downloader)
if ed.can_download(info_dict):
return ed
# Avoid using unwanted args since external_downloader was rejected
if params.get('external_downloader_args'):
params['external_downloader_args'] = None
protocol = info_dict['protocol']
if protocol.startswith('m3u8') and info_dict.get('is_live'):
return FFmpegFD

View File

@ -88,17 +88,21 @@ class FileDownloader(object):
return '---.-%'
return '%6s' % ('%3.1f%%' % percent)
@staticmethod
def calc_eta(start, now, total, current):
@classmethod
def calc_eta(cls, start_or_rate, now_or_remaining, *args):
if len(args) < 2:
rate, remaining = (start_or_rate, now_or_remaining)
if None in (rate, remaining):
return None
return int(float(remaining) / rate)
start, now = (start_or_rate, now_or_remaining)
total, current = args[:2]
if total is None:
return None
if now is None:
now = time.time()
dif = now - start
if current == 0 or dif < 0.001: # One millisecond
return None
rate = float(current) / dif
return int((float(total) - float(current)) / rate)
rate = cls.calc_speed(start, now, current)
return rate and int((float(total) - float(current)) / rate)
@staticmethod
def format_eta(eta):
@ -123,6 +127,12 @@ class FileDownloader(object):
def format_retries(retries):
return 'inf' if retries == float('inf') else '%.0f' % retries
@staticmethod
def filesize_or_none(unencoded_filename):
fn = encodeFilename(unencoded_filename)
if os.path.isfile(fn):
return os.path.getsize(fn)
@staticmethod
def best_block_size(elapsed_time, bytes):
new_min = max(bytes / 2.0, 1.0)
@ -329,6 +339,10 @@ class FileDownloader(object):
def download(self, filename, info_dict):
"""Download to a filename using the info from info_dict
Return True on success and False otherwise
This method filters the `Cookie` header from the info_dict to prevent leaks.
Downloaders have their own way of handling cookies.
See: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj
"""
nooverwrites_and_exists = (

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals
import itertools
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
@ -30,26 +32,28 @@ class DashSegmentsFD(FragmentFD):
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
frag_index = 0
for i, fragment in enumerate(fragments):
frag_index += 1
for frag_index, fragment in enumerate(fragments, 1):
if frag_index <= ctx['fragment_index']:
continue
success = False
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
fatal = frag_index == 1 or not skip_unavailable_fragments
fragment_url = fragment.get('url')
if not fragment_url:
assert fragment_base_url
fragment_url = urljoin(fragment_base_url, fragment['path'])
headers = info_dict.get('http_headers')
fragment_range = fragment.get('range')
if fragment_range:
headers = headers.copy() if headers else {}
headers['Range'] = 'bytes=%s' % (fragment_range,)
for count in itertools.count():
try:
fragment_url = fragment.get('url')
if not fragment_url:
assert fragment_base_url
fragment_url = urljoin(fragment_base_url, fragment['path'])
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict, headers)
if not success:
return False
self._append_fragment(ctx, frag_content)
break
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
@ -57,22 +61,21 @@ class DashSegmentsFD(FragmentFD):
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count < fragment_retries:
self.report_retry_fragment(err, frag_index, count + 1, fragment_retries)
continue
except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings
if not fatal:
self.report_skip_fragment(frag_index)
break
raise
# itself since it has its own retry settings
if fatal:
raise
break
if count > fragment_retries:
if not success:
if not fatal:
self.report_skip_fragment(frag_index)
continue
self.report_error('giving up after %s fragment retries' % fragment_retries)
self.report_error('giving up after %s fragment retries' % count)
return False
self._finish_frag_download(ctx)

View File

@ -1,17 +1,24 @@
from __future__ import unicode_literals
import os.path
import os
import re
import subprocess
import sys
import tempfile
import time
from .common import FileDownloader
from ..compat import (
compat_setenv,
compat_str,
compat_subprocess_Popen,
)
from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
try:
from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
except ImportError:
FFmpegPostProcessor = None
from ..utils import (
cli_option,
cli_valueless_option,
@ -22,6 +29,9 @@ from ..utils import (
handle_youtubedl_headers,
check_executable,
is_outdated_version,
process_communicate_or_kill,
T,
traverse_obj,
)
@ -29,6 +39,7 @@ class ExternalFD(FileDownloader):
def real_download(self, filename, info_dict):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
self._cookies_tempfile = None
try:
started = time.time()
@ -41,6 +52,13 @@ class ExternalFD(FileDownloader):
# should take place
retval = 0
self.to_screen('[%s] Interrupted by user' % self.get_basename())
finally:
if self._cookies_tempfile and os.path.isfile(self._cookies_tempfile):
try:
os.remove(self._cookies_tempfile)
except OSError:
self.report_warning(
'Unable to delete temporary cookies file "{0}"'.format(self._cookies_tempfile))
if retval == 0:
status = {
@ -96,6 +114,16 @@ class ExternalFD(FileDownloader):
def _configuration_args(self, default=[]):
return cli_configuration_args(self.params, 'external_downloader_args', default)
def _write_cookies(self):
if not self.ydl.cookiejar.filename:
tmp_cookies = tempfile.NamedTemporaryFile(suffix='.cookies', delete=False)
tmp_cookies.close()
self._cookies_tempfile = tmp_cookies.name
self.to_screen('[download] Writing temporary cookies file to "{0}"'.format(self._cookies_tempfile))
# real_download resets _cookies_tempfile; if it's None, save() will write to cookiejar.filename
self.ydl.cookiejar.save(self._cookies_tempfile, ignore_discard=True, ignore_expires=True)
return self.ydl.cookiejar.filename or self._cookies_tempfile
def _call_downloader(self, tmpfilename, info_dict):
""" Either overwrite this or implement _make_cmd """
cmd = [encodeArgument(a) for a in self._make_cmd(tmpfilename, info_dict)]
@ -104,18 +132,26 @@ class ExternalFD(FileDownloader):
p = subprocess.Popen(
cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate()
_, stderr = process_communicate_or_kill(p)
if p.returncode != 0:
self.to_stderr(stderr.decode('utf-8', 'replace'))
return p.returncode
@staticmethod
def _header_items(info_dict):
return traverse_obj(
info_dict, ('http_headers', T(dict.items), Ellipsis))
class CurlFD(ExternalFD):
AVAILABLE_OPT = '-V'
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '--location', '-o', tmpfilename]
for key, val in info_dict['http_headers'].items():
cmd = [self.exe, '--location', '-o', tmpfilename, '--compressed']
cookie_header = self.ydl.cookiejar.get_cookie_header(info_dict['url'])
if cookie_header:
cmd += ['--cookie', cookie_header]
for key, val in self._header_items(info_dict):
cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._bool_option('--continue-at', 'continuedl', '-', '0')
cmd += self._valueless_option('--silent', 'noprogress')
@ -141,7 +177,7 @@ class CurlFD(ExternalFD):
# curl writes the progress to stderr so don't capture it.
p = subprocess.Popen(cmd)
p.communicate()
process_communicate_or_kill(p)
return p.returncode
@ -150,8 +186,11 @@ class AxelFD(ExternalFD):
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-o', tmpfilename]
for key, val in info_dict['http_headers'].items():
for key, val in self._header_items(info_dict):
cmd += ['-H', '%s: %s' % (key, val)]
cookie_header = self.ydl.cookiejar.get_cookie_header(info_dict['url'])
if cookie_header:
cmd += ['-H', 'Cookie: {0}'.format(cookie_header), '--max-redirect=0']
cmd += self._configuration_args()
cmd += ['--', info_dict['url']]
return cmd
@ -161,8 +200,10 @@ class WgetFD(ExternalFD):
AVAILABLE_OPT = '--version'
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-O', tmpfilename, '-nv', '--no-cookies']
for key, val in info_dict['http_headers'].items():
cmd = [self.exe, '-O', tmpfilename, '-nv', '--compression=auto']
if self.ydl.cookiejar.get_cookie_header(info_dict['url']):
cmd += ['--load-cookies', self._write_cookies()]
for key, val in self._header_items(info_dict):
cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._option('--limit-rate', 'ratelimit')
retry = self._option('--tries', 'retries')
@ -171,7 +212,10 @@ class WgetFD(ExternalFD):
retry[1] = '0'
cmd += retry
cmd += self._option('--bind-address', 'source_address')
cmd += self._option('--proxy', 'proxy')
proxy = self.params.get('proxy')
if proxy:
for var in ('http_proxy', 'https_proxy'):
cmd += ['--execute', '%s=%s' % (var, proxy)]
cmd += self._valueless_option('--no-check-certificate', 'nocheckcertificate')
cmd += self._configuration_args()
cmd += ['--', info_dict['url']]
@ -181,24 +225,121 @@ class WgetFD(ExternalFD):
class Aria2cFD(ExternalFD):
AVAILABLE_OPT = '-v'
@staticmethod
def _aria2c_filename(fn):
return fn if os.path.isabs(fn) else os.path.join('.', fn)
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-c']
cmd += self._configuration_args([
'--min-split-size', '1M', '--max-connection-per-server', '4'])
dn = os.path.dirname(tmpfilename)
if dn:
cmd += ['--dir', dn]
cmd += ['--out', os.path.basename(tmpfilename)]
for key, val in info_dict['http_headers'].items():
cmd = [self.exe, '-c',
'--console-log-level=warn', '--summary-interval=0', '--download-result=hide',
'--http-accept-gzip=true', '--file-allocation=none', '-x16', '-j16', '-s16']
if 'fragments' in info_dict:
cmd += ['--allow-overwrite=true', '--allow-piece-length-change=true']
else:
cmd += ['--min-split-size', '1M']
if self.ydl.cookiejar.get_cookie_header(info_dict['url']):
cmd += ['--load-cookies={0}'.format(self._write_cookies())]
for key, val in self._header_items(info_dict):
cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._configuration_args(['--max-connection-per-server', '4'])
cmd += ['--out', os.path.basename(tmpfilename)]
cmd += self._option('--max-overall-download-limit', 'ratelimit')
cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += ['--', info_dict['url']]
cmd += self._bool_option('--show-console-readout', 'noprogress', 'false', 'true', '=')
cmd += self._configuration_args()
# aria2c strips out spaces from the beginning/end of filenames and paths.
# We work around this issue by adding a "./" to the beginning of the
# filename and relative path, and adding a "/" at the end of the path.
# See: https://github.com/yt-dlp/yt-dlp/issues/276
# https://github.com/ytdl-org/youtube-dl/issues/20312
# https://github.com/aria2/aria2/issues/1373
dn = os.path.dirname(tmpfilename)
if dn:
cmd += ['--dir', self._aria2c_filename(dn) + os.path.sep]
if 'fragments' not in info_dict:
cmd += ['--out', self._aria2c_filename(os.path.basename(tmpfilename))]
cmd += ['--auto-file-renaming=false']
if 'fragments' in info_dict:
cmd += ['--file-allocation=none', '--uri-selector=inorder']
url_list_file = '%s.frag.urls' % (tmpfilename, )
url_list = []
for frag_index, fragment in enumerate(info_dict['fragments']):
fragment_filename = '%s-Frag%d' % (os.path.basename(tmpfilename), frag_index)
url_list.append('%s\n\tout=%s' % (fragment['url'], self._aria2c_filename(fragment_filename)))
stream, _ = self.sanitize_open(url_list_file, 'wb')
stream.write('\n'.join(url_list).encode())
stream.close()
cmd += ['-i', self._aria2c_filename(url_list_file)]
else:
cmd += ['--', info_dict['url']]
return cmd
class Aria2pFD(ExternalFD):
''' Aria2pFD class
This class support to use aria2p as downloader.
(Aria2p, a command-line tool and Python library to interact with an aria2c daemon process
through JSON-RPC.)
It can help you to get download progress more easily.
To use aria2p as downloader, you need to install aria2c and aria2p, aria2p can download with pip.
Then run aria2c in the background and enable with the --enable-rpc option.
'''
try:
import aria2p
__avail = True
except ImportError:
__avail = False
@classmethod
def available(cls):
return cls.__avail
def _call_downloader(self, tmpfilename, info_dict):
aria2 = self.aria2p.API(
self.aria2p.Client(
host='http://localhost',
port=6800,
secret=''
)
)
options = {
'min-split-size': '1M',
'max-connection-per-server': 4,
'auto-file-renaming': 'false',
}
options['dir'] = os.path.dirname(tmpfilename) or os.path.abspath('.')
options['out'] = os.path.basename(tmpfilename)
if self.ydl.cookiejar.get_cookie_header(info_dict['url']):
options['load-cookies'] = self._write_cookies()
options['header'] = []
for key, val in self._header_items(info_dict):
options['header'].append('{0}: {1}'.format(key, val))
download = aria2.add_uris([info_dict['url']], options)
status = {
'status': 'downloading',
'tmpfilename': tmpfilename,
}
started = time.time()
while download.status in ['active', 'waiting']:
download = aria2.get_download(download.gid)
status.update({
'downloaded_bytes': download.completed_length,
'total_bytes': download.total_length,
'elapsed': time.time() - started,
'eta': download.eta.total_seconds(),
'speed': download.download_speed,
})
self._hook_progress(status)
time.sleep(.5)
return download.status != 'complete'
class HttpieFD(ExternalFD):
@classmethod
def available(cls):
@ -206,25 +347,34 @@ class HttpieFD(ExternalFD):
def _make_cmd(self, tmpfilename, info_dict):
cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']]
for key, val in info_dict['http_headers'].items():
for key, val in self._header_items(info_dict):
cmd += ['%s:%s' % (key, val)]
# httpie 3.1.0+ removes the Cookie header on redirect, so this should be safe for now. [1]
# If we ever need cookie handling for redirects, we can export the cookiejar into a session. [2]
# 1: https://github.com/httpie/httpie/security/advisories/GHSA-9w4w-cpc8-h2fq
# 2: https://httpie.io/docs/cli/sessions
cookie_header = self.ydl.cookiejar.get_cookie_header(info_dict['url'])
if cookie_header:
cmd += ['Cookie:%s' % cookie_header]
return cmd
class FFmpegFD(ExternalFD):
@classmethod
def supports(cls, info_dict):
return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms')
return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms', 'http_dash_segments')
@classmethod
def available(cls):
return FFmpegPostProcessor().available
# actual availability can only be confirmed for an instance
return bool(FFmpegPostProcessor)
def _call_downloader(self, tmpfilename, info_dict):
url = info_dict['url']
ffpp = FFmpegPostProcessor(downloader=self)
# `downloader` means the parent `YoutubeDL`
ffpp = FFmpegPostProcessor(downloader=self.ydl)
if not ffpp.available:
self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
self.report_error('ffmpeg required for download but no ffmpeg (nor avconv) executable could be found. Please install one.')
return False
ffpp.check_version()
@ -253,7 +403,15 @@ class FFmpegFD(ExternalFD):
# if end_time:
# args += ['-t', compat_str(end_time - start_time)]
if info_dict['http_headers'] and re.match(r'^https?://', url):
url = info_dict['url']
cookies = self.ydl.cookiejar.get_cookies_for_url(url)
if cookies:
args.extend(['-cookies', ''.join(
'{0}={1}; path={2}; domain={3};\r\n'.format(
cookie.name, cookie.value, cookie.path, cookie.domain)
for cookie in cookies)])
if info_dict.get('http_headers') and re.match(r'^https?://', url):
# Trailing \r\n after each HTTP header is important to prevent warning from ffmpeg/avconv:
# [http @ 00000000003d2fa0] No trailing CRLF found in HTTP header.
headers = handle_youtubedl_headers(info_dict['http_headers'])
@ -333,18 +491,25 @@ class FFmpegFD(ExternalFD):
self._debug_cmd(args)
proc = subprocess.Popen(args, stdin=subprocess.PIPE, env=env)
try:
retval = proc.wait()
except KeyboardInterrupt:
# subprocces.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
raise
# From [1], a PIPE opened in Popen() should be closed, unless
# .communicate() is called. Avoid leaking any PIPEs by using Popen
# as a context manager (newer Python 3.x and compat)
# Fixes "Resource Warning" in test/test_downloader_external.py
# [1] https://devpress.csdn.net/python/62fde12d7e66823466192e48.html
with compat_subprocess_Popen(args, stdin=subprocess.PIPE, env=env) as proc:
try:
retval = proc.wait()
except BaseException as e:
# subprocess.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if isinstance(e, KeyboardInterrupt) and sys.platform != 'win32':
process_communicate_or_kill(proc, b'q')
else:
proc.kill()
raise
return retval

View File

@ -71,7 +71,7 @@ class FragmentFD(FileDownloader):
@staticmethod
def __do_ytdl_file(ctx):
return not ctx['live'] and not ctx['tmpfilename'] == '-'
return ctx['live'] is not True and ctx['tmpfilename'] != '-'
def _read_ytdl_file(self, ctx):
assert 'ytdl_corrupt' not in ctx
@ -101,6 +101,13 @@ class FragmentFD(FileDownloader):
'url': frag_url,
'http_headers': headers or info_dict.get('http_headers'),
}
frag_resume_len = 0
if ctx['dl'].params.get('continuedl', True):
frag_resume_len = self.filesize_or_none(
self.temp_name(fragment_filename))
fragment_info_dict['frag_resume_len'] = frag_resume_len
ctx['frag_resume_len'] = frag_resume_len or 0
success = ctx['dl'].download(fragment_filename, fragment_info_dict)
if not success:
return False, None
@ -124,9 +131,7 @@ class FragmentFD(FileDownloader):
del ctx['fragment_filename_sanitized']
def _prepare_frag_download(self, ctx):
if 'live' not in ctx:
ctx['live'] = False
if not ctx['live']:
if not ctx.setdefault('live', False):
total_frags_str = '%d' % ctx['total_frags']
ad_frags = ctx.get('ad_frags', 0)
if ad_frags:
@ -136,10 +141,11 @@ class FragmentFD(FileDownloader):
self.to_screen(
'[%s] Total fragments: %s' % (self.FD_NAME, total_frags_str))
self.report_destination(ctx['filename'])
continuedl = self.params.get('continuedl', True)
dl = HttpQuietDownloader(
self.ydl,
{
'continuedl': True,
'continuedl': continuedl,
'quiet': True,
'noprogress': True,
'ratelimit': self.params.get('ratelimit'),
@ -150,12 +156,11 @@ class FragmentFD(FileDownloader):
)
tmpfilename = self.temp_name(ctx['filename'])
open_mode = 'wb'
resume_len = 0
# Establish possible resume length
if os.path.isfile(encodeFilename(tmpfilename)):
resume_len = self.filesize_or_none(tmpfilename) or 0
if resume_len > 0:
open_mode = 'ab'
resume_len = os.path.getsize(encodeFilename(tmpfilename))
# Should be initialized before ytdl file check
ctx.update({
@ -164,7 +169,8 @@ class FragmentFD(FileDownloader):
})
if self.__do_ytdl_file(ctx):
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
ytdl_file_exists = os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename'])))
if continuedl and ytdl_file_exists:
self._read_ytdl_file(ctx)
is_corrupt = ctx.get('ytdl_corrupt') is True
is_inconsistent = ctx['fragment_index'] > 0 and resume_len == 0
@ -178,7 +184,12 @@ class FragmentFD(FileDownloader):
if 'ytdl_corrupt' in ctx:
del ctx['ytdl_corrupt']
self._write_ytdl_file(ctx)
else:
if not continuedl:
if ytdl_file_exists:
self._read_ytdl_file(ctx)
ctx['fragment_index'] = resume_len = 0
self._write_ytdl_file(ctx)
assert ctx['fragment_index'] == 0
@ -209,6 +220,7 @@ class FragmentFD(FileDownloader):
start = time.time()
ctx.update({
'started': start,
'fragment_started': start,
# Amount of fragment's bytes downloaded by the time of the previous
# frag progress hook invocation
'prev_frag_downloaded_bytes': 0,
@ -218,6 +230,9 @@ class FragmentFD(FileDownloader):
if s['status'] not in ('downloading', 'finished'):
return
if not total_frags and ctx.get('fragment_count'):
state['fragment_count'] = ctx['fragment_count']
time_now = time.time()
state['elapsed'] = time_now - start
frag_total_bytes = s.get('total_bytes') or 0
@ -232,16 +247,17 @@ class FragmentFD(FileDownloader):
ctx['fragment_index'] = state['fragment_index']
state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes']
ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes']
ctx['speed'] = state['speed'] = self.calc_speed(
ctx['fragment_started'], time_now, frag_total_bytes)
ctx['fragment_started'] = time.time()
ctx['prev_frag_downloaded_bytes'] = 0
else:
frag_downloaded_bytes = s['downloaded_bytes']
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
ctx['speed'] = state['speed'] = self.calc_speed(
ctx['fragment_started'], time_now, frag_downloaded_bytes - ctx['frag_resume_len'])
if not ctx['live']:
state['eta'] = self.calc_eta(
start, time_now, estimated_size - resume_len,
state['downloaded_bytes'] - resume_len)
state['speed'] = s.get('speed') or ctx.get('speed')
ctx['speed'] = state['speed']
state['eta'] = self.calc_eta(state['speed'], estimated_size - state['downloaded_bytes'])
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
self._hook_progress(state)
@ -268,7 +284,7 @@ class FragmentFD(FileDownloader):
os.utime(ctx['filename'], (time.time(), filetime))
except Exception:
pass
downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename']))
downloaded_bytes = self.filesize_or_none(ctx['filename']) or 0
self._hook_progress({
'downloaded_bytes': downloaded_bytes,

View File

@ -58,9 +58,9 @@ class HttpFD(FileDownloader):
if self.params.get('continuedl', True):
# Establish possible resume length
if os.path.isfile(encodeFilename(ctx.tmpfilename)):
ctx.resume_len = os.path.getsize(
encodeFilename(ctx.tmpfilename))
ctx.resume_len = info_dict.get('frag_resume_len')
if ctx.resume_len is None:
ctx.resume_len = self.filesize_or_none(ctx.tmpfilename) or 0
ctx.is_resume = ctx.resume_len > 0
@ -115,9 +115,9 @@ class HttpFD(FileDownloader):
raise RetryDownload(err)
raise err
# When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers
# to match the value of requested Range HTTP header. This is due to webservers
# that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see
# set in response despite requested Range (see
# https://github.com/ytdl-org/youtube-dl/issues/6057#issuecomment-126129799)
if has_range:
content_range = ctx.data.headers.get('Content-Range')
@ -141,7 +141,8 @@ class HttpFD(FileDownloader):
# Content-Range is either not present or invalid. Assuming remote webserver is
# trying to send the whole file, resume is not possible, so wiping the local file
# and performing entire redownload
self.report_unable_to_resume()
if range_start > 0:
self.report_unable_to_resume()
ctx.resume_len = 0
ctx.open_mode = 'wb'
ctx.data_len = int_or_none(ctx.data.info().get('Content-length', None))
@ -293,10 +294,7 @@ class HttpFD(FileDownloader):
# Progress message
speed = self.calc_speed(start, now, byte_counter - ctx.resume_len)
if ctx.data_len is None:
eta = None
else:
eta = self.calc_eta(start, time.time(), ctx.data_len - ctx.resume_len, byte_counter - ctx.resume_len)
eta = self.calc_eta(speed, ctx.data_len and (ctx.data_len - byte_counter))
self._hook_progress({
'status': 'downloading',

View File

@ -0,0 +1,66 @@
# coding: utf-8
from __future__ import unicode_literals
try:
import threading
except ImportError:
threading = None
from .common import FileDownloader
from ..downloader import get_suitable_downloader
from ..extractor.niconico import NiconicoIE
from ..utils import sanitized_Request
class NiconicoDmcFD(FileDownloader):
""" Downloading niconico douga from DMC with heartbeat """
FD_NAME = 'niconico_dmc'
def real_download(self, filename, info_dict):
self.to_screen('[%s] Downloading from DMC' % self.FD_NAME)
ie = NiconicoIE(self.ydl)
info_dict, heartbeat_info_dict = ie._get_heartbeat_info(info_dict)
fd = get_suitable_downloader(info_dict, params=self.params)(self.ydl, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if not threading:
self.to_screen('[%s] Threading for Heartbeat not available' % self.FD_NAME)
return fd.real_download(filename, info_dict)
success = download_complete = False
timer = [None]
heartbeat_lock = threading.Lock()
heartbeat_url = heartbeat_info_dict['url']
heartbeat_data = heartbeat_info_dict['data'].encode()
heartbeat_interval = heartbeat_info_dict.get('interval', 30)
request = sanitized_Request(heartbeat_url, heartbeat_data)
def heartbeat():
try:
self.ydl.urlopen(request).read()
except Exception:
self.to_screen('[%s] Heartbeat failed' % self.FD_NAME)
with heartbeat_lock:
if not download_complete:
timer[0] = threading.Timer(heartbeat_interval, heartbeat)
timer[0].start()
heartbeat_info_dict['ping']()
self.to_screen('[%s] Heartbeat with %d second interval ...' % (self.FD_NAME, heartbeat_interval))
try:
heartbeat()
if type(fd).__name__ == 'HlsFD':
info_dict.update(ie._extract_m3u8_formats(info_dict['url'], info_dict['id'])[0])
success = fd.real_download(filename, info_dict)
finally:
if heartbeat_lock:
with heartbeat_lock:
timer[0].cancel()
download_complete = True
return success

View File

@ -89,11 +89,13 @@ class RtmpFD(FileDownloader):
self.to_screen('')
cursor_in_new_line = True
self.to_screen('[rtmpdump] ' + line)
finally:
if not cursor_in_new_line:
self.to_screen('')
return proc.wait()
except BaseException: # Including KeyboardInterrupt
proc.kill()
proc.wait()
if not cursor_in_new_line:
self.to_screen('')
return proc.returncode
raise
url = info_dict['url']
player_url = info_dict.get('player_url')

View File

@ -31,30 +31,34 @@ from ..utils import (
class ADNIE(InfoExtractor):
IE_DESC = 'Anime Digital Network'
_VALID_URL = r'https?://(?:www\.)?animedigitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'md5': '0319c99885ff5547565cacb4f3f9348d',
IE_DESC = 'Animation Digital Network'
_VALID_URL = r'https?://(?:www\.)?(?:animation|anime)digitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': 'https://animationdigitalnetwork.fr/video/fruits-basket/9841-episode-1-a-ce-soir',
'md5': '1c9ef066ceb302c86f80c2b371615261',
'info_dict': {
'id': '7778',
'id': '9841',
'ext': 'mp4',
'title': 'Blue Exorcist - Kyôto Saga - Episode 1',
'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5',
'series': 'Blue Exorcist - Kyôto Saga',
'duration': 1467,
'release_date': '20170106',
'title': 'Fruits Basket - Episode 1',
'description': 'md5:14be2f72c3c96809b0ca424b0097d336',
'series': 'Fruits Basket',
'duration': 1437,
'release_date': '20190405',
'comment_count': int,
'average_rating': float,
'season_number': 2,
'episode': 'Début des hostilités',
'season_number': 1,
'episode': 'À ce soir !',
'episode_number': 1,
}
}
},
'skip': 'Only available in region (FR, ...)',
}, {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'only_matching': True,
}]
_NETRC_MACHINE = 'animedigitalnetwork'
_BASE_URL = 'http://animedigitalnetwork.fr'
_API_BASE_URL = 'https://gw.api.animedigitalnetwork.fr/'
_NETRC_MACHINE = 'animationdigitalnetwork'
_BASE = 'animationdigitalnetwork.fr'
_API_BASE_URL = 'https://gw.api.' + _BASE + '/'
_PLAYER_BASE_URL = _API_BASE_URL + 'player/'
_HEADERS = {}
_LOGIN_ERR_MESSAGE = 'Unable to log in'
@ -82,14 +86,14 @@ class ADNIE(InfoExtractor):
if subtitle_location:
enc_subtitles = self._download_webpage(
subtitle_location, video_id, 'Downloading subtitles data',
fatal=False, headers={'Origin': 'https://animedigitalnetwork.fr'})
fatal=False, headers={'Origin': 'https://' + self._BASE})
if not enc_subtitles:
return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
# http://animationdigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(binascii.unhexlify(self._K + 'ab9f52f5baae7c72')),
bytes_to_intlist(binascii.unhexlify(self._K + '7fac1178830cfe0c')),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
@ -138,9 +142,9 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
if not username:
return
try:
url = self._API_BASE_URL + 'authentication/login'
access_token = (self._download_json(
self._API_BASE_URL + 'authentication/login', None,
'Logging in', self._LOGIN_ERR_MESSAGE, fatal=False,
url, None, 'Logging in', self._LOGIN_ERR_MESSAGE, fatal=False,
data=urlencode_postdata({
'password': password,
'rememberMe': False,
@ -153,7 +157,8 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
message = None
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(
e.cause.read().decode(), None, fatal=False) or {}
self._webpage_read_content(e.cause, url, username),
username, fatal=False) or {}
message = resp.get('message') or resp.get('code')
self.report_warning(message or self._LOGIN_ERR_MESSAGE)
@ -211,7 +216,9 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
# This usually goes away with a different random pkcs1pad, so retry
continue
error = self._parse_json(e.cause.read(), video_id)
error = self._parse_json(
self._webpage_read_content(e.cause, links_url, video_id),
video_id, fatal=False) or {}
message = error.get('message')
if e.cause.code == 403 and error.get('code') == 'player-bad-geolocation-country':
self.raise_geo_restricted(msg=message)

View File

@ -8,6 +8,8 @@ from ..utils import (
ExtractorError,
GeoRestrictedError,
int_or_none,
remove_start,
traverse_obj,
update_url_query,
urlencode_postdata,
)
@ -20,8 +22,8 @@ class AENetworksBaseIE(ThePlatformIE):
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/'''
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
_THEPLATFORM_KEY = '43jXaGRQud'
_THEPLATFORM_SECRET = 'S10BPXHMlb'
_DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'),
'aetv.com': ('AETV', 'aetv'),
@ -33,14 +35,17 @@ class AENetworksBaseIE(ThePlatformIE):
}
def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = {'mbr': 'true'}
query = {
'mbr': 'true',
'formats': 'M3U+none,MPEG-DASH+none,MPEG4,MP3',
}
if auth:
query['auth'] = auth
TP_SMIL_QUERY = [{
'assetTypes': 'high_video_ak',
'switch': 'hls_high_ak'
'switch': 'hls_high_ak',
}, {
'assetTypes': 'high_video_s3'
'assetTypes': 'high_video_s3',
}, {
'assetTypes': 'high_video_s3',
'switch': 'hls_high_fastly',
@ -75,7 +80,14 @@ class AENetworksBaseIE(ThePlatformIE):
requestor_id, brand = self._DOMAIN_MAP[domain]
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
filter_value, query={'filter[%s]' % filter_key: filter_value})['results'][0]
filter_value, query={'filter[%s]' % filter_key: filter_value})
result = traverse_obj(
result, ('results',
lambda k, v: k == 0 and v[filter_key] == filter_value),
get_all=False)
if not result:
raise ExtractorError('Show not found in A&E feed (too new?)', expected=True,
video_id=remove_start(filter_value, '/'))
title = result['title']
video_id = result['id']
media_url = result['publicUrl']
@ -126,7 +138,7 @@ class AENetworksIE(AENetworksBaseIE):
'skip_download': True,
},
'add_ie': ['ThePlatform'],
'skip': 'This video is only available for users of participating TV providers.',
'skip': 'Geo-restricted - This content is not available in your location.'
}, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'info_dict': {
@ -143,6 +155,7 @@ class AENetworksIE(AENetworksBaseIE):
'skip_download': True,
},
'add_ie': ['ThePlatform'],
'skip': 'This video is only available for users of participating TV providers.',
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True

View File

@ -18,7 +18,7 @@ class AliExpressLiveIE(InfoExtractor):
'id': '2800002704436634',
'ext': 'mp4',
'title': 'CASIMA7.22',
'thumbnail': r're:http://.*\.jpg',
'thumbnail': r're:https?://.*\.jpg',
'uploader': 'CASIMA Official Store',
'timestamp': 1500717600,
'upload_date': '20170722',

View File

@ -0,0 +1,89 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
dict_get,
get_element_by_class,
int_or_none,
unified_strdate,
url_or_none,
)
class Alsace20TVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?alsace20\.tv/(?:[\w-]+/)+[\w-]+-(?P<id>[\w]+)'
_TESTS = [{
'url': 'https://www.alsace20.tv/VOD/Actu/JT/Votre-JT-jeudi-3-fevrier-lyNHCXpYJh.html',
# 'md5': 'd91851bf9af73c0ad9b2cdf76c127fbb',
'info_dict': {
'id': 'lyNHCXpYJh',
'ext': 'mp4',
'description': 'md5:fc0bc4a0692d3d2dba4524053de4c7b7',
'title': 'Votre JT du jeudi 3 février',
'upload_date': '20220203',
'thumbnail': r're:https?://.+\.jpg',
'duration': 1073,
'view_count': int,
},
'params': {
'format': 'bestvideo',
},
}]
def _extract_video(self, video_id, url=None):
info = self._download_json(
'https://www.alsace20.tv/visionneuse/visio_v9_js.php?key=%s&habillage=0&mode=html' % (video_id, ),
video_id) or {}
title = info['titre']
formats = []
for res, fmt_url in (info.get('files') or {}).items():
formats.extend(
self._extract_smil_formats(fmt_url, video_id, fatal=False)
if '/smil:_' in fmt_url
else self._extract_mpd_formats(fmt_url, video_id, mpd_id=res, fatal=False))
self._sort_formats(formats)
webpage = (url and self._download_webpage(url, video_id, fatal=False)) or ''
thumbnail = url_or_none(dict_get(info, ('image', 'preview', )) or self._og_search_thumbnail(webpage))
upload_date = self._search_regex(r'/(\d{6})_', thumbnail, 'upload_date', default=None)
upload_date = unified_strdate('20%s-%s-%s' % (upload_date[:2], upload_date[2:4], upload_date[4:])) if upload_date else None
return {
'id': video_id,
'title': title,
'formats': formats,
'description': clean_html(get_element_by_class('wysiwyg', webpage)),
'upload_date': upload_date,
'thumbnail': thumbnail,
'duration': int_or_none(self._og_search_property('video:duration', webpage) if webpage else None),
'view_count': int_or_none(info.get('nb_vues')),
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_video(video_id, url)
class Alsace20TVEmbedIE(Alsace20TVIE):
_VALID_URL = r'https?://(?:www\.)?alsace20\.tv/emb/(?P<id>[\w]+)'
_TESTS = [{
'url': 'https://www.alsace20.tv/emb/lyNHCXpYJh',
# 'md5': 'd91851bf9af73c0ad9b2cdf76c127fbb',
'info_dict': {
'id': 'lyNHCXpYJh',
'ext': 'mp4',
'title': 'Votre JT du jeudi 3 février',
'upload_date': '20220203',
'thumbnail': r're:https?://.+\.jpg',
'view_count': int,
},
'params': {
'format': 'bestvideo',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_video(video_id)

View File

@ -15,7 +15,7 @@ from ..utils import (
class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:americastestkitchen|cooks(?:country|illustrated))\.com/(?P<resource_type>episode|videos)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?(?:americastestkitchen|cooks(?:country|illustrated))\.com/(?:cooks(?:country|illustrated)/)?(?P<resource_type>episode|videos)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f',
@ -23,15 +23,20 @@ class AmericasTestKitchenIE(InfoExtractor):
'id': '5b400b9ee338f922cb06450c',
'title': 'Japanese Suppers',
'ext': 'mp4',
'display_id': 'weeknight-japanese-suppers',
'description': 'md5:64e606bfee910627efc4b5f050de92b3',
'thumbnail': r're:^https?://',
'timestamp': 1523318400,
'upload_date': '20180410',
'release_date': '20180410',
'timestamp': 1523304000,
'upload_date': '20180409',
'release_date': '20180409',
'series': "America's Test Kitchen",
'season': 'Season 18',
'season_number': 18,
'episode': 'Japanese Suppers',
'episode_number': 15,
'duration': 1376,
'thumbnail': r're:^https?://',
'average_rating': 0,
'view_count': int,
},
'params': {
'skip_download': True,
@ -44,15 +49,20 @@ class AmericasTestKitchenIE(InfoExtractor):
'id': '5fbe8c61bda2010001c6763b',
'title': 'Simple Chicken Dinner',
'ext': 'mp4',
'display_id': 'atktv_2103_simple-chicken-dinner_full-episode_web-mp4',
'description': 'md5:eb68737cc2fd4c26ca7db30139d109e7',
'thumbnail': r're:^https?://',
'timestamp': 1610755200,
'upload_date': '20210116',
'release_date': '20210116',
'timestamp': 1610737200,
'upload_date': '20210115',
'release_date': '20210115',
'series': "America's Test Kitchen",
'season': 'Season 21',
'season_number': 21,
'episode': 'Simple Chicken Dinner',
'episode_number': 3,
'duration': 1397,
'thumbnail': r're:^https?://',
'view_count': int,
'average_rating': 0,
},
'params': {
'skip_download': True,
@ -60,6 +70,12 @@ class AmericasTestKitchenIE(InfoExtractor):
}, {
'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon',
'only_matching': True,
}, {
'url': 'https://www.americastestkitchen.com/cookscountry/episode/564-when-only-chocolate-will-do',
'only_matching': True,
}, {
'url': 'https://www.americastestkitchen.com/cooksillustrated/videos/4478-beef-wellington',
'only_matching': True,
}, {
'url': 'https://www.cookscountry.com/episode/564-when-only-chocolate-will-do',
'only_matching': True,
@ -94,7 +110,7 @@ class AmericasTestKitchenIE(InfoExtractor):
class AmericasTestKitchenSeasonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<show>americastestkitchen|cookscountry)\.com/episodes/browse/season_(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?(?P<show>americastestkitchen|(?P<cooks>cooks(?:country|illustrated)))\.com(?:(?:/(?P<show2>cooks(?:country|illustrated)))?(?:/?$|(?<!ated)(?<!ated\.com)/episodes/browse/season_(?P<season>\d+)))'
_TESTS = [{
# ATK Season
'url': 'https://www.americastestkitchen.com/episodes/browse/season_1',
@ -105,48 +121,93 @@ class AmericasTestKitchenSeasonIE(InfoExtractor):
'playlist_count': 13,
}, {
# Cooks Country Season
'url': 'https://www.cookscountry.com/episodes/browse/season_12',
'url': 'https://www.americastestkitchen.com/cookscountry/episodes/browse/season_12',
'info_dict': {
'id': 'season_12',
'title': 'Season 12',
},
'playlist_count': 13,
}, {
# America's Test Kitchen Series
'url': 'https://www.americastestkitchen.com/',
'info_dict': {
'id': 'americastestkitchen',
'title': 'America\'s Test Kitchen',
},
'playlist_count': 558,
}, {
# Cooks Country Series
'url': 'https://www.americastestkitchen.com/cookscountry',
'info_dict': {
'id': 'cookscountry',
'title': 'Cook\'s Country',
},
'playlist_count': 199,
}, {
'url': 'https://www.americastestkitchen.com/cookscountry/',
'only_matching': True,
}, {
'url': 'https://www.cookscountry.com/episodes/browse/season_12',
'only_matching': True,
}, {
'url': 'https://www.cookscountry.com',
'only_matching': True,
}, {
'url': 'https://www.americastestkitchen.com/cooksillustrated/',
'only_matching': True,
}, {
'url': 'https://www.cooksillustrated.com',
'only_matching': True,
}]
def _real_extract(self, url):
show_name, season_number = re.match(self._VALID_URL, url).groups()
season_number = int(season_number)
match = re.match(self._VALID_URL, url).groupdict()
show = match.get('show2')
show_path = ('/' + show) if show else ''
show = show or match['show']
season_number = int_or_none(match.get('season'))
slug = 'atk' if show_name == 'americastestkitchen' else 'cco'
slug, title = {
'americastestkitchen': ('atk', 'America\'s Test Kitchen'),
'cookscountry': ('cco', 'Cook\'s Country'),
'cooksillustrated': ('cio', 'Cook\'s Illustrated'),
}[show]
season = 'Season %d' % season_number
facet_filters = [
'search_document_klass:episode',
'search_show_slug:' + slug,
]
if season_number:
playlist_id = 'season_%d' % season_number
playlist_title = 'Season %d' % season_number
facet_filters.append('search_season_list:' + playlist_title)
else:
playlist_id = show
playlist_title = title
season_search = self._download_json(
'https://y1fnzxui30-dsn.algolia.net/1/indexes/everest_search_%s_season_desc_production' % slug,
season, headers={
'Origin': 'https://www.%s.com' % show_name,
playlist_id, headers={
'Origin': 'https://www.americastestkitchen.com',
'X-Algolia-API-Key': '8d504d0099ed27c1b73708d22871d805',
'X-Algolia-Application-Id': 'Y1FNZXUI30',
}, query={
'facetFilters': json.dumps([
'search_season_list:' + season,
'search_document_klass:episode',
'search_show_slug:' + slug,
]),
'attributesToRetrieve': 'description,search_%s_episode_number,search_document_date,search_url,title' % slug,
'facetFilters': json.dumps(facet_filters),
'attributesToRetrieve': 'description,search_%s_episode_number,search_document_date,search_url,title,search_atk_episode_season' % slug,
'attributesToHighlight': '',
'hitsPerPage': 1000,
})
def entries():
for episode in (season_search.get('hits') or []):
search_url = episode.get('search_url')
search_url = episode.get('search_url') # always formatted like '/episode/123-title-of-episode'
if not search_url:
continue
yield {
'_type': 'url',
'url': 'https://www.%s.com%s' % (show_name, search_url),
'id': try_get(episode, lambda e: e['objectID'].split('_')[-1]),
'url': 'https://www.americastestkitchen.com%s%s' % (show_path, search_url),
'id': try_get(episode, lambda e: e['objectID'].rsplit('_', 1)[-1]),
'title': episode.get('title'),
'description': episode.get('description'),
'timestamp': unified_timestamp(episode.get('search_document_date')),
@ -156,4 +217,4 @@ class AmericasTestKitchenSeasonIE(InfoExtractor):
}
return self.playlist_result(
entries(), 'season_%d' % season_number, season)
entries(), playlist_id, playlist_title)

View File

@ -9,10 +9,10 @@ from ..utils import (
class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)'
_TEST = {
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/(?:id)?sa\.(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': 'e7c38568a01ea45402570e6029206723',
'md5': 'c1d41f72c8bcaf222e089434619316e4',
'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v',
@ -22,7 +22,10 @@ class AppleConnectIE(InfoExtractor):
'upload_date': '20150710',
'timestamp': 1436545535,
},
}
}, {
'url': 'https://itunes.apple.com/us/post/sa.0fe0229f-2457-11e5-9f40-1bb645f2d5d9',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -36,7 +39,7 @@ class AppleConnectIE(InfoExtractor):
video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count', default=None))
return {
'id': video_id,

View File

@ -3,8 +3,11 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
clean_podcast_url,
get_element_by_class,
int_or_none,
parse_codecs,
parse_iso8601,
try_get,
)
@ -14,16 +17,17 @@ class ApplePodcastsIE(InfoExtractor):
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
_TESTS = [{
'url': 'https://podcasts.apple.com/us/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'md5': 'df02e6acb11c10e844946a39e7222b08',
'md5': '41dc31cd650143e530d9423b6b5a344f',
'info_dict': {
'id': '1000482637777',
'ext': 'mp3',
'title': '207 - Whitney Webb Returns',
'description': 'md5:13a73bade02d2e43737751e3987e1399',
'description': 'md5:75ef4316031df7b41ced4e7b987f79c6',
'upload_date': '20200705',
'timestamp': 1593921600,
'duration': 6425,
'timestamp': 1593932400,
'duration': 6454,
'series': 'The Tim Dillon Show',
'thumbnail': 're:.+[.](png|jpe?g|webp)',
}
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
@ -39,19 +43,40 @@ class ApplePodcastsIE(InfoExtractor):
def _real_extract(self, url):
episode_id = self._match_id(url)
webpage = self._download_webpage(url, episode_id)
ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id)
ember_data = ember_data.get(episode_id) or ember_data
episode = ember_data['data']['attributes']
episode_data = {}
ember_data = {}
# new page type 2021-11
amp_data = self._parse_json(self._search_regex(
r'(?s)id="shoebox-media-api-cache-amp-podcasts"[^>]*>\s*({.+?})\s*<',
webpage, 'AMP data', default='{}'), episode_id, fatal=False) or {}
amp_data = try_get(amp_data,
lambda a: self._parse_json(
next(a[x] for x in iter(a) if episode_id in x),
episode_id),
dict) or {}
amp_data = amp_data.get('d') or []
episode_data = try_get(
amp_data,
lambda a: next(x for x in a
if x['type'] == 'podcast-episodes' and x['id'] == episode_id),
dict)
if not episode_data:
# try pre 2021-11 page type: TODO: consider deleting if no longer used
ember_data = self._parse_json(self._search_regex(
r'(?s)id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id) or {}
ember_data = ember_data.get(episode_id) or ember_data
episode_data = try_get(ember_data, lambda x: x['data'], dict)
episode = episode_data['attributes']
description = episode.get('description') or {}
series = None
for inc in (ember_data.get('included') or []):
for inc in (amp_data or ember_data.get('included') or []):
if inc.get('type') == 'media/podcast':
series = try_get(inc, lambda x: x['attributes']['name'])
series = series or clean_html(get_element_by_class('podcast-header__identity', webpage))
return {
info = [{
'id': episode_id,
'title': episode['name'],
'url': clean_podcast_url(episode['assetUrl']),
@ -59,4 +84,10 @@ class ApplePodcastsIE(InfoExtractor):
'timestamp': parse_iso8601(episode.get('releaseDateTime')),
'duration': int_or_none(episode.get('durationInMilliseconds'), 1000),
'series': series,
}
'thumbnail': self._og_search_thumbnail(webpage),
}]
self._sort_formats(info)
info = info[0]
codecs = parse_codecs(info.get('ext', 'mp3'))
info.update(codecs)
return info

View File

@ -332,9 +332,24 @@ class ARDIE(InfoExtractor):
formats.append(f)
self._sort_formats(formats)
_SUB_FORMATS = (
('./dataTimedText', 'ttml'),
('./dataTimedTextNoOffset', 'ttml'),
('./dataTimedTextVtt', 'vtt'),
)
subtitles = {}
for subsel, subext in _SUB_FORMATS:
for node in video_node.findall(subsel):
subtitles.setdefault('de', []).append({
'url': node.attrib['url'],
'ext': subext,
})
return {
'id': xpath_text(video_node, './videoId', default=display_id),
'formats': formats,
'subtitles': subtitles,
'display_id': display_id,
'title': video_node.find('./title').text,
'duration': parse_duration(video_node.find('./duration').text),

View File

@ -12,6 +12,7 @@ from ..utils import (
ExtractorError,
int_or_none,
qualities,
strip_or_none,
try_get,
unified_strdate,
url_or_none,
@ -252,3 +253,49 @@ class ArteTVPlaylistIE(ArteTVBaseIE):
title = collection.get('title')
description = collection.get('shortDescription') or collection.get('teaserText')
return self.playlist_result(entries, playlist_id, title, description)
class ArteTVCategoryIE(ArteTVBaseIE):
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>%s)/videos/(?P<id>[\w-]+(?:/[\w-]+)*)/?\s*$' % ArteTVBaseIE._ARTE_LANGUAGES
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/politics-and-society/',
'info_dict': {
'id': 'politics-and-society',
'title': 'Politics and society',
'description': 'Investigative documentary series, geopolitical analysis, and international commentary',
},
'playlist_mincount': 13,
},
]
@classmethod
def suitable(cls, url):
return (
not any(ie.suitable(url) for ie in (ArteTVIE, ArteTVPlaylistIE, ))
and super(ArteTVCategoryIE, cls).suitable(url))
def _real_extract(self, url):
lang, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, playlist_id)
items = []
for video in re.finditer(
r'<a\b[^>]*?href\s*=\s*(?P<q>"|\'|\b)(?P<url>https?://www\.arte\.tv/%s/videos/[\w/-]+)(?P=q)' % lang,
webpage):
video = video.group('url')
if video == url:
continue
if any(ie.suitable(video) for ie in (ArteTVIE, ArteTVPlaylistIE, )):
items.append(video)
if items:
title = (self._og_search_title(webpage, default=None)
or self._html_search_regex(r'<title\b[^>]*>([^<]+)</title>', default=None))
title = strip_or_none(title.rsplit('|', 1)[0]) or self._generic_title(url)
result = self.playlist_from_matches(items, playlist_id=playlist_id, playlist_title=title)
if result:
description = self._og_search_description(webpage, default=None)
if description:
result['description'] = description
return result

View File

@ -14,7 +14,7 @@ from ..utils import (
class AudiomackIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/song/(?P<id>[\w/-]+)'
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/(?:song/|(?=.+/song/))(?P<id>[\w/-]+)'
IE_NAME = 'audiomack'
_TESTS = [
# hosted on audiomack
@ -29,25 +29,27 @@ class AudiomackIE(InfoExtractor):
}
},
# audiomack wrapper around soundcloud song
# Needs new test URL.
{
'add_ie': ['Soundcloud'],
'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
'info_dict': {
'id': '258901379',
'ext': 'mp3',
'description': 'mamba day freestyle for the legend Kobe Bryant ',
'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
'uploader': 'ILOVEMAKONNEN',
'upload_date': '20160414',
}
'only_matching': True,
# 'info_dict': {
# 'id': '258901379',
# 'ext': 'mp3',
# 'description': 'mamba day freestyle for the legend Kobe Bryant ',
# 'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
# 'uploader': 'ILOVEMAKONNEN',
# 'upload_date': '20160414',
# }
},
]
def _real_extract(self, url):
# URLs end with [uploader name]/[uploader title]
# URLs end with [uploader name]/song/[uploader title]
# this title is whatever the user types in, and is rarely
# the proper song title. Real metadata is in the api response
album_url_tag = self._match_id(url)
album_url_tag = self._match_id(url).replace('/song/', '/')
# Request the extended version of the api for extra fields like artist and title
api_response = self._download_json(
@ -73,13 +75,13 @@ class AudiomackIE(InfoExtractor):
class AudiomackAlbumIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/album/(?P<id>[\w/-]+)'
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/(?:album/|(?=.+/album/))(?P<id>[\w/-]+)'
IE_NAME = 'audiomack:album'
_TESTS = [
# Standard album playlist
{
'url': 'http://www.audiomack.com/album/flytunezcom/tha-tour-part-2-mixtape',
'playlist_count': 15,
'playlist_count': 11,
'info_dict':
{
'id': '812251',
@ -95,24 +97,24 @@ class AudiomackAlbumIE(InfoExtractor):
},
'playlist': [{
'info_dict': {
'title': 'PPP (Pistol P Project) - 9. Heaven or Hell (CHIMACA) ft Zuse (prod by DJ FU)',
'id': '837577',
'title': 'PPP (Pistol P Project) - 10. 4 Minutes Of Hell Part 4 (prod by DY OF 808 MAFIA)',
'id': '837580',
'ext': 'mp3',
'uploader': 'Lil Herb a.k.a. G Herbo',
}
}],
'params': {
'playliststart': 9,
'playlistend': 9,
'playliststart': 2,
'playlistend': 2,
}
}
]
def _real_extract(self, url):
# URLs end with [uploader name]/[uploader title]
# URLs end with [uploader name]/album/[uploader title]
# this title is whatever the user types in, and is rarely
# the proper song title. Real metadata is in the api response
album_url_tag = self._match_id(url)
album_url_tag = self._match_id(url).replace('/album/', '/')
result = {'_type': 'playlist', 'entries': []}
# There is no one endpoint for album metadata - instead it is included/repeated in each song's metadata
# Therefore we don't know how many songs the album has and must infi-loop until failure
@ -134,7 +136,7 @@ class AudiomackAlbumIE(InfoExtractor):
# Pull out the album metadata and add to result (if it exists)
for resultkey, apikey in [('id', 'album_id'), ('title', 'album_title')]:
if apikey in api_response and resultkey not in result:
result[resultkey] = api_response[apikey]
result[resultkey] = compat_str(api_response[apikey])
song_id = url_basename(api_response['url']).rpartition('.')[0]
result['entries'].append({
'id': compat_str(api_response.get('id', song_id)),

View File

@ -12,6 +12,7 @@ from ..compat import (
compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_error,
compat_urllib_parse_urlparse,
compat_urlparse,
)
@ -39,7 +40,7 @@ from ..utils import (
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_ID_REGEX = r'(?:[pbm][\da-z]{7}|w[\da-z]{7,14})'
_ID_REGEX = r'(?:[pbml][\da-z]{7}|w[\da-z]{7,14})'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?bbc\.co\.uk/
@ -395,9 +396,17 @@ class BBCCoUkIE(InfoExtractor):
formats.extend(self._extract_mpd_formats(
href, programme_id, mpd_id=format_id, fatal=False))
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
# TODO: let expected_status be passed into _extract_xxx_formats() instead
try:
fmts = self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
except ExtractorError as e:
if not (isinstance(e.exc_info[1], compat_urllib_error.HTTPError)
and e.exc_info[1].code in (403, 404)):
raise
fmts = []
formats.extend(fmts)
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
@ -775,21 +784,33 @@ class BBCIE(BBCCoUkIE):
'timestamp': 1437785037,
'upload_date': '20150725',
},
}, {
# video with window.__INITIAL_DATA__ and value as JSON string
'url': 'https://www.bbc.com/news/av/world-europe-59468682',
'info_dict': {
'id': 'p0b71qth',
'ext': 'mp4',
'title': 'Why France is making this woman a national hero',
'description': 'md5:7affdfab80e9c3a1f976230a1ff4d5e4',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1638230731,
'upload_date': '20211130',
},
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
'only_matching': True,
}, {
# bbcthreeConfig
'url': 'https://www.bbc.co.uk/bbcthree/clip/73d0bbd0-abc3-4cea-b3c0-cdae21905eb1',
'info_dict': {
'id': 'p06556y7',
'ext': 'mp4',
'title': 'Transfers: Cristiano Ronaldo to Man Utd, Arsenal to spend?',
'description': 'md5:4b7dfd063d5a789a1512e99662be3ddd',
'title': 'Things Not To Say to people that live on council estates',
'description': "From being labelled a 'chav', to the presumption that they're 'scroungers', people who live on council estates encounter all kinds of prejudices and false assumptions about themselves, their families, and their lifestyles. Here, eight people discuss the common statements, misconceptions, and clichés that they're tired of hearing.",
'duration': 360,
'thumbnail': r're:https?://.+/.+\.jpg',
},
'params': {
'skip_download': True,
}
}, {
# window.__PRELOADED_STATE__
'url': 'https://www.bbc.co.uk/radio/play/b0b9z4yl',
@ -1162,9 +1183,16 @@ class BBCIE(BBCCoUkIE):
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
initial_data = self._parse_json(self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), playlist_id, fatal=False)
initial_data = self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*("{.+?}")\s*;', webpage,
'quoted preload state', default=None)
if initial_data is None:
initial_data = self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*({.+?})\s*;', webpage,
'preload state', default={})
else:
initial_data = self._parse_json(initial_data or '"{}"', playlist_id, fatal=False)
initial_data = self._parse_json(initial_data, playlist_id, fatal=False)
if initial_data:
def parse_media(media):
if not media:
@ -1205,7 +1233,10 @@ class BBCIE(BBCCoUkIE):
if name == 'media-experience':
parse_media(try_get(resp, lambda x: x['data']['initialItem']['mediaItem'], dict))
elif name == 'article':
for block in (try_get(resp, lambda x: x['data']['blocks'], list) or []):
for block in (try_get(resp,
(lambda x: x['data']['blocks'],
lambda x: x['data']['content']['model']['blocks'],),
list) or []):
if block.get('type') != 'media':
continue
parse_media(block.get('model'))

View File

@ -0,0 +1,59 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import ExtractorError, urlencode_postdata
class BigoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bigo\.tv/(?:[a-z]{2,}/)?(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.bigo.tv/ja/221338632',
'info_dict': {
'id': '6576287577575737440',
'title': '土よ〜💁‍♂️ 休憩室/REST room',
'thumbnail': r're:https?://.+',
'uploader': '✨Shin💫',
'uploader_id': '221338632',
'is_live': True,
},
'skip': 'livestream',
}, {
'url': 'https://www.bigo.tv/th/Tarlerm1304',
'only_matching': True,
}, {
'url': 'https://bigo.tv/115976881',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
info_raw = self._download_json(
'https://bigo.tv/studio/getInternalStudioInfo',
user_id, data=urlencode_postdata({'siteId': user_id}))
if not isinstance(info_raw, dict):
raise ExtractorError('Received invalid JSON data')
if info_raw.get('code'):
raise ExtractorError(
'Bigo says: %s (code %s)' % (info_raw.get('msg'), info_raw.get('code')), expected=True)
info = info_raw.get('data') or {}
if not info.get('alive'):
raise ExtractorError('This user is offline.', expected=True)
return {
'id': info.get('roomId') or user_id,
'title': info.get('roomTopic') or info.get('nick_name') or user_id,
'formats': [{
'url': info.get('hls_src'),
'ext': 'mp4',
'protocol': 'm3u8',
}],
'thumbnail': info.get('snapshot'),
'uploader': info.get('nick_name'),
'uploader_id': user_id,
'is_live': True,
}

View File

@ -233,7 +233,7 @@ class BiliBiliIE(InfoExtractor):
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader': uploader_mobj.group('name').strip(),
'uploader_id': uploader_mobj.group('id'),
})
if not info.get('uploader'):
@ -369,6 +369,11 @@ class BilibiliAudioIE(BilibiliAudioBaseIE):
'filesize': int_or_none(play_data.get('size')),
}]
for a_format in formats:
a_format.setdefault('http_headers', {}).update({
'Referer': url,
})
song = self._call_api('song/info', au_id)
title = song['title']
statistic = song.get('statistic') or {}

View File

@ -0,0 +1,173 @@
# coding: utf-8
from __future__ import unicode_literals
import json
from ..utils import (
strip_or_none,
traverse_obj,
)
from .common import InfoExtractor
class BlerpIE(InfoExtractor):
IE_NAME = 'blerp'
_VALID_URL = r'https?://(?:www\.)?blerp\.com/soundbites/(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://blerp.com/soundbites/6320fe8745636cb4dd677a5a',
'info_dict': {
'id': '6320fe8745636cb4dd677a5a',
'title': 'Samsung Galaxy S8 Over the Horizon Ringtone 2016',
'uploader': 'luminousaj',
'uploader_id': '5fb81e51aa66ae000c395478',
'ext': 'mp3',
'tags': ['samsung', 'galaxy', 's8', 'over the horizon', '2016', 'ringtone'],
}
}, {
'url': 'https://blerp.com/soundbites/5bc94ef4796001000498429f',
'info_dict': {
'id': '5bc94ef4796001000498429f',
'title': 'Yee',
'uploader': '179617322678353920',
'uploader_id': '5ba99cf71386730004552c42',
'ext': 'mp3',
'tags': ['YEE', 'YEET', 'wo ha haah catchy tune yee', 'yee']
}
}]
_GRAPHQL_OPERATIONNAME = "webBitePageGetBite"
_GRAPHQL_QUERY = (
'''query webBitePageGetBite($_id: MongoID!) {
web {
biteById(_id: $_id) {
...bitePageFrag
__typename
}
__typename
}
}
fragment bitePageFrag on Bite {
_id
title
userKeywords
keywords
color
visibility
isPremium
owned
price
extraReview
isAudioExists
image {
filename
original {
url
__typename
}
__typename
}
userReactions {
_id
reactions
createdAt
__typename
}
topReactions
totalSaveCount
saved
blerpLibraryType
license
licenseMetaData
playCount
totalShareCount
totalFavoriteCount
totalAddedToBoardCount
userCategory
userAudioQuality
audioCreationState
transcription
userTranscription
description
createdAt
updatedAt
author
listingType
ownerObject {
_id
username
profileImage {
filename
original {
url
__typename
}
__typename
}
__typename
}
transcription
favorited
visibility
isCurated
sourceUrl
audienceRating
strictAudienceRating
ownerId
reportObject {
reportedContentStatus
__typename
}
giphy {
mp4
gif
__typename
}
audio {
filename
original {
url
__typename
}
mp3 {
url
__typename
}
__typename
}
__typename
}
''')
def _real_extract(self, url):
audio_id = self._match_id(url)
data = {
'operationName': self._GRAPHQL_OPERATIONNAME,
'query': self._GRAPHQL_QUERY,
'variables': {
'_id': audio_id
}
}
headers = {
'Content-Type': 'application/json'
}
json_result = self._download_json('https://api.blerp.com/graphql',
audio_id, data=json.dumps(data).encode('utf-8'), headers=headers)
bite_json = json_result['data']['web']['biteById']
info_dict = {
'id': bite_json['_id'],
'url': bite_json['audio']['mp3']['url'],
'title': bite_json['title'],
'uploader': traverse_obj(bite_json, ('ownerObject', 'username'), expected_type=strip_or_none),
'uploader_id': traverse_obj(bite_json, ('ownerObject', '_id'), expected_type=strip_or_none),
'ext': 'mp3',
'tags': list(filter(None, map(strip_or_none, (traverse_obj(bite_json, 'userKeywords', expected_type=list) or []))) or None)
}
return info_dict

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@ -12,13 +13,28 @@ from ..utils import (
class BongaCamsIE(InfoExtractor):
_VALID_URL = r'https?://(?P<host>(?:[^/]+\.)?bongacams\d*\.com)/(?P<id>[^/?&#]+)'
_VALID_URL = r'https?://(?P<host>(?:[^/]+\.)?bongacams\d*\.(?:com|net))/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://de.bongacams.com/azumi-8',
'only_matching': True,
}, {
'url': 'https://cn.bongacams.com/azumi-8',
'only_matching': True,
}, {
'url': 'https://de.bongacams.net/claireashton',
'info_dict': {
'id': 'claireashton',
'ext': 'mp4',
'title': r're:ClaireAshton \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'age_limit': 18,
'uploader_id': 'ClaireAshton',
'uploader': 'ClaireAshton',
'like_count': int,
'is_live': True,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):

View File

@ -0,0 +1,79 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
merge_dicts,
parse_iso8601,
T,
traverse_obj,
txt_or_none,
urljoin,
)
class CaffeineTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?caffeine\.tv/[^/]+/video/(?P<id>[0-9a-f-]+)'
_TESTS = [{
'url': 'https://www.caffeine.tv/TsuSurf/video/cffc0a00-e73f-11ec-8080-80017d29f26e',
'info_dict': {
'id': 'cffc0a00-e73f-11ec-8080-80017d29f26e',
'ext': 'mp4',
'title': 'GOOOOD MORNINNNNN #highlights',
'timestamp': 1654702180,
'upload_date': '20220608',
'uploader': 'TsuSurf',
'duration': 3145,
'age_limit': 17,
},
'params': {
'format': 'bestvideo',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json(
'https://api.caffeine.tv/social/public/activity/' + video_id,
video_id)
broadcast_info = traverse_obj(json_data, ('broadcast_info', T(dict))) or {}
title = broadcast_info['broadcast_title']
video_url = broadcast_info['video_url']
ext = determine_ext(video_url)
if ext == 'm3u8':
formats = self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8',
fatal=False)
else:
formats = [{'url': video_url}]
self._sort_formats(formats)
return merge_dicts({
'id': video_id,
'title': title,
'formats': formats,
}, traverse_obj(json_data, {
'uploader': ((None, 'user'), 'username'),
}, get_all=False), traverse_obj(json_data, {
'like_count': ('like_count', T(int_or_none)),
'view_count': ('view_count', T(int_or_none)),
'comment_count': ('comment_count', T(int_or_none)),
'tags': ('tags', Ellipsis, T(txt_or_none)),
'is_live': 'is_live',
'uploader': ('user', 'name'),
}), traverse_obj(broadcast_info, {
'duration': ('content_duration', T(int_or_none)),
'timestamp': ('broadcast_start_time', T(parse_iso8601)),
'thumbnail': ('preview_image_path', T(lambda u: urljoin(url, u))),
'age_limit': ('content_rating', T(lambda r: r and {
# assume Apple Store ratings [1]
# 1. https://en.wikipedia.org/wiki/Mobile_software_content_rating_system
'FOUR_PLUS': 0,
'NINE_PLUS': 9,
'TWELVE_PLUS': 12,
'SEVENTEEN_PLUS': 17,
}.get(r, 17))),
}))

View File

@ -0,0 +1,74 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
traverse_obj,
try_get,
)
class CallinIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?callin\.com/episode/(?:[^/#?-]+-)*(?P<id>[^/#?-]+)'
_TESTS = [{
'url': 'https://www.callin.com/episode/fcc-commissioner-brendan-carr-on-elons-PrumRdSQJW',
'md5': '14ede27ee2c957b7e4db93140fc0745c',
'info_dict': {
'id': 'PrumRdSQJW',
'ext': 'mp4',
'title': 'FCC Commissioner Brendan Carr on Elons Starlink',
'description': 'Or, why the government doesnt like SpaceX',
'channel': 'The Pull Request',
'channel_url': 'https://callin.com/show/the-pull-request-ucnDJmEKAa',
}
}, {
'url': 'https://www.callin.com/episode/episode-81-elites-melt-down-over-student-debt-lzxMidUnjA',
'md5': '16f704ddbf82a27e3930533b12062f07',
'info_dict': {
'id': 'lzxMidUnjA',
'ext': 'mp4',
'title': 'Episode 81- Elites MELT DOWN over Student Debt Victory? Rumble in NYC?',
'description': 'Lets talk todays episode about the primary election shake up in NYC and the elites melting down over student debt cancelation.',
'channel': 'The DEBRIEF With Briahna Joy Gray',
'channel_url': 'https://callin.com/show/the-debrief-with-briahna-joy-gray-siiFDzGegm',
}
}]
def _search_nextjs_data(self, webpage, video_id, transform_source=None, fatal=True, **kw):
return self._parse_json(
self._search_regex(
r'(?s)<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>([^<]+)</script>',
webpage, 'next.js data', fatal=fatal, **kw),
video_id, transform_source=transform_source, fatal=fatal)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
next_data = self._search_nextjs_data(webpage, video_id)
episode = traverse_obj(next_data, ('props', 'pageProps', 'episode'), expected_type=dict)
if not episode:
raise ExtractorError('Failed to find episode data')
title = episode.get('title') or self._og_search_title(webpage)
description = episode.get('description') or self._og_search_description(webpage)
formats = []
formats.extend(self._extract_m3u8_formats(
episode.get('m3u8'), video_id, 'mp4',
entry_protocol='m3u8_native', fatal=False))
self._sort_formats(formats)
channel = try_get(episode, lambda x: x['show']['title'], compat_str)
channel_url = try_get(episode, lambda x: x['show']['linkObj']['resourceUrl'], compat_str)
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats,
'channel': channel,
'channel_url': channel_url,
}

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
url_or_none,
)
@ -20,32 +19,11 @@ class CamModelsIE(InfoExtractor):
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(
url, user_id, headers=self.geo_verification_headers())
manifest_root = self._html_search_regex(
r'manifestUrlRoot=([^&\']+)', webpage, 'manifest', default=None)
if not manifest_root:
ERRORS = (
("I'm offline, but let's stay connected", 'This user is currently offline'),
('in a private show', 'This user is in a private show'),
('is currently performing LIVE', 'This model is currently performing live'),
)
for pattern, message in ERRORS:
if pattern in webpage:
error = message
expected = True
break
else:
error = 'Unable to find manifest URL root'
expected = False
raise ExtractorError(error, expected=expected)
manifest = self._download_json(
'%s%s.json' % (manifest_root, user_id), user_id)
'https://manifest-server.naiadsystems.com/live/s:%s.json' % user_id, user_id)
formats = []
thumbnails = []
for format_id, format_dict in manifest['formats'].items():
if not isinstance(format_dict, dict):
continue
@ -85,6 +63,13 @@ class CamModelsIE(InfoExtractor):
'preference': -1,
})
else:
if format_id == 'jpeg':
thumbnails.append({
'url': f['url'],
'width': f['width'],
'height': f['height'],
'format_id': f['format_id'],
})
continue
formats.append(f)
self._sort_formats(formats)
@ -92,6 +77,7 @@ class CamModelsIE(InfoExtractor):
return {
'id': user_id,
'title': self._live_title(user_id),
'thumbnails': thumbnails,
'is_live': True,
'formats': formats,
'age_limit': 18

View File

@ -12,35 +12,21 @@ from ..utils import (
ExtractorError,
float_or_none,
sanitized_Request,
unescapeHTML,
update_url_query,
str_or_none,
traverse_obj,
urlencode_postdata,
USER_AGENTS,
)
class CeskaTelevizeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/ivysilani/(?:[^/?#&]+/)*(?P<id>[^/#?]+)'
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(?:ivysilani|porady|zive)/(?:[^/?#&]+/)*(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': {
'id': '61924494877246241',
'ext': 'mp4',
'title': 'Hyde Park Civilizace: Život v Grónsku',
'description': 'md5:3fec8f6bb497be5cdb0c9e8781076626',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3350,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.ceskatelevize.cz/ivysilani/10441294653-hyde-park-civilizace/215411058090502/bonus/20641-bonus-01-en',
'info_dict': {
'id': '61924494877028507',
'ext': 'mp4',
'title': 'Hyde Park Civilizace: Bonus 01 - En',
'title': 'Bonus 01 - En - Hyde Park Civilizace',
'description': 'English Subtittles',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 81.3,
@ -51,31 +37,111 @@ class CeskaTelevizeIE(InfoExtractor):
},
}, {
# live stream
'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
'url': 'http://www.ceskatelevize.cz/zive/ct1/',
'info_dict': {
'id': 402,
'id': '102',
'ext': 'mp4',
'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'title': r'ČT1 - živé vysílání online',
'description': 'Sledujte živé vysílání kanálu ČT1 online. Vybírat si můžete i z dalších kanálů České televize na kterémkoli z vašich zařízení.',
'is_live': True,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to Czech Republic',
}, {
# another
'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
'only_matching': True,
'info_dict': {
'id': 402,
'ext': 'mp4',
'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'is_live': True,
},
# 'skip': 'Georestricted to Czech Republic',
}, {
'url': 'http://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php?hash=d6a3e1370d2e4fa76296b90bad4dfc19673b641e&IDEC=217 562 22150/0004&channelID=1&width=100%25',
'only_matching': True,
}, {
# video with 18+ caution trailer
'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/',
'info_dict': {
'id': '215562210900007-bogotart',
'title': 'Bogotart - Queer',
'description': 'Hlavní město Kolumbie v doprovodu queer umělců. Vroucí svět plný vášně, sebevědomí, ale i násilí a bolesti',
},
'playlist': [{
'info_dict': {
'id': '61924494877311053',
'ext': 'mp4',
'title': 'Bogotart - Queer (Varování 18+)',
'duration': 11.9,
},
}, {
'info_dict': {
'id': '61924494877068022',
'ext': 'mp4',
'title': 'Bogotart - Queer (Queer)',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 1558.3,
},
}],
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# iframe embed
'url': 'http://www.ceskatelevize.cz/porady/10614999031-neviditelni/21251212048/',
'only_matching': True,
}]
def _search_nextjs_data(self, webpage, video_id, **kw):
return self._parse_json(
self._search_regex(
r'(?s)<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>([^<]+)</script>',
webpage, 'next.js data', **kw),
video_id, **kw)
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, playlist_id)
parsed_url = compat_urllib_parse_urlparse(urlh.geturl())
site_name = self._og_search_property('site_name', webpage, fatal=False, default='Česká televize')
playlist_title = self._og_search_title(webpage, default=None)
if site_name and playlist_title:
playlist_title = re.split(r'\s*[—|]\s*%s' % (site_name, ), playlist_title, 1)[0]
playlist_description = self._og_search_description(webpage, default=None)
if playlist_description:
playlist_description = playlist_description.replace('\xa0', ' ')
webpage = self._download_webpage(url, playlist_id)
type_ = 'IDEC'
if re.search(r'(^/porady|/zive)/', parsed_url.path):
next_data = self._search_nextjs_data(webpage, playlist_id)
if '/zive/' in parsed_url.path:
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', 'liveBroadcast', 'current', 'idec'), get_all=False)
else:
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', ('show', 'mediaMeta'), 'idec'), get_all=False)
if not idec:
idec = traverse_obj(next_data, ('props', 'pageProps', 'data', 'videobonusDetail', 'bonusId'), get_all=False)
if idec:
type_ = 'bonus'
if not idec:
raise ExtractorError('Failed to find IDEC id')
iframe_hash = self._download_webpage(
'https://www.ceskatelevize.cz/v-api/iframe-hash/',
playlist_id, note='Getting IFRAME hash')
query = {'hash': iframe_hash, 'origin': 'iVysilani', 'autoStart': 'true', type_: idec, }
webpage = self._download_webpage(
'https://www.ceskatelevize.cz/ivysilani/embed/iFramePlayer.php',
playlist_id, note='Downloading player', query=query)
NOT_AVAILABLE_STRING = 'This content is not available at your territory due to limited copyright.'
if '%s</p>' % NOT_AVAILABLE_STRING in webpage:
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
self.raise_geo_restricted(NOT_AVAILABLE_STRING)
if any(not_found in webpage for not_found in ('Neplatný parametr pro videopřehrávač', 'IDEC nebyl nalezen', )):
raise ExtractorError('no video with IDEC available', video_id=idec, expected=True)
type_ = None
episode_id = None
@ -100,7 +166,7 @@ class CeskaTelevizeIE(InfoExtractor):
data = {
'playlist[0][type]': type_,
'playlist[0][id]': episode_id,
'requestUrl': compat_urllib_parse_urlparse(url).path,
'requestUrl': parsed_url.path,
'requestSource': 'iVysilani',
}
@ -108,7 +174,7 @@ class CeskaTelevizeIE(InfoExtractor):
for user_agent in (None, USER_AGENTS['Safari']):
req = sanitized_Request(
'https://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
'https://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist/',
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
@ -130,9 +196,6 @@ class CeskaTelevizeIE(InfoExtractor):
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id, fatal=False)
if not playlist:
continue
@ -167,7 +230,7 @@ class CeskaTelevizeIE(InfoExtractor):
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId']
item_id = str_or_none(item.get('id') or item['assetId'])
title = item['title']
duration = float_or_none(item.get('duration'))
@ -181,8 +244,6 @@ class CeskaTelevizeIE(InfoExtractor):
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
@ -200,6 +261,8 @@ class CeskaTelevizeIE(InfoExtractor):
for e in entries:
self._sort_formats(e['formats'])
if len(entries) == 1:
return entries[0]
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
def _get_subtitles(self, episode_id, subs):
@ -236,54 +299,3 @@ class CeskaTelevizeIE(InfoExtractor):
yield line
return '\r\n'.join(_fix_subtitle(subtitles))
class CeskaTelevizePoradyIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/porady/(?:[^/?#&]+/)*(?P<id>[^/#?]+)'
_TESTS = [{
# video with 18+ caution trailer
'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/',
'info_dict': {
'id': '215562210900007-bogotart',
'title': 'Queer: Bogotart',
'description': 'Alternativní průvodce současným queer světem',
},
'playlist': [{
'info_dict': {
'id': '61924494876844842',
'ext': 'mp4',
'title': 'Queer: Bogotart (Varování 18+)',
'duration': 10.2,
},
}, {
'info_dict': {
'id': '61924494877068022',
'ext': 'mp4',
'title': 'Queer: Bogotart (Queer)',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 1558.3,
},
}],
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# iframe embed
'url': 'http://www.ceskatelevize.cz/porady/10614999031-neviditelni/21251212048/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data_url = update_url_query(unescapeHTML(self._search_regex(
(r'<span[^>]*\bdata-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?ceskatelevize\.cz/ivysilani/embed/iFramePlayer\.php.*?)\1'),
webpage, 'iframe player url', group='url')), query={
'autoStart': 'true',
})
return self.url_result(data_url, ie=CeskaTelevizeIE.ie_key())

View File

@ -0,0 +1,69 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
merge_dicts,
T,
traverse_obj,
unified_timestamp,
url_or_none,
)
class ClipchampIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clipchamp\.com/watch/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://clipchamp.com/watch/gRXZ4ZhdDaU',
'info_dict': {
'id': 'gRXZ4ZhdDaU',
'ext': 'mp4',
'title': 'Untitled video',
'uploader': 'Alexander Schwartz',
'timestamp': 1680805580,
'upload_date': '20230406',
'thumbnail': r're:^https?://.+\.jpg',
},
'params': {
'skip_download': 'm3u8',
'format': 'bestvideo',
},
}]
_STREAM_URL_TMPL = 'https://%s.cloudflarestream.com/%s/manifest/video.%s'
_STREAM_URL_QUERY = {'parentOrigin': 'https://clipchamp.com'}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['video']
storage_location = data.get('storage_location')
if storage_location != 'cf_stream':
raise ExtractorError('Unsupported clip storage location "%s"' % (storage_location,))
path = data['download_url']
iframe = self._download_webpage(
'https://iframe.cloudflarestream.com/' + path, video_id, 'Downloading player iframe')
subdomain = self._search_regex(
r'''\bcustomer-domain-prefix\s*=\s*("|')(?P<sd>[\w-]+)\1''', iframe,
'subdomain', group='sd', fatal=False) or 'customer-2ut9yn3y6fta1yxe'
formats = self._extract_mpd_formats(
self._STREAM_URL_TMPL % (subdomain, path, 'mpd'), video_id,
query=self._STREAM_URL_QUERY, fatal=False, mpd_id='dash')
formats.extend(self._extract_m3u8_formats(
self._STREAM_URL_TMPL % (subdomain, path, 'm3u8'), video_id, 'mp4',
query=self._STREAM_URL_QUERY, fatal=False, m3u8_id='hls'))
return merge_dicts({
'id': video_id,
'formats': formats,
'uploader': ' '.join(traverse_obj(data, ('creator', ('first_name', 'last_name'), T(compat_str)))) or None,
}, traverse_obj(data, {
'title': ('project', 'project_name', T(compat_str)),
'timestamp': ('created_at', T(unified_timestamp)),
'thumbnail': ('thumbnail_url', T(url_or_none)),
}), rev=True)

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,148 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
try_get,
unified_timestamp,
update_url_query,
urljoin,
)
# compat_range
try:
if callable(xrange):
range = xrange
except (NameError, TypeError):
pass
class CPACIE(InfoExtractor):
IE_NAME = 'cpac'
_VALID_URL = r'https?://(?:www\.)?cpac\.ca/(?P<fr>l-)?episode\?id=(?P<id>[\da-f]{8}(?:-[\da-f]{4}){3}-[\da-f]{12})'
_TEST = {
# 'url': 'http://www.cpac.ca/en/programs/primetime-politics/episodes/65490909',
'url': 'https://www.cpac.ca/episode?id=fc7edcae-4660-47e1-ba61-5b7f29a9db0f',
'md5': 'e46ad699caafd7aa6024279f2614e8fa',
'info_dict': {
'id': 'fc7edcae-4660-47e1-ba61-5b7f29a9db0f',
'ext': 'mp4',
'upload_date': '20220215',
'title': 'News Conference to Celebrate National Kindness Week February 15, 2022',
'description': 'md5:466a206abd21f3a6f776cdef290c23fb',
'timestamp': 1644901200,
},
'params': {
'format': 'bestvideo',
'hls_prefer_native': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
url_lang = 'fr' if '/l-episode?' in url else 'en'
content = self._download_json(
'https://www.cpac.ca/api/1/services/contentModel.json?url=/site/website/episode/index.xml&crafterSite=cpacca&id=' + video_id,
video_id)
video_url = try_get(content, lambda x: x['page']['details']['videoUrl'], compat_str)
formats = []
if video_url:
content = content['page']
title = str_or_none(content['details']['title_%s_t' % (url_lang, )])
formats = self._extract_m3u8_formats(video_url, video_id, m3u8_id='hls', ext='mp4')
for fmt in formats:
# prefer language to match URL
fmt_lang = fmt.get('language')
if fmt_lang == url_lang:
fmt['language_preference'] = 10
elif not fmt_lang:
fmt['language_preference'] = -1
else:
fmt['language_preference'] = -10
self._sort_formats(formats)
category = str_or_none(content['details']['category_%s_t' % (url_lang, )])
def is_live(v_type):
return (v_type == 'live') if v_type is not None else None
return {
'id': video_id,
'formats': formats,
'title': title,
'description': str_or_none(content['details'].get('description_%s_t' % (url_lang, ))),
'timestamp': unified_timestamp(content['details'].get('liveDateTime')),
'category': [category] if category else None,
'thumbnail': urljoin(url, str_or_none(content['details'].get('image_%s_s' % (url_lang, )))),
'is_live': is_live(content['details'].get('type')),
}
class CPACPlaylistIE(InfoExtractor):
IE_NAME = 'cpac:playlist'
_VALID_URL = r'(?i)https?://(?:www\.)?cpac\.ca/(?:program|search|(?P<fr>emission|rechercher))\?(?:[^&]+&)*?(?P<id>(?:id=\d+|programId=\d+|key=[^&]+))'
_TESTS = [{
'url': 'https://www.cpac.ca/program?id=6',
'info_dict': {
'id': 'id=6',
'title': 'Headline Politics',
'description': 'Watch CPACs signature long-form coverage of the days pressing political events as they unfold.',
},
'playlist_count': 10,
}, {
'url': 'https://www.cpac.ca/search?key=hudson&type=all&order=desc',
'info_dict': {
'id': 'key=hudson',
'title': 'hudson',
},
'playlist_count': 22,
}, {
'url': 'https://www.cpac.ca/search?programId=50',
'info_dict': {
'id': 'programId=50',
'title': '50',
},
'playlist_count': 9,
}, {
'url': 'https://www.cpac.ca/emission?id=6',
'only_matching': True,
}, {
'url': 'https://www.cpac.ca/rechercher?key=hudson&type=all&order=desc',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
url_lang = 'fr' if any(x in url for x in ('/emission?', '/rechercher?')) else 'en'
pl_type, list_type = ('program', 'itemList') if any(x in url for x in ('/program?', '/emission?')) else ('search', 'searchResult')
api_url = (
'https://www.cpac.ca/api/1/services/contentModel.json?url=/site/website/%s/index.xml&crafterSite=cpacca&%s'
% (pl_type, video_id, ))
content = self._download_json(api_url, video_id)
entries = []
total_pages = int_or_none(try_get(content, lambda x: x['page'][list_type]['totalPages']), default=1)
for page in range(1, total_pages + 1):
if page > 1:
api_url = update_url_query(api_url, {'page': '%d' % (page, ), })
content = self._download_json(
api_url, video_id,
note='Downloading continuation - %d' % (page, ),
fatal=False)
for item in try_get(content, lambda x: x['page'][list_type]['item'], list) or []:
episode_url = urljoin(url, try_get(item, lambda x: x['url_%s_s' % (url_lang, )]))
if episode_url:
entries.append(episode_url)
return self.playlist_result(
(self.url_result(entry) for entry in entries),
playlist_id=video_id,
playlist_title=try_get(content, lambda x: x['page']['program']['title_%s_t' % (url_lang, )]) or video_id.split('=')[-1],
playlist_description=try_get(content, lambda x: x['page']['program']['description_%s_t' % (url_lang, )]),
)

View File

@ -145,7 +145,7 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collections?|series)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://app.curiositystream.com/collection/2',
'info_dict': {
@ -157,6 +157,9 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
}, {
'url': 'https://curiositystream.com/series/2',
'only_matching': True,
}, {
'url': 'https://curiositystream.com/collections/36',
'only_matching': True,
}]
def _real_extract(self, url):

204
youtube_dl/extractor/dlf.py Normal file
View File

@ -0,0 +1,204 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
)
from ..utils import (
determine_ext,
extract_attributes,
int_or_none,
merge_dicts,
traverse_obj,
url_or_none,
variadic,
)
class DLFBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?deutschlandfunk\.de/'
_BUTTON_REGEX = r'(<button[^>]+alt="Anhören"[^>]+data-audio-diraid[^>]*>)'
def _parse_button_attrs(self, button, audio_id=None):
attrs = extract_attributes(button)
audio_id = audio_id or attrs['data-audio-diraid']
url = traverse_obj(
attrs, 'data-audio-download-src', 'data-audio', 'data-audioreference',
'data-audio-src', expected_type=url_or_none)
ext = determine_ext(url)
formats = (self._extract_m3u8_formats(url, audio_id, fatal=False)
if ext == 'm3u8' else [{'url': url, 'ext': ext, 'vcodec': 'none'}])
self._sort_formats(formats)
def traverse_attrs(path):
path = list(variadic(path))
t = path.pop() if callable(path[-1]) else None
return traverse_obj(attrs, path, expected_type=t, get_all=False)
def txt_or_none(v, default=None):
return default if v is None else (compat_str(v).strip() or default)
return merge_dicts(*reversed([{
'id': audio_id,
# 'extractor_key': DLFIE.ie_key(),
# 'extractor': DLFIE.IE_NAME,
'formats': formats,
}, dict((k, traverse_attrs(v)) for k, v in {
'title': (('data-audiotitle', 'data-audio-title', 'data-audio-download-tracking-title'), txt_or_none),
'duration': (('data-audioduration', 'data-audio-duration'), int_or_none),
'thumbnail': ('data-audioimage', url_or_none),
'uploader': 'data-audio-producer',
'series': 'data-audio-series',
'channel': 'data-audio-origin-site-name',
'webpage_url': ('data-audio-download-tracking-path', url_or_none),
}.items())]))
class DLFIE(DLFBaseIE):
IE_NAME = 'dlf'
_VALID_URL = DLFBaseIE._VALID_URL_BASE + r'[\w-]+-dlf-(?P<id>[\da-f]{8})-100\.html'
_TESTS = [
# Audio as an HLS stream
{
'url': 'https://www.deutschlandfunk.de/tanz-der-saiteninstrumente-das-wild-strings-trio-aus-slowenien-dlf-03a3eb19-100.html',
'info_dict': {
'id': '03a3eb19',
'title': r're:Tanz der Saiteninstrumente [-/] Das Wild Strings Trio aus Slowenien',
'ext': 'm4a',
'duration': 3298,
'thumbnail': 'https://assets.deutschlandfunk.de/FALLBACK-IMAGE-AUDIO/512x512.png?t=1603714364673',
'uploader': 'Deutschlandfunk',
'series': 'On Stage',
'channel': 'deutschlandfunk'
},
'params': {
'skip_download': 'm3u8'
},
'skip': 'This webpage no longer exists'
}, {
'url': 'https://www.deutschlandfunk.de/russische-athleten-kehren-zurueck-auf-die-sportbuehne-ein-gefaehrlicher-tueroeffner-dlf-d9cc1856-100.html',
'info_dict': {
'id': 'd9cc1856',
'title': 'Russische Athleten kehren zurück auf die Sportbühne: Ein gefährlicher Türöffner',
'ext': 'mp3',
'duration': 291,
'thumbnail': 'https://assets.deutschlandfunk.de/FALLBACK-IMAGE-AUDIO/512x512.png?t=1603714364673',
'uploader': 'Deutschlandfunk',
'series': 'Kommentare und Themen der Woche',
'channel': 'deutschlandfunk'
}
},
]
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id)
return self._parse_button_attrs(
self._search_regex(self._BUTTON_REGEX, webpage, 'button'), audio_id)
class DLFCorpusIE(DLFBaseIE):
IE_NAME = 'dlf:corpus'
IE_DESC = 'DLF Multi-feed Archives'
_VALID_URL = DLFBaseIE._VALID_URL_BASE + r'(?P<id>(?![\w-]+-dlf-[\da-f]{8})[\w-]+-\d+)\.html'
_TESTS = [
# Recorded news broadcast with referrals to related broadcasts
{
'url': 'https://www.deutschlandfunk.de/fechten-russland-belarus-ukraine-protest-100.html',
'info_dict': {
'id': 'fechten-russland-belarus-ukraine-protest-100',
'title': r're:Wiederzulassung als neutrale Athleten [-/] Was die Rückkehr russischer und belarussischer Sportler beim Fechten bedeutet',
'description': 'md5:91340aab29c71aa7518ad5be13d1e8ad'
},
'playlist_mincount': 5,
'playlist': [{
'info_dict': {
'id': '1fc5d64a',
'title': r're:Wiederzulassung als neutrale Athleten [-/] Was die Rückkehr russischer und belarussischer Sportler beim Fechten bedeutet',
'ext': 'mp3',
'duration': 252,
'thumbnail': 'https://assets.deutschlandfunk.de/aad16241-6b76-4a09-958b-96d0ee1d6f57/512x512.jpg?t=1679480020313',
'uploader': 'Deutschlandfunk',
'series': 'Sport',
'channel': 'deutschlandfunk'
}
}, {
'info_dict': {
'id': '2ada145f',
'title': r're:(?:Sportpolitik / )?Fechtverband votiert für Rückkehr russischer Athleten',
'ext': 'mp3',
'duration': 336,
'thumbnail': 'https://assets.deutschlandfunk.de/FILE_93982766f7317df30409b8a184ac044a/512x512.jpg?t=1678547581005',
'uploader': 'Deutschlandfunk',
'series': 'Deutschlandfunk Nova',
'channel': 'deutschlandfunk-nova'
}
}, {
'info_dict': {
'id': '5e55e8c9',
'title': r're:Wiederzulassung von Russland und Belarus [-/] "Herumlavieren" des Fechter-Bundes sorgt für Unverständnis',
'ext': 'mp3',
'duration': 187,
'thumbnail': 'https://assets.deutschlandfunk.de/a595989d-1ed1-4a2e-8370-b64d7f11d757/512x512.jpg?t=1679173825412',
'uploader': 'Deutschlandfunk',
'series': 'Sport am Samstag',
'channel': 'deutschlandfunk'
}
}, {
'info_dict': {
'id': '47e1a096',
'title': r're:Rückkehr Russlands im Fechten [-/] "Fassungslos, dass es einfach so passiert ist"',
'ext': 'mp3',
'duration': 602,
'thumbnail': 'https://assets.deutschlandfunk.de/da4c494a-21cc-48b4-9cc7-40e09fd442c2/512x512.jpg?t=1678562155770',
'uploader': 'Deutschlandfunk',
'series': 'Sport am Samstag',
'channel': 'deutschlandfunk'
}
}, {
'info_dict': {
'id': '5e55e8c9',
'title': r're:Wiederzulassung von Russland und Belarus [-/] "Herumlavieren" des Fechter-Bundes sorgt für Unverständnis',
'ext': 'mp3',
'duration': 187,
'thumbnail': 'https://assets.deutschlandfunk.de/a595989d-1ed1-4a2e-8370-b64d7f11d757/512x512.jpg?t=1679173825412',
'uploader': 'Deutschlandfunk',
'series': 'Sport am Samstag',
'channel': 'deutschlandfunk'
}
}]
},
# Podcast feed with tag buttons, playlist count fluctuates
{
'url': 'https://www.deutschlandfunk.de/kommentare-und-themen-der-woche-100.html',
'info_dict': {
'id': 'kommentare-und-themen-der-woche-100',
'title': 'Meinung - Kommentare und Themen der Woche',
'description': 'md5:2901bbd65cd2d45e116d399a099ce5d5',
},
'playlist_mincount': 10,
},
# Podcast feed with no description
{
'url': 'https://www.deutschlandfunk.de/podcast-tolle-idee-100.html',
'info_dict': {
'id': 'podcast-tolle-idee-100',
'title': 'Wissenschaftspodcast - Tolle Idee! - Was wurde daraus?',
},
'playlist_mincount': 11,
},
]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
return self.playlist_result(
map(self._parse_button_attrs, re.findall(self._BUTTON_REGEX, webpage)),
playlist_id, self._html_search_meta(['og:title', 'twitter:title'], webpage, default=None),
self._html_search_meta(['description', 'og:description', 'twitter:description'], webpage, default=None))

View File

@ -22,16 +22,19 @@ class EggheadBaseIE(InfoExtractor):
class EggheadCourseIE(EggheadBaseIE):
IE_DESC = 'egghead.io course'
IE_NAME = 'egghead:course'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)'
_TEST = {
_VALID_URL = r'https://(?:app\.)?egghead\.io/(?:course|playlist)s/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
'playlist_count': 29,
'info_dict': {
'id': '72',
'id': '432655',
'title': 'Professor Frisby Introduces Composable Functional JavaScript',
'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$',
},
}
}, {
'url': 'https://app.egghead.io/playlists/professor-frisby-introduces-composable-functional-javascript',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
@ -65,7 +68,7 @@ class EggheadCourseIE(EggheadBaseIE):
class EggheadLessonIE(EggheadBaseIE):
IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_VALID_URL = r'https://(?:app\.)?egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'info_dict': {
@ -88,6 +91,9 @@ class EggheadLessonIE(EggheadBaseIE):
}, {
'url': 'https://egghead.io/api/v1/lessons/react-add-redux-to-a-react-application',
'only_matching': True,
}, {
'url': 'https://app.egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
float_or_none,
T,
traverse_obj,
txt_or_none,
unified_timestamp,
url_or_none,
)
class EpidemicSoundIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?epidemicsound\.com/track/(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://www.epidemicsound.com/track/yFfQVRpSPz/',
'md5': 'd98ff2ddb49e8acab9716541cbc9dfac',
'info_dict': {
'id': '45014',
'display_id': 'yFfQVRpSPz',
'ext': 'mp3',
'tags': ['foley', 'door', 'knock', 'glass', 'window', 'glass door knock'],
'title': 'Door Knock Door 1',
'duration': 1,
'thumbnail': 'https://cdn.epidemicsound.com/curation-assets/commercial-release-cover-images/default-sfx/3000x3000.jpg',
'timestamp': 1415320353,
'upload_date': '20141107',
'age_limit': None,
# check that the "best" format was found, since test file MD5 doesn't
# distinguish the formats
'format': 'full',
},
}, {
'url': 'https://www.epidemicsound.com/track/mj8GTTwsZd/',
'md5': 'c82b745890f9baf18dc2f8d568ee3830',
'info_dict': {
'id': '148700',
'display_id': 'mj8GTTwsZd',
'ext': 'mp3',
'tags': ['liquid drum n bass', 'energetic'],
'title': 'Noplace',
'duration': 237,
'thumbnail': 'https://cdn.epidemicsound.com/curation-assets/commercial-release-cover-images/11138/3000x3000.jpg',
'timestamp': 1694426482,
'release_timestamp': 1700535606,
'upload_date': '20230911',
'age_limit': None,
'format': 'full',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json('https://www.epidemicsound.com/json/track/' + video_id, video_id)
def fmt_or_none(f):
if not f.get('format'):
f['format'] = f.get('format_id')
elif not f.get('format_id'):
f['format_id'] = f['format']
if not (f['url'] and f['format']):
return
if f.get('format_note'):
f['format_note'] = 'track ID ' + f['format_note']
f['preference'] = -1 if f['format'] == 'full' else -2
return f
formats = traverse_obj(json_data, (
'stems', T(dict.items), Ellipsis, {
'format': (0, T(txt_or_none)),
'format_note': (1, 's3TrackId', T(txt_or_none)),
'format_id': (1, 'stemType', T(txt_or_none)),
'url': (1, 'lqMp3Url', T(url_or_none)),
}, T(fmt_or_none)))
self._sort_formats(formats)
info = traverse_obj(json_data, {
'id': ('id', T(txt_or_none)),
'tags': ('metadataTags', Ellipsis, T(txt_or_none)),
'title': ('title', T(txt_or_none)),
'duration': ('length', T(float_or_none)),
'timestamp': ('added', T(unified_timestamp)),
'thumbnail': (('imageUrl', 'cover'), T(url_or_none)),
'age_limit': ('isExplicit', T(lambda b: 18 if b else None)),
'release_timestamp': ('releaseDate', T(unified_timestamp)),
}, get_all=False)
info.update(traverse_obj(json_data, {
'categories': ('genres', Ellipsis, 'tag', T(txt_or_none)),
'tags': ('metadataTags', Ellipsis, T(txt_or_none)),
}))
info.update({
'display_id': video_id,
'formats': formats,
})
return info

View File

@ -51,6 +51,10 @@ from .anvato import AnvatoIE
from .aol import AolIE
from .allocine import AllocineIE
from .aliexpress import AliExpressLiveIE
from .alsace20tv import (
Alsace20TVIE,
Alsace20TVEmbedIE,
)
from .apa import APAIE
from .aparat import AparatIE
from .appleconnect import AppleConnectIE
@ -71,6 +75,7 @@ from .arte import (
ArteTVIE,
ArteTVEmbedIE,
ArteTVPlaylistIE,
ArteTVCategoryIE,
)
from .arnes import ArnesIE
from .asiancrush import (
@ -114,6 +119,7 @@ from .bfmtv import (
)
from .bibeltv import BibelTVIE
from .bigflix import BigflixIE
from .bigo import BigoIE
from .bild import BildIE
from .bilibili import (
BiliBiliIE,
@ -132,6 +138,7 @@ from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
)
from .blerp import BlerpIE
from .bloomberg import BloombergIE
from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE
@ -152,6 +159,8 @@ from .businessinsider import BusinessInsiderIE
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .c56 import C56IE
from .caffeine import CaffeineTVIE
from .callin import CallinIE
from .camdemy import (
CamdemyIE,
CamdemyFolderIE
@ -202,10 +211,7 @@ from .ccc import (
from .ccma import CCMAIE
from .cctv import CCTVIE
from .cda import CDAIE
from .ceskatelevize import (
CeskaTelevizeIE,
CeskaTelevizePoradyIE,
)
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
from .charlierose import CharlieRoseIE
from .chaturbate import ChaturbateIE
@ -221,6 +227,7 @@ from .ciscolive import (
CiscoLiveSearchIE,
)
from .cjsw import CJSWIE
from .clipchamp import ClipchampIE
from .cliphunter import CliphunterIE
from .clippit import ClippitIE
from .cliprs import ClipRsIE
@ -253,6 +260,10 @@ from .commonprotocols import (
from .condenast import CondeNastIE
from .contv import CONtvIE
from .corus import CorusIE
from .cpac import (
CPACIE,
CPACPlaylistIE,
)
from .cracked import CrackedIE
from .crackle import CrackleIE
from .crooksandliars import CrooksAndLiarsIE
@ -286,6 +297,10 @@ from .dbtv import DBTVIE
from .dctp import DctpTvIE
from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE
from .dlf import (
DLFCorpusIE,
DLFIE,
)
from .dfb import DFBIE
from .dhm import DHMIE
from .digg import DiggIE
@ -343,6 +358,7 @@ from .ellentube import (
from .elpais import ElPaisIE
from .embedly import EmbedlyIE
from .engadget import EngadgetIE
from .epidemicsound import EpidemicSoundIE
from .eporner import EpornerIE
from .eroprofile import EroProfileIE
from .escapist import EscapistIE
@ -367,6 +383,7 @@ from .fc2 import (
FC2EmbedIE,
)
from .fczenit import FczenitIE
from .fifa import FifaIE
from .filmon import (
FilmOnIE,
FilmOnChannelIE,
@ -426,6 +443,7 @@ from .gamespot import GameSpotIE
from .gamestar import GameStarIE
from .gaskrank import GaskrankIE
from .gazeta import GazetaIE
from .gbnews import GBNewsIE
from .gdcvault import GDCVaultIE
from .gedidigital import GediDigitalIE
from .generic import GenericIE
@ -433,6 +451,13 @@ from .gfycat import GfycatIE
from .giantbomb import GiantBombIE
from .giga import GigaIE
from .glide import GlideIE
from .globalplayer import (
GlobalPlayerLiveIE,
GlobalPlayerLivePlaylistIE,
GlobalPlayerAudioIE,
GlobalPlayerAudioEpisodeIE,
GlobalPlayerVideoIE
)
from .globo import (
GloboIE,
GloboArticleIE,
@ -469,6 +494,7 @@ from .hotstar import (
)
from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE
from .hrfernsehen import HRFernsehenIE
from .hrti import (
HRTiIE,
HRTiPlaylistIE,
@ -545,8 +571,10 @@ from .khanacademy import (
from .kickstarter import KickStarterIE
from .kinja import KinjaEmbedIE
from .kinopoisk import KinoPoiskIE
from .kommunetv import KommunetvIE
from .konserthusetplay import KonserthusetPlayIE
from .krasview import KrasViewIE
from .kth import KTHIE
from .ku6 import Ku6IE
from .kusi import KUSIIE
from .kuwo import (
@ -610,10 +638,6 @@ from .linkedin import (
from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE
from .livejournal import LiveJournalIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
)
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
@ -720,6 +744,7 @@ from .myvi import (
MyviIE,
MyviEmbedIE,
)
from .myvideoge import MyVideoGeIE
from .myvidster import MyVidsterIE
from .nationalgeographic import (
NationalGeographicVideoIE,
@ -793,7 +818,14 @@ from .nick import (
NickNightIE,
NickRuIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .niconico import (
NiconicoIE,
NiconicoPlaylistIE,
NiconicoUserIE,
NicovideoSearchIE,
NicovideoSearchDateIE,
NicovideoSearchURLIE,
)
from .ninecninemedia import NineCNineMediaIE
from .ninegag import NineGagIE
from .ninenow import NineNowIE
@ -897,6 +929,10 @@ from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
from .pearvideo import PearVideoIE
from .peekvids import (
PeekVidsIE,
PlayVidsIE,
)
from .peertube import PeerTubeIE
from .people import PeopleIE
from .performgroup import PerformGroupIE
@ -953,6 +989,10 @@ from .pornhub import (
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
from .pornoxo import PornoXOIE
from .pr0gramm import (
Pr0grammIE,
Pr0grammStaticIE,
)
from .puhutv import (
PuhuTVIE,
PuhuTVSerieIE,
@ -990,6 +1030,10 @@ from .raywenderlich import (
RayWenderlichIE,
RayWenderlichCourseIE,
)
from .rbgtum import (
RbgTumIE,
RbgTumCourseIE,
)
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import (
@ -1045,6 +1089,10 @@ from .rutube import (
from .rutv import RUTVIE
from .ruutu import RuutuIE
from .ruv import RuvIE
from .s4c import (
S4CIE,
S4CSeriesIE,
)
from .safari import (
SafariIE,
SafariApiIE,
@ -1180,6 +1228,7 @@ from .storyfire import (
from .streamable import StreamableIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .streamsb import StreamsbIE
from .streetvoice import StreetVoiceIE
from .stretchinternet import StretchInternetIE
from .stv import STVPlayerIE
@ -1249,6 +1298,11 @@ from .theweatherchannel import TheWeatherChannelIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE
from .thisvid import (
ThisVidIE,
ThisVidMemberIE,
ThisVidPlaylistIE,
)
from .threeqsdn import ThreeQSDNIE
from .tiktok import (
TikTokIE,
@ -1533,6 +1587,7 @@ from .weibo import (
WeiboMobileIE
)
from .weiqitv import WeiqiTVIE
from .whyp import WhypIE
from .wistia import (
WistiaIE,
WistiaPlaylistIE,
@ -1598,7 +1653,15 @@ from .younow import (
YouNowChannelIE,
YouNowMomentIE,
)
from .youporn import YouPornIE
from .youporn import (
YouPornIE,
YouPornCategoryIE,
YouPornChannelIE,
YouPornCollectionIE,
YouPornStarIE,
YouPornTagIE,
YouPornVideosIE,
)
from .yourporn import YourPornIE
from .yourupload import YourUploadIE
from .youtube import (
@ -1610,7 +1673,7 @@ from .youtube import (
YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
#YoutubeSearchURLIE,
YoutubeSearchURLIE,
YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,

View File

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
traverse_obj,
unified_timestamp,
)
if not callable(getattr(InfoExtractor, '_match_valid_url', None)):
BaseInfoExtractor = InfoExtractor
import re
class InfoExtractor(BaseInfoExtractor):
@classmethod
def _match_valid_url(cls, url):
return re.match(cls._VALID_URL, url)
class FifaIE(InfoExtractor):
_VALID_URL = r'https?://www.fifa.com/fifaplus/(?P<locale>\w{2})/watch/([^#?]+/)?(?P<id>\w+)'
_TESTS = [{
'url': 'https://www.fifa.com/fifaplus/en/watch/7on10qPcnyLajDDU3ntg6y',
'info_dict': {
'id': '7on10qPcnyLajDDU3ntg6y',
'title': 'Italy v France | Final | 2006 FIFA World Cup Germany™ | Full Match Replay',
'description': 'md5:f4520d0ee80529c8ba4134a7d692ff8b',
'ext': 'mp4',
'categories': ['FIFA Tournaments'],
'thumbnail': 'https://digitalhub.fifa.com/transform/135e2656-3a51-407b-8810-6c34bec5b59b/FMR_2006_Italy_France_Final_Hero',
'duration': 8165,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.fifa.com/fifaplus/pt/watch/1cg5r5Qt6Qt12ilkDgb1sV',
'info_dict': {
'id': '1cg5r5Qt6Qt12ilkDgb1sV',
'title': 'Brazil v Germany | Semi-finals | 2014 FIFA World Cup Brazil™ | Extended Highlights',
'description': 'md5:d908c74ee66322b804ae2e521b02a855',
'ext': 'mp4',
'categories': ['FIFA Tournaments', 'Highlights'],
'thumbnail': 'https://digitalhub.fifa.com/transform/d8fe6f61-276d-4a73-a7fe-6878a35fd082/FIFAPLS_100EXTHL_2014BRAvGER_TMB',
'duration': 902,
'release_timestamp': 1404777600,
'release_date': '20140708',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.fifa.com/fifaplus/fr/watch/3C6gQH9C2DLwzNx7BMRQdp',
'info_dict': {
'id': '3C6gQH9C2DLwzNx7BMRQdp',
'title': 'Josimar goal against Northern Ireland | Classic Goals',
'description': 'md5:cbe7e7bb52f603c9f1fe9a4780fe983b',
'ext': 'mp4',
'categories': ['FIFA Tournaments', 'Goal'],
'duration': 28,
'thumbnail': 'https://digitalhub.fifa.com/transform/f9301391-f8d9-48b5-823e-c093ac5e3e11/CG_MEN_1986_JOSIMAR',
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
video_id, locale = self._match_valid_url(url).group('id', 'locale')
webpage = self._download_webpage(url, video_id)
preconnect_link = self._search_regex(
r'<link\b[^>]+\brel\s*=\s*"preconnect"[^>]+href\s*=\s*"([^"]+)"', webpage, 'Preconnect Link')
video_details = self._download_json(
'{preconnect_link}/sections/videoDetails/{video_id}'.format(**locals()), video_id, 'Downloading Video Details', fatal=False)
preplay_parameters = self._download_json(
'{preconnect_link}/videoPlayerData/{video_id}'.format(**locals()), video_id, 'Downloading Preplay Parameters')['preplayParameters']
content_data = self._download_json(
# 1. query string is expected to be sent as-is
# 2. `sig` must be appended
# 3. if absent, the call appears to work but the manifest is bad (404)
'https://content.uplynk.com/preplay/{contentId}/multiple.json?{queryStr}&sig={signature}'.format(**preplay_parameters),
video_id, 'Downloading Content Data')
# formats, subtitles = self._extract_m3u8_formats_and_subtitles(content_data['playURL'], video_id)
formats, subtitles = self._extract_m3u8_formats(content_data['playURL'], video_id, ext='mp4', entry_protocol='m3u8_native'), None
self._sort_formats(formats)
return {
'id': video_id,
'title': video_details['title'],
'description': video_details.get('description'),
'duration': int_or_none(video_details.get('duration')),
'release_timestamp': unified_timestamp(video_details.get('dateOfRelease')),
'categories': traverse_obj(video_details, (('videoCategory', 'videoSubcategory'),)),
'thumbnail': traverse_obj(video_details, ('backgroundImage', 'src')),
'formats': formats,
'subtitles': subtitles,
}

View File

@ -0,0 +1,139 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
extract_attributes,
ExtractorError,
T,
traverse_obj,
txt_or_none,
url_or_none,
)
class GBNewsIE(InfoExtractor):
IE_DESC = 'GB News clips, features and live stream'
# \w+ is normally shows or news, but apparently any word redirects to the correct URL
_VALID_URL = r'https?://(?:www\.)?gbnews\.(?:uk|com)/(?:\w+/)?(?P<id>[^#?]+)'
_PLATFORM = 'safari'
_SSMP_URL = 'https://mm-v2.simplestream.com/ssmp/api.php'
_TESTS = [{
'url': 'https://www.gbnews.uk/shows/andrew-neils-message-to-companies-choosing-to-boycott-gb-news/106889',
'info_dict': {
'id': '106889',
'ext': 'mp4',
'title': "Andrew Neil's message to companies choosing to boycott GB News",
'description': 'md5:b281f5d22fd6d5eda64a4e3ba771b351',
},
'skip': '404 not found',
}, {
'url': 'https://www.gbnews.com/news/bbc-claudine-gay-harvard-university-antisemitism-row',
'info_dict': {
'id': '52264136',
'display_id': 'bbc-claudine-gay-harvard-university-antisemitism-row',
'ext': 'mp4',
'title': 'BBC deletes post after furious backlash over headline downplaying antisemitism',
'description': 'The post was criticised by former employers of the broadcaster',
},
}, {
'url': 'https://www.gbnews.uk/watchlive',
'info_dict': {
'id': '1069',
'display_id': 'watchlive',
'ext': 'mp4',
'title': 'GB News Live',
'is_live': True,
},
'params': {
'skip_download': 'm3u8',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url).split('/')[-1]
webpage = self._download_webpage(url, display_id)
# extraction based on https://github.com/ytdl-org/youtube-dl/issues/29341
'''
<div id="video-106908"
class="simplestream"
data-id="GB001"
data-type="vod"
data-key="3Li3Nt2Qs8Ct3Xq9Fi5Uy0Mb2Bj0Qs"
data-token="f9c317c727dc07f515b20036c8ef14a6"
data-expiry="1624300052"
data-uvid="37900558"
data-poster="https://thumbnails.simplestreamcdn.com/gbnews/ondemand/37900558.jpg?width=700&"
data-npaw="false"
data-env="production">
'''
# exception if no match
video_data = self._search_regex(
r'(<div\s[^>]*\bclass\s*=\s*(\'|")(?!.*sidebar\b)simplestream(?:\s[\s\w$-]*)?\2[^>]*>)',
webpage, 'video data')
video_data = extract_attributes(video_data)
ss_id = video_data.get('data-id')
if not ss_id:
raise ExtractorError('Simplestream ID not found')
json_data = self._download_json(
self._SSMP_URL, display_id,
note='Downloading Simplestream JSON metadata',
errnote='Unable to download Simplestream JSON metadata',
query={
'id': ss_id,
'env': video_data.get('data-env', 'production'),
}, fatal=False)
meta_url = traverse_obj(json_data, ('response', 'api_hostname'))
if not meta_url:
raise ExtractorError('No API host found')
uvid = video_data['data-uvid']
dtype = video_data.get('data-type')
stream_data = self._download_json(
'%s/api/%s/stream/%s' % (meta_url, 'show' if dtype == 'vod' else dtype, uvid),
uvid,
query={
'key': video_data.get('data-key'),
'platform': self._PLATFORM,
},
headers={
'Token': video_data.get('data-token'),
'Token-Expiry': video_data.get('data-expiry'),
'Uvid': uvid,
}, fatal=False)
stream_url = traverse_obj(stream_data, (
'response', 'stream', T(url_or_none)))
if not stream_url:
raise ExtractorError('No stream data/URL')
# now known to be a dict
stream_data = stream_data['response']
drm = stream_data.get('drm')
if drm:
self.report_drm(uvid)
formats = self._extract_m3u8_formats(
stream_url, uvid, ext='mp4', entry_protocol='m3u8_native',
fatal=False)
# exception if no formats
self._sort_formats(formats)
return {
'id': uvid,
'display_id': display_id,
'title': (traverse_obj(stream_data, ('title', T(txt_or_none)))
or self._og_search_title(webpage, default=None)
or display_id.replace('-', ' ').capitalize()),
'description': self._og_search_description(webpage, default=None),
'thumbnail': (traverse_obj(video_data, ('data-poster', T(url_or_none)))
or self._og_search_thumbnail(webpage)),
'formats': formats,
'is_live': (dtype == 'live') or None,
}

View File

@ -28,6 +28,7 @@ from ..utils import (
mimetype2ext,
orderedSet,
parse_duration,
parse_resolution,
sanitized_Request,
smuggle_url,
unescapeHTML,
@ -35,6 +36,7 @@ from ..utils import (
unsmuggle_url,
UnsupportedError,
url_or_none,
urljoin,
xpath_attr,
xpath_text,
xpath_with_ns,
@ -84,7 +86,6 @@ from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .arkena import ArkenaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
from .kaltura import KalturaIE
@ -1629,31 +1630,6 @@ class GenericIE(InfoExtractor):
'upload_date': '20160409',
},
},
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': '7619da8c820e835bef21a1efa2a0fc71',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Another LiveLeak embed pattern (#13336)
{
'url': 'https://milo.yiannopoulos.net/2017/06/concealed-carry-robbery/',
'info_dict': {
'id': '2eb_1496309988',
'ext': 'mp4',
'title': 'Thief robs place where everyone was armed',
'description': 'md5:694d73ee79e535953cf2488562288eee',
'uploader': 'brazilwtf',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Duplicated embedded video URLs
{
'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443',
@ -2253,6 +2229,116 @@ class GenericIE(InfoExtractor):
# Sibnet embed (https://help.sibnet.ru/?sibnet_video_embed)
'url': 'https://phpbb3.x-tk.ru/bbcode-video-sibnet-t24.html',
'only_matching': True,
}, {
# KVS Player
'url': 'https://www.kvs-demo.com/videos/105/kelis-4th-of-july/',
'info_dict': {
'id': '105',
'display_id': 'kelis-4th-of-july',
'ext': 'mp4',
'title': 'Kelis - 4th Of July',
'thumbnail': r're:https://(?:www\.)?kvs-demo.com/contents/videos_screenshots/0/105/preview.jpg',
},
}, {
# KVS Player
'url': 'https://www.kvs-demo.com/embed/105/',
'info_dict': {
'id': '105',
'display_id': 'kelis-4th-of-july',
'ext': 'mp4',
'title': 'Kelis - 4th Of July / Embed Player',
'thumbnail': r're:https://(?:www\.)?kvs-demo.com/contents/videos_screenshots/0/105/preview.jpg',
},
'params': {
'skip_download': True,
},
}, {
# KVS Player (tested also in thisvid.py)
'url': 'https://youix.com/video/leningrad-zoj/',
'md5': '94f96ba95706dc3880812b27b7d8a2b8',
'info_dict': {
'id': '18485',
'display_id': 'leningrad-zoj',
'ext': 'mp4',
'title': 'Клип: Ленинград - ЗОЖ скачать, смотреть онлайн | Youix.com',
'thumbnail': r're:https://youix.com/contents/videos_screenshots/18000/18485/preview(?:_480x320_youix_com.mp4)?\.jpg',
},
}, {
# KVS Player
'url': 'https://youix.com/embed/18485',
'md5': '94f96ba95706dc3880812b27b7d8a2b8',
'info_dict': {
'id': '18485',
'display_id': 'leningrad-zoj',
'ext': 'mp4',
'title': 'Ленинград - ЗОЖ',
'thumbnail': r're:https://youix.com/contents/videos_screenshots/18000/18485/preview(?:_480x320_youix_com.mp4)?\.jpg',
},
}, {
# KVS Player
'url': 'https://bogmedia.org/videos/21217/40-nochey-40-nights-2016/',
'md5': '94166bdb26b4cb1fb9214319a629fc51',
'info_dict': {
'id': '21217',
'display_id': '40-nochey-2016',
'ext': 'mp4',
'title': '40 ночей (2016) - BogMedia.org',
'description': 'md5:4e6d7d622636eb7948275432eb256dc3',
'thumbnail': 'https://bogmedia.org/contents/videos_screenshots/21000/21217/preview_480p.mp4.jpg',
},
}, {
# KVS Player (for sites that serve kt_player.js via non-https urls)
'url': 'http://www.camhub.world/embed/389508',
'md5': 'fbe89af4cfb59c8fd9f34a202bb03e32',
'info_dict': {
'id': '389508',
'display_id': 'syren-de-mer-onlyfans-05-07-2020have-a-happy-safe-holiday5f014e68a220979bdb8cd-source',
'ext': 'mp4',
'title': 'Syren De Mer onlyfans_05-07-2020Have_a_happy_safe_holiday5f014e68a220979bdb8cd_source / Embed плеер',
'thumbnail': r're:https?://www\.camhub\.world/contents/videos_screenshots/389000/389508/preview\.mp4\.jpg',
},
}, {
'url': 'https://mrdeepfakes.com/video/5/selena-gomez-pov-deep-fakes',
'md5': 'fec4ad5ec150f655e0c74c696a4a2ff4',
'info_dict': {
'id': '5',
'display_id': 'selena-gomez-pov-deep-fakes',
'ext': 'mp4',
'title': 'Selena Gomez POV (Deep Fakes) DeepFake Porn - MrDeepFakes',
'description': 'md5:17d1f84b578c9c26875ac5ef9a932354',
'height': 720,
'age_limit': 18,
},
}, {
'url': 'https://shooshtime.com/videos/284002/just-out-of-the-shower-joi/',
'md5': 'e2f0a4c329f7986280b7328e24036d60',
'info_dict': {
'id': '284002',
'display_id': 'just-out-of-the-shower-joi',
'ext': 'mp4',
'title': 'Just Out Of The Shower JOI - Shooshtime',
'height': 720,
'age_limit': 18,
},
}, {
# would like to use the yt-dl test video but searching for
# '"\'/\\ä↭𝕐' fails, so using an old vid from YouTube Korea
'note': 'Test default search',
'url': 'Shorts로 허락 필요없이 놀자! (BTS편)',
'info_dict': {
'id': 'usDGO4Zb-dc',
'ext': 'mp4',
'title': 'YouTube Shorts로 허락 필요없이 놀자! (BTS편)',
'description': 'md5:96e31607eba81ab441567b5e289f4716',
'upload_date': '20211107',
'uploader': 'YouTube Korea',
'location': '대한민국',
},
'params': {
'default_search': 'ytsearch',
'skip_download': True,
},
'expected_warnings': ['uploader id'],
},
]
@ -2358,6 +2444,88 @@ class GenericIE(InfoExtractor):
'title': title,
}
def _extract_kvs(self, url, webpage, video_id):
def getlicensetoken(license):
modlicense = license.replace('$', '').replace('0', '1')
center = int(len(modlicense) / 2)
fronthalf = int(modlicense[:center + 1])
backhalf = int(modlicense[center:])
modlicense = compat_str(4 * abs(fronthalf - backhalf))
def parts():
for o in range(0, center + 1):
for i in range(1, 5):
yield compat_str((int(license[o + i]) + int(modlicense[o])) % 10)
return ''.join(parts())
def getrealurl(video_url, license_code):
if not video_url.startswith('function/0/'):
return video_url # not obfuscated
url_path, _, url_query = video_url.partition('?')
urlparts = url_path.split('/')[2:]
license = getlicensetoken(license_code)
newmagic = urlparts[5][:32]
def spells(x, o):
l = (o + sum(int(n) for n in license[o:])) % 32
for i in range(0, len(x)):
yield {l: x[o], o: x[l]}.get(i, x[i])
for o in range(len(newmagic) - 1, -1, -1):
newmagic = ''.join(spells(newmagic, o))
urlparts[5] = newmagic + urlparts[5][32:]
return '/'.join(urlparts) + '?' + url_query
flashvars = self._search_regex(
r'(?s)<script\b[^>]*>.*?var\s+flashvars\s*=\s*(\{.+?\});.*?</script>',
webpage, 'flashvars')
flashvars = self._parse_json(flashvars, video_id, transform_source=js_to_json)
# extract the part after the last / as the display_id from the
# canonical URL.
display_id = self._search_regex(
r'(?:<link href="https?://[^"]+/(.+?)/?" rel="canonical"\s*/?>'
r'|<link rel="canonical" href="https?://[^"]+/(.+?)/?"\s*/?>)',
webpage, 'display_id', fatal=False
)
title = self._html_search_regex(r'<(?:h1|title)>(?:Video: )?(.+?)</(?:h1|title)>', webpage, 'title')
thumbnail = flashvars['preview_url']
if thumbnail.startswith('//'):
protocol, _, _ = url.partition('/')
thumbnail = protocol + thumbnail
url_keys = list(filter(re.compile(r'^video_(?:url|alt_url\d*)$').match, flashvars.keys()))
formats = []
for key in url_keys:
if '/get_file/' not in flashvars[key]:
continue
format_id = flashvars.get(key + '_text', key)
formats.append(merge_dicts(
parse_resolution(format_id) or parse_resolution(flashvars[key]), {
'url': urljoin(url, getrealurl(flashvars[key], flashvars['license_code'])),
'format_id': format_id,
'ext': 'mp4',
'http_headers': {'Referer': url},
}))
if not formats[-1].get('height'):
formats[-1]['quality'] = 1
self._sort_formats(formats)
return {
'id': flashvars['video_id'],
'display_id': display_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}
def _real_extract(self, url):
if url.startswith('//'):
return self.url_result(self.http_scheme() + url)
@ -2566,9 +2734,16 @@ class GenericIE(InfoExtractor):
# but actually don't.
AGE_LIMIT_MARKERS = [
r'Proudly Labeled <a href="http://www\.rtalabel\.org/" title="Restricted to Adults">RTA</a>',
r'>[^<]*you acknowledge you are at least (\d+) years old',
r'>\s*(?:18\s+U(?:\.S\.C\.|SC)\s+)?(?:§+\s*)?2257\b',
]
if any(re.search(marker, webpage) for marker in AGE_LIMIT_MARKERS):
age_limit = 18
for marker in AGE_LIMIT_MARKERS:
m = re.search(marker, webpage)
if not m:
continue
age_limit = max(
age_limit or 0,
int_or_none(m.groups() and m.group(1), default=18))
# video uploader is domain name
video_uploader = self._search_regex(
@ -3179,11 +3354,6 @@ class GenericIE(InfoExtractor):
return self.url_result(
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_urls = LiveLeakIE._extract_urls(webpage)
if liveleak_urls:
return self.playlist_from_matches(liveleak_urls, video_id, video_title)
# Look for 3Q SDN embeds
threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
if threeqsdn_url:
@ -3420,6 +3590,20 @@ class GenericIE(InfoExtractor):
info_dict['formats'] = formats
return info_dict
# Look for generic KVS player (before ld+json for tests)
found = self._search_regex(
(r'<script\b[^>]+?\bsrc\s*=\s*(["\'])https?://(?:\S+?/)+kt_player\.js\?v=(?P<ver>\d+(?:\.\d+)+)\1[^>]*>',
# kt_player('kt_player', 'https://i.shoosh.co/player/kt_player.swf?v=5.5.1', ...
r'kt_player\s*\(\s*(["\'])(?:(?!\1)[\w\W])+\1\s*,\s*(["\'])https?://(?:\S+?/)+kt_player\.swf\?v=(?P<ver>\d+(?:\.\d+)+)\2\s*,',
), webpage, 'KVS player', group='ver', default=False)
if found:
self.report_extraction('%s: KVS Player' % (video_id, ))
if found.split('.')[0] not in ('4', '5', '6'):
self.report_warning('Untested major version (%s) in player engine - download may fail.' % (found, ))
return merge_dicts(
self._extract_kvs(url, webpage, video_id),
info_dict)
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')

View File

@ -0,0 +1,273 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
join_nonempty,
merge_dicts,
parse_duration,
str_or_none,
T,
traverse_obj,
unified_strdate,
unified_timestamp,
urlhandle_detect_ext,
)
class GlobalPlayerBaseIE(InfoExtractor):
def _get_page_props(self, url, video_id):
webpage = self._download_webpage(url, video_id)
return self._search_nextjs_data(webpage, video_id)['props']['pageProps']
def _request_ext(self, url, video_id):
return urlhandle_detect_ext(self._request_webpage( # Server rejects HEAD requests
url, video_id, note='Determining source extension'))
@staticmethod
def _clean_desc(x):
x = clean_html(x)
if x:
x = x.replace('\xa0', ' ')
return x
def _extract_audio(self, episode, series):
return merge_dicts({
'vcodec': 'none',
}, traverse_obj(series, {
'series': 'title',
'series_id': 'id',
'thumbnail': 'imageUrl',
'uploader': 'itunesAuthor', # podcasts only
}), traverse_obj(episode, {
'id': 'id',
'description': ('description', T(self._clean_desc)),
'duration': ('duration', T(parse_duration)),
'thumbnail': 'imageUrl',
'url': 'streamUrl',
'timestamp': (('pubDate', 'startDate'), T(unified_timestamp)),
'title': 'title',
}, get_all=False), rev=True)
class GlobalPlayerLiveIE(GlobalPlayerBaseIE):
_VALID_URL = r'https?://www\.globalplayer\.com/live/(?P<id>\w+)/\w+'
_TESTS = [{
'url': 'https://www.globalplayer.com/live/smoothchill/uk/',
'info_dict': {
'id': '2mx1E',
'ext': 'aac',
'display_id': 'smoothchill-uk',
'title': 're:^Smooth Chill.+$',
'thumbnail': 'https://herald.musicradio.com/media/f296ade8-50c9-4f60-911f-924e96873620.png',
'description': 'Music To Chill To',
# 'live_status': 'is_live',
'is_live': True,
},
}, {
# national station
'url': 'https://www.globalplayer.com/live/heart/uk/',
'info_dict': {
'id': '2mwx4',
'ext': 'aac',
'description': 'turn up the feel good!',
'thumbnail': 'https://herald.musicradio.com/media/49b9e8cb-15bf-4bf2-8c28-a4850cc6b0f3.png',
# 'live_status': 'is_live',
'is_live': True,
'title': 're:^Heart UK.+$',
'display_id': 'heart-uk',
},
}, {
# regional variation
'url': 'https://www.globalplayer.com/live/heart/london/',
'info_dict': {
'id': 'AMqg',
'ext': 'aac',
'thumbnail': 'https://herald.musicradio.com/media/49b9e8cb-15bf-4bf2-8c28-a4850cc6b0f3.png',
'title': 're:^Heart London.+$',
# 'live_status': 'is_live',
'is_live': True,
'display_id': 'heart-london',
'description': 'turn up the feel good!',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
station = self._get_page_props(url, video_id)['station']
stream_url = station['streamUrl']
return merge_dicts({
'id': station['id'],
'display_id': (
join_nonempty('brandSlug', 'slug', from_dict=station)
or station.get('legacyStationPrefix')),
'url': stream_url,
'ext': self._request_ext(stream_url, video_id),
'vcodec': 'none',
'is_live': True,
}, {
'title': self._live_title(traverse_obj(
station, (('name', 'brandName'), T(str_or_none)),
get_all=False)),
}, traverse_obj(station, {
'description': 'tagline',
'thumbnail': 'brandLogo',
}), rev=True)
class GlobalPlayerLivePlaylistIE(GlobalPlayerBaseIE):
_VALID_URL = r'https?://www\.globalplayer\.com/playlists/(?P<id>\w+)'
_TESTS = [{
# "live playlist"
'url': 'https://www.globalplayer.com/playlists/8bLk/',
'info_dict': {
'id': '8bLk',
'ext': 'aac',
# 'live_status': 'is_live',
'is_live': True,
'description': r're:(?s).+\bclassical\b.+\bClassic FM Hall [oO]f Fame\b',
'thumbnail': 'https://images.globalplayer.com/images/551379?width=450&signature=oMLPZIoi5_dBSHnTMREW0Xg76mA=',
'title': 're:Classic FM Hall of Fame.+$'
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
station = self._get_page_props(url, video_id)['playlistData']
stream_url = station['streamUrl']
return merge_dicts({
'id': video_id,
'url': stream_url,
'ext': self._request_ext(stream_url, video_id),
'vcodec': 'none',
'is_live': True,
}, traverse_obj(station, {
'title': 'title',
'description': ('description', T(self._clean_desc)),
'thumbnail': 'image',
}), rev=True)
class GlobalPlayerAudioIE(GlobalPlayerBaseIE):
_VALID_URL = r'https?://www\.globalplayer\.com/(?:(?P<podcast>podcasts)/|catchup/\w+/\w+/)(?P<id>\w+)/?(?:$|[?#])'
_TESTS = [{
# podcast
'url': 'https://www.globalplayer.com/podcasts/42KuaM/',
'playlist_mincount': 5,
'info_dict': {
'id': '42KuaM',
'title': 'Filthy Ritual',
'thumbnail': 'md5:60286e7d12d795bd1bbc9efc6cee643e',
'categories': ['Society & Culture', 'True Crime'],
'uploader': 'Global',
'description': r're:(?s).+\bscam\b.+?\bseries available now\b',
},
}, {
# radio catchup
'url': 'https://www.globalplayer.com/catchup/lbc/uk/46vyD7z/',
'playlist_mincount': 2,
'info_dict': {
'id': '46vyD7z',
'description': 'Nick Ferrari At Breakfast is Leading Britain\'s Conversation.',
'title': 'Nick Ferrari',
'thumbnail': 'md5:4df24d8a226f5b2508efbcc6ae874ebf',
},
}]
def _real_extract(self, url):
video_id, podcast = self._match_valid_url(url).group('id', 'podcast')
props = self._get_page_props(url, video_id)
series = props['podcastInfo'] if podcast else props['catchupInfo']
return merge_dicts({
'_type': 'playlist',
'id': video_id,
'entries': [self._extract_audio(ep, series) for ep in traverse_obj(
series, ('episodes', lambda _, v: v['id'] and v['streamUrl']))],
'categories': traverse_obj(series, ('categories', Ellipsis, 'name')) or None,
}, traverse_obj(series, {
'description': ('description', T(self._clean_desc)),
'thumbnail': 'imageUrl',
'title': 'title',
'uploader': 'itunesAuthor', # podcasts only
}), rev=True)
class GlobalPlayerAudioEpisodeIE(GlobalPlayerBaseIE):
_VALID_URL = r'https?://www\.globalplayer\.com/(?:(?P<podcast>podcasts)|catchup/\w+/\w+)/episodes/(?P<id>\w+)/?(?:$|[?#])'
_TESTS = [{
# podcast
'url': 'https://www.globalplayer.com/podcasts/episodes/7DrfNnE/',
'info_dict': {
'id': '7DrfNnE',
'ext': 'mp3',
'title': 'Filthy Ritual - Trailer',
'description': 'md5:1f1562fd0f01b4773b590984f94223e0',
'thumbnail': 'md5:60286e7d12d795bd1bbc9efc6cee643e',
'duration': 225.0,
'timestamp': 1681254900,
'series': 'Filthy Ritual',
'series_id': '42KuaM',
'upload_date': '20230411',
'uploader': 'Global',
},
}, {
# radio catchup
'url': 'https://www.globalplayer.com/catchup/lbc/uk/episodes/2zGq26Vcv1fCWhddC4JAwETXWe/',
'only_matching': True,
# expired: refresh the details with a current show for a full test
'info_dict': {
'id': '2zGq26Vcv1fCWhddC4JAwETXWe',
'ext': 'm4a',
'timestamp': 1682056800,
'series': 'Nick Ferrari',
'thumbnail': 'md5:4df24d8a226f5b2508efbcc6ae874ebf',
'upload_date': '20230421',
'series_id': '46vyD7z',
'description': 'Nick Ferrari At Breakfast is Leading Britain\'s Conversation.',
'title': 'Nick Ferrari',
'duration': 10800.0,
},
}]
def _real_extract(self, url):
video_id, podcast = self._match_valid_url(url).group('id', 'podcast')
props = self._get_page_props(url, video_id)
episode = props['podcastEpisode'] if podcast else props['catchupEpisode']
return self._extract_audio(
episode, traverse_obj(episode, 'podcast', 'show', expected_type=dict) or {})
class GlobalPlayerVideoIE(GlobalPlayerBaseIE):
_VALID_URL = r'https?://www\.globalplayer\.com/videos/(?P<id>\w+)'
_TESTS = [{
'url': 'https://www.globalplayer.com/videos/2JsSZ7Gm2uP/',
'info_dict': {
'id': '2JsSZ7Gm2uP',
'ext': 'mp4',
'description': 'md5:6a9f063c67c42f218e42eee7d0298bfd',
'thumbnail': 'md5:d4498af48e15aae4839ce77b97d39550',
'upload_date': '20230420',
'title': 'Treble Malakai Bayoh sings a sublime Handel aria at Classic FM Live',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
meta = self._get_page_props(url, video_id)['videoData']
return merge_dicts({
'id': video_id,
}, traverse_obj(meta, {
'url': 'url',
'thumbnail': ('image', 'url'),
'title': 'title',
'upload_date': ('publish_date', T(unified_strdate)),
'description': 'description',
}), rev=True)

Some files were not shown because too many files have changed in this diff Show More